Building Better Code Review at Heap

February 25, 20216 min read

Heap — like most companies — has a code review (CR) process. Until recently, that process often involved manually catching trivial issues in PRs (Pull Requests), which led to slow, inconsistent, and low-quality PR comments.

To address this, we adopted Prettier, Danger, some custom StyleLint rules, and a Buildkite step that automatically adds specific checklists to PRs depending on which files are changed in our monorepo. Here’s why we took these steps towards CR automation, how we did it, and what we learned along the way.

Note: If you’re as excited about the possibilities of CR automation as we are, we’d love to have you join our team! Check out our open roles here.

Why more automation?

Not everything that can be automated should be. The work we do to build automation has an opportunity cost; we have to weigh it against the value we can create by doing other things. The following xkcd comic illustrates this point nicely:

Given this, it’s understandable that many teams don’t invest in tools that streamline CR beyond simple linting and automated testing. Standard tools like ESLint do a decent job of catching many issues, and the trivial issues that they do miss only take a second or two to find. Since most of us aren’t reviewing more than five PRs in a day, it looks like building additional tooling to streamline CR isn’t a stellar investment (or at least according to the above chart).

There are, however, three big advantages to automating code review that don’t fit as neatly into this chart and make the investment worth it.

Consistency. Even if humans were just as fast as computers at catching issues, they wouldn’t be able to keep up for long. Humans get tired. They skip reviewing files that they “know” are issue-free. They forget standards that were worked out months ago.
Tighter feedback loop. Even if the humans reviewing code could consistently catch all issues in a PR in 300 milliseconds like a computer can, the humans writing the code are still better off with automation because it provides a tighter feedback loop. Humans can take a while to actually look at the PR (we do have other work to do, you know), and while we’re waiting, we’re losing precious context on the code we wrote.
More bandwidth to focus on what matters. When we lack the tools to catch trivial issues in our code, we risk spending our limited CR time bike-shedding instead of focusing on higher-leverage issues (e.g., Did we build the right thing? Is this component over architected?). We’re much better off when we can’t fool ourselves into thinking we’ve done a good job reviewing a PR simply because we’ve pointed out simple issues that a computer could spot.

Tools adopted, lessons learned

To start reaping the benefits of more automation in our code reviews at Heap, we adopted Prettier, Danger, custom StyleLint rules, and a Buildkite step that automatically adds specific checklists to PRs depending on which files are changed in our monorepo. Here’s what we learned.

Prettier

Prettier is an opinionated code formatter for JavaScript, similar to gofmt for Go and rufo for Ruby. It intentionally exposes fewer options than most devs would like in order to cut down on debates over formatting.

The biggest obstacle to getting Prettier adopted was the team’s doubt as to whether Prettier would do a “good enough” job formatting code, given that configuration options are limited. The best thing we did to gain momentum towards adoption was to point out that companies like Webpack, Babel, and Dropbox have successfully used Prettier for large projects. And once our team had more experience with the tool, they realized that it could do everything we needed it to do.

The second biggest obstacle to adoption was concern about how the formatting would affect git history. We solved this by formatting nearly all files at once. This didn’t lead to any bugs or incidents, and because we used git blame’s –ignore-revs-file option, we haven’t had issues working with our git history since we ran the formatter.

Danger

Danger is a tool that allows you to build automated checks against PR metadata instead of just the code contained in those PRs. For example, for SOC2 compliance, we require that all PRs contain a link to a Jira ticket, a Salesforce support issue, or a Slack message describing why a change is being made. Previously, engineers would point out the lack of these references during CR. Now, Danger does it for us. Another example: We have a check that fails the build if people introduce any new CoffeeScript. This helps facilitate our transition to typescript.

A key lesson we learned was how to organize our danger checks. Many projects seem to favor keeping all rules in one file. This makes the file unwieldy after a few checks are added. Once we worked around Danger’s odd import behavior, we found that we’re better off splitting each of our checks into their own file so that our Dangerfile looks something like this:

We followed the Danger docs recommendation of slowly introducing Danger checks, and we initially made Danger check failures optional to fix. Both of these were good moves, as we had some unexpected bugs with our initial Danger checks.

Custom StyleLint rules StyleLint formats and lints your CSS and CSS-like files (e.g., Sass, LESS, etc.). As we started introducing more tools, we noticed that CSS source code often doesn’t receive the same amount of scrutiny as other source files during CR. For example, we wouldn’t accept the use of magic constants in TypeScript, but we found that this same standard wasn’t consistently applied in our LESS files when we specified colors.

So we wrote a custom StyleLint rule that encourages use of LESS variables instead of hex values. This was easier than we expected. The meat of the rule is only a few lines of code:

Folder-specific PR checklists

Not every issue with our code can be caught via static analysis, but we can still introduce some automation into the manual checks we make on certain PRs. GitHub has PR templates with checklists for this, but in our monorepo, they’re not as useful. GitHub allows you to set a default template to apply to every PR in your repo, but this doesn’t scale.

Not every team has the same checklist items and a noisy PR template is one that’s more likely to be ignored. (Note: You can also pre-fill the PR with the template via a query parameter, but this relies on humans to remember to use a specific url.)

Using the monorepo-diff Buildkite plugin, we built a solution that’s more flexible. Teams can simply create a markdown checklist file and specify in YAML which directories that checklist is associated with:

Whenever a PR lands that touches a file in the tracker directory, the appropriate checklist is automatically added to the PR as a comment.

Results

Since we’ve adopted these tools, our PR process has allowed us to merge more consistent, higher-quality code without some of the “nitpicky” comments people were manually adding to PRs. The result: PR comments that are less trivial and more focused on relevant issues.

We’re still in the early stages of quantifying the impact of these tools since we adopted them late last year, but we can say for sure that Prettier has saved a lot of formatting headaches and that our Danger rules have already caught issues with 168 Pull Requests.

Given the success we’ve had with the tools we’ve adopted so far, we believe that CR automation is well worth our time, and we’ll continue to invest in it. If you have any helpful thoughts on CR automation or developer tooling in general, feel free to reach out to me via @philosohacker on Twitter, and if you’re interested in this kind of work, we’re hiring.

I’d like to give a shout-out to Greg Huels, Howie Benefiel, and Dan Loewenherz for their help with reviewing early drafts of this post, and also Heap’s engineering team for their willingness to experiment, reflect, and iterate on our code review to make it the best that it can be.