These 5 habits will make you a great code reviewer

High-quality code reviews are hard.

They are time-consuming and require significant mental effort.

Code reviewing is also not taught at school, and figuring it out on your own is hard work. In this post, I share five crucial habits all great code reviewers I know have in common.

Make the code review about the code

Code reviews are about the code, not the person who wrote the code. If you think the proposed change is incorrect or have suggestions to improve it, then, by all means, leave your comments (just remember to make them professional and high-quality). However, leave out comments that don’t relate to the code under review.

Understand the code and ask questions when in doubt

Understanding the code under review is the foundation of a solid code review.

It is also the most difficult part.

Whenever you have doubts about proposed changes, you should ask the author for clarification. You might not be aware of an assumption the author is making or need help understanding how their changes fit in. However, the difficulty in grasping the changes is also often a sign of a mistake.

Asking a question will prompt the author to explain their thought process. As a result, they will either answer the question and clear up your doubts or realize that something is indeed wrong and needs to be fixed.

Be clear about your expectations

Code reviews can generate a wide variety of comments. Some are nits that you would like to see fixed but are not real issues. Some, however, identify serious problems that must be corrected before merging. If you leave a comment, make sure that the author understands which category it falls into. Doing this will save you and the PR author time and frustration.

Sometimes, you may take on a PR that is outside of your area of expertise. You may realize it only after you already left some comments. You are now in a weird situation: the author expects you to finish the review and approve the PR, but you don’t feel confident you can. If this happens, instead of accepting the change you don’t understand, it is better to leave a comment recommending the author get a review from someone more familiar with the code they are changing.

Side note: reviewing code that is outside of your area of expertise is a good thing. Even if you don’t understand the change enough to approve it, you can still provide useful feedback, identify bugs, and learn something. Just make sure the author does not expect to get your approval.

Cross-check with the existing code

Code reviews show only code that changed. Most of the time, it is sufficient. But sometimes, to understand the change fully, you need to check how the new code works with the code not included in the review because it hasn’t changed.

This idea may be obvious to most developers, but I was surprised to meet some who had never considered checking the existing code when reviewing PRs. In my experience, the biggest surprises are caused by not what’s included in the PR but by what’s missing.

Occasionally, examining the existing code may not give you all the answers. The ultimate weapon for these situations is to use a debugger. You can check out the PR branch and step through the code. It should resolve all your doubts. I resort to a debugger very rarely. It’s almost always easier and faster to ask the author.

Resist the “stamp” pressure

The pressure to merge changes quickly for projects on tight timelines is high. And it only grows as the deadline nears.

During the end game, engineers enter a Pull Request frenzy. Eventually, due to the number of PRs, code reviews often become a bottleneck. So, the engineers try to unblock themselves by asking to “stamp the diff” (i.e., approve changes without looking).

On the one hand it is understandable – no one wants to miss the deadline. On the other hand, these are the times when code changes need even more scrutiny than usual. Due to the time pressure, most PRs are coded very hastily. The changes may not be validated thoroughly (or at all) and the stress only increases the likelihood of mistakes.

While a proper code review takes time, merging code with issues a review could have caught is more costly. At best, fixing the problem will require sending a new PR (which, by the way, will trigger a review). At worst, an embarrassing bug ships to customers.

I remember when one of my teams worked extremely hard to finish a project on time. We were very close, and then one of the team members dropped a 1000-line PR a few hours before the deadline. The manager was trying hard to find someone willing to approve the PR. It wasn’t easy because the PR had a bunch of red flags, like spotty test coverage or many TODO comments, but he eventually succeeded. As soon as our product shipped, we started getting reports from angry customers complaining that important scenarios stopped working. We found that the hastily merged PR was the culprit. The team scrambled to fix the issues, but the damage was done. This one PR cost us the reputation we’d been building for a long time.

On-call Manual: How to cope with on-call anxiety

On-call anxiety is real. I’ve been there, and I know engineers who experienced it. Many factors contribute to it, but from my experience, three stand out.

1. Unpredictability

Unpredictability is the number one reason for on-call anxiety. You might be responsible for a wide range of services. They may break anytime for various reasons like network issues, deployments, failing dependencies, shared infra outages, data center drains (a.k.a. storms), excavators damaging etc. On-calls, especially new ones, worry they won’t know what to do if they get an alert. How would they figure out what broke? How would they come up with a fix?

What to do about it?

With experience, the unpredictability aspect of the on-call gets easier. But even for the most seasoned on-call engineers, handling an outage can be difficult without the proper tools like:

Easy-to-navigate dashboards that allow to tell quickly if a service is working correctly and identify problematic areas in case of failures
Playbooks (a.k.a. runbooks) explaining troubleshooting and mitigation steps
Documentation describing the service and its dependencies, including the relevant on-call rotations to reach out if necessary

Having a team eager to jump in and help mitigate an outage quickly is priceless. Your team members understand some areas better than you. Knowing they have your back is reassuring.

2. Too many alerts and incidents

The second most common reason engineers fear their on-call is a never-ending litany of alerts, requests, and tasks. If you get a new alert when you barely finished acknowledging a previous one and are also expected to handle customer tickets and deal with requests from other teams, fretting your on-call is understandable. The exhaustion is usually exacerbated by the feeling of not doing a decent job. I was on a rotation like this once. After a while, I realized that everyone, not only me, was overwhelmed. Even though we toiled long hours, most alerts were ignored, customer tickets remained answered, and requests from other teams were only handled after they escalated them to the manager.

What to do about it?

There is no way a single person can fix a very heavy on-call by themselves. They won’t have the time during their shift, and by the time the shift ends, they will be so fed up that they won’t want to hear about anything on-call-related. There are, however, a few low-hanging fruits that can help improve the quality of the on-call quickly:

Delete alerts – find routinely ignored alerts and determine if they’re useful. If they aren’t – delete them.
Tune noisy but useful alerts – adjust thresholds and windows for flapping alerts, alerts that fire prematurely, and short-lived alerts.
Get a secondary on-call – a second person could help handle tasks the primary on-call does not have the capacity to deal with (e.g., customer tickets). This could be only temporary.

These ideas can alleviate on-call pain but are unlikely to fix a bad on-call for good. Improving a heavy on-call requires identifying and addressing problems at their source and demands effort from the entire team to maintain on-call quality. I wrote a post dedicated to this topic. Take a look.

3. Middle-of-the-night alerts

Many on-call rotations are 24/7. The on-call is responsible for dealing with incidents promptly, even if they happen in the middle of the night. Waking to an alert is not fun, and if it happens regularly, it is a valid reason to resent being on-call.

What to do about it?

While it may not be possible to avoid all middle-of-the-night alerts, there might be some actions you can take to reduce the disruption. A lot will depend on your specific situation, but here are some ideas:

Check your dashboards in the evening and address any issues that could raise an alert.
Increase alert thresholds outside working hours. If your traffic is cyclical – e.g., you have much lower traffic at night because most requests come from one timezone – you may be able to relax thresholds outside working hours. Even if an incident happens, its impact will be smaller. Also, alerts get much noisier if the traffic volume is low (e.g., if you get ten requests in an hour and one fails, you might get an alert due to a 10% error rate).
Disable alerts at night. Some outages won’t cause any impact unless they last for a long time. For instance, our team was responsible for a service that would work fine even if one of its dependencies was down for a day. This 24-hour grace period allowed us to turn off alerts at night.

On-Call manual: Onboarding a new person to the on-call rotation

One (selfish) reason to celebrate a new team member is that they will eventually join the on-call rotation. And when they do, the existing shifts will move farther apart. However, adding an unprepared engineer to the on-call rotation can be a disaster. This post describes what on-call onboarding looks like on our team.

The on-call onboarding process is the same for each new team member. It consists of the following steps:

Regular ramp-up
On-call overview
Shadow shift
Reverse shadow shift
First solo shift

Let’s look into each of these steps in more detail.

Regular ramp-up

The regular ramp-up aims to help new team members familiarize themselves with the problems the team is solving and teach them how to work effectively in the team’s codebase. We want new colleagues to work on the code they will be responsible for when they are on call later. This approach allows them to acquire basic context that will be useful for maintaining this code and troubleshooting issues.

On-call overview

Regular ramp-up is rarely sufficient for new people to grasp the entire infra the team is responsible for. And knowing this infra is just the tip of the iceberg. There is much more an effective on-call needs to be familiar with, for instance:

what are the dependencies, and what is the impact of their failures
how to find dashboards and use them for debugging
where to find the documentation (e.g., runbooks)
expectations, e.g., is the on-call responsible for alerts raised outside working hours
how to do deployments and rollbacks
tools used to troubleshoot and fix issues
standard operating procedures
and more

On our team, we organize knowledge-sharing sessions that give new team members an overview of all these areas. We record these sessions to make revisiting unclear topics easy.

Shadow on-call shift

During the shadow on-call shift, the on-call-in-training (a.k.a. secondary on-call) shadows an experienced on-call (a.k.a. primary on-call). Both on-calls are subscribed to all tasks and alerts, but resolving issues is the primary on-call’s responsibility. The primary on-call is expected to show the secondary on-call how to deal with outages. This is usually limited to problems occurring during working hours. Finally, the primary on-call can ask the secondary on-call to handle non-critical tasks, providing guidance as needed.

Reverse shadow on-call shift

After the shadow shift, things get real: the on-call in training becomes the primary on-call. They are now responsible for handling all alerts, tasks, deployments, etc. However, they are not alone—they have an experienced on-call having their back during the entire shift.

We schedule shadow and reverse shadow shifts back-to-back. This way, everything the on-call-in-training learned during the first shift is fresh when they become the primary on-call.

First solo shift

Once shadowing is complete, we add the new team member to the on-call rotation. We add them to the queue’s end, giving them additional time to learn more about our systems and the infrastructure.

In addition to training new on-calls, our team maintains a chat to discuss on-call problems and get help when resolving issues. Both new and experienced on-calls regularly use this chat when they are stuck because they know someone will be there to help them.

On-call Manual: Boost your career by improving your team’s on-call

I have yet to find a team maintaining critical systems that is happy with its on-call. Most engineers dread their on-call shifts and want to forget about on-call as soon as their shift ends. For some, hectic on-call shifts are the reason to leave the team or even the company.

But this is great news for you. All these factors make improving on-call a great career opportunity. Here are a few reasons:

Team-wide impact. Making the on-call better increases work satisfaction for everyone on the team.
Finding work is easy. No on-call is perfect. There’s always something to fix.
No competition. Most engineers consider work related to on-call uninteresting, so you can fully own the entire area. As a result, your scope might be bigger than any other development work you own.

Getting started

It is difficult to propose meaningful improvements to your team’s on-call before your first shift. You need to become familiar with your team’s on-call responsibilities and problems before trying to make it better.

Once you have a few shifts under your belt, you should know the most problematic areas. Come up with a few concrete actions to remedy the biggest issues. This list doesn’t have to be complete to get started. Some examples include tuning (or deleting) the noisiest alerts, refactoring fragile code, or automating time-consuming manual tasks.

Talk to your manager about the improvements you want to make. No manager who cares about their team would refuse the offer to improve the team’s on-call. If the timing is not right (e.g., your team is closing a big release), ask your manager when a better time would be. Mention that you may need their help to ensure the participation of all team members.

Set your expectations right. Despite the improvements, don’t expect your team members to suddenly start loving their on-call. It’s a win if they stop dreading it.

Execution

From my experience, the two most effective ways to improve the on-call is to have regular (e.g., twice a year) fixathons combined with ongoing maintenance.

During a fixathon, the entire team spends a few days fixing the biggest on-call issues. In most cases, these will be issues that started occurring since the previous fixathon but weren’t taken care of by on-calls during their shifts. You may need to work closely with your manager to ensure the entire team’s participation, especially at the beginning.

Ongoing maintenance involves fixing problems as they arise, usually done by the person on call. As some shifts are heavier than others, the on-call may not always be able to address all issues.

Your role

Before talking about what your role is, let’s talk about what your role isn’t.

Your role isn’t to single-handedly fix all on-call issues.

This approach doesn’t scale. If you try it, you will eventually burn out, struggling to do two full-time jobs simultaneously: your regular responsibilities and fixing on-call issues. The worst part is that your team members won’t feel responsible for maintaining the on-call quality. They might even care less because now somebody is fixing issues for them.

While you should still participate in fixing on-call issues, your main role is to:

organize fixathons – identify the most pressing issues and distribute issues for the team to work on, track progress, and measure the improvement
ensure on-calls are addressing issues they encountered during their shifts
build tools – e.g., dashboards to monitor the quality of the on-call or queries that allow to identify the biggest problems quickly

If you do this consistently, your team members will eventually find fixing on-call issues natural.

Skills you will learn

Driving on-call improvements will help you hone a few skills that are key for successful senior and even staff engineers:

leading without authority – as the owner of the on-call improvement area you’re responsible for coming up with the plan and leading its execution
scaling through others – because you involve the entire team, you can get much more done than if you did it yourself
influencing the engineering culture of the team – ingraining a sense of responsibility for the on-call quality in team members is an impactful change
holding people accountable – making sure everyone does their part is always a challenge
identifying problems worth solving – instead of being told what problems to solve, you are responsible for finding these problems and deciding if they are worth solving

Expanding your scope

Once you start seeing the results of your work, you can take it further to expand your scope.

You can become the engineer who manages the on-call rotation for your team. This work doesn’t take a lot of time but can save a lot of headaches for your manager. The typical responsibilities include:

managing the on-call schedule
organizing onboarding new team members to the on-call rotation
helping figure out shift swaps and substitutions

Another way to increase your scope is to share your experience with other teams. You can organize talks showing what you did, the results you achieved, and what worked and what didn’t. You can also generalize the tools you built so that other teams can use them.

“This code is s**t!” and Other Mistakes Code Reviewers Make

Code reviews are a standard practice in software development. Their purpose is to have another pair of eyes examine the code to catch issues before they affect users and to provide feedback that helps make the code cleaner and easier to understand.

While the goals of code reviews are noble, the experience for many developers is often less than stellar. There are many contributing factors, but one that stands out is how code reviewers approach and conduct code reviews.

Here are the top 5 mistakes I’ve seen code reviewers make (and I had made myself) that left fellow, often junior, developers discouraged and frustrated with the code review process.

Approving a PR without understanding the change

Many developers approve PRs very quickly, looking at the code only barely or not at all. While the speed matters, this attitude leads to problems:

bugs that could be identified during code reviews are missed, reach production, and impact users
the quality of the code deteriorates over time, making implementing new features more challenging
both the code reviewer and the author miss the opportunity to learn something new from the PR

Proper code review requires time, effort, and, often, additional context. I sometimes realize that a change I started reviewing requires more time than I can afford or that I don’t have enough context to understand it fully. If this happens, I will still review the change as best as I can, but I will let the author know that I can’t sign off on it.

“If you approve it, you’re responsible for it” was one piece of advice I received that completely changed my perspective on carelessly approving PRs.

Unprofessional feedback

Code reviews should be all about code. Sadly, they sometimes become personal attacks with harsh or condescending comments. This kind of “feedback” usually extends beyond code reviews and leads to a toxic team culture.

Even if the code sent for review has multiple issues, comments like: “this code is s**t!” are not helpful. Explaining the problems and suggesting solutions is a much more effective approach. Talking to the author is even more effective.

Too much focus on less important details

Flooding a PR with nitpicky comments is not good feedback. Not only is it borderline passive-aggressive behavior, but these comments can also drown out ones that raise important issues.

One great example is comments about code formatting. Asking the author to adhere to the Coding Style Guidelines adopted by the team is one thing, but commenting on each single incorrect indentation or misplaced parenthesis is not OK. Fortunately, this entire class of arguments can be easily avoided by integrating a code formatting tool. Not only will the tool end the petty arguments about code formatting, but it will also allow developers to focus on what’s important.

(I wrote about this in more detail here: The downsides of an inconsistent codebase and what you can do about it.)

Unclear or unactionable feedback

Comments like: “I am sure it can be done better” are not useful. They leave the author clueless about the reviewer’s expectations and the improvements they expect. In the best case, the author will ignore the feedback. In the worst case, they will try guessing what the reviewer meant and iterate on the code, often unnecessarily.

From my experience, illustrating comments with code suggestions is one of the clearest and most effective code review feedback.

Delaying code reviews

One of the most common complaints about code reviews is that they significantly slow software development. The most common reasons are:

reviewers are not picking up PRs for review
reviewers are not responding after the author addressed the feedback
changes are flooded with comments on minor issues, and resolving them requires many iterations

Assuming a PR is not intentionally blocked due to a serious concern, delaying reviewing it can be frustrating for the author. Often, reviewers are simply busy with their work and don’t have time to review someone else’s changes. But this is a double-edged sword—eventually, they will want someone to review their changes, and they shouldn’t expect quick reviews if they don’t review PRs promptly.

Sometimes, it is a matter of being better organized. Blocking half an hour daily on the calendar for code reviews should help team members move faster.

That being said, if your PRs are not getting reviewed, there may be something you can do about this. Check out my post here: 7 Tips To Accelerate Your Code Reviews.