A simple way to ship maintainable software

This was my first solo on-call shift on my new team. I was almost ready to go home when a Critical alert fired. I acknowledged it almost instantly and started troubleshooting. But this was not going well. Wherever I turned, I hit a roadblock. The alert runbook was empty. The dashboards didn’t work. And I couldn’t see any logs because logging was disabled.

Some team members were still around, and I turned to them for help. I learned that the impacted service shipped merely a week before, and barely anyone knew how it worked. The person who wrote and shipped it was on sick leave.

It took us a few hours to figure out what was happening and to mitigate the outage. This work made one thing apparent – this service was not ready for the prime time.

In the week following the incident, we filled the gaps we had found during the outage. Our main goal was to ensure that future on-calls wouldn’t have to scramble when encountering issues with this service.

But the bigger question left unanswered was: how can we avoid similar issues with any new service or feature we will ship in the future?

The idea we came up with was the Service Readiness Checklist.

What is the Service Readiness Checklist?

The Readiness Checklist is a checklist that contains requirements each service (or a bigger feature) needs to meet to be considered ready to ship. It serves two purposes:

  • to guarantee that none of the aspects related to operating the service have been forgotten
  • to make it clear who is responsible for ensuring that requirements have been reviewed and met

When we are close to shipping, we create a task that contains a copy of the readiness checklist and assign it to the engineer driving the project. They become responsible for ensuring all requirements on the checklist.

Having one engineer responsible for the checklist helps avoid situations where some requirements fall through the cracks because everyone thought someone else was taking care of them. The primary job of this engineer is to ensure all checkboxes are checked. They may do the work themselves if they choose to or assign items to people involved in the project and coordinate the work.

Occasionally, the checklist owner may decide that some requirements are inapplicable. For example, the checklist may call for setting up deployment, but there is nothing to do if the existing deployment infrastructure automatically covers it.

The checklist will usually contain more than ten requirements. They are all obvious, but it is easy to miss some just because of how many there are.

Example readiness checklist

There is no single readiness checklist that would work for every team because each team operates differently. They all follow different processes and have their own ways of running their code and detecting and troubleshooting outages. There is, however, a common subset of requirements that can be a starting point for a team-specific readiness checklist:

  • [ ] Has the service/feature been introduced to the on-call?
  • [ ] Has sufficient documentation been created for the service? Does it contain information about dependencies, including the on-calls who own them?
  • [ ] Does the service have working dashboards?
  • [ ] Have alerts been created and tested?
  • [ ] Does the service/feature have runbooks (a.k.a. playbooks)?
  • [ ] Has the service been load tested?
  • [ ] Is logging for the service/feature enabled at the appropriate level?
  • [ ] Is automated deployment configured?
  • [ ] Does the service/feature have sufficient test coverage?
  • [ ] Has a rollout plan been developed?

Success story

Our team was tasked to solve a relatively big problem on tight timelines. The solution required building a pipeline of a few services. Because we didn’t have enough people to implement this infrastructure within the allotted amount of time, we asked for help. Soon after, a few engineers temporarily joined our team. We were worried, however, that this partnership may not work out because of the differences in our engineering cultures. The Service Readiness Checklist was one of the things (others included coding guidelines, interface-based programming, etc.) that helped set clear expectations. With both teams on the same page, the collaboration was smooth, and we shipped the project on time.

These 5 habits will make you a great code reviewer

High-quality code reviews are hard.

They are time-consuming and require significant mental effort.

Code reviewing is also not taught at school, and figuring it out on your own is hard work. In this post, I share five crucial habits all great code reviewers I know have in common.

Make the code review about the code

Code reviews are about the code, not the person who wrote the code. If you think the proposed change is incorrect or have suggestions to improve it, then, by all means, leave your comments (just remember to make them professional and high-quality). However, leave out comments that don’t relate to the code under review.

Understand the code and ask questions when in doubt

Understanding the code under review is the foundation of a solid code review.

It is also the most difficult part.

Whenever you have doubts about proposed changes, you should ask the author for clarification. You might not be aware of an assumption the author is making or need help understanding how their changes fit in. However, the difficulty in grasping the changes is also often a sign of a mistake.

Asking a question will prompt the author to explain their thought process. As a result, they will either answer the question and clear up your doubts or realize that something is indeed wrong and needs to be fixed.

Be clear about your expectations

Code reviews can generate a wide variety of comments. Some are nits that you would like to see fixed but are not real issues. Some, however, identify serious problems that must be corrected before merging. If you leave a comment, make sure that the author understands which category it falls into. Doing this will save you and the PR author time and frustration.

Sometimes, you may take on a PR that is outside of your area of expertise. You may realize it only after you already left some comments. You are now in a weird situation: the author expects you to finish the review and approve the PR, but you don’t feel confident you can. If this happens, instead of accepting the change you don’t understand, it is better to leave a comment recommending the author get a review from someone more familiar with the code they are changing.

Side note: reviewing code that is outside of your area of expertise is a good thing. Even if you don’t understand the change enough to approve it, you can still provide useful feedback, identify bugs, and learn something. Just make sure the author does not expect to get your approval.

Cross-check with the existing code

Code reviews show only code that changed. Most of the time, it is sufficient. But sometimes, to understand the change fully, you need to check how the new code works with the code not included in the review because it hasn’t changed.

This idea may be obvious to most developers, but I was surprised to meet some who had never considered checking the existing code when reviewing PRs. In my experience, the biggest surprises are caused by not what’s included in the PR but by what’s missing.

Occasionally, examining the existing code may not give you all the answers. The ultimate weapon for these situations is to use a debugger. You can check out the PR branch and step through the code. It should resolve all your doubts. I resort to a debugger very rarely. It’s almost always easier and faster to ask the author.

Resist the “stamp” pressure

The pressure to merge changes quickly for projects on tight timelines is high. And it only grows as the deadline nears.

During the end game, engineers enter a Pull Request frenzy. Eventually, due to the number of PRs, code reviews often become a bottleneck. So, the engineers try to unblock themselves by asking to “stamp the diff” (i.e., approve changes without looking).

On the one hand it is understandable – no one wants to miss the deadline. On the other hand, these are the times when code changes need even more scrutiny than usual. Due to the time pressure, most PRs are coded very hastily. The changes may not be validated thoroughly (or at all) and the stress only increases the likelihood of mistakes.

While a proper code review takes time, merging code with issues a review could have caught is more costly. At best, fixing the problem will require sending a new PR (which, by the way, will trigger a review). At worst, an embarrassing bug ships to customers.

I remember when one of my teams worked extremely hard to finish a project on time. We were very close, and then one of the team members dropped a 1000-line PR a few hours before the deadline. The manager was trying hard to find someone willing to approve the PR. It wasn’t easy because the PR had a bunch of red flags, like spotty test coverage or many TODO comments, but he eventually succeeded. As soon as our product shipped, we started getting reports from angry customers complaining that important scenarios stopped working. We found that the hastily merged PR was the culprit. The team scrambled to fix the issues, but the damage was done. This one PR cost us the reputation we’d been building for a long time.

How to improve your coding skills (without spending a lot of time)

Every developer wants to get better at coding. But they sometimes don’t know how and feel stuck. The good news is that there are a few simple ways to improve coding skills that many developers neglect. Most can be done as part of a regular job, so they don’t require additional time. And doing them consistently can make you a better coder.

Code reviews

Code reviews are the easiest way to learn from others. Unfortunately, many developers treat code reviews as a chore. They want to spend as little time as possible on them. This way, they lose great learning opportunities.

What makes code reviews special is their interactivity. They allow asking the author about their choices and discuss alternatives they considered. Other participants often leave interesting comments or propose novel alternative solutions.

Reading other developers’ code

I learned a lot about programming by reading code written by other developers. I often want to know how a feature or library I use works. Usually, the fastest way to get this information is by inspecting its code.

For instance, TypeScript supported some features, e.g., async/await, before they were added to JavaScript. I was curious how it was possible. So, I wrote short TypeScript snippets and checked how they were transpiled.

Your company’s codebase is another great resource to learn from. When I get stuck, I often search my company repository to check how other developers solved a similar problem. Our repo is big, so if I can’t find anything useful, I am almost sure what I am trying to do is questionable.

I use the same strategy for my side projects but search GitHub.

If you find reading other developers’ code challenging (I sure did), take a look at this post.

Debugging

Stepping through the code, analyzing the stack trace, and inspecting variables will allow you to understand important nuances and easy-to-miss details much deeper. I often fire a debugger if I can’t answer all my questions after reading the code.

“Borrowing” code

Let’s be honest. Not all code needs to be written from scratch. Sometimes, we just need a boilerplate. But sometimes, we don’t know how to solve a problem. In these cases, copying and adapting code is often faster (and easier). It could be the code you wrote in the past or someone else’s code, e.g., copied from StackOverflow (I have yet to find a software developer who never copied code from StackOverflow.) AI-powered programming tools are built on this idea. Tools like Github Copilot ask you to constantly vet and adapt the code they generate. Here is the thing, though. You’ll learn nothing if you don’t try to understand why the code you copied works or can’t correctly adapt it.

Programming contests

Advent of Code taught me a lot. It is a light programming contest that takes place every December and consists of a series of small programming puzzles that can be solved in any programming language. I find it an excellent way to keep my coding skills sharp.

Solving Advent of Code problems is a good exercise, but examining other participants’ solutions is where the real learnings are. And quite frankly, it can be a humbling experience. The different ways and techniques the participants use to solve the problems can be astonishing. I remember being proud of my ultra-short, 30-line-long solution, only to see someone else solve the same problem in the same programming language with just two lines of code because they used a clever idea.

Do code reviews find bugs?

I recently overheard this: “Code reviews are a waste of time – they don’t find bugs! We should rely on our tests and skip all this code review mambo-jumbo.”

And I agree – tests are important. But they won’t identify many issues code reviews will. Here are a few examples.

Unwanted dependencies

Developers often add dependencies they could easily do without. Bringing in libraries that are megabytes in size only to use one small function or relying on packages like true may not be worth the cost. Code reviews are a great opportunity to spot new dependencies and discuss the value they bring.

Potential performance issues

In my experience, most automated tests use only basic test inputs. For instance, tests for code that operates on arrays rarely use arrays with more than a few items. These inputs might be sufficient to test the basic functionality but won’t put the code under stress.

Code reviews allow spotting suboptimal algorithms whose execution time rapidly grows with the input size or scenarios prone to combinatorial explosion.

The latter bit our team not too long ago. When reviewing a code change, one of the reviewers mentioned the possibility of a combinatorial explosion. After discussing it with the author, they concluded it would never happen. Fast forward a few weeks, and our service occasionally uses 100% CPU before it crashes due to running out of memory. Guess what? The hypothetical scenario did happen. Had we analyzed the concern mentioned in the code review more thoroughly, we would’ve avoided the problem completely.

Code complexity and readability

Computers execute all code with the same ease. They don’t care what it looks like. Humans are different. The more complex the code, the harder it is to understand and correctly modify. Code review is the best time to identify code that will become a maintenance nightmare due to its complexity and poor readability.

Missing test coverage

The purpose of automated tests is to flag bugs and regressions. But how do we ensure that these tests exist in the first place? Through a code review! If test coverage for a proposed change is insufficient, it is usually enough to ask in a code review for improving it.

Bugs? Yes, sometimes.

Code reviews do find bugs. They don’t find all of them, but any bug caught before it reaches the repository is a win. Code reviews are one of the first stages in the software development process making them the earliest chance to catch bugs. And the sooner a bug is found, the cheaper it is to fix.

Conclusion

Tests are not a replacement for code reviews. However, code reviews are also not a replacement for tests. These are two different tools. Even though there might be some overlap, they have different purposes. Having both helps maintain high-quality code.

Storytime

One memorable issue I found through a code review was the questionable use of regular expressions. My experience taught me to be careful with regular expressions because even innocent-looking ones can lead to serious performance issues. But when I saw the proposed change, I was speechless: the code generated regular expressions on the fly in a loop and executed them.

At that time, I was nineteen years into my Software Engineering career (including 11 years at Microsoft), and I hadn’t seen a problem whose solution would require generating regular expressions on the fly. I didn’t even completely understand what the code did, but I was convinced this code should never be checked in (despite passing tests). After digging deeper into the problem, we found that a single for loop with two indices could solve it. If not for the code review, this change would’ve made it to millions of phones, including, perhaps, even yours, because this app has hundreds of millions of downloads across iOS and Android.

The paradox of test coverage

When I learn that code owned by a team has low test coverage, I expect “here be dragons.” But I never know what to expect if the code coverage is high. I call this a paradox of high test coverage.

High test coverage does not tell much about the quality of unit tests. Low coverage does.

The low coverage argument is self-explanatory. If tests cover only a small portion of the product code, they cannot prevent bugs in the code that is not covered. The opposite is, however, not true: high test coverage does not guarantee a quality product. How is this possible?

Test issues

While unit tests ensure the quality of the product code, nothing, except the developer, ensures the quality of the unit tests. As a result, tests sometimes have issues that allow bugs to sneak in. Finding unit test issues is more luck than science. It usually happens by accident—usually when tests continue to pass despite code changes that should trigger test failures.

One of the simplest examples of a unit test issue is missing asserts. Tests without asserts are unlikely to flag issues. Other common problems include incorrect setup and bugs caused by copying existing tests and incorrectly adapting them to test a new scenario.

Mocking issues

Mocking allows the code under test to be isolated from its dependencies and simulate the dependency behavior. However, when the simulation is incorrect or the behavior of the dependency changes, tests may happily pass, hiding serious issues.

I’ve been working with C++ code bases, and I often see developers assume, without confirming, that a dependency they use won’t throw an exception. So, when they mock this dependency, they forget about the exception case. Even though their tests cover all the code, an exception in production takes the entire service down.

Uncovered code

Getting to 100% code coverage is usually impractical, if not impossible. As a result, a small amount of code is still not covered. Similar to the low coverage scenarios, any change to the code that is not covered can introduce a bug that won’t be detected.

Chasing the coverage number

Test coverage is only a metric. I’ve seen teams do whatever it takes to achieve the metric goal, especially if it was mandated externally, e.g., at the organization or company level. Occasionally, I encountered teams that wrote “test” code whose primary purpose was increasing coverage. Detecting or preventing bugs was a non-goal.

Low test coverage is only the tip of the iceberg

At first sight, low test coverage seems a benign issue. But it often signals bigger problems the team is facing, like:

  • spending a significant amount of time fixing regressions
  • shipping high-quality new features is slow due to excessive manual validation
  • many bugs reach production and are only caught and reported by users
  • the on-call, if the team has one, is challenging
  • the engineering culture of the team is poor, or the team is under pressure to ship new features at an unsustainable pace
  • the code is not very well organized and might be hard to work with, only slowing down the development even further
  • test coverage is likely lower than admitted to and will continue to deteriorate

I’ve worked on a few teams where developers understood the value of unit testing. They treated test code like product code and never sent a PR without unit tests. Because of this, even if they experienced the problems listed above, it was at a much smaller scale. They also never needed to worry about meeting the test coverage goals – they achieved them as a side effect.