How to quickly ramp up on new codebases

Joining a new team is intimidating. There is always much to learn – new people, new processes, and a new code base.

Ramping up on your new team’s code base is difficult, but knowing how to work effectively in it is crucial to your success. So, how do you do it quickly?

My mid- and early senior developer years were intense. Due to a mix of reorgs and personal interests, I found myself on a new team every year or so. As a result, I had to learn new codebases in quick succession. They included .NET System.Xml, OData, Entity Framework, Entity Framework Designer, ASP.Net SignalR, ASP.Net Core, and the Alexa mobile app, and most of them were over one hundred thousand lines.

The first couple of transitions were slow and overwhelming. But they helped me develop a strategy I used later to quickly get up to speed on new codebases.

Get your hands dirty ASAP

The sooner you start working actively with the code base, the sooner you will get productive. Reading documentation can be helpful, but nothing can replace the hands-on experience.

When I joined Amazon, I asked my manager to assign me a simple bug to fix on my first day. Even though the fix amounted to a single line of code, it took me a few days to send it for review. It might seem long, but this quest was not about fixing the bug. Rather, it was about forcing myself to set up my development environment, learn how to build and run the code, write unit tests, and debug our product.

Review code

Each team has its way of letting members know there is a new PR (Pull Request) for review. It could be email, chat, or review tool notifications. Whatever it is, subscribe to this channel and start reviewing PRs. You won’t understand much initially, but you may still catch some bugs (e.g., off-by-one errors). More importantly, these reviews will allow you to ask questions and gather a broader context of what the team is working on.

Identify code that matters

The 80/20 rule does apply to codebases. This is especially noticeable in the bigger ones, where most code changes developers make are concentrated in one area. Knowing which code it is allows you to focus your ramp-up on the area you are likely to work on soon.

There are a couple of easy ways to tell which code is in the top 20%:

  • paying attention to code reviews
  • checking the commit history

Take notes, draw diagrams

I discovered that taking notes and drawing diagrams is a very effective way to grok the most complex parts of the code. Call graphs, class diagrams, and dependency graphs all help organize the information and are great reference material in case you need to refresh your memory.

After I joined Amazon, I drew diagrams of a few areas of code I couldn’t understand. They became an instant hit. Even people who had been on the team for much longer than me wanted a copy.

Debug code

Reading a new codebase is like reading a book in a language you barely know. You can do it, but it is excruciating.

For me, one of the best ways to overcome this is to step through the code with the debugger. Initially, I have no idea what I am looking at. But if I follow the same code path a few times, I begin to recognize code I have already seen, and soon, everything starts to fall into place.

To get the most out of my debugging sessions, I do two things:

  • I continuously inspect the state – the stack trace, parameters, and local and class variables
  • I take notes and draw diagrams

Using the debugger to learn the code base is slower and narrower than reading the code. But it is also much deeper. In fact, when debugging, I often realize that the understanding I gained from reading the code was incomplete or sometimes even incorrect.

Read documentation

Reading documentation can help accelerate your onboarding. It could be especially useful when it comes to the high-level architecture and concepts that are hard to deduce from code.

However, you should take documentation with a grain of salt. It is often sparse and outdated. But this could be good news for you. Updating the documentation as part of your ramp-up could be a great contribution to your new team.

Join on-call rotation

Joining the on-call rotation to accelerate team onboarding might sound extreme but I did it on my first team at Facebook. I did it because I was concerned that my ramp-up was slow, so I decided to push myself. I learned more about our services this week than in weeks before. After my shift ended, I knew what services our team owned, their dependencies and where to find their code. Alerts immediately pointed me to the hot code paths, and troubleshooting issues forced me to dig into the code.

These 5 habits will make you a great code reviewer

High-quality code reviews are hard.

They are time-consuming and require significant mental effort.

Code reviewing is also not taught at school, and figuring it out on your own is hard work. In this post, I share five crucial habits all great code reviewers I know have in common.

Make the code review about the code

Code reviews are about the code, not the person who wrote the code. If you think the proposed change is incorrect or have suggestions to improve it, then, by all means, leave your comments (just remember to make them professional and high-quality). However, leave out comments that don’t relate to the code under review.

Understand the code and ask questions when in doubt

Understanding the code under review is the foundation of a solid code review.

It is also the most difficult part.

Whenever you have doubts about proposed changes, you should ask the author for clarification. You might not be aware of an assumption the author is making or need help understanding how their changes fit in. However, the difficulty in grasping the changes is also often a sign of a mistake.

Asking a question will prompt the author to explain their thought process. As a result, they will either answer the question and clear up your doubts or realize that something is indeed wrong and needs to be fixed.

Be clear about your expectations

Code reviews can generate a wide variety of comments. Some are nits that you would like to see fixed but are not real issues. Some, however, identify serious problems that must be corrected before merging. If you leave a comment, make sure that the author understands which category it falls into. Doing this will save you and the PR author time and frustration.

Sometimes, you may take on a PR that is outside of your area of expertise. You may realize it only after you already left some comments. You are now in a weird situation: the author expects you to finish the review and approve the PR, but you don’t feel confident you can. If this happens, instead of accepting the change you don’t understand, it is better to leave a comment recommending the author get a review from someone more familiar with the code they are changing.

Side note: reviewing code that is outside of your area of expertise is a good thing. Even if you don’t understand the change enough to approve it, you can still provide useful feedback, identify bugs, and learn something. Just make sure the author does not expect to get your approval.

Cross-check with the existing code

Code reviews show only code that changed. Most of the time, it is sufficient. But sometimes, to understand the change fully, you need to check how the new code works with the code not included in the review because it hasn’t changed.

This idea may be obvious to most developers, but I was surprised to meet some who had never considered checking the existing code when reviewing PRs. In my experience, the biggest surprises are caused by not what’s included in the PR but by what’s missing.

Occasionally, examining the existing code may not give you all the answers. The ultimate weapon for these situations is to use a debugger. You can check out the PR branch and step through the code. It should resolve all your doubts. I resort to a debugger very rarely. It’s almost always easier and faster to ask the author.

Resist the “stamp” pressure

The pressure to merge changes quickly for projects on tight timelines is high. And it only grows as the deadline nears.

During the end game, engineers enter a Pull Request frenzy. Eventually, due to the number of PRs, code reviews often become a bottleneck. So, the engineers try to unblock themselves by asking to “stamp the diff” (i.e., approve changes without looking).

On the one hand it is understandable – no one wants to miss the deadline. On the other hand, these are the times when code changes need even more scrutiny than usual. Due to the time pressure, most PRs are coded very hastily. The changes may not be validated thoroughly (or at all) and the stress only increases the likelihood of mistakes.

While a proper code review takes time, merging code with issues a review could have caught is more costly. At best, fixing the problem will require sending a new PR (which, by the way, will trigger a review). At worst, an embarrassing bug ships to customers.

I remember when one of my teams worked extremely hard to finish a project on time. We were very close, and then one of the team members dropped a 1000-line PR a few hours before the deadline. The manager was trying hard to find someone willing to approve the PR. It wasn’t easy because the PR had a bunch of red flags, like spotty test coverage or many TODO comments, but he eventually succeeded. As soon as our product shipped, we started getting reports from angry customers complaining that important scenarios stopped working. We found that the hastily merged PR was the culprit. The team scrambled to fix the issues, but the damage was done. This one PR cost us the reputation we’d been building for a long time.

What it is like to work in Meta’s (Facebook’s) monorepo

I love monorepos! Or at least I love Meta’s (Facebook’s) monorepo, which happens to be the only monorepo I have ever worked with. Here is why.

Easy access to all code

Meta’s monorepo contains most of the company’s code. Any developer working at Meta has access to it. We can search it, read it, and check the commit history. We also can, and frequently do, modify code code managed by other teams.

This easy access to all the code is great for the developer’s productivity. Engineers can understand their dependencies deeper, debug issues across the entire stack, and implement features or bug fixes regardless of who manages the code. All this is available at their fingertips. They can hit the ground quickly without talking to other teams, reading their out-of-date wikis, and spending time figuring out how to clone and build their code.

Linear commit history

Meta’s monorepo does not use branches, so the commit history is linear. The linear commiit history saves engineers from having to reverse engineer a London Tube Map-like merge history to determine if a given commit’s snapshot contains their changes. With linear commit history, answering this question boils down to comparing commit times.

No versioning

Versioning is one of the most complex problems when working with multiple repos. Each repo is independent, and teams are free to decide which versions of dependencies they want to adopt. However, because each repo evolves at its own pace, different repos will inevitably end up with different versions of the same package. These inconsistencies lead to situations where a project may contain more than one version of the same dependency, but no single version works for everyone.

I experienced this firsthand during my time at Amazon. I was working on the Alexa app, which consisted of tens of packages, each pulling in at least a few dependencies. It was a versioning hell: conflicts were common, and resolving them was difficult. For example – one package used an older dependency because a newer version contained a bug. Another package, however, required the latest version because older versions lacked the needed features.

A monorepo solves versioning issues in a simple way: there is no versioning. All code is built together, so each package or project has only one version for a given commit.

Atomic commits

Monorepos allow atomic cross-project commits. Developers can rename classes or change function signatures without breaking code or tests. They just need to fix all the code affected by their change in the same commit.

Doing the same is impossible in a multi-repo environment. Introducing breaking changes is either safe but slow (as it requires multiple commits for a proper migration) or fast but at the expense of broken builds.

This problem plagued the ASP.NET Core project in its early days (ProjectK anyone?). The team was working on getting abstractions right, so the foundational interfaces constantly changed. Many packages (each in its repo) implemented or used these interfaces. Whenever they changed, most repos stopped compiling and needed fixes.

Build

Builds in monorepos are conceptually simple: all code in the repo is built at a given commit.

This approach makes it possible to quickly tell what’s included in the build and create bundles where all build artifacts match.

While the idea is simple, building the entire monorepo becomes increasingly challenging as the repository grows. Compiling big monorepos, like Meta’s, in a reasonable time is impossible without specialized build tools and massive infra.

Multiple repos make creating a list of matching packages surprisingly hard. I learned this when working on ASP.NET Core. The framework initially consisted of a couple of dozen of repos. Our build servers were constantly grinding because of what we called “build waves.” A build wave was initiated by a single commit that triggered a build. When this build finished, it triggered builds in repos depending on it. This process continued until all repos were built. Not only was this process slow and fragile, but with a steady stream of commits across all the repos, producing a list of matching packages was difficult.

The ASP.Net Core team eventually consolidated all the code in a single repository adopting the monorepo approach. This change happened after I left the team, but I believe the challenges behind getting fast and consistent builds were an important reason.

What are the problems with monorepos?

If monorepos are so great, why isn’t everyone using them? There are several reasons.

Scale

Scale poses the biggest challenge for monorepos. Meta’s repository is counted in terabytes and receives thousands of commits each day. Detecting conflicts and ensuring that all changes are merged correctly and don’t break the build without hurting developers’ productivity is tough. As most off-the-shelf tools cannot handle this scale, Meta has many dedicated teams that maintain the build infrastructure. Sometimes, they need to go to great lengths to do their job. Here is an example:

Back in 2013, tooling teams ran a simulation that showed that in a few years, basic git commands would take 45 minutes to execute if the repo continued to grow at the rate it did. It was unacceptable, so Facebook engineers turned to Git folks to solve this problem. At that time, Git was uninterested in modifying their SCM (Source Code Management) to support such a big repo. The Mercurial (hg) team, however, was more receptive. With significant contributions from Facebook, it rearchitected Mercurial to meet Facebook’s requirements. This is why Meta (a.k.a. Facebook) uses Mercurial (hg) as its source control.

Granular project permissions

Monorepos make accessing any code in the repository easy, which is great for developers’ productivity. However, companies often have sensitive code only selected developers should be able to access. This requirement goes against the idea of the monorepo, which aims to make all code easily accessible. As a result, enforcing access to code in a monorepo is problematic. Creating separate repos for sensitive projects is also not ideal, especially if these projects use the common infrastructure the monorepo provides for free.

Release management

A common strategy to maintain multiple releases is to create a branch for each release. Follow-ups (e.g., bug fixes) can be merged to these branches without bringing unrelated changes that could destabilize the release. This strategy won’t work in monorepos with a linear history.

I must admit that I don’t know how teams that ship their products publicly handle their releases. Our team owns a few services we deploy to production frequently. If we find an issue, we roll back our deployment and fix the bug forward.

A single commit can break the build

Because for monorepos, the entire codebase is built at a given commit, merging a mistake that causes compilation errors will break the build. These situations happen despite the tooling that is supposed to prevent them. In practice, this is only rarely a problem. Developers are only affected if the project that doesn’t compile is a dependency. And even then, they can workaround the problem by working off of an older commit until the breakage is fixed.

How to effectively work in big codebases

You wouldn’t hire a software engineer who cannot navigate code. Yet, I turned out to be one after I joined Microsoft and explored my new team’s codebase. What I saw shocked me.

Before Microsoft, I worked in a small start-up, and our projects didn’t exceed tens of thousands of lines of code. We could open, edit, and compile these projects directly in the IDE (Integrated Development Environment). My new team’s codebase had a few hundred thousand lines written in several programming languages. It was about 15 years old and used pretty much all possible technologies Microsoft invented in those years. Compiling it successfully was impossible without setting tens of environment variables and using magic command line incantations. No single IDE could handle this. It took me a few weeks before I began to feel comfortable with this codebase and all the tools I had to use for development.

This was almost twenty years ago, and since then, I have worked in several other big codebases, including .NET Framework, Visual Studio, ASP.NET Core, Amazon’s codebase, and Meta’s (Facebook’s) mono repo. Even though all these codebases were different, they had many similar challenges, most of which could be overcome using similar tactics.

Trying to understand all code is futile

A single person cannot deeply understand a codebase that has a few hundred thousand lines. But this is not the only challenge. Large codebases are not static. They often receive hundreds of contributions each day, so they evolve rapidly.

On the bright side, understanding all the code is not necessary. Rather, it is better to have a very good understanding of the area your team works on and a decent knowledge of the areas your code interacts with.

Code searching

It’s hard to be productive if you can’t search code. But it gets exponentially harder if you can’t even find the repo. And this was my experience during my years at Microsoft.

At that time, each team managed its codebase and source control individually, but there wasn’t any tool to find these repositories. The internal search returned an incomplete list of, often outdated, wikis. The easiest way to find code was to first find the team responsible for it and then get all the details from them.

(Around the time I was leaving Microsoft, it implemented its new engineering system, 1ES (One Engineering System), which I am sure brought significant improvements.)

Searching large codebases on a dev machine may not be an option. Cloning the entire codebase to a dev box may not be feasible, especially if the codebase consists of thousands of federated repos, like Amazon’s. Even if cloning is possible, tools such as grep are often too slow. This is why most big codebases have dedicated tools that make searching the code fast. Many of them also support following references, which is extremely helpful.

One factor that tremendously simplifies searching the code is formatting. If coding style is not enforced, finding anything is almost impossible. Searching a uniformly formatted codebase is much easier. This is why implementing a tool that enforces coding style is a good investment.

Build system complexity

Understanding the build system is key to being productive when working with big codebases.

Big codebases tend to have extremely complex build systems, often consisting of custom scripts, one-off tools, and specialized extensions stitched together to do the job. Off-the-shelf developer tools (e.g., IDEs) rarely can handle this complexity. Developers may struggle for days when they encounter a build system issue.

Many big companies have built their own tools to reign in this complexity and make it easier and faster for developers to work on large, multi-language code bases. Meta has buck Amazon has brazil, and Google has bazel. But from my experience, especially, with brazil, these tools also have some rough edges, so understanding how they work can go a long way.

The development environment is constantly in flux

Due to the number of engineers working in large codebases, even small productivity improvements can yield savings measured in engineering years. Maintainers work all the time to identify and fix bottlenecks. Because of this, the developer environment changes constantly, and the transitions are often not smooth, ironically resulting in lost productivity.

In 2019, Facebook decided to move away from Nuclide as its main IDE and migrate to VS Code. As a fan and an early adopter of VS Code (I even created an extension, and it was only in 2015!) I welcomed this change. But the ride was bumpy. The command I used the most (a few times per hour) during the first year was: Developer: Reload Windows. I had to use Vim or go back to Nuclide multiple times because VS Code stopped working. The early versions were bare – it took more than two years to bring all the features Nuclide offered to VS Code.

(To clarify, the tooling team did an awesome job. It supported both IDEs during the migration and put immense effort into making this migration successful. And it paid off—today, our VS Code is very stable, constantly gets new features, and is a pleasure to work with.)

Slow builds

Compiling large codebases takes time. Fortunately, you never need to do it yourself. In most cases, you only need to build and integrate with your product the sub-project you modified. However, even these steps can take considerable time despite the miracles that build engineers perform.

Legacy code

The codebases of many successful products that have been around for decades (e.g., Microsoft Windows) are big. They grow organically over the years thanks to the contributions of hundreds or thousands of developers who merge code daily. New releases are developed by expanding previous releases. Consequently, large codebases accumulate a lot of legacy code that almost no one is familiar with. I am sure some of the code I considered legacy when I joined Microsoft twenty years ago is still around because the product I worked on is still on the market.

The paradox of test coverage

When I learn that code owned by a team has low test coverage, I expect “here be dragons.” But I never know what to expect if the code coverage is high. I call this a paradox of high test coverage.

High test coverage does not tell much about the quality of unit tests. Low coverage does.

The low coverage argument is self-explanatory. If tests cover only a small portion of the product code, they cannot prevent bugs in the code that is not covered. The opposite is, however, not true: high test coverage does not guarantee a quality product. How is this possible?

Test issues

While unit tests ensure the quality of the product code, nothing, except the developer, ensures the quality of the unit tests. As a result, tests sometimes have issues that allow bugs to sneak in. Finding unit test issues is more luck than science. It usually happens by accident—usually when tests continue to pass despite code changes that should trigger test failures.

One of the simplest examples of a unit test issue is missing asserts. Tests without asserts are unlikely to flag issues. Other common problems include incorrect setup and bugs caused by copying existing tests and incorrectly adapting them to test a new scenario.

Mocking issues

Mocking allows the code under test to be isolated from its dependencies and simulate the dependency behavior. However, when the simulation is incorrect or the behavior of the dependency changes, tests may happily pass, hiding serious issues.

I’ve been working with C++ code bases, and I often see developers assume, without confirming, that a dependency they use won’t throw an exception. So, when they mock this dependency, they forget about the exception case. Even though their tests cover all the code, an exception in production takes the entire service down.

Uncovered code

Getting to 100% code coverage is usually impractical, if not impossible. As a result, a small amount of code is still not covered. Similar to the low coverage scenarios, any change to the code that is not covered can introduce a bug that won’t be detected.

Chasing the coverage number

Test coverage is only a metric. I’ve seen teams do whatever it takes to achieve the metric goal, especially if it was mandated externally, e.g., at the organization or company level. Occasionally, I encountered teams that wrote “test” code whose primary purpose was increasing coverage. Detecting or preventing bugs was a non-goal.

Low test coverage is only the tip of the iceberg

At first sight, low test coverage seems a benign issue. But it often signals bigger problems the team is facing, like:

  • spending a significant amount of time fixing regressions
  • shipping high-quality new features is slow due to excessive manual validation
  • many bugs reach production and are only caught and reported by users
  • the on-call, if the team has one, is challenging
  • the engineering culture of the team is poor, or the team is under pressure to ship new features at an unsustainable pace
  • the code is not very well organized and might be hard to work with, only slowing down the development even further
  • test coverage is likely lower than admitted to and will continue to deteriorate

I’ve worked on a few teams where developers understood the value of unit testing. They treated test code like product code and never sent a PR without unit tests. Because of this, even if they experienced the problems listed above, it was at a much smaller scale. They also never needed to worry about meeting the test coverage goals – they achieved them as a side effect.