Every developer wants to get better at coding. But they sometimes don’t know how and feel stuck. The good news is that there are a few simple ways to improve coding skills that many developers neglect. Most can be done as part of a regular job, so they don’t require additional time. And doing them consistently can make you a better coder.
Code reviews
Code reviews are the easiest way to learn from others. Unfortunately, many developers treat code reviews as a chore. They want to spend as little time as possible on them. This way, they lose great learning opportunities.
What makes code reviews special is their interactivity. They allow asking the author about their choices and discuss alternatives they considered. Other participants often leave interesting comments or propose novel alternative solutions.
Reading other developers’ code
I learned a lot about programming by reading code written by other developers. I often want to know how a feature or library I use works. Usually, the fastest way to get this information is by inspecting its code.
For instance, TypeScript supported some features, e.g., async/await, before they were added to JavaScript. I was curious how it was possible. So, I wrote short TypeScript snippets and checked how they were transpiled.
Your company’s codebase is another great resource to learn from. When I get stuck, I often search my company repository to check how other developers solved a similar problem. Our repo is big, so if I can’t find anything useful, I am almost sure what I am trying to do is questionable.
I use the same strategy for my side projects but search GitHub.
If you find reading other developers’ code challenging (I sure did), take a look at this post.
Debugging
Stepping through the code, analyzing the stack trace, and inspecting variables will allow you to understand important nuances and easy-to-miss details much deeper. I often fire a debugger if I can’t answer all my questions after reading the code.
“Borrowing” code
Let’s be honest. Not all code needs to be written from scratch. Sometimes, we just need a boilerplate. But sometimes, we don’t know how to solve a problem. In these cases, copying and adapting code is often faster (and easier). It could be the code you wrote in the past or someone else’s code, e.g., copied from StackOverflow (I have yet to find a software developer who never copied code from StackOverflow.) AI-powered programming tools are built on this idea. Tools like Github Copilot ask you to constantly vet and adapt the code they generate. Here is the thing, though. You’ll learn nothing if you don’t try to understand why the code you copied works or can’t correctly adapt it.
Programming contests
Advent of Code taught me a lot. It is a light programming contest that takes place every December and consists of a series of small programming puzzles that can be solved in any programming language. I find it an excellent way to keep my coding skills sharp.
Solving Advent of Code problems is a good exercise, but examining other participants’ solutions is where the real learnings are. And quite frankly, it can be a humbling experience. The different ways and techniques the participants use to solve the problems can be astonishing. I remember being proud of my ultra-short, 30-line-long solution, only to see someone else solve the same problem in the same programming language with just two lines of code because they used a clever idea.
I recently overheard this: “Code reviews are a waste of time – they don’t find bugs! We should rely on our tests and skip all this code review mambo-jumbo.”
And I agree – tests are important. But they won’t identify many issues code reviews will. Here are a few examples.
Unwanted dependencies
Developers often add dependencies they could easily do without. Bringing in libraries that are megabytes in size only to use one small function or relying on packages like true may not be worth the cost. Code reviews are a great opportunity to spot new dependencies and discuss the value they bring.
Potential performance issues
In my experience, most automated tests use only basic test inputs. For instance, tests for code that operates on arrays rarely use arrays with more than a few items. These inputs might be sufficient to test the basic functionality but won’t put the code under stress.
Code reviews allow spotting suboptimal algorithms whose execution time rapidly grows with the input size or scenarios prone to combinatorial explosion.
The latter bit our team not too long ago. When reviewing a code change, one of the reviewers mentioned the possibility of a combinatorial explosion. After discussing it with the author, they concluded it would never happen. Fast forward a few weeks, and our service occasionally uses 100% CPU before it crashes due to running out of memory. Guess what? The hypothetical scenario did happen. Had we analyzed the concern mentioned in the code review more thoroughly, we would’ve avoided the problem completely.
Code complexity and readability
Computers execute all code with the same ease. They don’t care what it looks like. Humans are different. The more complex the code, the harder it is to understand and correctly modify. Code review is the best time to identify code that will become a maintenance nightmare due to its complexity and poor readability.
Missing test coverage
The purpose of automated tests is to flag bugs and regressions. But how do we ensure that these tests exist in the first place? Through a code review! If test coverage for a proposed change is insufficient, it is usually enough to ask in a code review for improving it.
Bugs? Yes, sometimes.
Code reviews do find bugs. They don’t find all of them, but any bug caught before it reaches the repository is a win. Code reviews are one of the first stages in the software development process making them the earliest chance to catch bugs. And the sooner a bug is found, the cheaper it is to fix.
Conclusion
Tests are not a replacement for code reviews. However, code reviews are also not a replacement for tests. These are two different tools. Even though there might be some overlap, they have different purposes. Having both helps maintain high-quality code.
Storytime
One memorable issue I found through a code review was the questionable use of regular expressions. My experience taught me to be careful with regular expressions because even innocent-looking ones can lead to serious performance issues. But when I saw the proposed change, I was speechless: the code generated regular expressions on the fly in a loop and executed them.
At that time, I was nineteen years into my Software Engineering career (including 11 years at Microsoft), and I hadn’t seen a problem whose solution would require generating regular expressions on the fly. I didn’t even completely understand what the code did, but I was convinced this code should never be checked in (despite passing tests). After digging deeper into the problem, we found that a single for loop with two indices could solve it. If not for the code review, this change would’ve made it to millions of phones, including, perhaps, even yours, because this app has hundreds of millions of downloads across iOS and Android.
I love monorepos! Or at least I love Meta’s (Facebook’s) monorepo, which happens to be the only monorepo I have ever worked with. Here is why.
Easy access to all code
Meta’s monorepo contains most of the company’s code. Any developer working at Meta has access to it. We can search it, read it, and check the commit history. We also can, and frequently do, modify code code managed by other teams.
This easy access to all the code is great for the developer’s productivity. Engineers can understand their dependencies deeper, debug issues across the entire stack, and implement features or bug fixes regardless of who manages the code. All this is available at their fingertips. They can hit the ground quickly without talking to other teams, reading their out-of-date wikis, and spending time figuring out how to clone and build their code.
Linear commit history
Meta’s monorepo does not use branches, so the commit history is linear. The linear commiit history saves engineers from having to reverse engineer a London Tube Map-like merge history to determine if a given commit’s snapshot contains their changes. With linear commit history, answering this question boils down to comparing commit times.
No versioning
Versioning is one of the most complex problems when working with multiple repos. Each repo is independent, and teams are free to decide which versions of dependencies they want to adopt. However, because each repo evolves at its own pace, different repos will inevitably end up with different versions of the same package. These inconsistencies lead to situations where a project may contain more than one version of the same dependency, but no single version works for everyone.
I experienced this firsthand during my time at Amazon. I was working on the Alexa app, which consisted of tens of packages, each pulling in at least a few dependencies. It was a versioning hell: conflicts were common, and resolving them was difficult. For example – one package used an older dependency because a newer version contained a bug. Another package, however, required the latest version because older versions lacked the needed features.
A monorepo solves versioning issues in a simple way: there is no versioning. All code is built together, so each package or project has only one version for a given commit.
Atomic commits
Monorepos allow atomic cross-project commits. Developers can rename classes or change function signatures without breaking code or tests. They just need to fix all the code affected by their change in the same commit.
Doing the same is impossible in a multi-repo environment. Introducing breaking changes is either safe but slow (as it requires multiple commits for a proper migration) or fast but at the expense of broken builds.
This problem plagued the ASP.NET Core project in its early days (ProjectK anyone?). The team was working on getting abstractions right, so the foundational interfaces constantly changed. Many packages (each in its repo) implemented or used these interfaces. Whenever they changed, most repos stopped compiling and needed fixes.
Build
Builds in monorepos are conceptually simple: all code in the repo is built at a given commit.
This approach makes it possible to quickly tell what’s included in the build and create bundles where all build artifacts match.
While the idea is simple, building the entire monorepo becomes increasingly challenging as the repository grows. Compiling big monorepos, like Meta’s, in a reasonable time is impossible without specialized build tools and massive infra.
Multiple repos make creating a list of matching packages surprisingly hard. I learned this when working on ASP.NET Core. The framework initially consisted of a couple of dozen of repos. Our build servers were constantly grinding because of what we called “build waves.” A build wave was initiated by a single commit that triggered a build. When this build finished, it triggered builds in repos depending on it. This process continued until all repos were built. Not only was this process slow and fragile, but with a steady stream of commits across all the repos, producing a list of matching packages was difficult.
The ASP.Net Core team eventually consolidated all the code in a single repository adopting the monorepo approach. This change happened after I left the team, but I believe the challenges behind getting fast and consistent builds were an important reason.
What are the problems with monorepos?
If monorepos are so great, why isn’t everyone using them? There are several reasons.
Scale
Scale poses the biggest challenge for monorepos. Meta’s repository is counted in terabytes and receives thousands of commits each day. Detecting conflicts and ensuring that all changes are merged correctly and don’t break the build without hurting developers’ productivity is tough. As most off-the-shelf tools cannot handle this scale, Meta has many dedicated teams that maintain the build infrastructure. Sometimes, they need to go to great lengths to do their job. Here is an example:
Back in 2013, tooling teams ran a simulation that showed that in a few years, basic git commands would take 45 minutes to execute if the repo continued to grow at the rate it did. It was unacceptable, so Facebook engineers turned to Git folks to solve this problem. At that time, Git was uninterested in modifying their SCM (Source Code Management) to support such a big repo. The Mercurial (hg) team, however, was more receptive. With significant contributions from Facebook, it rearchitected Mercurial to meet Facebook’s requirements. This is why Meta (a.k.a. Facebook) uses Mercurial (hg) as its source control.
Granular project permissions
Monorepos make accessing any code in the repository easy, which is great for developers’ productivity. However, companies often have sensitive code only selected developers should be able to access. This requirement goes against the idea of the monorepo, which aims to make all code easily accessible. As a result, enforcing access to code in a monorepo is problematic. Creating separate repos for sensitive projects is also not ideal, especially if these projects use the common infrastructure the monorepo provides for free.
Release management
A common strategy to maintain multiple releases is to create a branch for each release. Follow-ups (e.g., bug fixes) can be merged to these branches without bringing unrelated changes that could destabilize the release. This strategy won’t work in monorepos with a linear history.
I must admit that I don’t know how teams that ship their products publicly handle their releases. Our team owns a few services we deploy to production frequently. If we find an issue, we roll back our deployment and fix the bug forward.
A single commit can break the build
Because for monorepos, the entire codebase is built at a given commit, merging a mistake that causes compilation errors will break the build. These situations happen despite the tooling that is supposed to prevent them. In practice, this is only rarely a problem. Developers are only affected if the project that doesn’t compile is a dependency. And even then, they can workaround the problem by working off of an older commit until the breakage is fixed.
I didn’t know what “gold plating” was until a senior engineer called out on one of my code changes and recommended that I “stop gold plating.” I was clueless about what he meant, so I went to talk to him. This meeting ended up being a memorable lesson in my software engineering journey.
Wikipedia defines gold plating as: “the phenomenon of working on a project or task past the point of diminishing returns.” While the article talks about gold plating in the context of project management, the same phenomenon occurs in software development under a more familiar name: unnecessary refactoring.
The feedback I got from the senior engineer was that he noticed a pattern where I continued working on code that was already finished. I was polishing tests without covering new scenarios, changing perfectly fine variable or method names, or making functions slightly shorter.
I felt offended – I was making the code better!
How could a senior software engineer not see it?
How could they be against improving code?
So, he asked me to explain how my changes improved the code. I couldn’t. Only then did I realize he was right. I had to admit the new code was a bit different, but it wasn’t objectively better.
But then he went further and asked me how my changes impacted the team. I got confused: why would these small changes affect the team? It turned out they could and in a few ways.
I didn’t use my time effectively.
I spent time working on unimportant changes instead of taking on work that mattered. Therefore, someone else had to pick up tasks I could work on. If I did it, we could fix more bugs, implement more features, or ship faster. I also hurt myself – important work is usually a good learning opportunity and can lead to a quicker promotion, but I chose to pass on it.
I stole time from team members.
Code reviews were standard practice on every team I worked on in the past 20 years. Reviewing even small changes requires time. By requesting reviews of unneeded changes, I demanded that my team members spend time on trivialities.
Changing any code can lead to merge conflicts and disrupt other developer’s work. Sometimes, it is unavoidable. But it is not fun when changes no one needs cause conflicts.
I occasionally introduced issues.
A few times, my gold plating resulted in bugs. I missed an edge case in the new code, and somehow, tests didn’t catch it. The bug would break the build or make it to production. Having to justify fixing issues introduced by unnecessary changes is always embarrassing.
Not all refactoring is gold plating.
I am not trying to convince anyone that refactoring, in general, is a waste of time. In most cases, it’s the opposite. Refactoring code often aims to simplify implementing future changes, remove duplication, or make code more understandable. Sometimes, especially when deadlines loom, developers (myself included) make shortcuts or introduce hacks that are a ticking bomb. Removing them is the right thing to do, substantially improving the code quality. These kinds of refactoring are not gold plating. Gold plating is about changes we could live without without anyone noticing them.
Now that you know what gold plating is, whenever you decide to refactor some code, you should ask yourself: “Is it a real improvement, or am I just gold plating?”
You wouldn’t hire a software engineer who cannot navigate code. Yet, I turned out to be one after I joined Microsoft and explored my new team’s codebase. What I saw shocked me.
Before Microsoft, I worked in a small start-up, and our projects didn’t exceed tens of thousands of lines of code. We could open, edit, and compile these projects directly in the IDE (Integrated Development Environment). My new team’s codebase had a few hundred thousand lines written in several programming languages. It was about 15 years old and used pretty much all possible technologies Microsoft invented in those years. Compiling it successfully was impossible without setting tens of environment variables and using magic command line incantations. No single IDE could handle this. It took me a few weeks before I began to feel comfortable with this codebase and all the tools I had to use for development.
This was almost twenty years ago, and since then, I have worked in several other big codebases, including .NET Framework, Visual Studio, ASP.NET Core, Amazon’s codebase, and Meta’s (Facebook’s) mono repo. Even though all these codebases were different, they had many similar challenges, most of which could be overcome using similar tactics.
Trying to understand all code is futile
A single person cannot deeply understand a codebase that has a few hundred thousand lines. But this is not the only challenge. Large codebases are not static. They often receive hundreds of contributions each day, so they evolve rapidly.
It’s hard to be productive if you can’t search code. But it gets exponentially harder if you can’t even find the repo. And this was my experience during my years at Microsoft.
At that time, each team managed its codebase and source control individually, but there wasn’t any tool to find these repositories. The internal search returned an incomplete list of, often outdated, wikis. The easiest way to find code was to first find the team responsible for it and then get all the details from them.
(Around the time I was leaving Microsoft, it implemented its new engineering system, 1ES (One Engineering System), which I am sure brought significant improvements.)
Searching large codebases on a dev machine may not be an option. Cloning the entire codebase to a dev box may not be feasible, especially if the codebase consists of thousands of federated repos, like Amazon’s. Even if cloning is possible, tools such as grep are often too slow. This is why most big codebases have dedicated tools that make searching the code fast. Many of them also support following references, which is extremely helpful.
One factor that tremendously simplifies searching the code is formatting. If coding style is not enforced, finding anything is almost impossible. Searching a uniformly formatted codebase is much easier. This is why implementing a tool that enforces coding style is a good investment.
Build system complexity
Understanding the build system is key to being productive when working with big codebases.
Big codebases tend to have extremely complex build systems, often consisting of custom scripts, one-off tools, and specialized extensions stitched together to do the job. Off-the-shelf developer tools (e.g., IDEs) rarely can handle this complexity. Developers may struggle for days when they encounter a build system issue.
Many big companies have built their own tools to reign in this complexity and make it easier and faster for developers to work on large, multi-language code bases. Meta has buck Amazon has brazil, and Google has bazel. But from my experience, especially, with brazil, these tools also have some rough edges, so understanding how they work can go a long way.
The development environment is constantly in flux
Due to the number of engineers working in large codebases, even small productivity improvements can yield savings measured in engineering years. Maintainers work all the time to identify and fix bottlenecks. Because of this, the developer environment changes constantly, and the transitions are often not smooth, ironically resulting in lost productivity.
In 2019, Facebook decided to move away from Nuclide as its main IDE and migrate to VS Code. As a fan and an early adopter of VS Code (I even created an extension, and it was only in 2015!) I welcomed this change. But the ride was bumpy. The command I used the most (a few times per hour) during the first year was: Developer: Reload Windows. I had to use Vim or go back to Nuclide multiple times because VS Code stopped working. The early versions were bare – it took more than two years to bring all the features Nuclide offered to VS Code.
(To clarify, the tooling team did an awesome job. It supported both IDEs during the migration and put immense effort into making this migration successful. And it paid off—today, our VS Code is very stable, constantly gets new features, and is a pleasure to work with.)
Slow builds
Compiling large codebases takes time. Fortunately, you never need to do it yourself. In most cases, you only need to build and integrate with your product the sub-project you modified. However, even these steps can take considerable time despite the miracles that build engineers perform.
The codebases of many successful products that have been around for decades (e.g., Microsoft Windows) are big. They grow organically over the years thanks to the contributions of hundreds or thousands of developers who merge code daily. New releases are developed by expanding previous releases. Consequently, large codebases accumulate a lot of legacy code that almost no one is familiar with. I am sure some of the code I considered legacy when I joined Microsoft twenty years ago is still around because the product I worked on is still on the market.