How to improve your coding skills (without spending a lot of time)

Every developer wants to get better at coding. But they sometimes don’t know how and feel stuck. The good news is that there are a few simple ways to improve coding skills that many developers neglect. Most can be done as part of a regular job, so they don’t require additional time. And doing them consistently can make you a better coder.

Code reviews

Code reviews are the easiest way to learn from others. Unfortunately, many developers treat code reviews as a chore. They want to spend as little time as possible on them. This way, they lose great learning opportunities.

What makes code reviews special is their interactivity. They allow asking the author about their choices and discuss alternatives they considered. Other participants often leave interesting comments or propose novel alternative solutions.

Reading other developers’ code

I learned a lot about programming by reading code written by other developers. I often want to know how a feature or library I use works. Usually, the fastest way to get this information is by inspecting its code.

For instance, TypeScript supported some features, e.g., async/await, before they were added to JavaScript. I was curious how it was possible. So, I wrote short TypeScript snippets and checked how they were transpiled.

Your company’s codebase is another great resource to learn from. When I get stuck, I often search my company repository to check how other developers solved a similar problem. Our repo is big, so if I can’t find anything useful, I am almost sure what I am trying to do is questionable.

I use the same strategy for my side projects but search GitHub.

If you find reading other developers’ code challenging (I sure did), take a look at this post.

Debugging

Stepping through the code, analyzing the stack trace, and inspecting variables will allow you to understand important nuances and easy-to-miss details much deeper. I often fire a debugger if I can’t answer all my questions after reading the code.

“Borrowing” code

Let’s be honest. Not all code needs to be written from scratch. Sometimes, we just need a boilerplate. But sometimes, we don’t know how to solve a problem. In these cases, copying and adapting code is often faster (and easier). It could be the code you wrote in the past or someone else’s code, e.g., copied from StackOverflow (I have yet to find a software developer who never copied code from StackOverflow.) AI-powered programming tools are built on this idea. Tools like Github Copilot ask you to constantly vet and adapt the code they generate. Here is the thing, though. You’ll learn nothing if you don’t try to understand why the code you copied works or can’t correctly adapt it.

Programming contests

Advent of Code taught me a lot. It is a light programming contest that takes place every December and consists of a series of small programming puzzles that can be solved in any programming language. I find it an excellent way to keep my coding skills sharp.

Solving Advent of Code problems is a good exercise, but examining other participants’ solutions is where the real learnings are. And quite frankly, it can be a humbling experience. The different ways and techniques the participants use to solve the problems can be astonishing. I remember being proud of my ultra-short, 30-line-long solution, only to see someone else solve the same problem in the same programming language with just two lines of code because they used a clever idea.

What is gold plating and why you should avoid it

I didn’t know what “gold plating” was until a senior engineer called out on one of my code changes and recommended that I “stop gold plating.” I was clueless about what he meant, so I went to talk to him. This meeting ended up being a memorable lesson in my software engineering journey.

Wikipedia defines gold plating as: “the phenomenon of working on a project or task past the point of diminishing returns.” While the article talks about gold plating in the context of project management, the same phenomenon occurs in software development under a more familiar name: unnecessary refactoring.

The feedback I got from the senior engineer was that he noticed a pattern where I continued working on code that was already finished. I was polishing tests without covering new scenarios, changing perfectly fine variable or method names, or making functions slightly shorter.

I felt offended – I was making the code better!

How could a senior software engineer not see it?

How could they be against improving code?

So, he asked me to explain how my changes improved the code. I couldn’t. Only then did I realize he was right. I had to admit the new code was a bit different, but it wasn’t objectively better.

But then he went further and asked me how my changes impacted the team. I got confused: why would these small changes affect the team? It turned out they could and in a few ways.

I didn’t use my time effectively.

I spent time working on unimportant changes instead of taking on work that mattered. Therefore, someone else had to pick up tasks I could work on. If I did it, we could fix more bugs, implement more features, or ship faster. I also hurt myself – important work is usually a good learning opportunity and can lead to a quicker promotion, but I chose to pass on it.

I stole time from team members.

Code reviews were standard practice on every team I worked on in the past 20 years. Reviewing even small changes requires time. By requesting reviews of unneeded changes, I demanded that my team members spend time on trivialities.

Changing any code can lead to merge conflicts and disrupt other developer’s work. Sometimes, it is unavoidable. But it is not fun when changes no one needs cause conflicts.

I occasionally introduced issues.

A few times, my gold plating resulted in bugs. I missed an edge case in the new code, and somehow, tests didn’t catch it. The bug would break the build or make it to production. Having to justify fixing issues introduced by unnecessary changes is always embarrassing.

Not all refactoring is gold plating.

I am not trying to convince anyone that refactoring, in general, is a waste of time. In most cases, it’s the opposite. Refactoring code often aims to simplify implementing future changes, remove duplication, or make code more understandable. Sometimes, especially when deadlines loom, developers (myself included) make shortcuts or introduce hacks that are a ticking bomb. Removing them is the right thing to do, substantially improving the code quality. These kinds of refactoring are not gold plating. Gold plating is about changes we could live without without anyone noticing them.

Now that you know what gold plating is, whenever you decide to refactor some code, you should ask yourself: “Is it a real improvement, or am I just gold plating?”

How to effectively work in big codebases

You wouldn’t hire a software engineer who cannot navigate code. Yet, I turned out to be one after I joined Microsoft and explored my new team’s codebase. What I saw shocked me.

Before Microsoft, I worked in a small start-up, and our projects didn’t exceed tens of thousands of lines of code. We could open, edit, and compile these projects directly in the IDE (Integrated Development Environment). My new team’s codebase had a few hundred thousand lines written in several programming languages. It was about 15 years old and used pretty much all possible technologies Microsoft invented in those years. Compiling it successfully was impossible without setting tens of environment variables and using magic command line incantations. No single IDE could handle this. It took me a few weeks before I began to feel comfortable with this codebase and all the tools I had to use for development.

This was almost twenty years ago, and since then, I have worked in several other big codebases, including .NET Framework, Visual Studio, ASP.NET Core, Amazon’s codebase, and Meta’s (Facebook’s) mono repo. Even though all these codebases were different, they had many similar challenges, most of which could be overcome using similar tactics.

Trying to understand all code is futile

A single person cannot deeply understand a codebase that has a few hundred thousand lines. But this is not the only challenge. Large codebases are not static. They often receive hundreds of contributions each day, so they evolve rapidly.

On the bright side, understanding all the code is not necessary. Rather, it is better to have a very good understanding of the area your team works on and a decent knowledge of the areas your code interacts with.

Code searching

It’s hard to be productive if you can’t search code. But it gets exponentially harder if you can’t even find the repo. And this was my experience during my years at Microsoft.

At that time, each team managed its codebase and source control individually, but there wasn’t any tool to find these repositories. The internal search returned an incomplete list of, often outdated, wikis. The easiest way to find code was to first find the team responsible for it and then get all the details from them.

(Around the time I was leaving Microsoft, it implemented its new engineering system, 1ES (One Engineering System), which I am sure brought significant improvements.)

Searching large codebases on a dev machine may not be an option. Cloning the entire codebase to a dev box may not be feasible, especially if the codebase consists of thousands of federated repos, like Amazon’s. Even if cloning is possible, tools such as grep are often too slow. This is why most big codebases have dedicated tools that make searching the code fast. Many of them also support following references, which is extremely helpful.

One factor that tremendously simplifies searching the code is formatting. If coding style is not enforced, finding anything is almost impossible. Searching a uniformly formatted codebase is much easier. This is why implementing a tool that enforces coding style is a good investment.

Build system complexity

Understanding the build system is key to being productive when working with big codebases.

Big codebases tend to have extremely complex build systems, often consisting of custom scripts, one-off tools, and specialized extensions stitched together to do the job. Off-the-shelf developer tools (e.g., IDEs) rarely can handle this complexity. Developers may struggle for days when they encounter a build system issue.

Many big companies have built their own tools to reign in this complexity and make it easier and faster for developers to work on large, multi-language code bases. Meta has buck Amazon has brazil, and Google has bazel. But from my experience, especially, with brazil, these tools also have some rough edges, so understanding how they work can go a long way.

The development environment is constantly in flux

Due to the number of engineers working in large codebases, even small productivity improvements can yield savings measured in engineering years. Maintainers work all the time to identify and fix bottlenecks. Because of this, the developer environment changes constantly, and the transitions are often not smooth, ironically resulting in lost productivity.

In 2019, Facebook decided to move away from Nuclide as its main IDE and migrate to VS Code. As a fan and an early adopter of VS Code (I even created an extension, and it was only in 2015!) I welcomed this change. But the ride was bumpy. The command I used the most (a few times per hour) during the first year was: Developer: Reload Windows. I had to use Vim or go back to Nuclide multiple times because VS Code stopped working. The early versions were bare – it took more than two years to bring all the features Nuclide offered to VS Code.

(To clarify, the tooling team did an awesome job. It supported both IDEs during the migration and put immense effort into making this migration successful. And it paid off—today, our VS Code is very stable, constantly gets new features, and is a pleasure to work with.)

Slow builds

Compiling large codebases takes time. Fortunately, you never need to do it yourself. In most cases, you only need to build and integrate with your product the sub-project you modified. However, even these steps can take considerable time despite the miracles that build engineers perform.

Legacy code

The codebases of many successful products that have been around for decades (e.g., Microsoft Windows) are big. They grow organically over the years thanks to the contributions of hundreds or thousands of developers who merge code daily. New releases are developed by expanding previous releases. Consequently, large codebases accumulate a lot of legacy code that almost no one is familiar with. I am sure some of the code I considered legacy when I joined Microsoft twenty years ago is still around because the product I worked on is still on the market.

On-Call manual: Onboarding a new person to the on-call rotation

One (selfish) reason to celebrate a new team member is that they will eventually join the on-call rotation. And when they do, the existing shifts will move farther apart. However, adding an unprepared engineer to the on-call rotation can be a disaster. This post describes what on-call onboarding looks like on our team.

The on-call onboarding process is the same for each new team member. It consists of the following steps:

  1. Regular ramp-up
  2. On-call overview
  3. Shadow shift
  4. Reverse shadow shift
  5. First solo shift

Let’s look into each of these steps in more detail.

Regular ramp-up

The regular ramp-up aims to help new team members familiarize themselves with the problems the team is solving and teach them how to work effectively in the team’s codebase. We want new colleagues to work on the code they will be responsible for when they are on call later. This approach allows them to acquire basic context that will be useful for maintaining this code and troubleshooting issues.

On-call overview

Regular ramp-up is rarely sufficient for new people to grasp the entire infra the team is responsible for. And knowing this infra is just the tip of the iceberg. There is much more an effective on-call needs to be familiar with, for instance:

  • what are the dependencies, and what is the impact of their failures
  • how to find dashboards and use them for debugging
  • where to find the documentation (e.g., runbooks)
  • expectations, e.g., is the on-call responsible for alerts raised outside working hours
  • how to do deployments and rollbacks
  • tools used to troubleshoot and fix issues
  • standard operating procedures
  • and more

On our team, we organize knowledge-sharing sessions that give new team members an overview of all these areas. We record these sessions to make revisiting unclear topics easy.

Shadow on-call shift

During the shadow on-call shift, the on-call-in-training (a.k.a. secondary on-call) shadows an experienced on-call (a.k.a. primary on-call). Both on-calls are subscribed to all tasks and alerts, but resolving issues is the primary on-call’s responsibility. The primary on-call is expected to show the secondary on-call how to deal with outages. This is usually limited to problems occurring during working hours. Finally, the primary on-call can ask the secondary on-call to handle non-critical tasks, providing guidance as needed.

Reverse shadow on-call shift

After the shadow shift, things get real: the on-call in training becomes the primary on-call. They are now responsible for handling all alerts, tasks, deployments, etc. However, they are not alone—they have an experienced on-call having their back during the entire shift.

We schedule shadow and reverse shadow shifts back-to-back. This way, everything the on-call-in-training learned during the first shift is fresh when they become the primary on-call.

First solo shift

Once shadowing is complete, we add the new team member to the on-call rotation. We add them to the queue’s end, giving them additional time to learn more about our systems and the infrastructure.

In addition to training new on-calls, our team maintains a chat to discuss on-call problems and get help when resolving issues. Both new and experienced on-calls regularly use this chat when they are stuck because they know someone will be there to help them.

On-call Manual: Boost your career by improving your team’s on-call

I have yet to find a team maintaining critical systems that is happy with its on-call. Most engineers dread their on-call shifts and want to forget about on-call as soon as their shift ends. For some, hectic on-call shifts are the reason to leave the team or even the company.

But this is great news for you. All these factors make improving on-call a great career opportunity. Here are a few reasons:

  • Team-wide impact. Making the on-call better increases work satisfaction for everyone on the team.
  • Finding work is easy. No on-call is perfect. There’s always something to fix.
  • No competition. Most engineers consider work related to on-call uninteresting, so you can fully own the entire area. As a result, your scope might be bigger than any other development work you own.

Getting started

It is difficult to propose meaningful improvements to your team’s on-call before your first shift. You need to become familiar with your team’s on-call responsibilities and problems before trying to make it better.

Once you have a few shifts under your belt, you should know the most problematic areas. Come up with a few concrete actions to remedy the biggest issues. This list doesn’t have to be complete to get started. Some examples include tuning (or deleting) the noisiest alerts, refactoring fragile code, or automating time-consuming manual tasks.

Talk to your manager about the improvements you want to make. No manager who cares about their team would refuse the offer to improve the team’s on-call. If the timing is not right (e.g., your team is closing a big release), ask your manager when a better time would be. Mention that you may need their help to ensure the participation of all team members.

Set your expectations right. Despite the improvements, don’t expect your team members to suddenly start loving their on-call. It’s a win if they stop dreading it.

Execution

From my experience, the two most effective ways to improve the on-call is to have regular (e.g., twice a year) fixathons combined with ongoing maintenance.

During a fixathon, the entire team spends a few days fixing the biggest on-call issues. In most cases, these will be issues that started occurring since the previous fixathon but weren’t taken care of by on-calls during their shifts. You may need to work closely with your manager to ensure the entire team’s participation, especially at the beginning.

Ongoing maintenance involves fixing problems as they arise, usually done by the person on call. As some shifts are heavier than others, the on-call may not always be able to address all issues.

Your role

Before talking about what your role is, let’s talk about what your role isn’t.

Your role isn’t to single-handedly fix all on-call issues.

This approach doesn’t scale. If you try it, you will eventually burn out, struggling to do two full-time jobs simultaneously: your regular responsibilities and fixing on-call issues. The worst part is that your team members won’t feel responsible for maintaining the on-call quality. They might even care less because now somebody is fixing issues for them.

While you should still participate in fixing on-call issues, your main role is to:

  • organize fixathons – identify the most pressing issues and distribute issues for the team to work on, track progress, and measure the improvement
  • ensure on-calls are addressing issues they encountered during their shifts
  • build tools – e.g., dashboards to monitor the quality of the on-call or queries that allow to identify the biggest problems quickly

If you do this consistently, your team members will eventually find fixing on-call issues natural.

Skills you will learn

Driving on-call improvements will help you hone a few skills that are key for successful senior and even staff engineers:

  • leading without authority – as the owner of the on-call improvement area you’re responsible for coming up with the plan and leading its execution
  • scaling through others – because you involve the entire team, you can get much more done than if you did it yourself
  • influencing the engineering culture of the team – ingraining a sense of responsibility for the on-call quality in team members is an impactful change
  • holding people accountable – making sure everyone does their part is always a challenge
  • identifying problems worth solving – instead of being told what problems to solve, you are responsible for finding these problems and deciding if they are worth solving

Expanding your scope

Once you start seeing the results of your work, you can take it further to expand your scope.

You can become the engineer who manages the on-call rotation for your team. This work doesn’t take a lot of time but can save a lot of headaches for your manager. The typical responsibilities include:

  • managing the on-call schedule
  • organizing onboarding new team members to the on-call rotation
  • helping figure out shift swaps and substitutions

Another way to increase your scope is to share your experience with other teams. You can organize talks showing what you did, the results you achieved, and what worked and what didn’t. You can also generalize the tools you built so that other teams can use them.