Want your productivity to skyrocket? Avoid this trap!

As a junior engineer, I felt the urge to jump on each new project that showed on the horizon. I never checked what I already had on my plate. Inevitably, I ended up with too many projects to work on simultaneously. Whenever my manager or a customer mentioned one of my projects, I immediately switched to it to show I was making progress. It took me years to realize that while this approach pleased the customer or the manager (and saved my junior dev butt) for the moment, it quietly hurt everyone.

The three-line execution graph

Executing multiple projects of the same priority at the same time looks like this:

Initially, there are two projects: Project A and Project B. You start working on Project A, but after a while, you receive a call from the customer inquiring about the progress of Project B. To make this customer happy, you switch to project B. In the meantime, a new interesting project, C, pops up. It is cool and seems small, so you pick it up. Your manager realizes that project A is dragging and asks about it. You somehow manage to finish your toy project C and move to A, which is well past the deadline. Then you pick B again.

If you didn’t jump from project to project, the execution could look like this:

If you compare these graphs, three things stand out in the second scenario:

Overall, the execution took less time. Resuming a paused project requires time to remember where the project was left off and get in the groove, i.e., to switch the context, which , which is time-consuming.
Projects A and B finished much quicker than in the first case. While you didn’t make customer B feel good by saying you were working on their project, ultimately, the project was completed sooner. In fact, both projects, A and B, were finished much sooner than they would have if you bounced between them.
Project C came in late, so it should wait unless it is a much higher priority than other projects. Otherwise, it disrupts the execution of these projects.

I know that life is not that simple. Completely avoiding context switching is rarely possible. But if you can limit it, your productivity will dramatically increase.

Does it mean you should only work on one thing at a time?

In the past I thought a good solution for junior engineers to combat context switching was to ask them to work on just one project at a time. But this idea has a serious drawback – projects often get stuck due to factors outside our control. If this happens and there is no backup, idling until the issue gets resolved is a waste of time.

Having two projects with different priorities works best. You execute on the higher-priority project whenever you can. If you can’t, you turn to the other project until the main project gets unblocked.

What I like about this approach is that it is always clear what to work on: the higher priority project wins unless working on it is impossible.

Falling back to the lower priority project means there might be some context switching. While it is not ideal, it is better than idly waiting until the issues blocking the main project are resolved.

But my TL (Tech Lead) always works on five projects!

Indeed, experienced senior and staff engineers often work on a few projects at the same time. In my experience, however, it is a different kind of work. It might be preparing a high-level design, working on an alignment with a partner team, breaking projects into smaller tasks, and tracking the progress.

The secret is that most of these activities don’t require as much focus as coding. Handling a few of them at the same time is much more manageable because the cost of switching contexts is much lower.

A simple way to ship maintainable software

This was my first solo on-call shift on my new team. I was almost ready to go home when a Critical alert fired. I acknowledged it almost instantly and started troubleshooting. But this was not going well. Wherever I turned, I hit a roadblock. The alert runbook was empty. The dashboards didn’t work. And I couldn’t see any logs because logging was disabled.

Some team members were still around, and I turned to them for help. I learned that the impacted service shipped merely a week before, and barely anyone knew how it worked. The person who wrote and shipped it was on sick leave.

It took us a few hours to figure out what was happening and to mitigate the outage. This work made one thing apparent – this service was not ready for the prime time.

In the week following the incident, we filled the gaps we had found during the outage. Our main goal was to ensure that future on-calls wouldn’t have to scramble when encountering issues with this service.

But the bigger question left unanswered was: how can we avoid similar issues with any new service or feature we will ship in the future?

The idea we came up with was the Service Readiness Checklist.

What is the Service Readiness Checklist?

The Readiness Checklist is a checklist that contains requirements each service (or a bigger feature) needs to meet to be considered ready to ship. It serves two purposes:

to guarantee that none of the aspects related to operating the service have been forgotten
to make it clear who is responsible for ensuring that requirements have been reviewed and met

When we are close to shipping, we create a task that contains a copy of the readiness checklist and assign it to the engineer driving the project. They become responsible for ensuring all requirements on the checklist.

Having one engineer responsible for the checklist helps avoid situations where some requirements fall through the cracks because everyone thought someone else was taking care of them. The primary job of this engineer is to ensure all checkboxes are checked. They may do the work themselves if they choose to or assign items to people involved in the project and coordinate the work.

Occasionally, the checklist owner may decide that some requirements are inapplicable. For example, the checklist may call for setting up deployment, but there is nothing to do if the existing deployment infrastructure automatically covers it.

The checklist will usually contain more than ten requirements. They are all obvious, but it is easy to miss some just because of how many there are.

Example readiness checklist

There is no single readiness checklist that would work for every team because each team operates differently. They all follow different processes and have their own ways of running their code and detecting and troubleshooting outages. There is, however, a common subset of requirements that can be a starting point for a team-specific readiness checklist:

[ ] Has the service/feature been introduced to the on-call?
[ ] Has sufficient documentation been created for the service? Does it contain information about dependencies, including the on-calls who own them?
[ ] Does the service have working dashboards?
[ ] Have alerts been created and tested?
[ ] Does the service/feature have runbooks (a.k.a. playbooks)?
[ ] Has the service been load tested?
[ ] Is logging for the service/feature enabled at the appropriate level?
[ ] Is automated deployment configured?
[ ] Does the service/feature have sufficient test coverage?
[ ] Has a rollout plan been developed?

Success story

Our team was tasked to solve a relatively big problem on tight timelines. The solution required building a pipeline of a few services. Because we didn’t have enough people to implement this infrastructure within the allotted amount of time, we asked for help. Soon after, a few engineers temporarily joined our team. We were worried, however, that this partnership may not work out because of the differences in our engineering cultures. The Service Readiness Checklist was one of the things (others included coding guidelines, interface-based programming, etc.) that helped set clear expectations. With both teams on the same page, the collaboration was smooth, and we shipped the project on time.

The 3 categories of skills every software developer needs to know

100% of software engineers who don’t keep their skills sharp become obsolete[*]. Many who do keep their skills sharp also become obsolete. It all boils down to what skills they focus on.

I divide skills into three main categories: company-specific, job-specific, and universal. This categorization makes it easy for me to decide where to invest my time when it comes to skill development.

Company-specific skills

Each company has internal infrastructure, processes, and tools. Knowing them before joining the company is impossible, but everyone must learn them after joining.

Being strategic about what to focus on can save you a lot of time. Company-specific knowledge and skills are, by definition, not transferable. You need to learn them to do your job, but they become useless the moment you move on.

My strategy is to have a solid understanding of company-specific tools and processes but not dive too deeply unless I have to. Being in the dark will slow me down, but drilling into all the details is not much better. Most of these things constantly change, and I will quickly forget what I don’t use. Instead, my time is better spent developing other, more general skills.

When I worked at Microsoft, I had a colleague named Ben. Ben was the most knowledgeable person I knew when it came to the .NET Framework build system. Although he was not on the build team, he knew all the scripts, hacks, and environment variables used in this build system. Acquiring and keeping this knowledge fresh ate much of Ben’s time. Having Ben around was great for the team. We could (and constantly did) ping him for help with build-related issues. Eventually, Ben moved to a different team outside our organization. His expertise evaporated instantly, even though he didn’t go far.

Job-specific skills

Job-specific skills are usually the skills that get us hired. Companies look for people who can hit the ground running, and having skills in demand increases the chances of getting hired substantially.

Job-specific skills are transferable. You learned them before getting hired and may use them in your next job. There is a caveat, though. It is only true if you keep your skills up-to-date. Knowing React and knowing React-as-of-four-years-ago is not the same thing.

I learned that trying to keep skills sharp just by using them at work doesn’t always work. Companies rarely move as fast as technology. For instance, the product you work on might have been on the bleeding edge a few years ago. But it got stuck there because there was never a good enough business reason to migrate it to the newest framework version. You still need to maintain it but can’t use the latest features.

Another thing to pay attention to is the signs that the technology you specialize in is becoming obsolete. Some technologies are more durable, but some can fade fast. You don’t want to wake up one day only to realize that everyone except you has moved on.

CoffeeScript was one of the most followed projects on GitHub in the early to mid-2010s. It was incredibly popular, and there was a lot of hype around it. Today, hardly anyone remembers it.

Even if you got hired for your job-specific skills, it doesn’t mean you can’t learn new skills on the job. If you want to develop a new skill, you may consider joining a new project or moving to a different team. It isn’t always easy because you don’t have the skills they need, but it is doable – especially if you are known as someone who learns fast.

And the biggest secret: not only do you develop a new skill that may land you your next job, but you are also paid to do this.

Universal skills

Finally, there are the universal skills. These are skills that never become obsolete. You can use them at your current job, at your next job, or for non-work-related purposes. You can also apply them instantly.

These are skills like writing, effective time management, delivering great presentations, etc.

The biggest problem with universal skills is that developing them never seems urgent. As a software developer, you will not lose your job because you write lengthy emails. But eventually, you may hit a glass ceiling and realize that what inhibits you is not coding but the lack of these universal skills.

These 5 habits will make you a great code reviewer

High-quality code reviews are hard.

They are time-consuming and require significant mental effort.

Code reviewing is also not taught at school, and figuring it out on your own is hard work. In this post, I share five crucial habits all great code reviewers I know have in common.

Make the code review about the code

Code reviews are about the code, not the person who wrote the code. If you think the proposed change is incorrect or have suggestions to improve it, then, by all means, leave your comments (just remember to make them professional and high-quality). However, leave out comments that don’t relate to the code under review.

Understand the code and ask questions when in doubt

Understanding the code under review is the foundation of a solid code review.

It is also the most difficult part.

Whenever you have doubts about proposed changes, you should ask the author for clarification. You might not be aware of an assumption the author is making or need help understanding how their changes fit in. However, the difficulty in grasping the changes is also often a sign of a mistake.

Asking a question will prompt the author to explain their thought process. As a result, they will either answer the question and clear up your doubts or realize that something is indeed wrong and needs to be fixed.

Be clear about your expectations

Code reviews can generate a wide variety of comments. Some are nits that you would like to see fixed but are not real issues. Some, however, identify serious problems that must be corrected before merging. If you leave a comment, make sure that the author understands which category it falls into. Doing this will save you and the PR author time and frustration.

Sometimes, you may take on a PR that is outside of your area of expertise. You may realize it only after you already left some comments. You are now in a weird situation: the author expects you to finish the review and approve the PR, but you don’t feel confident you can. If this happens, instead of accepting the change you don’t understand, it is better to leave a comment recommending the author get a review from someone more familiar with the code they are changing.

Side note: reviewing code that is outside of your area of expertise is a good thing. Even if you don’t understand the change enough to approve it, you can still provide useful feedback, identify bugs, and learn something. Just make sure the author does not expect to get your approval.

Cross-check with the existing code

Code reviews show only code that changed. Most of the time, it is sufficient. But sometimes, to understand the change fully, you need to check how the new code works with the code not included in the review because it hasn’t changed.

This idea may be obvious to most developers, but I was surprised to meet some who had never considered checking the existing code when reviewing PRs. In my experience, the biggest surprises are caused by not what’s included in the PR but by what’s missing.

Occasionally, examining the existing code may not give you all the answers. The ultimate weapon for these situations is to use a debugger. You can check out the PR branch and step through the code. It should resolve all your doubts. I resort to a debugger very rarely. It’s almost always easier and faster to ask the author.

Resist the “stamp” pressure

The pressure to merge changes quickly for projects on tight timelines is high. And it only grows as the deadline nears.

During the end game, engineers enter a Pull Request frenzy. Eventually, due to the number of PRs, code reviews often become a bottleneck. So, the engineers try to unblock themselves by asking to “stamp the diff” (i.e., approve changes without looking).

On the one hand it is understandable – no one wants to miss the deadline. On the other hand, these are the times when code changes need even more scrutiny than usual. Due to the time pressure, most PRs are coded very hastily. The changes may not be validated thoroughly (or at all) and the stress only increases the likelihood of mistakes.

While a proper code review takes time, merging code with issues a review could have caught is more costly. At best, fixing the problem will require sending a new PR (which, by the way, will trigger a review). At worst, an embarrassing bug ships to customers.

I remember when one of my teams worked extremely hard to finish a project on time. We were very close, and then one of the team members dropped a 1000-line PR a few hours before the deadline. The manager was trying hard to find someone willing to approve the PR. It wasn’t easy because the PR had a bunch of red flags, like spotty test coverage or many TODO comments, but he eventually succeeded. As soon as our product shipped, we started getting reports from angry customers complaining that important scenarios stopped working. We found that the hastily merged PR was the culprit. The team scrambled to fix the issues, but the damage was done. This one PR cost us the reputation we’d been building for a long time.

On-call Manual: How to cope with on-call anxiety

On-call anxiety is real. I’ve been there, and I know engineers who experienced it. Many factors contribute to it, but from my experience, three stand out.

1. Unpredictability

Unpredictability is the number one reason for on-call anxiety. You might be responsible for a wide range of services. They may break anytime for various reasons like network issues, deployments, failing dependencies, shared infra outages, data center drains (a.k.a. storms), excavators damaging etc. On-calls, especially new ones, worry they won’t know what to do if they get an alert. How would they figure out what broke? How would they come up with a fix?

What to do about it?

With experience, the unpredictability aspect of the on-call gets easier. But even for the most seasoned on-call engineers, handling an outage can be difficult without the proper tools like:

Easy-to-navigate dashboards that allow to tell quickly if a service is working correctly and identify problematic areas in case of failures
Playbooks (a.k.a. runbooks) explaining troubleshooting and mitigation steps
Documentation describing the service and its dependencies, including the relevant on-call rotations to reach out if necessary

Having a team eager to jump in and help mitigate an outage quickly is priceless. Your team members understand some areas better than you. Knowing they have your back is reassuring.

2. Too many alerts and incidents

The second most common reason engineers fear their on-call is a never-ending litany of alerts, requests, and tasks. If you get a new alert when you barely finished acknowledging a previous one and are also expected to handle customer tickets and deal with requests from other teams, fretting your on-call is understandable. The exhaustion is usually exacerbated by the feeling of not doing a decent job. I was on a rotation like this once. After a while, I realized that everyone, not only me, was overwhelmed. Even though we toiled long hours, most alerts were ignored, customer tickets remained answered, and requests from other teams were only handled after they escalated them to the manager.

What to do about it?

There is no way a single person can fix a very heavy on-call by themselves. They won’t have the time during their shift, and by the time the shift ends, they will be so fed up that they won’t want to hear about anything on-call-related. There are, however, a few low-hanging fruits that can help improve the quality of the on-call quickly:

Delete alerts – find routinely ignored alerts and determine if they’re useful. If they aren’t – delete them.
Tune noisy but useful alerts – adjust thresholds and windows for flapping alerts, alerts that fire prematurely, and short-lived alerts.
Get a secondary on-call – a second person could help handle tasks the primary on-call does not have the capacity to deal with (e.g., customer tickets). This could be only temporary.

These ideas can alleviate on-call pain but are unlikely to fix a bad on-call for good. Improving a heavy on-call requires identifying and addressing problems at their source and demands effort from the entire team to maintain on-call quality. I wrote a post dedicated to this topic. Take a look.

3. Middle-of-the-night alerts

Many on-call rotations are 24/7. The on-call is responsible for dealing with incidents promptly, even if they happen in the middle of the night. Waking to an alert is not fun, and if it happens regularly, it is a valid reason to resent being on-call.

What to do about it?

While it may not be possible to avoid all middle-of-the-night alerts, there might be some actions you can take to reduce the disruption. A lot will depend on your specific situation, but here are some ideas:

Check your dashboards in the evening and address any issues that could raise an alert.
Increase alert thresholds outside working hours. If your traffic is cyclical – e.g., you have much lower traffic at night because most requests come from one timezone – you may be able to relax thresholds outside working hours. Even if an incident happens, its impact will be smaller. Also, alerts get much noisier if the traffic volume is low (e.g., if you get ten requests in an hour and one fails, you might get an alert due to a 10% error rate).
Disable alerts at night. Some outages won’t cause any impact unless they last for a long time. For instance, our team was responsible for a service that would work fine even if one of its dependencies was down for a day. This 24-hour grace period allowed us to turn off alerts at night.