Using mental models to think about software

No person can function properly without building mental models to understand the world around them. The reality is simply too complex to deal with as is, so we need to abstract it.

Software is no different. The complexity of software systems has been only growing. As a result, the ability to quickly build mental models and use them to reason about these systems is an important software engineering skill.

Building good mental models is important because they allow us to better communicate with other software engineers, explain our ideas, and understand the impact of our changes on our software.

Building mental models

Reasoning about software systems requires building mental models that represent them. The granularity of these models depends on their purpose. Often, one level of granularity is not sufficient. The most effective software engineers can quickly build a few models and switch between them depending on the situation.

For example, if your team owns a few services, you can draw a lines-and-boxes diagram where boxes represent services and lines represent dependencies. You can then zoom in and build a model for each service. These service-level models could illustrate interactions between the libraries a service consists of. If you want more details, you can create a class diagram. You can continue zooming in and focus on methods, code blocks, statements, etc.

Each of these models describes your system at a different level of granularity and has a unique set of applications. The high-level model could be useful to troubleshoot larger outages (or when talking with your director) but is unlikely to help you fix a small bug. The more detailed models are best suited for solving gnarly issues but won’t be helpful when explaining your infrastructure to other teams.

Building models covering different aspects of the same system is also common. If you want to analyze your system from the security perspective, your model will include different details than when focusing on performance.

Caveats

Mental models are so natural to people that we often forget about their flaws.

All models are wrong

Models, by definition, ignore details. As they only capture certain aspects of reality, they are inaccurate. Furthermore, some relevant information is often omitted because it doesn’t fit the model. Edge cases are an excellent example.

For instance, when software developers explain how their code works, they rarely mention error cases. They focus on their ifs and fors and the program flow but omit exceptions because exceptions make their model murkier. The problem is that error scenarios are important. Incorrect or missing error handling is a common cause of system outages.

Models get more wrong with time

The world, including our software systems, is in constant flux. But mental models don’t automatically keep up with changing reality. Outdated mental models lead to misunderstandings and bad decisions.

I experienced this very problem recently. I started working on a feature that depended on a system I had never seen changing. Everything was going swimmingly, and I only needed to tie up a few loose ends related to some new data requirements. But when doing this, I discovered, to my dismay, that the system I depended on had recently changed. It had been updated to accommodate the same data requirements I struggled with. Making my feature work now required additional information I didn’t have. Plumbing this data meant revisiting my implementation. Working off of an outdated mental model cost me implementing my feature twice.

No two models are identical

Building mental models requires deciding which details are important depending on the purpose of the model. However, even if the purpose of the model is well understood, different people will consider different information relevant.

Also, mental models built at different times will naturally differ because they capture different system versions.

The interesting fact about this phenomenon is that the overlap between mental models built by different people is usually significant. The differences are often discovered unexpectedly, e.g., when discussing small but important details.

How to quickly ramp up on new codebases

Joining a new team is intimidating. There is always much to learn – new people, new processes, and a new code base.

Ramping up on your new team’s code base is difficult, but knowing how to work effectively in it is crucial to your success. So, how do you do it quickly?

My mid- and early senior developer years were intense. Due to a mix of reorgs and personal interests, I found myself on a new team every year or so. As a result, I had to learn new codebases in quick succession. They included .NET System.Xml, OData, Entity Framework, Entity Framework Designer, ASP.Net SignalR, ASP.Net Core, and the Alexa mobile app, and most of them were over one hundred thousand lines.

The first couple of transitions were slow and overwhelming. But they helped me develop a strategy I used later to quickly get up to speed on new codebases.

Get your hands dirty ASAP

The sooner you start working actively with the code base, the sooner you will get productive. Reading documentation can be helpful, but nothing can replace the hands-on experience.

When I joined Amazon, I asked my manager to assign me a simple bug to fix on my first day. Even though the fix amounted to a single line of code, it took me a few days to send it for review. It might seem long, but this quest was not about fixing the bug. Rather, it was about forcing myself to set up my development environment, learn how to build and run the code, write unit tests, and debug our product.

Review code

Each team has its way of letting members know there is a new PR (Pull Request) for review. It could be email, chat, or review tool notifications. Whatever it is, subscribe to this channel and start reviewing PRs. You won’t understand much initially, but you may still catch some bugs (e.g., off-by-one errors). More importantly, these reviews will allow you to ask questions and gather a broader context of what the team is working on.

Identify code that matters

The 80/20 rule does apply to codebases. This is especially noticeable in the bigger ones, where most code changes developers make are concentrated in one area. Knowing which code it is allows you to focus your ramp-up on the area you are likely to work on soon.

There are a couple of easy ways to tell which code is in the top 20%:

paying attention to code reviews
checking the commit history

Take notes, draw diagrams

I discovered that taking notes and drawing diagrams is a very effective way to grok the most complex parts of the code. Call graphs, class diagrams, and dependency graphs all help organize the information and are great reference material in case you need to refresh your memory.

After I joined Amazon, I drew diagrams of a few areas of code I couldn’t understand. They became an instant hit. Even people who had been on the team for much longer than me wanted a copy.

Debug code

Reading a new codebase is like reading a book in a language you barely know. You can do it, but it is excruciating.

For me, one of the best ways to overcome this is to step through the code with the debugger. Initially, I have no idea what I am looking at. But if I follow the same code path a few times, I begin to recognize code I have already seen, and soon, everything starts to fall into place.

To get the most out of my debugging sessions, I do two things:

I continuously inspect the state – the stack trace, parameters, and local and class variables
I take notes and draw diagrams

Using the debugger to learn the code base is slower and narrower than reading the code. But it is also much deeper. In fact, when debugging, I often realize that the understanding I gained from reading the code was incomplete or sometimes even incorrect.

Read documentation

Reading documentation can help accelerate your onboarding. It could be especially useful when it comes to the high-level architecture and concepts that are hard to deduce from code.

However, you should take documentation with a grain of salt. It is often sparse and outdated. But this could be good news for you. Updating the documentation as part of your ramp-up could be a great contribution to your new team.

Join on-call rotation

Joining the on-call rotation to accelerate team onboarding might sound extreme but I did it on my first team at Facebook. I did it because I was concerned that my ramp-up was slow, so I decided to push myself. I learned more about our services this week than in weeks before. After my shift ended, I knew what services our team owned, their dependencies and where to find their code. Alerts immediately pointed me to the hot code paths, and troubleshooting issues forced me to dig into the code.

The Must-Have Skill Every Senior Developer Needs

Writing code is a fundamental skill every junior software developer needs to master. However, coding skills are no longer the biggest differentiator at senior and above levels. Every senior engineer is expected to have solid coding skills, and growing to higher levels based on coding alone is rare.

If not coding, then what?

If coding is not the skill to grow beyond senior levels, what skill is it?

This question has no correct answer, as no single skill can elevate you to the staff+ levels.

However, software development is a team sport, and successful senior engineers must focus on many areas besides coding. They are often responsible for projects spanning one or more teams. They drive the design, collaborate with partner teams, communicate progress, etc. Doing all this work effectively requires good communication skills, especially writing (which I consider one of the Universal Skills.)

Why writing?

Writing clarifies thinking and promotes the exploration of ideas. I don’t know how many times I thought I understood something, only to struggle to summarize it in writing. But once I succeeded, I had a much deeper grasp of the concept and noticed new insights I hadn’t considered before.

Thanks to its durability and asynchronous nature, writing is also a great way to scale. You can write something once and refer to it later. Your readers can benefit from it even if you are not around. Here are a few examples from the software engineering field:

Project execution plans are useful for aligning all interested parties: the team that will execute the project, partner teams, your manager, etc., without having to talk to them individually.
Documentation helps avoid explaining the same concepts again and again. It protects the team from scrambling when a key team member leaves the project or the team (see also: bus factor)
Design documents allow for gathering feedback without holding a meeting for all interested parties. They are also an invaluable resource to understand why certain design choices were made and what alternatives were considered.

Writing is difficult

Writing is not natural for most people. Making the content clear, concise, and well-organized is grueling work.

I often see software developers dismay when I ask them to write a rollout plan or a design doc. Some tell me they were relieved to graduate from college because it meant they would never have to write again, and I am shuttering their world.

There are also other reasons why writing is difficult. Many developers have to write in a non-native language. But even native speakers often struggle because the way of writing they learned at school does not serve them at work.

Opportunities to practice writing.

Even though writing becomes important gradually, it doesn’t mean you should wait to improve it. On the contrary, the sooner you start, the better. Fortunately, every developer has plenty of opportunities to practice writing on the job.

Emails

Emails are everyone’s bread and butter these days. However, many emails are hard to read and understand and, as a result, fail to achieve their goal.

In my first job, our manager asked us to send a weekly email summarizing what we worked on and accomplished in the past week. I was proud of my reports: they were very detailed and explained everything. Despite these emails, my manager kept asking me what I had been working on. When I saw one of these emails years later, I understood. He never read them. I couldn’t blame him – it was an unbearable wall of text.

Memos / Announcements

Posts, memos, and announcements meant for a wide audience need to be tailored to that audience. Otherwise, readers won’t understand them and will give up reading them.

I recently read a post from my co-worker reporting on the status of our project. The audience of this post was broad (more than 150 people) and included managers, directors, and partner teams. The technical details in this post left me lost despite my heavy involvement in this project. I can only guess what others took away from this post.

Design documents

Good design documents explain complex topics using simple language. This combination makes them hard to write, but the payoff is worth the effort. Confusing design documents lead to lengthy discussions, feedback on unimportant matters, and frustration.

I once asked a junior engineer to write a design document explaining how he plans to implement a feature we promised to deliver. What I got was an untitled Google Doc with no text and two pictures – a diagram and “The Starry Night” by van Gogh. While I have nothing against “The Starry Night”, the document didn’t give me the faintest idea about the design of the feature, assumptions, and considered alternatives.

Code review feedback

The main purpose of sending code for review is to gather feedback. But giving short, clear, and actionable feedback professionally is an art. The conclusion: if you want to improve your writing, you should review a lot of code (and provide feedback).

Code comments

I am not a huge fan of writing code comments, but in some situations, they are warranted. Unfortunately, many code comments are so poorly written that it is sometimes hard to tell if they are there to help you or make you more confused.

The main challenge with code comments is that they need to be short to not overshadow the code but must clearly explain intricate ideas that the code cannot express. These requirements make writing code comments good practice.

Documentation

Writing documentation is one of the least favorite tasks software developers want to do. Yet, it often is one of the most impactful they can do. Good documentation helps put out on-call fires faster, makes onboarding new team members easier, and reduces randomization caused by repeatedly answering the same questions. By writing documentation, you help your team achieve more and polish your writing skills.

Bug reports

If you want someone to do something for you, you need to make it as easy as possible for them to do it. If you don’t, what you are asking for will take a long time or will never get done.

This rule applies perfectly to bug reports. If you encounter a bug that blocks your work, writing a clear bug report dramatically increases the chances of getting the issue fixed. Despite this, many reported bugs are incomprehensible.

At Microsoft, I worked on a few high-profile open-source projects like Entity Framework or ASP.Net Core. As thousands of developers used our products, we received a decent number of bug reports. Unfortunately, we often couldn’t understand what issue was being reported, how to reproduce it, and the expected behavior. Following up on these issues was painful. The back-and-forth took weeks. The “bugs” slipped from release to release while we were waiting for the details we requested. Eventually, we closed most of these bugs without resolution as it was hard to prioritize them over other issues we could immediately investigate and fix.

Understand the purpose of your work

One mistake I’ve seen junior software engineers repeat again and again is their lack of understanding of why they work on tasks they work on. This confusion can be somewhat justified by the relatively small scope junior engineers typically have but it’s a slippery slope. Doing something only “because my manager (or a senior engineer) asked me to do it” has a few drawbacks:

Inability to execute independently: making even the smallest decision without involving your manager or the tech lead will be hard if you don’t understand the bigger picture. You will get stuck if you can’t get hold of them. It will also be difficult for you to demonstrate you know how to solve problems at your level and are ready for bigger challenges.
Communication gap: your manager or senior engineer may unintentionally give you incomplete or incorrect information. If you don’t have enough context, you may not notice this. You may struggle to complete the task, but once you finally do, it may turn out that what you built is not what they hoped for and needs redone.
Hindered innovation: unawareness of where your work fits limits your ability to propose solutions beyond what you’re asked to do. Sometimes, the approach you’re instructed to follow may not be the best solution to the problem, but exploring alternatives is impossible if you don’t understand the broader context.
Incorrect prioritization: working on a task without knowing its purpose may lead to neglecting this task and unknowingly delaying work that depends on it.

How to understand the bigger picture?

The easiest way to understand where your work fits is to ask your manager or the tech lead. They are responsible for what the team needs to deliver, so they should be able to explain this instantly.

Your question may even come to them as a surprise. They probably assume everyone on the team already understands the purpose of their work. In my experience, this is not always the case. The bigger and more complex the project, the harder it is to connect the dots.

You can start small, but it is important to go deep. Start asking about your task. You may hear that it contributes to a project the team is working on. An answer like this is not very helpful but could be a great starting point. It allows you to drive the discussion further and ask more interesting questions like:

Why are we working on this project? Why is it important?
What metrics is this work expected to move, and how?
How does it support the company’s goals and priorities?
What projects did we decide not to pursue to fund this work (a.k.a. opportunity cost)?

A different way to understand where your work fits might be by talking to your product manager or people from the UX (User Experience) or marketing team. Because of their different perspective, they can teach you things you would never learn from fellow engineers. The challenge with this approach is that you need to be able to explain your role in the project to them.

A simple way to ship maintainable software

This was my first solo on-call shift on my new team. I was almost ready to go home when a Critical alert fired. I acknowledged it almost instantly and started troubleshooting. But this was not going well. Wherever I turned, I hit a roadblock. The alert runbook was empty. The dashboards didn’t work. And I couldn’t see any logs because logging was disabled.

Some team members were still around, and I turned to them for help. I learned that the impacted service shipped merely a week before, and barely anyone knew how it worked. The person who wrote and shipped it was on sick leave.

It took us a few hours to figure out what was happening and to mitigate the outage. This work made one thing apparent – this service was not ready for the prime time.

In the week following the incident, we filled the gaps we had found during the outage. Our main goal was to ensure that future on-calls wouldn’t have to scramble when encountering issues with this service.

But the bigger question left unanswered was: how can we avoid similar issues with any new service or feature we will ship in the future?

The idea we came up with was the Service Readiness Checklist.

What is the Service Readiness Checklist?

The Readiness Checklist is a checklist that contains requirements each service (or a bigger feature) needs to meet to be considered ready to ship. It serves two purposes:

to guarantee that none of the aspects related to operating the service have been forgotten
to make it clear who is responsible for ensuring that requirements have been reviewed and met

When we are close to shipping, we create a task that contains a copy of the readiness checklist and assign it to the engineer driving the project. They become responsible for ensuring all requirements on the checklist.

Having one engineer responsible for the checklist helps avoid situations where some requirements fall through the cracks because everyone thought someone else was taking care of them. The primary job of this engineer is to ensure all checkboxes are checked. They may do the work themselves if they choose to or assign items to people involved in the project and coordinate the work.

Occasionally, the checklist owner may decide that some requirements are inapplicable. For example, the checklist may call for setting up deployment, but there is nothing to do if the existing deployment infrastructure automatically covers it.

The checklist will usually contain more than ten requirements. They are all obvious, but it is easy to miss some just because of how many there are.

Example readiness checklist

There is no single readiness checklist that would work for every team because each team operates differently. They all follow different processes and have their own ways of running their code and detecting and troubleshooting outages. There is, however, a common subset of requirements that can be a starting point for a team-specific readiness checklist:

[ ] Has the service/feature been introduced to the on-call?
[ ] Has sufficient documentation been created for the service? Does it contain information about dependencies, including the on-calls who own them?
[ ] Does the service have working dashboards?
[ ] Have alerts been created and tested?
[ ] Does the service/feature have runbooks (a.k.a. playbooks)?
[ ] Has the service been load tested?
[ ] Is logging for the service/feature enabled at the appropriate level?
[ ] Is automated deployment configured?
[ ] Does the service/feature have sufficient test coverage?
[ ] Has a rollout plan been developed?

Success story

Our team was tasked to solve a relatively big problem on tight timelines. The solution required building a pipeline of a few services. Because we didn’t have enough people to implement this infrastructure within the allotted amount of time, we asked for help. Soon after, a few engineers temporarily joined our team. We were worried, however, that this partnership may not work out because of the differences in our engineering cultures. The Service Readiness Checklist was one of the things (others included coding guidelines, interface-based programming, etc.) that helped set clear expectations. With both teams on the same page, the collaboration was smooth, and we shipped the project on time.