Edward Huang

Posted on May 20, 2023 • Originally published at pathtosenior.substack.com on May 11, 2023

How to Avoid Falling Down the Rabbit Hole When Solving Problems

#career #softwareengineering #programming #softwaredevelopment

Developers suck at estimation.

When someone asks them, "How long does it take to investigate the bug or to solve X?" We often give them an optimistic estimation, "It will be a T-shirt size Medium." However, we dive into the problem and realize that the problem is bigger than we initially estimated - we will need to touch on multiple files/folders and libraries to change a single function. Then, they will make the classic time-wasting mistake.

Thanks for reading Path To Senior! Subscribe for free to receive new posts and support my work.

They will go silent and solve the problem with their solutions for DAYS. Essentially, they are in the rabbit hole.

It is a different world down there. Simple problems and bugs with well-defined solutions can become complex ones with interconnected code. By the time they submitted their PR, everyone was shocked.

"How did it turn out to be so crazy? This problem can be solved in X, Y, Z." The team requested a change request on the PR.

The developer submitting the PR is unhappy because they have worked on it for days. The team wasn't happy either because they had spent days not hearing anything from the developer.

Through experience, a pro developer knows never to get into the rabbit hole. They know when the problem gets gnarly, they stop spinning their wheels and ask for help.

Let's face it; we've all been pulled down the dreaded rabbit hole at some point in our careers. The experience has taught us invaluable lessons.

This week, I want to share how you can escape and not get into the rabbit hole when solving problems. I will talk about three bugs I mostly encounter in my software engineer career and dive deep into strategy and tips to help you solve your problems faster and escape the rabbit hole.

Let's get right into it!

Solving problems and debugging is often a process of making and testing hypotheses. Since you think the system should be doing one thing but doing another, it must mean that one of your assumptions about how the system works is wrong.

There are three types of bugs that I mostly encounter in my software engineering career, ranking from easiest to hardest to solve.

"The Human Error"

Humans are known to be unreliable. Human Error bugs are bugs that usually cause by developer error.

For example, you are trying to figure out why the program keeps throwing a compilation error saying that it cannot find the implicit value "fool," only to realize that you didn't name the variable right - it should be "foo" instead of "fool."

Misconfiguration issues are the "common cold "of the data center. In the Designing Data-Intensive Application, Martin Kleppman mentioned that 85% of service outages are caused by configuration errors instead of hardware faults. Because of this, each team and each company have different processes to prevent human errors during deploying and designing services. For instance, some companies give developers authority to release their features to the staging environment. Still, they must ask for the manager's or senior engineers' approvals for production release.

A good design API and admin interfaces make it easy to do the "right thing" and discourage "the wrong thing."

Automate the steps where people will find the most mistakes. I caused an SI due to version tag misconfiguration on the payment service. During the continuous delivery process, bumping the release tag requires a manual change on the configuration file. I accidentally set the tag to 0.51 instead of 0.515, omitting the last number 5 on our company's payment system, and caused an SI. Carelessness happens, but I recommended automating bumping tags through a script while writing the postmortem to decrease human errors.

"The Client/Business Error"

The client error occurs when there is a discrepancy of assumption between the developer developing the app and the user that is using the app.

When you are given a product requirement from a business, certain case studies will break the assumption from the business. For example, when I was tasked to design a referral contest system, the product requirement was "when the contest ended, the 1st winner who referred the most people got X amount of points, 2nd - 4th winner got Y amount of points." As I dive deep and think about all scenarios in which it may go wrong, I realize there is a possible scenario when there are multiple 1st winners. In this scenario, do we want to give all of them X amount of points, or do we want only to pick one out of all the 1st winners?

If there is a mismatch in assumption, there will be an error when launching the feature where the user isn't happy with not getting any rewards even though they see themselves as the 1st in the leaderboard.

Sometimes bug occurs when there is a misunderstanding about the assumption of the other service. For instance, one of the notification services has an overwrite feature, where marketers can force send notifications to X amount of users. The feature was created for sending notifications to customers for emergency purposes. Thus, there is an extra step that marketers need to take before using the feature. Many marketers have filed a bug ticket saying they cannot send their campaign to the user because it keeps throwing errors. Only to know that the feature wasn't meant to be used for last-minute campaign sends.

Client and business errors are usually reproducible by understanding how the flow works. We can get to the root cause of these bugs by asking and understanding the client's flow and steps to get into such an error.

The Infrastructure Problem

Infrastructure problem is problems that lie underneath the application code.

Infrastructure bugs are tricky because they're non-deterministic - even when you do things in the same order, you get different results.

Every once in a while, your service pods keep restarting out of nowhere. You don't know how to reproduce the bug because you don't know the root cause. You search online or read some blog posts that a common issue for pod restart is Out of Memory (OOM) problem. So you try to look at the memory utilization percentage on that pod during the restart period, but you didn't see any signs of OOM. So you go back to square one to identify any events that happen during those hours to correlate those events with such problems.

These kinds of bugs are bugs that are hard to reproduce. Sometimes you are confused if you have solved the problem or got lucky. You can neither accept nor reject your hypothesis yet.

A specific bug was also categorized as the "Heisenbug." It was nicknamed after physicist Werner Heisenberg. Heisenberg was one of the first to assert that observing a phenomenon may cause it to change its behavior. From a software perspective, this means that when you try to investigate the bug (add a print statement, change the certain flow of the code), the bug stops appearing! This could be frustrating for many software engineers because it usually takes up the entire sprint or time to understand the underlying root cause.

3 Tips to Not Get Suck into the Rabbit Hole When Solving Problem

Taking a Break

Taking a break helps because it forces your mind to look at the problem again from a different angle.

There is power in solving the solution the longer you think about the problem. When I was stuck at a bug for a couple of hours, I would take a walk outside. However, I wouldn't stop thinking about the problem. Even when I took a bathroom break, I walked while thinking about the problem. Usually, I find the root cause or the solution in every aspect of my daily life. A change in scenario helps activate certain parts of your electromagnetic wave in your brain and trigger a profound solution to the problem.

Innovation and problem-solving depend on a routine that systematically brings interesting ideas to the surface of our awareness. Tiago Forte mentioned that his dad uses a similar "strategy" to integrate his creative process into every aspect of his life. During sermons at the local church, his dad would practice sketching biblical stories in a small paper notebook as he listened. While going to the supermarket, he would buy vegetables with unusual shapes to take home and incorporate into his still life. When watching TV with the family, Tiago often caught himself looking off to the side, at the living room wall, and thinking about the next piece of art he wanted to produce.

Ever found an epiphany in the shower?

Million people experience ideas and "think outside the box" during shower time. One hypothesis is that showering is one activity that relaxes your body and brain, and when your brain relaxes, it doesn't filter out distractions or imposed rules as easily. Thus, provoking thoughts and solutions often flow through them easily.

Identify When You Need to Converge

The simple pattern of solving problems involves divergence and convergence.

When trying to solve a bug, you usually start with the act of divergence. You will start by searching through the codebase on the bug and try to reproduce them. You will look through the documentation to understand where bugs occur in the flow. You start looking at the logs and history of the previous codebase version and get a diff of what has changed. The amount of resources and things you are looking at is increasing; you search in breadth instead of depth.

You start the convergence process as you understand the flow and the ability to reproduce the bug. Convergence forces us to eliminate possible options and helps us dive deep into the root cause of the problem. For example, after researching and reading through multiple codebases on different repo, you identify the bug that occurs in repository A. Thus, you dive deep into repository A and create a simple unit test to see if you can reproduce the bug.

A lot of the time, engineers often get sucked into the rabbit hole by aimlessly doing divergence. They would think it would let them know better about the problems if they knew how it works "underneath the hood." Therefore, they command-click into the library's source code and read through it. After hours of reading through layers of abstraction, they don't understand; they stop - not having learned anything.

Reading a thousand lines of code written and updated by multiple people across the year is hard, and you will forget everything you consumed after a few weeks or hours of reading through the codebase.

Instead of reading the source code because you want to learn, read the source code because you have a "concrete" reason. You are debugging by starting from the entry point of the flow and slowly "peeling" down the abstraction on the location of the next execution. For instance, if you encounter a problem from the customer flow, you can start by looking at the UI layer and understanding which endpoint it is calling. Then, you try to go deeper into the API layer, finding the location of that endpoint. Then, once you find the endpoint, you go through the code base of that implementation. If you realize that the bug occur because of a library call, you could command-click on the implementation to understand the root cause.

This debugging path helps you unravel many layers of the team's entire stack, and you can start asking questions and collaborating with other team members. This debugging experience allows the deep system knowledge you gained to thrive within your brain.

Ask Questions

I used to go dark for hours or days when I encountered a problem.

I got asked during standup, "How are things going?"

Although my team is trying to help me and check on me, I often misunderstand it as monitoring my progress. Then a couple of days later, I would try to push my code with massive thousand-liens of code that is hard to read.

I was frustrated that I got pushed back on every line of my PR.

"That PR are solutions that I spent HOURS thinking about the problems, and it seems like they want to scratch all of my solutions out." I thought.

I felt like I spent days on needed solutions to imaginary problems.

I learned from my experience to be sure before going deeper into the problem.

Communicating the solutions you have tried and discussing options with your team is an important skill. Too often, you don't need all the extra effort - you can solve the problem another way.

The simple thing to do is ask questions before going deeper into the weeds. Learning from that experience, I would stop plowing forward each time I explore a solution that seems hard to implement. I would ask the group channel or the principal engineer on my team about the challenge that I had encountered. Then, I'll give a couple of the solutions I consider and ask if there is some alternative to the solution that I have not yet considered. Often, they will give some insights that lead me to a simpler solution, or they will understand the trickiness of the problem and try to help me move forward.

Asking for help is not a sign of incompetency or weakness. It shows competence that you can pause and re-evaluate your situation.

Recap

Debugging can take up a lot of developers' time. There are three bug types that I have encountered in my software engineering career. It is easier to identify the human error, and the bug becomes harder and harder to identify on the infrastructure layer due to its non-deterministic nature.

Developers often make a classic time-wasting mistake - they keep spinning the wheels when encountering problems. One tip to avoid getting stuck in the rabbit hole is to pause and change scenery. Tiny light-bulb often popped up to give you more insight and an easier route to solve the problem. In addition, you can reach the root cause faster by working backward from the entry point of the bug. Lastly, before you even go deeper, ask questions to verify if any other options are simpler.

You can rapidly pinpoint the core issue and emerge as an adept developer by deliberately halting, inquiring, and delving into a specific avenue.

What other tips and methods have you done to avoid going down the rabbit hole when solving problems? Comment them down below!

💡 Want more actionable advice about Software engineering?

I’m Edward. I started writing as a Software Engineer at Disney Streaming Service, trying to document my learnings as I step into a Senior role. I write about functional programming, Scala, distributed systems, and careers-development.

Subscribe to the FREE newsletter to get actionable advice every week and topics about Scala, Functional Programming, and Distributed Systems: https://pathtosenior.substack.com/

DEV Community

How to Avoid Falling Down the Rabbit Hole When Solving Problems

"The Human Error"

"The Client/Business Error"

The Infrastructure Problem

3 Tips to Not Get Suck into the Rabbit Hole When Solving Problem

Taking a Break

Identify When You Need to Converge

Ask Questions

Recap

💡 Want more actionable advice about Software engineering?

Top comments (0)

Read next

Conditional Statements and Loops in JavaScript

2779. Maximum Beauty of an Array After Applying Operation

Python 🐍 and variable types

🚀 The Rise of Bun.js: Why It’s More Than Just Another JavaScript Runtime 🥖