Dhruv Agarwal for Middleware

Posted on Jun 17

Write Less, Fix Never: The Art of Highly Reliable Code

#developer #productivity #programming #career

If you're a developer tirelessly pushing out new changes, only to be dragged back by errors in your past work, this post is incredibly relevant for you.

Over the past decade in software development, one of the key mistakes I've made and seen others make repeatedly — is focusing on doing more work rather than ensuring the work done (no matter how small) is robust and will continue to work properly. These recurring errors can significantly hamper productivity and motivation.

From my own share of mistakes, I’ve learned valuable lessons. Here, I’d like to share a few strategies that will not only help you ship robust software but also free you from the shackles of your past work.

We will talk about the top 5 strategies that worked for me:

Plan for 10x
Psst: Your old work got a bug and is calling you back
Make the Systems Work for You, Not the Other Way Around
Always Answer with a Link
Understand software building is a team sport.

1. Plan for 10x

There are two types of engineers IMHO: those who hack their way through for today and those who design for the distant future. Neither approach is sustainable on its own.

Your code should be able to handle the growth your business is about to experience. However, over-designing for future challenges can lead to unnecessary complexity. There's a term dedicated to this - Bike Shedding

Here's my practical rule of thumb: plan for 10 times the current scale or consider how much your business is expected to grow in the next 2-3 years. Ensure your plans align with your business goals.

For example, if you're a cab company designing a booking module, and today your company handles 10,000 rides a day with an expectation to reach 100,000 rides a day in 2 years, use that as your benchmark. Designing a system for 10 million rides a day when you're only doing 10,000 rides might result in an overly complex and expensive solution.

2. Psst: Your old work got a bug and is calling you back

"Days and weeks of debugging can save you a few hours of writing tests" - someone wise.

Shipping code without testing all the edge cases is like a spray and pray strategy. The simplest way to ensure your code works as expected is by adding unit tests. This might sound obvious, but the importance of thorough testing cannot be overstated.

Unit tests not only act as the first line of defense against obvious errors but also serve as insurance for your code against unintended changes that could violate business requirements. Hence, reducing those adhoc bugs being assigned to you every sprint 😉

A trick for the lazy (like me): Before you write the code:

Write tests covering every corner case you can think of.
Pretend you're trying to break someone else's system.
Write assert False in all the tests and run them.
Naturally, all tests will fail.

Now, just work towards making each test pass. This approach takes less time overall and produces robust code every time!

3. Make the Systems Work for You, Not the Other Way Around

One of my managers once gave me the most impactful advice: "Act, don't react." This advice came when I was constantly being tagged on different Slack channels for problems, customer complaints, and payment failures. I was just reacting to each request, having no clue what might happen next.

That's when I started asking three questions for every feature I built:

How will I know it's working?
How will I know it failed?
How will I know it succeeded?

I then answered these questions at every level (feature, screens, app) by sending metrics to our APM tools like Datadog or NewRelic.

After setting this up, I configured alerts to notify me if anything went wrong.

By doing this, I became aware of bugs before they escalated into major issues, preventing reactive measures, poor customer experiences, and my own uncertainty about what might come next.

Start answering these three fundamental questions every time you build something to ensure you always act instead of react.

4. Always Answer with a Link

Just like bad work gets you tagged on various Slack channels for fixes, great work gets you tagged for context in areas you've worked on.

This can drain your energy when you least expect it, or worse, it can make you the go-to person for the same tasks because you know the complete picture.

Keep this secret trick to yourself:
Document everything. Include the context, architecture, and business-specific decisions you made while building the feature. When someone asks about the context of an area (feature, screen, app), just send them the link to the updated document. This will save you a few hours every time.

Additionally, thorough documentation makes onboarding new team members easier and ensures that your work remains accessible and understandable over time.

5. Understand software building is a team sport.

Software engineering often emphasizes the individual contributor path. However, reaching the end goal alone is impossible—you only reach it with your team (and vice versa).

Understanding and adopting a process excellence mindset helps you leverage the team's collective productivity.

Sorry for that worded statement 😄
To simplify, ensuring that reviews, deployments, and any collaborative activities involving code don't have significant wait times boosts your productivity immensely!

The best way to identify high waiting or blocked times in your team is to measure DORA metrics. You can use an open-source tool like Middleware, which provides DORA metrics out of the box.

middlewarehq / middleware

✨ Open-source DORA metrics platform for engineering teams ✨

Open-source engineering management that unlocks developer potential

Join our Open Source Community

Introduction

Middleware is an open-source tool designed to help engineering leaders measure and analyze the effectiveness of their teams using the DORA metrics. The DORA metrics are a set of four key values that provide insights into software delivery performance and operational efficiency.

They are:

Deployment Frequency: The frequency of code deployments to production or an operational environment.
Lead Time for Changes: The time it takes for a commit to make it into production.
Mean Time to Restore: The time it takes to restore service after an incident or failure.
Change Failure Rate: The percentage of deployments that result in failures or require remediation.

Table of Contents

Middleware - Open Source

View on GitHub

PS: I'm also co-founder of Middleware and our mission is to make engineering frictionless for engineers. Do consider giving us a star if you like what we've built!

Ship code like a boss!

By adopting these suggestions, you can significantly reduce the time spent revisiting and fixing past work. This will not only enhance your productivity but also ensure that your focus remains on innovating and delivering new features.

Be productive, not busy! All the best 😊

Top comments (33)

Jayant Bhawal • Jun 17

That's when I started asking three questions for every feature I built:

How will I know it's working?

How will I know it failed?

How will I know it succeeded?

This part. 👏

Dhruv Agarwal • Jun 17

🙏🏼 Jayant

John Hogerhuis • Jun 18 • Edited

"The simplest way to ensure your code works as expected is by adding unit tests."

I'd argue it's functional tests. They're less brittle, and more likely to provide confidence that the application does what it's supposed to do the way it's supposed to do it.

Unit tests make your codebase probably 10x larger or more. That's a lot more maintenance.

The argument is that unit tests allow you to make changes confidently because they will catch regression faults. So they would increase development speed. But the unavoidable counterargument is that with so much more code to maintain... code that is directly tied to the lowest level of code, binding to every interface, every type, every property creates so much inertia, how will it balance out?

And if you are really making big changes you may end up discarding a whole lot of unit tests as things get reworked. In which case, you ended up doing work that had no real longevity. Bad investment.

The balance I recommend is unit tests for hard fought, intricate internal business logic you want to lock in forever, and functional tests for everything else.

They're both really functional tests but unit tests can get you more solid testing on certain things, while being easier to write.

I wrote an emulator... I created unit tests for the CPU, and some hardware simulations. These are forever code but I want to ensure they stay working now and forever. Great investment. And they're fast and short... I can test every instruction at a very granular level without having the coding and execution overhead of other layers.

Outside the core of the emulation I expect a lot of churn. I want to detect memory leaks, 2nd level issues across modules, performance issues, basically stuff that unit tests won't see anyway. That stuff wants a functional regression suite.

Code that doesn't do something functional for a user and never prevents a bug is cost with no business benefit.

If your budgets and time are infinite, write all the units tests you want.

Eljay-Adobe • Jun 24

I think both.

Functional tests ensure the product functions as expected. Unit tests (TDD-style) ensures the code complies with basic correctness.

Jayant Bhawal • Jun 19

That's a great take.
Love the balanced perspective on this, @jhoger!

Ben Goldberg • Jun 18

Divide up your unit tests into two separate groups, development and release.

If you refactor your code and your release tests break, it's a problem.

If you refactor your code and your development tests break, it might be safe to change the tests instead of the code.

Alexandra • Jun 22

No... you double your work and you repeat yourself.
You just need to refactor the tests before the code.

Benoit COUETIL 💫 • Jun 18

Thank you for sharing 😊

I would label "4. Always Answer with a Link" as a broader concept "always plan for being unnecessary to the project". It applies to more categories than the one you mentioned.

Keep up the good work !

Dhruv Agarwal • Jun 18

Thanks for the input @bcouetil! I agree to your point.

The advice of sharing a link always just makes it more actionable :)

Riccardo Bernardini • Jun 20

I would add

if your language allows it, use contracts (pre- and post-condition, type invariant, and so on)

I just love contracts:

They document the behavior of your function/procedure/method better than any comment because they are executable and, therefore, it is much more difficult that they get out-of-dated
They are "implicit tests" of your function. Moreover, they refer to the "theoretical" behavior of your function and because of this they rarely changed. For example, if you have a package that implements an associative map (a very, very simple example), you can add as post-condition to your insert procedure that the key given to the procedure now must be present in the map and the corresponding value must be the value given to the procedure. This is something that must be true, whatever the implementation you choose, because it is part of the "abstract nature" of an associative map.
They are bug sniper. If something is wrong in a function/procedure/method, an exception will be raised right away (if you activated contract checking, a nice idea during the development), circumscribing the troublesome portion of code.
If your language allows it, you can check them formally, that is, you can ask to the checker to prove that given the pre-conditions and the procedure body, the post-conditions follow. If the checker succeeds, you know that the function behaves as desired (or, at least, as specified by the contract).

Matt Trachsel • Jun 18

I feel compelled to point out to any newer devs that this sentence "Shipping code without testing all the edge cases is like a spray and pray strategy" is nonsense. It is literally impossible to test every possible edge case in an application with even a tiny amount of complexity. Your applications WILL fail, you should test the most complex/important/used parts of an application. And especially where those labels overlap. 100% code coverage is NOT a codebase with no errors.

Dhruv Agarwal • Jun 19

Thanks for the candid comment @lovetrax! While it is difficult to write tests for every possible edge case but it is indeed possible to define a scope relevant to the business/product and test for those cases. That will make the code robust for the defined scope.

When things out of your scope start to break, you can take an educated call to include it or ignore it 😊

BaasMurdo • Jun 19

Great article. In theory, I agree with 99% of what was said. Unfortunately, this is only in theory. There are many companies where there simply is no time to do all of this. There are companies where results and getting the latest features out on a hard deadline is the top priority and as soon as that is done, there is another feature that is expected to be done ASAP as there is a hard deadline and so the cycle continues, this is very common for startups that have investors and a board to keep happy.
Trying to convince board members who are not tech-savvy that 50% to 70% (a personal guestimate) of development time will be tests and documentation is going to be a hard sell.

Dhruv Agarwal • Jun 19

Great point @baasmurdo! That's a change that tech leaders have to drive with other stakeholders not by hiding but by educating.

In one of my EM stints, I literally did a tech process onboarding for non tech joiners so that they understand what our estimates meant and what all a developer does apart from just writing code.

Shivam Chhuneja • Jun 17

interesting read Dhruv!

Dhruv Agarwal • Jun 17

Thanks Shivam!

Adnan Hashmi • Jun 17

This is a really good read. Lately I have been a victim to reacting to bugs 😅. I will start asking those 3 questions on everything I build from now on. It seems like a really good way to avoid bugs early on and save those tens of hours of debugging. Thanks!! 😊