Davyd McColl

Posted on Nov 10, 2019 • Edited on May 21, 2020

Code is for co-workers, not compilers

#programming #style #refactoring

Whilst the primary intent of code is to convey the desired logic to a computer, we have to remember that we are likely to need other people to work with that code too. Co-workers, consumers of our apis, and, given enough time, us!

Haven't you ever gone back to some code you haven't been near for a long time and almost wondered who wrote it? The style may seem alien -- and if you're dedicated to the path of continual learning, I would argue that there must be a timeframe after which code you wrote should feel clumsy and a little alien. Because if you're continually learning and getting better, that means that, at some point in the past, you weren't at the level you're at now -- and it can become painfully obvious!

I have some code which I wrote probably a decade or more ago, which I still use today. Looking at the code, I can recognise that I've come a long way. I don't need to rewrite it or change it -- it does what it does sufficiently well. But it's interesting to look back at code you wrote a while ago and see how you have grown.

You can also use this process as a measure of your learning path and growth: I would argue that when we are evolving quickly, code we wrote even a month ago can evoke the desire to refactor or even rewrite! Resist the urge for the latter though, unless there's a good reason to rewrite: if it works sufficiently well, refactor if you like, but move on -- create new things with your learnings.

I would also caution that if you can look at code you wrote more than 6 months ago and find no fault, no better way to do things -- you may be stagnating. Perhaps not -- but it's still something I do to measure my "growth velocity" for lack of a better term.

So what makes good code for co-workers?

That's a loaded question, and I'm sure we all have our own preferences and ideas. I'll share mine here, and I'd appreciate dialogue in the comments. If you have a protip that can help me to write code which others can work with more effectively, I want to know about it!

Without further ado, here are my guiding principles.

Good code reads like a story

I've always found that the code I understand more naturally has been written in a style which lays out what it does at a specific level with well-named methods that convey the logic at that level in plain English. I'm a native-speaker of English, so I would naturally lean towards the language, but there are two thoughts I can give on that front anyway:

Adopt your native language for method and variable names. People you work with are more likely to understand and you are more likely to express yourself clearly.
Write in English if you'd like the widest audience and to open up the code for open-source work, since, whilst English may not be the primary language of the world, it's at least a secondary language for an overwhelming majority of the world.

We can fight for the sovereignty of our native tongues -- and we wouldn't be wrong to do so -- or we can accept that there is a language which is most likely to be understood by the most people, and run with that. Who knows? In a century, it may be Mandarin!

When I say that good code reads like a story, consider the following excerpt from TempDbMySqlBase, a class which is the base for the two flavors of TempDbMySql: TempDb.MySql.Data and TempDb.MySql.Connector which only differ in the library they expose and use for connectors (Oracle's MySql.Data and the opensource MySqlConnector):

protected override void CreateDatabase()
{
    Port = FindOpenRandomPort();

    var tempDefaultsPath = CreateTemporaryDefaultsFile();
    EnsureIsRemoved(DatabasePath);
    InitializeWith(MySqld, tempDefaultsPath);
    DumpDefaultsFileAt(DatabasePath);
    StartServer(MySqld, Port);
    CreateInitialSchema();
}

This block may even read a bit like pseudo-code. The code makes it quite obvious the steps which must be performed to bring up a temporary mysql database server:

Find an open random TCP port to listen on
Create the contents of defaults file (my.cnf) for the server to work against, based on sane defaults and user overrides
Ensure that there's no data at the path that has been determined (or provided) for storing the data of this instance
Dump out the defaults file out to the data folder
Start the server on the high port which was found to be open earlier
Run in any constructor-provided scripts which are generally used to generate the initial schema of the database

Each of these steps requires considerable work, and any one which seems interesting is a keypress (F12 in Visual Studio or Rider with the VS scheme) away if it's a place the reader would like to learn more about.

Part of my attempt to make code read like a story is method names that end in words like At or With, so that the entire line reads a little smoother.

Concepts are separated with whitespace

Where we have lines of logic which performs specific functionality, it's nice to group those lines into a block which is separated from other code by whitespace, at least as a first pass. Consider this simple example:

using (var connection = OpenConnection())
using (var command = connection.CreateCommand())
{
    command.CommandText = $"create schema if not exists `{schema}`";
    command.ExecuteNonQuery();
    command.CommandText = $"use `{schema}`";
    command.ExecuteNonQuery();
    SchemaName = schema;
}

It may not be immediately obvious that there are two commands being run here, and the current schema name is being stored for usage elsewhere. A little whitespace can help to clarify:

using (var connection = OpenConnection())
using (var command = connection.CreateCommand())
{
    command.CommandText = $"create schema if not exists `{schema}`";
    command.ExecuteNonQuery();

    command.CommandText = $"use `{schema}`";
    command.ExecuteNonQuery();

    SchemaName = schema;
}

It's a small change which immediately draws the eye to three distinct operations within the block.

Once you have two, three or four well-defined blocks within a method, consider pulling them out into well-named methods.

Another form of whitespace is indentation:

<html><head><title>Hello, World!</title><style>h2 {text-decoration: underline;font-weight: bolder;}
p {font-style: italic;}
</style><script>window.onload=function(){alert('Hello, there!');};console.log("loaded");</script></head><body>
<h2>This is my first html page!</h2><p>I hope you like it!</p></body></html>

<html>
  <head>
    <title>Hello, World!</title>
    <script>
      window.onload = function() {
        alert('Hello, there!');
      }
      console.log("loaded");
    </script>
    <style>
      h2 {
        text-decoration: underline;
        font-weight: bolder;
      }
      p {
        font-style: italic;
      }
    </style>
  </head>
  <body>
    <h2>This is my first html page!</h2>
    <p>I hope you like it!</p>
  </body>
</html>

Methods are kept short

I don't like to impose a hard-and-fast rule for methods, but when we start getting up at around 10-15 lines, I'm looking for a way to simplify the text on screen. Sometimes I can't find one, but often I can. A method which requires the reader to scroll to see more of it is definitely too long: there is no way for the reader to see the entire logic chain in one glance which makes it difficult to fully understand the flow of the method.

Names tell you what a thing is or does

Ever seen code like this?

public async Task<bool> Get(string u, string p)
{
    using (var x = WebRequest.Create(u))
    using (var y = await x.GetResponseAsync())
    using (var s = response.GetResponseStream())
    using (var w = new FileStream(p, FileMode.CreateNew))
    {

        var t = long.Parse(y.Headers["Content-Length"]);
        var i = 0;
        var a = 8192;

        var d = new byte[a];
        while (i < t)
        {
            var c = t - i;
            if (c > a)
                c = a;
            var b = await s.ReadAsync(d, 0, (int)c);
            await w.WriteAsync(d, 0, b);
            w.Flush();
            i += b;
        }
    }
    return true;
}

What do x, y, c, a and all of these variables do? What does the method mean, when it says it will Get?

It just downloads data from an url to a file on disk, but to actually know what this code does, you'd have to read the entire thing and hold that information in your head. There are a few moving parts here, but this code would suck a lot less if it just had some readable names:

private const int DEFAULT_CHUNK_SIZE = 8192; // 8kb

public async Task<bool> Download(string linkUrl, string outputPath)
{
    using (var disposer = new AutoDisposer())
    {
        var req = WebRequest.Create(linkUrl);
        var response = await disposer.Add(req.GetResponseAsync());
        var readStream = disposer.Add(response.GetResponseStream());
        var writeStream = disposer.Add(new FileStream(outputPath, FileMode.CreateNew));

        var expectedLength = long.Parse(response.Headers["Content-Length"]);
        var haveRead = 0;

        var thisChunk = new byte[DEFAULT_CHUNK_SIZE];
        while (haveRead < expectedLength)
        {
            var toRead = expectedLength - haveRead;
            if (toRead > DEFAULT_CHUNK_SIZE)
                toRead = DEFAULT_CHUNK_SIZE;
            var readBytes = await readStream.ReadAsync(thisChunk, 0, (int)toRead);
            await writeStream.WriteAsync(thisChunk, 0, readBytes);
            writeStream.Flush();
            haveRead += readBytes;
        }
    }
    return true;
}

(In addition, I added a class called AutoDisposer that disposes things when it was disposed, in the reverse order in which it was asked to track them. So I can cut down on the number of using statements, but get the same functionality).

It's arguably not perfect code -- but it's better than the first attempt.

Whenever I'm struggling to find the name for a property, variable or POCO for holding data, I ask "What information does this hold? What is it?". Whenever I'm struggling to find the name for a method or class, I ask "What does it do?". Asking the last question also helps me to be honest when I'm starting to make a class or method which does too much, in other words, when I should be considering more than one class or method. If there's an And in the name (eg FindRecordAndSoftDelete), then it's a clue that perhaps I need two methods.

Comments

I like to keep comments sparse. Comments tend to rot, that is, to go out of sync with the code that they are near. Like code I just saw recently at work:

switch (retries)
{
  case 1: return Retry.Immediately();
  // retry in 5 minutes
  case 2: return Retry.In(TimeSpan.FromMinutes(2));
  // and so on -- all the comments after this lied
  // but had told the truth at some point in time.
}

In addition, comments which tell you the obvious just waste the reader's time:

// initialize the database context
dbContext.Init();

// fetch the first user record
return dbContext.Users.First();

There are, however, valuable comments:

"WHY" comments: when the reason(s) as to why the code does what it does are not obvious. I went trawling through my open-source work and the only comments I could find along this line are within empty catch blocks, explaining that I'm intentionally suppressing errors. However, in a business use-case, there are often little idiosyncrasies that have to be taken into account, like perhaps certain operations can't ever happen on a Sunday because a third-party integration is always down for maintenance.
API documentation. This is where you want to use the documentation that fits your ecosystem, eg xmldoc for .net or jsdoc for JavaScript. API documentation is usually surfaced by editing tools as consumers are writing code against your libraries and really help to make the developer experience slick, when they are consistently helpful.

Remember though that the onus is on you to keep these comments up-to-date every time you're working in that area of code. The only comment worse than a missing, helpful one is a misleading one.

Also, if you're tempted to write a short, one-line comment above a block of code to describe what it does, why not just extract a method? You'll end up:

documenting what the code is doing
abstracting out that little block to make it easier to read the overall logic of your current method
helping to prevent comment-rot: people are (in my experience) more likely to rename a method than to update comments

Wrapping it up

A friend of mine used to say: "write your code like the next person who is going to work on it is a psychopathic killer, and he knows where you live!". Whilst perhaps he was taking it a bit far, the point remains:

Code is for co-workers, not compilers

Machines are not that fussy about the instructions they receive -- they just execute them. And compilers don't care if your methods have short, indecipherable names, or longer, more descriptive names. Even in the case of languages like JavaScript where the source is delivered to the user to be executed at her machine, we have code minifiers to optimise that experience -- so rather write code you'd be glad to come back to and maintain than code which loses its meaning the moment you close the editor.

I also highly recommend watching this talk by Kevlin Henny: