Ayron Wohletz

Posted on Nov 8, 2021 • Originally published at funtoimagine.com

How to separate concerns in code

#programming #codequality #architecture

We developers often praise separation of concerns. Neatly separated systems are easier to work with, because you can change one part without breaking anything else. Separated logic has wider applicability, because it's not tied to irrelevant context. However, it's not always obvious how to achieve this separation. Here I will share a heuristic for separating concerns.

I call that heuristic "varying the world." To vary the world means to consider what may change and then isolate the changing parts in their own modules. I picked up the idea years ago from the C2 wiki, where a certain Carl R. Castro wrote about "The Principle of Essential Representation". Carl wrote in the context of Object-Oriented design, but the heuristic could apply to many types of systems. Here I apply it to JS functions.

Example #1

Let's start with a typical NodeJS procedure. Here's one that reads user data from a CSV file, calculates some statistics, and then writes the result to another file.

import * as fs from "fs";
import {maxBy, map} from "lodash";

interface User {
    name: string;
    followers: number;
    posts: number;
    joinedOn: Date;
}

interface UserStats {
    numUsers: number;
    maxNumFollowers: number;
    mostRecentlyJoined: string;
}

function calculateUserStats() {
    try {
        const userData = fs.readFileSync('users.csv', 'utf8');
        const userRows = userData.split("\n").slice(1, -1); //assuming header row and blank last row
        const users: User[] = map(userRows, row => {
            const [name, followers, posts, joinedOn] = row.split(",");
            return {
                name,
                followers: parseInt(followers),
                posts: parseInt(posts),
                joinedOn: new Date(joinedOn)
            };
        });

        const stats: UserStats = {
            numUsers: users.length,
            maxNumFollowers: maxBy(users, u => u.followers).followers,
            mostRecentlyJoined: maxBy(users, u => u.joinedOn.getTime()).name
        }

        fs.writeFileSync("./stats.json", JSON.stringify(stats, null, 4));
        console.log("Done writing stats.json");

    } catch (err) {
        console.error(err)
        process.exit(1);
    }
}

calculateUserStats();

So with this users.csv:

name,followers,posts,joinedOn
Priya,2503,100,11/7/2020
John,300,5,10/1/2019
Georgette,2503,100,5/2/2018
Ayron,9000,1000,1/1/2021

It generates this stats.json:

{
    "numUsers": 4,
    "maxNumFollowers": 9000,
    "mostRecentlyJoined": "Ayron"
}

Now with this calculateUserStats, think about alternate realities/worlds/environments/contexts that it might be used in. In other words, think about what might change. There is an alternate reality where we read user data from a JSON file instead of CSV file. We might run this function in an AWS Lambda or on a local desktop computer. It could be called synchronously or asynchronously. The statistics we might want to write to a file, return to the caller, or do something else.

This varying of the world reveals the changing parts of calculateUserStats. All these changing parts are bundled together in the same function. This narrows the function's range of applicability. You can only use it in environments that have a local file named "users.csv". It can only be called synchronously, blocking the caller. It restricts the result to be written to a file.

Varying the world sometimes also reveals parts of a function that don't change among the different contexts. We can call this the "essence" of the function. This is usually domain logic that our application has to perform to solve the user's problem.

I would say the essence of calculateUserStats is taking some user data and producing some statistics. I.e. represented by this code:

const stats = {
    numUsers: users.length,
    maxNumFollowers: maxBy(users, u => u.followers).followers,
    avgNumPosts: avg(map(users, u => u.posts)),
    mostRecentlyJoined: maxBy(users, u => u.joinedOn.getTime()).name
}

No matter where or how or when we run it, this is the essential task. This is the domain logic. Our user wants these statistics, and they don't care how we do it. We could do it with this code, by a human using pen and paper, or by a clever arrangement of gears and levers.

With the results of our varying the world thought experiment, we can start separating parts of this function one-by-one. Let's start with the method of obtaining the user data. Clearly that can vary. So let's parameterize it:

function getUsers(): User[] {
    try {
        const userData = fs.readFileSync('users.csv', 'utf8');
        const userRows = userData.split("\n").slice(1, -1); //assuming header row and blank last row
        return map(userRows, row => {
            const [name, followers, posts, joinedOn] = row.split(",");
            return {
                name,
                followers: parseInt(followers),
                posts: parseInt(posts),
                joinedOn: new Date(joinedOn)
            };
        });
    } catch (err) {
        console.error(err)
        process.exit(1);
    }
}

function calculateUserStats(users: User[]) {
    const stats = {
        numUsers: users.length,
        maxNumFollowers: maxBy(users, u => u.followers).followers,
        mostRecentlyJoined: maxBy(users, u => u.joinedOn.getTime()).name
    }

    fs.writeFileSync("./stats.json", JSON.stringify(stats, null, 4));
    console.log("Done writing stats.json");
}

calculateUserStats(getUsers());

The "decoupling" and "separation of concerns" is starting to emerge – calculateUserStats no longer has to know or care where users comes from.

Ok, how about the output? What we do with the stats can vary. So let's extract that part:

function getUsers(): User[] {
    ...
}

function writeStats(stats: UserStats) {
    fs.writeFileSync("./stats.json", JSON.stringify(stats, null, 4));
    console.log("Done writing stats.json");
}

function calculateUserStats(users: User[]): UserStats {
    return {
        numUsers: users.length,
        maxNumFollowers: maxBy(users, u => u.followers).followers,
        avgNumPosts: avg(map(users, u => u.posts)),
        mostRecentlyJoined: maxBy(users, u => u.joinedOn.getTime()).name
    }
}

writeStats(calculateUserStats(getUsers()));

calculateUserStats is now a pure function. It has no side effects – no connections or entanglements with the outside world that we need worry about. Give it the same users and you'll always get the same result. It doesn't know where you get the users or what you do with the user stats. This makes it a portable, testable, decoupled unit of domain logic.

We could bring this further, for example, by considering that the users array may not fit in memory. calculateUserStats assumes a world in which it does. So we could make it accept an abstract stream interface. Also, we could separate writeStats and getUsers further.

Example #2

Let's take getUsers from above. I can see at least four things that may vary:

Reading from a file with a specific name (users.csv) – where do we locate the file?
Parsing the CSV string – what is the format of rows and cols?
Error handling – what do we want to do in case of an error?
Sync or async – do we want to block the caller or no?

Here's my first pass at splitting it up:

// 1. Reading from a file with a specific name (users.csv)
function readFile(filename: string): Promise<string> {
    return new Promise((resolve, reject) =>
        fs.readFile(filename, 'utf8', (err, data) => {
            if (err) {
                reject(err);
            }
            resolve(data);
        }));
}

// 2. Parsing the CSV string
function parseUserCsv(userCsv: string): User[] {
    const userRows = userCsv.split("\n").slice(1, -1); //assuming header row and blank last row
    return map(userRows, row => {
        const [name, followers, posts, joinedOn] = row.split(",");
        return {
            name,
            followers: parseInt(followers),
            posts: parseInt(posts),
            joinedOn: new Date(joinedOn)
        };
    });
}

async function readAndParseUserCsv(filename: string): Promise<User[]> {
    const userCsv = await readFile(filename);
    return parseUserCsv(userCsv);
}

// 3. Error handling
async function getUsersOrExit(filename: string): Promise<User[]> {
    try {
        return await readAndParseUserCsv(filename);
    } catch (err) {
        console.error(err)
        process.exit(1);
    }
}

// Tie it together in a high-level interface if desired
const getUsers = () => getUsersOrExit("users.csv");

This splitting (or "slicing into layers") has opened up combinatorial possibilities for the caller. Now the caller, if they already have a CSV string from some other source, can call parseUserCsv. Or if they want to handle errors in their own way, they can call readAndParseUserCsv. Or if they just want to read a file into a string for some other purpose, they can call readFile. They are not forced into an all-or-nothing interface like the original getUsers. In sum, the code has become more flexible, composable, and testable.

What is the essence of getUsers? I would say it doesn't have one in terms of the domain. Of course, we have not defined what the domain is for these examples. So let's say that the user needs these statistics to solve some business problem they have. They do not care where we get the user data from, how we store it, etc. The mechanics of how we do that isn't "essential" to the domain. However, we could consider that getUsers has its own mini-domain at its level of abstraction and think about what is essential/inessential from that perspective. For example, maybe we call it the domain of CSV parsing.

Avoid over-engineering

I'll end with a word of caution. There should be a good reason for separating and decoupling. Pursuing it as an end in itself leads to over-engineering. For example, in the cases above, if calculateUserStats was just a script I run once in a while, it's overkill to separate to this degree. Also, it's easy enough to just wait and separate this code if the need for it actually arises.

The vary the world heuristic is just a rule of thumb, a way of thinking, not an exact science. It's more about trying it, getting a feel for it, and being aware of the options for a procedure or function or module or system. I almost never explicitly think about it in low-level programming anymore, instead relying on intuition of what will be needed.

Have an opinion on separation vs over-engineering? Leave a comment or message me on Twitter.

DEV Community

How to separate concerns in code

Example #1

Example #2

Avoid over-engineering

Oldest comments (0)