Nathan Fritz

Posted on Jan 25, 2021

One Web Architect's Core Beliefs

#web #architecture #database #api

Our core beliefs are the principles that we've learned through experience or subscribed to from others in order to quickly make decisions and value judgements. They occasionally need to be re-evaluated, but without them, we'd spend all of our time analyzing instead of acting.

Over time, as a software developer, you'll adopt and develop software development core beliefs. It's important to be conscious of them, so that you can contextualize their application, debate your peers, and know when to re-evaluate.

Here are some of the web architecture core beliefs that I've developed and adopted in my 20 years as a software developer. Each of these could be a blog post, and probably already are. My former colleagues may be able to remember specific incidents and hard lessons learned that cemented some of these beliefs. 😬

Core beliefs, be they about software or life, are personal; I'm not asking you to agree with or adopt any of these, but if something rings true, certainly feel free.

Inactive Software Is Dead Software

In the web world, software ages very quickly. Dependencies are out of date in months, and with a culture of small packages, your software may have a lot of dependencies.

Every time you set a piece of software down, it's going out of date. Bit rot is real. The platform that you built it for moves on, your embedded dependencies have software vulnerabilities published, your services change APIs, etc. It'll take some effort to bring your software back up to date the next time you need to work on it.

When choosing dependencies, it's usually a bad idea in the web world to use anything that hasn't been updated in the last 6 months. You've been warned.

Are there exceptions? Sure. What if your software uses a stable platform, has no or few dependencies, and is super small. Again, context is king.

The KISS Principle (Keep It Simple Stupid)

The Kiss Principle is 80 years old at this point. It states that complicated systems are much more prone to failure.

There's a lot of these software principals. You may want to adopt some of them if they ring true to you.

I also like Gall's Law.

The thing is, we're engineers, and we over-do it sometimes. Over-engineering is always a temptation for software developers that love their craft. It's probably the most common software development sin. There's a time and a place, which we'll get to in a bit.

Client State is a Cache

Client-side web applications have to hold some data, and the client is almost never the source of truth. That means that any data you maintain in a client is part of a cache and should be treated that way.

You have to be very deliberate with cached data. Here are some guidelines:

Only cache data that you're very likely to use.
Don't cache data that the user doesn't have permissions to see.
Avoid making decisions based on cached data.
Invalidate cached data when the source of truth updates.
Don't update the cache from user actions without marking it as pending.

Dealing properly with client state data is a very large topic, but thinking of it as a cache first can help you make the right choices for how to use and maintain that data.

100% Test Coverage Is Achievable and Desirable

When writing a library or API, 100% test coverage should be a priority. Yes, it's doable. In the worst case, it's time-neutral, and the benefits are great and sometimes non-obvious.

I'm not an expert when it comes to testing on the client-side of a web application, but I have seen significant coverage of the data-management and component side done to great effect.

Tests not only validate that your code is following the intended purpose, but it also documents that intended purpose in great detail. It gives future contributors a starting point when looking at a piece of functionality -- they can use your tests to suss out how to generate the required inputs, where in the code functionality lives, and exactly what your expectations for the code were. Good test coverage reduces the cognitive overhead of someone else working on your code.

Tests give you a solid foundation when refactoring -- if your coverage is good enough, you'll know where you need to keep working and when you're done refactoring. Tests give you the confidence to make big changes.

Code coverage will also help prevent spaghetti code. As Isaac points out, code that is hard to test is a code smell.

Business Logic Needs to be Isolated

All of your business logic for a web app should be in the server. UX logic can live on the client, and data management logic can live in the database, but you're asking for trouble to do any business logic anywhere except your server process. Ideally, it should be in it's own layer within that process.

When you spread business logic out, it makes it hard to find and maintain, and you greatly increase the complexity of your app. You're opening yourself up for security vulnerabilities (the very nature of a web client having authorization logic is a vulnerability in itself).

So what is business logic? Any action that results in a function that changes/accesses data directly related to the purpose of the application.

It's okay to use a stored procedure for actions that are not specific to the application, like maintenance, migrations, sharding, etc. It's okay to have client logic for displaying views and generating API calls. If you're checking to see whether a user can make an API call before making the API call, that may be a sign that something is amiss.

Keeping all of the business logic on one server layer makes it more testable, makes it easier to swap out database and API layers, and reduces the number of things a developer has to think about while making changes or debugging.

Don't Expose More Surface Area Than You Use

Any code that you're not using in your project is technical debt, and any API that you haven't tested is a vulnerability. It's a trope at this point for developers to brag about how many lines of code they've deleted rather than the number that they've written with a wink. It goes along with the KISS principal. Don't be overly clever in order to reduce your line-count, but generally less is more.

One of the reasons why I generally don't think GraphQL is a great solution is because you're deploying more capability than you're using. You're giving clients and users a full query language, of which they'll only use a small portion, in most cases, and when it doesn't work or surfaces a denial-of-service vulnerability because they can write a query that performs poorly, that's a bug you're going to have to fix.

There are valid use cases for exposing a query interface like GraphQL, and there are ways of limiting its surface-area, and ways of getting complete test coverage, but all of those things involve considered effort. Here is an Apollo GraphQL Guide for dealing with some of these problems.

Every line of code and every piece of functionality you provide is an opportunity for bugs and increases maintenance. If you and your users aren't going to use it, don't include it.

Don't Innovate When Developing an MVP

This twitter thread pretty much covers my view here. It's a bit of a hot take, but it goes along with some of my other core beliefs and the KISS principal.

Basically, the time for innovation is not during the initial version of a project. You should do your experimentation separately, either as isolated prototypes and experiments, or incrementally after you prove to the business that the MVP has value.

Clever is the Enemy of Good

You should write code for people first, not computers. It's important that team members and your future self be able to easily see what your code is meant to do. Follow language idioms, don't take shortcuts just to shave off a few lines, and don't try to combine your logic to fulfill many purposes in one clever bit of code.

Instead, clearly handle your logic in an idiomatic way that walks through the problem step-by-step. Simplicity is handling the core logic and edge-cases with clear code. Don't go out of your way to save keystrokes as typing is not the time consuming part of writing software; thinking is. You can always go back and make things more terse if you've thought of a better way to be clear and handle more edge cases later.

Your goal is to make your code appear pedestrian, despite solving the problems brilliantly. This may take some iteration and a lot of thinking. You'll know you're doing well if both senior and junior developers understand your code and pull requests at a glance. Make sure to give similar feedback when doing pull requests for other developers.

For example, creating a clever mix-in function for a language that only supports direct inheritance will make the code hard to follow, and make the software more difficult to debug. Keep things simple.

Uncompromised Multi-Master Replication is a Myth

Warning: This section is about the CAP Theorem, so feel free to skip it. Do not operate heavy machinery while taking CAP Theorem.

Multi-master replication will always have compromises. The CAP theorem states that between C.onsistency, A.vailability, and P.artition-tolerance, you can never maintain all three at once.

Well-implemented distributed databases document their claims and don't claim to violate the CAP theorem. These claims are typically AP (available and partition-tolerant) or CP (consistent and partition-tolerant). On a practical level, all distributed databases aim to be partition-tolerant, safely recovering from node and network failures.

A partition is when one or more nodes are either down or can't communicate with all other nodes. These partitions can happen in unexpected ways, where the boundaries of communication is different for any given node.

An AP database keeps data available during network partitions, but can produce conflicts and inconsistencies such as foreign references or incomplete transactions. A CP system does not produce conflicts (usually, check your guarantees) and keeps data consistent, but some reads or writes can be blocked during a network partition. RDBMS databases, like SQL, are typically CP, while document stores are more often AP.

Conflict resolution is a business logic problem, and needs to be solved specifically in the context of the data being stored. If two writers are editing the same article during a network partition, a last-write-wins strategy could delete one of the editor's changes. In that particular case, you may want to resolve the conflicts manually with document-merge strategy. An AP database often resolves conflicts with a last-write-win strategy by default, so be mindful.

MongoDB famously had ridiculous performance and replication claims when it first launched in 2009. It took quite a few years to clean up. I learned a lot in those days from Aphyr's posts, exploring database clustering claims with his testing tool, Jespen. Beware of marketing-driven engineering products!

In short, when selecting a distributed database product, make sure you understand their claims. There's no magic bullet when it comes to horizontal scaling.

Make a Plan for Scaling

You can scale pretty high with a single application server and a single database server, but you should also have a plan for scaling up later. Implementing a scaling solution during development of an MVP introduces unnecessary complexity (violating several of my previous core beliefs) and is an example of premature optimization

Depending on your business needs, you could scale horizontally, making it so that you could have any number of application servers and database servers (See the Multi-Master core belief above -- this can be complicated to get right). Horizontally scaling makes sense if you need to support many writes per second, and your users and data aren't logically siloed.

You could also scale through siloed sharding. If your product primarily manages logically grouped users (a common case for sharding) each group can have a dedicated set of application servers and databases. Interactions between user groups is limited. This is one of the easier ways to scale.

If your user numbers are capped, like in an IT intranet application, simply writing an efficient application is enough to scale to 10s of thousands of active users. You should still design your application in such a way that it doesn't matter how many instances you run at once for reasons of uptime and staged releases.

Keep in mind, running multiple instances of your API against a single database server (or single master server with many read-servers) will only scale to a point. Eventually you'll exceed your ability to write data to the database and will need to shard your users/data or use a distributed database.

Knowing how you're going to scale up ahead of time will help you make implementation decisions during the MVP stage of your application. Remember, having to scale up is a good problem to have, and you'll have more resources to implement your plan later.

Developer Experience Matters (DX)

At one of my first full-time programming jobs 20 years ago, I spoke with a senior developer about a problem I was having configuring the project for a niche QA use-case. They pointed me to a CLI program they wrote to manage that configuration. He was the only one using this tool so far, so I was shocked to see how nice it was. He'd take the time to have self-documenting command-line arguments and a clear README that went over its purpose and usage.

Why had he bothered to take the time to make this script so easy to use when he was the only user? Because developer experience matters, even just for himself, and he suspected others might find it useful eventually. At the time, I would have just slapped together a script, made all of the options constants that I would have to manually edit each time, and moved on.

It left an impression, and I began taking more care in the experience of the software I was writing, because it usually payed dividends in time. I was never embarrassed to share my little tools and libraries with others.

Years later, when I started regularly writing open source code, I took the time to see what successful open source projects included in their README. It's made a huge difference for my career.

Your Core Beliefs

When I started this blog post, I asked people what their core software beliefs were. I got some pretty good answers. Feel free to send me any additional thoughts or feedback below or at @fritzy. For more, follow me here, on Twitter, and on github.com/fritzy.

In this post, I set out to explore and share some of my web architecture beliefs. Some of them are objective fact (like multi-master databases), some of them are value statements (DX matters), but they're all things that I keep in mind in order to quickly architect quality software.

A similar article was published during the writing of this. There's a lot there that I agree with, but certainly not all of it. I don't endorse their views, but it's the kind of exercise that I'm encouraging with this article.

Oldest comments (3)

Dave Cridland • Feb 1 '21

Lovely to see you post here. As you know, it is inevitable that I shall disagree with one of your thoughts here. :-)


def multiply(a, b):
   return 4

def test_multiply():
  assert multiply(2, 2) == 4
  # 100% test coverage!

I'd love to have 100% automated check coverage, but I will never break my back trying to do it. It's lovely if your code falls into that form, but if it does, it probably means that you're just not doing the things that are hard to build automated checks around. That's great. Lucky you.

I think that Software Engineers, and programmers in general, are terrible about their approach to software quality. "100% test coverage" seems to be feeding that mentality rather than in opposition to it. Software Quality is about delivering software to the customer that does what they need, expect, and want. Automated checks form the middle third of that process at best - at worst, they're completely disconnected from it.

It assumes that the only valuable testing is automated checks. Manual testing - and more broadly, anything that doesn't run through in a couple of minutes within a development environment or CI pipeline - is discounted. That's a shame, because manual test scripts, automated large scale integration tests, and so on are all great too (and catch things unit tests can't), but are hard to extract the Holy Metric from.
It has a focus on metrics which is unhealthy - because we fixate on the numbers, and not the result. There's a corresponding risk that if we focus too much on the metrics, we lose sight of the goal, and quality actually goes down.

I find automated checks written by the developer who also wrote the code fall regularly into two pitfalls:

They test the same assumptions that the developer had when writing the code.
They often don't check nearly as much as they cover, because given a metric, we get excited over the number, and not the outcome.
In particular, Library and Backend developers have a tendency to write APIs in order to make them easy to check, rather than easy to use without error.

In the past couple of weeks at work, I've found two cases where the code wasn't actually checked in any useful way.

In one case, the developer had incorporated the same error into both their test and the code it was testing. The code in question had 100% test coverage, and therefore passed CI, but would never have worked in reality. Luckily, it would have been caught when it stopped the Dev environment working. The fixed code (and test) had a new, more subtle, bug in that might have work in Dev, Stage, and Prod. Or might have failed in one of them. We switched it out for a library.

In the second case, the developer had a check for a particular problem - and since I know you'll know this one, I can tell you it was that an XMPP client library was choking on milliseconds in timestamps in MAM, and the test ensured they weren't present. Only it turned out the test was wrong - it tested that it was a valid ISO 8601 timestamp, but passed whether or not it had milliseconds. It didn't, as it happens, but the fact the test was wrong went entirely unnoticed.

That latter bug was mine, and the reason why the code worked anyway was because, of course, I'd eyeballed the result and then tested it - manually - against the app. Manual testing is, and always will be, the gold standard of "Does it actually work?" - it's just lengthy and painstaking to do.

Ultimately, bugs creep into code at a reasonably constant rate by volume of code. If you have more test code, the statistics say that you'll have more bugs - we hope that the increased introspection of the overall code will mean fewer bugs, not more, but the bugs are just as capable of falling in the test code as in the code under test, after all. And who tests the automated check code? And how? (And, moreover, what coverage does it have?)

So am I about to argue that we shouldn't have automated checks at all? Far from it, of course. I expect automated checks to cover about 70%-90% of the codebase. Higher if it's possible, but I find that above 90% we're reaching into diminishing returns, but YMMV, and if you can comfortably get it higher, do.

As an aside, my security labelling library is around 93%, with every line that's not covered examined (they're usually overly-defensive conditionals), and a set of test vectors that's had at least some independent verification. (And, more fun, the test suite is data driven so you can provide your own test vectors). I'm totally happy with that. I could spent days getting the last 7%, but ... why?

Let's face it, the important metric isn't coverage, but checking, and we don't have convenient tooling for that. We just use coverage as a proxy.

So more important to me than coverage is two things:

Any bug that gets past the development stage should be - if at all possible - baked into an automated check.
A good automated check suite is, first and foremost, a developer's aid. It should be a really convenient way of getting a first pass over your code, a quick way of reproducing a bug, and of driving a deep part of your code through a debugging session easily.

These two rules will naturally drive coverage up. But that will happen without arbitrary rules. They'll also save you loads of time.

That all said, some software really can be checked exhaustively. Model verification and formal proofs are great if you can.

But when you can't, I truly believe that instead of chasing that last 7% of coverage, you'd be more efficient working through a manual test script.

Nathan Fritz • Feb 3 '21 • Edited

We've disagreed on this before. And to be fair, I'm not afraid to put coverage exemptions around conditionals that are a little over-defensive and configuration-specific lines. That said, I agree that many types of projects don't need 100% test coverage. Data-centered open source libraries and API services are most of what I write, and in those cases, I find it advantageous to have 100% coverage, at least on paper.

My games usually have little to no tests, as the desired behavior is an emergent property of several systems working together, and so it makes more sense to manually test. Even in the web world, most front-end UI rendering logic isn't worth testing, nor is it worth re-testing your APIs from a set of client tests.

In any case, with APIs and open source libraries, I find bugs by working toward 100% coverage, and I'm more confident to do refactoring later. It's also good marketing to legitimately have that 100% coverage badge on an open source library.

We can continue to disagree on this, but I think we understand each other's views pretty well. Hopefully, someday, we get to work together and butt heads on issues like this for code that we share.

Dave Cridland • Feb 3 '21

I wonder, though, whether 100% coverage - and coverage in general - is a measure of how easily the software could be checked automatically, rather than how well it is. I think the thing that winds me up so much about coverage as a metric is that distinction seems to be missed.

DEV Community