Nicolas Fränkel

Posted on Jun 27, 2022 • Originally published at blog.frankel.ch

Hard things in Computer Science

#computerscience #discuss #hardthings

If you've more than a couple of years of experience in IT, you probably have stumbled upon the following quote:

There are only two hard things in computer science: cache invalidation and naming things.

-- Phil Karlton

Then, because it's such a great quote, it evolved:

However, I think that the initial quote is misleading. A lot of things are hard in computer science. This post aims to describe some of them.

Cache invalidation

I had a few use-cases for caching in my professional career. When I did, it was mostly to cache Hibernate entities. Only once did I implement my own cache. I used a simple HashMap as it was for a batch job as the cache size was small: no invalidation was needed.

Most cache implementations implement the Cache-Aside strategy. In Cache-Aside, the application tries to load data from the cache. If the cache has them, it returns them; if not, the application reads from the source of truth - in general, a database.

While data are stored in the cache, they may change in the database. In short, the probability that data in the cache and the source of truth diverge increases as time passes. For this reason, we sometimes need to synchronize data with the source of truth. It's achieved by cleaning data from the cache - cache invalidation, also known as TTL.

The TTL specifies how long an entry is valid. After the time has elapsed, the cache removes it, and the next read will load data from the source of truth.

The tricky thing, in this case, is to choose the correct TTL:

If the reference data in the source of truth changes before invalidation, clients read stale data
If the invalidation happens before the change, an unnecessary reload happens.

In essence, the smaller the TTL, the less chance to read stale data, but the less useful is the cache.

Naming things

If you have any experience as a developer, you probably are convinced that naming things is challenging indeed. If not, let's introduce another quote:

Programs are meant to be read by humans and only incidentally for computers to execute

-- Donald Knuth

Here's another one, slightly more inflammatory:

Any fool can write code that a computer can understand. Good programmers write code that humans can understand.

-- Martin Fowler

For example, here's some gibberish-y code:

data class Foo(val bar: String)

class Baz : IllegalArgumentException()

data class Qux(val quux: Double, val foo: Foo) {
    operator fun plus(qux: Qux) =
        if (foo != qux.foo) throw Baz()
        else Qux(quux + qux.quux, foo)
}

Thanks to the usage of APIs, namely IllegalArgumentException and plus(), you may infer somewhat what the code does. However, correctly renaming classes and fields reveals our intent:

data class Currency(val symbol: String)

class MismatchCurrencyException : IllegalArgumentException()

data class Amount(val number: Double, val currency: Currency) {
    operator fun plus(amount: Amount) =
        if (currency != amount.currency) throw MismatchCurrencyException()
        else Amount(number + amount.number, currency)
    }
}

When I worked on projects, I always cautioned about the following issues when talking with the business:

Using different words to cover the same reality
Using the same word to cover different realities

The second is much worse, as you may think you're talking about the same thing with the business or fellow developers, but you're not. If the talk ends in agreement, but each party has a different reality in their head, it's a recipe for a future disaster.

Within the scope of verbal communication, it's possible to ask questions about the meaning of such and such words. In code, it's not! Hence, naming classes and variables must convey the exact meaning.

It's hard because you need to be precise without being verbose.

Dates, times, and timezones

I already wrote about Date and time gotchas. To sum up:

The move from the Julian calendar to the Gregorian one happened at different dates depending on the country involved
Some countries' timezone are not entirely congruent to their geographical location
Some countries implement DST, some not. Even worse, some did and don't anymore.
Countries sometimes change their timezones. While it's not frequent, it happens more frequently than most think.
Not all timezones are separated by one hour. For example, India is UTC+5:30, but three timezones are spaced by 45 minutes.

Estimates

Estimates in software development projects are so hard to get right that some people working on the development side started the "No Estimate" movement. I won't delve into the pros and cons of not doing estimates; my point is just that estimating a non-trivial software project is challenging.

I think that entire books have been written on why estimates are demanding and how to improve them. Let's summarize the reasons:

Not only software projects:

Many who are involved in software projects but do not know how they work are eager to compare the activity with house building. Unfortunately, "house" can encompass different levels of industrialization. What the people refer to is an off-the-shelf building. It involves low or zero customization. Software development projects are on the opposite side of the scale, akin to unique architectural chef-d'oeuvre. For a more detailed explanation, look at the following On house building and software development projects post.
Problems along the way:

Estimates are provided before the start of the project, with all parameters available at the time. Unfortunately, while risk management accounts for the known knowns and the known unknowns, it's impossible to forecast the unknown unknowns. With the highly-mutable state of software projects, it's practically a given that something unexpected happens, invalidating our previous estimates.
The nature of estimates:

By definition, an estimate is a guess. The problem is that most treat them as a deadline. At this point, the organization will focus on respecting the deadline, regardless of the trade-offs - a recipe for failure.

Distributed systems

There's so much one can do with a single computer, even a multicore one. Adding more resources to a computer rapidly hits the point of diminishing returns. You can do nothing at this point but distribute the load across several computers. Welcome to the realm of distributed systems!

The problem with distributed systems is that they are easy "to get wrong". Here is the list of fallacies regarding distributed systems:

The network is reliable

Latency is zero

Bandwidth is infinite

The network is secure

Topology doesn't change

There is one administrator

Transport cost is zero

The network is homogeneous.

-- Wikipedia, Fallacies of distributed computing

Tons of books and theses have been written on distributed systems. People who can get them right deserve all my respect.

In my career, I stumbled upon two distributed systems problems:

Dual writes
Leader election

Dual writes

Imagine a system with two distributed data stores. We require that they must contain the same data.

It qualifies as the dual write problem. In a single database, we can solve it with database transactions. In two different databases, two-phase commits are available, even if they might not be reliable. If the stores that you need to write to cannot be enrolled in 2PC, as in microservices architectures, you need to rely on compensating transactions. Compensating transactions are fragile and complex to implement correctly, if at all.

From a theoretical point of view, the CAP teaches us that we can choose only two out of three capabilities: Consistency, Availability, and Partition tolerance. In reality, the choice is no choice at all. Because the system is distributed, we need to choose P ; because modern requirements forbid us to block, we need to choose A . The trade-off is consistency: both stores will converge to the same state over time.

While working on this problem, I discovered Change-Data-Capture. The idea behind CDC is to send updates to a single store and stream the diffs of the new state to the other one. To implement it by oneself is not trivial. I'd recommend using an existing product: I've used Debezium successfully in the past for my demos.

Leader election

Distributed systems rely on multiple nodes, and coordination across them is mandatory. Some rely on a specific node referred to as the leader to manage other nodes, while others are leaderless.

Most modern implementations are leader-based: it seems leaderless implementations are less reliable, though, truth to be told, I cannot tell the reason given the depth of my current understanding at the moment of this writing). Such implementations require all nodes to agree on which node is the leader among them - consensus. When a network partition happens, some nodes cannot communicate with others. It's relatively easy to reach a consensus in a partition; it's more challenging when the network reunites, and a single leader must be selected among all previous ones.

It's why the Paxos algorithm, or the Paxos family of algorithms, was invented. However, experts seem to think that Paxos is error-prone to implement: the Raft algorithm is an attractive easier-to-implement alternative. In any case, easier doesn't mean easy.

Proving code is bug-free

Traditional software engineering mandates testing to avoid bugs. Unfortunately, whatever approach you favor - unit, integration, end-to-end, or a mix of the three - doesn't guarantee your code has no bugs. It's, in fact, quite widespread to find bugs in production despite the infamous 100% code coverage. The only reliable way to have bug-free code is to prove it. It requires solid mathematical foundations and a programming language that allows formal proofs.

A couple of such languages exist. Unfortunately, all of them still belong to academia; the ones I've heard of are Coq, Idris, and Isabelle.

Until any of them makes it into the mainstream industry, writing bug-free code will be part of one the hard things in computer science.

Summary

Writing there are only two hard things in computer science is a strong claim. In this post, I tried to list several hard things I've been exposed to during my career. I believe there are many others: I'll be interested in the ones you've encountered.

To go further:

Originally published at A Java Geek on June 26^th, 2022

Top comments (17)

Matt McNeive • Jun 28 '22

I just graduated and started my first job as a developer. The thought of having to provide accurate estimates for my work seems really intimidating 😬

Thomas Hansen • Jun 28 '22

Multiply your own estimates by Pi (3.14). It ensures you always overdeliver and underpromise. People are only angry if you promise something you cannot keep. If you multiple your own estimates by Pi, you're 95% likely of being able to deliver below expectations ... ;)

hidden_dude • Jun 29 '22 • Edited

I'd also add that it's much easier to estimate small things than big things.

If your estimate is longer than 5 days, then break it up into parts until all your estimates are under 5 days.

Yet another tip: design your UI first, then make your estimate. Estimates become a LOT more certain once you've defined the UI (look up cone of uncertainty for a graphical explanation).

John Colagioia (he/him) • Aug 8 '22

This isn't entirely true for many projects. Developer management doesn't get angry if you over-estimate your work, sure, especially in a dysfunctional company where the development team doesn't care about the business side. Everybody who interacts with the outside world, though, and might need things done to beat competition or have releases available for key events get extremely angry when told to put off their big announcements, and then find out that their original deadline would have worked. Or worse, it means that projects get cancelled, because yours will take too long to help the overall strategy.

Nicolas Fränkel • Jul 1 '22

Estimating requires experience, lots of it, and even then, the gap between the estimates and the final time can be huge.

Asking a freshly-graduated teammate to estimate is a huge failure on the part of the organization. I'd advise to push back on this. My personal feeling is that it's a sign of a dysfunctional organization and it's just the tip of the iceberg.

John Colagioia (he/him) • Aug 8 '22

I've found it enough of a mess at every company that I wrote a post describing what I've done for fairly solid estimates.

How Long Will That Project Take?

John Colagioia (he/him) ・ Jan 28 '21 ・ 10 min read

#beginners #programming #productivity

It takes time itself, but I find that it's worth it to consistently hit the target.

Koas • Jun 29 '22 • Edited

OMG timezones are soooooo painful! Even if you store all your datetimes in UTC a looot of work is needed to accurately display data to users around the world.

Great article! There are really a lot of hard things in CS.

Sean Williams • Jun 28 '22

Though you alluded to it, read-after-write and write-after-write are incredibly difficult at nearly any scale of concurrency. Do you use locking, where even if you have a great acquisition-is-initialization sort of scheme, it gets bad if the locks are held too long? Or do you reconcile after the fact, potentially subjecting users to difference merging? Difference merging already causes angst in programmers, I can't imagine asking normal users to do it.

On the subject of proof languages, I'm extremely pessimistic. I feel like the only thing Prolog proves is that declarative languages can't really exist: writing correct (much less efficient) Prolog code means writing it against the behavior of the evaluation engine. That's not really any different from imperative programming. I never learned Coq, mostly because Prolog left such a bad taste in my mouth. Likewise with Idris, I wanted to like it, but could never find anything that actually made it better than Haskell.

The overall problem with bugs is, apart from hardware exceptions like invalid pointer dereference and integer divide by zero, a "bug" is software doing something other than what you intended. The challenge, then, is coming up with a formal language of intent—which is what programming languages already are. Proofs can only ever be a stripped-down analog of the language you're actually using, but once you try to scale proof specification up, it becomes just as error-prone. Or maybe more error-prone, since proof systems are themselves quite obtuse and complicated.

Nicolas Fränkel • Jun 30 '22

Do you use locking, where even if you have a great acquisition-is-initialization sort of scheme, it gets bad if the locks are held too long? Or do you reconcile after the fact, potentially subjecting users to difference merging?

"It depends" what you value most:

Do you need 100% correct values?
Or do you need to have values, regardless whether they are correct or close to?

Back to the CAP theorem. There's no correct answer, you need to choose. And the choice depends on your business requirements. Nowadays, the trend is toward Availability but I'm pretty sure that banks will frown upon that and value Consistency instead.

José Thomaz • Jun 27 '22

Compilers design and optmization are very hard things in computer science

Thomas Hansen • Jun 28 '22

Only if you believe so. We've got an execution runtime, on top of which we built our own programming language called Hyperlambda - And it's execution runtime is 10 lines of code. For the record, it's neither compiled nor interpreted, so arguably you're correct still ... ;)

Randall • Jun 28 '22

It would be nice if cache invalidation and naming things were the hardest things I had to do :) Lately the hardest thing for me has been release coordination. Not really "computer science" but I don't think cache invalidation or naming things really are either.