DEV Community: M. Alwi Sukra

TIL - Compatibility Direction in Schema Evolution

M. Alwi Sukra — Tue, 21 Jul 2026 06:08:21 +0000

This week I read DDIA Chapter 4, on encoding and evolution, and the part that stuck with me was how to reason about schema compatibility during a rolling upgrade.

In a large system, code changes don't happen all at once. We do a rolling upgrade, so old and new versions of the code run side by side for a while. For the system to keep working through that window, the data has to survive being read by whichever version happens to pick it up. That's what forward and backward compatibility are for.

Which direction we actually need isn't fixed. It depends on the dataflow, and on the order we roll things out.

Through a database

Say we add a field and update the code that reads and writes the row.

Backward compatibility is obviously needed: data written by the old code shouldn't break when the new code reads it. But forward compatibility is needed too, because during a rolling upgrade an old instance can read a row that a new instance already wrote.

And there's a twist a database adds that services don't have: old data just sits there. A row written two years ago is still there, in the old shape, and we have to keep supporting it. We could migrate (rewrite) the data to avoid that, but on a large database that's expensive, so most of the time we don't. So the database case needs both directions, and it needs them for a long time.

Through services (REST and RPC)

Now say we change a request or response schema. Here it splits by who's processing what.

On the server side, we need backward compatibility: the new server code has to handle old requests from clients that haven't upgraded yet. What we usually don't strictly need is forward compatibility, because we always deployed the server fully first, then told clients to upgrade. So an old server never sees a new request. I'd been doing this by instinct; the chapter just gave me the reason it holds, the rollout order is what makes that direction safe to skip.

On the client side it's the mirror image. The client needs forward compatibility: old client code has to handle new responses, because the server upgraded first and is already sending the new shape. It doesn't need backward compatibility, because by the time the client upgrades, the server is already new.

So "the direction we care about" was never really a rule about APIs. It was a consequence of how we rolled out. Server first, clients after.

Through message passing

Producer and consumer versions coexist, and we can't force the order the way we can with server-then-client. A newer consumer might read a message from an older producer, and an older consumer might read a message from a newer producer. So the consumer needs both directions.

And even if we decide to upgrade the producer fully first, we still can't drop forward compatibility, because messages live in the broker. They sit there for retries and replays, so a message written by the new producer can still be waiting when an old consumer picks it up. We can't tell a message already sitting in the queue to upgrade itself.

That's the difference. With an API, the rollout order is something we control. With a queue, the message outlives the moment it was written, so the order isn't ours to enforce anymore.

The summary

	Backward (old data → new code)	Forward (new data → old code)
Database	Yes	Yes
Service (server)	Yes	No, because server upgrades first
Service (client)	No, because server upgrades first	Yes
Message passing	Yes	Yes

The thing I took from it: I'd been handling both directions in practice, but as a set of habits, not a clear model. What this chapter gave me was the definition, backward and forward as two independent directions, and the reason the direction we need shifts with the dataflow and the rollout order. The moment the dataflow stops letting us control that order, like a message queue does, the shortcut of leaning on rollout order stops working, and we have to actually support both.

TIL - Picking a Database by Its Read/Write Pattern

M. Alwi Sukra — Sun, 14 Jun 2026 04:36:15 +0000

I've shipped systems on Postgres, BigTable, and a column store. If you asked me which one to reach for, I could answer from experience. But until I read DDIA Chapter 3, I couldn't have told you why they behave so differently, because I'd never actually thought about what the storage engine does with the bytes underneath.

That turned out to be the whole lesson. The internal data structure of a database isn't an implementation detail we can ignore. It's designed and tuned around a read/write access pattern. So once we understand the structures, "which database" stops being a vibe and starts being a consequence of how our workload reads and writes.

Two worlds: OLTP and OLAP

DDIA splits database workloads into two shapes. These aren't database types, they're descriptions of how a workload behaves:

OLTP (Online Transaction Processing): point reads/writes, low latency, high concurrency, few rows touched per query, needs fresh data. (The app serving user requests)
OLAP (Online Analytical Processing): large scans, few columns across many rows, throughput over latency, bulk/batch writes, tolerates staleness. (The analytics and reporting)

The thing that finally clicked for me: these labels are something we put on a workload after we've described it, not a bucket we pick first. And one system usually has both. The write path can be analytical-shaped while the read path is transactional-shaped.

Picking an engine is asking two questions about the workload

If the storage engine is tuned to the access pattern, then choosing one is really about describing how our workload reads and writes, then matching it. Two questions get us most of the way there.

One quick note before that. I'm only scratching the surface of each technology here. This is what clicked for me, not a deep dive into compaction internals or page layouts. If you want the full mechanics, the book itself (and plenty of good articles) go far deeper. Treat the names below as starting points to go read about, not as the last word.

Q1: write-heavy or read-heavy? (LSM vs B-tree)

Inside the OLTP world, there are two ways to organize data on disk, and the difference is entirely about how each one handles a write.

Log-structured (LSM-tree). New and updated data is appended. Writes land in an in-memory sorted structure (the memtable), which gets flushed to disk as an immutable, sorted file (an SSTable). Background compaction merges those files and throws away superseded values. Because every write is a sequential append, write throughput is high. The cost is read amplification, where a key might live in several SSTables, so reads check multiple files (Bloom filters exist to skip the ones that definitely don't have the key). Databases built this way: Cassandra, RocksDB, LevelDB, HBase, and BigTable.

Update-in-place (B-tree). Data lives in fixed-size pages, and an update rewrites the page where the key already is. Reads are predictable (a bounded number of page lookups), and this structure fits transactions cleanly. The cost is write amplification: writes are random I/O, and a single update often rewrites a whole page (plus the write-ahead log entry, and sometimes a page split). Databases built this way: Postgres, MySQL (InnoDB), and most traditional relational databases.

So the heuristic "LSM for write-heavy, B-tree for read-heavy or transactional" isn't a rule to memorize. It falls out of how each structure treats a write. That's the thesis in miniature: the engine is shaped by the access pattern. This is why Cassandra can absorb a firehose of writes while Postgres gives us clean transactions. They picked different sides of this tradeoff.

Q2: point access or analytical scan? (and why "row vs column" is the wrong axis)

This is the part that reframed how I think about it.

I used to assume the storage axis was "row-oriented vs column-oriented." After this chapter I think that's the wrong axis. The real distinction is whether our storage unit is explicitly keyed or positionally implied.

In a row store like Postgres or MySQL, a row is one keyed record with all its columns fused together. Update one column, and it rewrites the whole record.
In a wide-column store like BigTable, Cassandra, or HBase, the unit isn't row -> all columns. The on-disk key is the full cell coordinate (row key, column, timestamp) -> value. Each cell is independently keyed. A row's cells are sorted to sit next to each other (that's the only thing "row-oriented" really means here, just locality), but each cell is written and updated on its own.
In a column store like Parquet, ClickHouse, Vertica, or BigQuery, each column is a separate file of bare values, with no key stored next to each value. Row N is whatever sits at position N in every file. Position is the implicit key.

For example, let's say we have these rows:

row A: impression=11, click=12, cost=13
row B: impression=14, click=15, cost=16
row C: impression=17, click=18, cost=19

The storage visualization for each type is:

That single distinction explains the write behavior the orientation framing can't:

A row store updates one column by rewriting the whole row, since the columns are fused into a single keyed record. Cheap when you're touching one row at a time, but two writers updating different columns of the same row still contend on the same record.
A wide-column store can update one column of one row without touching the others, because the cell is its own keyed thing. Independent writers writing different columns to the same row never collide.
A column store can't cheaply insert one row, because there's no key to address it by. Position ties every value to its row, so inserting means realigning every file (and the compressed, sorted columns reject cheap in-place edits). That same positional layout is exactly what makes scans and compression spectacular.

Strength and weakness come from the same design decision. Keyed buys us independent writes and locality. Positional buys us compression and scan speed. Neither is "better." Each is tuned to a different access pattern.

The read side mirrors it:

A row store fetches one whole record in a single read, but scanning one column means dragging every full row off disk.
A wide-column store reads one row as a contiguous scan over its adjacent cells, but it's still reading row by row, not scanning one column across everything.
A column store reads one column across millions of rows by touching just that file, but reassembling one whole row means gathering a value from every file.

Point access wants keyed; analytical scan wants positional. Read and write pull the same direction, because they're the same access-pattern bet.

Where I'd been relying on this without naming it

A system I worked on stored predictions for an ads insights platform: three prediction types (impression uplift, click uplift, cost saving), produced by several suggestion engines, kept in BigTable. We did reason our way there from the read/write pattern. What this chapter gave me was sharper vocabulary for the thing we were already doing.

Run the workload through the two questions:

Writes were high-volume and batch: several engines aggregating offline, then loading results in. That's append-heavy ingest, the LSM side of Q1.
Reads were point and prefix lookups on a composite key like shop#group (read one group, or a range of groups under a shop). Not analytical scans across one column, so keyed, not positional, on Q2.

Append-heavy writes, keyed point and prefix reads. That shape fits a wide-column store like BigTable almost exactly, so that's where we landed. None of it is "row vs column" or "SQL vs NoSQL." It's just the workload described honestly, and the engine fell out of the description.

The fit wasn't perfect, either. Some read paths still needed online aggregation, which traded read latency for write-path simplicity, but that's a story for another post.

The part the access pattern doesn't decide

Here's what I only noticed reading this chapter. We assessed the read/write pattern, it pointed at BigTable, and we stopped there. I never actually asked the next question: could Postgres have done this too?

The honest answer is, probably yes.

Whole-row update contention from multiple writers? Postgres has a workaround. Split each writer's columns into separate tables keyed on (shop, group). No shared row, no contention.

Avoid joins on large tables? That join only exists because I split the tables. Keep one wide table and there's no join at all. And even the split join is cheap when it's indexed on a bounded set of rows.

The data too big for one machine? It wasn't, at the time. It fit comfortably on a single node.

So the thing that pointed me at BigTable, the access pattern, wasn't actually the thing that ruled Postgres out. Postgres could have served the same reads and writes. I just never ran the comparison, because I'd found something that fit and didn't look back.

That's the lesson I took from it. The access pattern narrows you to an engine class, and that narrowing is real and useful. But it doesn't rule out the alternatives within reach of that class. What actually separates BigTable from Postgres here is how each one grows past a single machine. Postgres is single-node by design: scaling out means sharding it yourself, picking a shard key, routing queries in the application, and rebalancing by hand as it grows. BigTable partitions automatically, splitting data into tablets by row-key range and spreading them across machines on its own. So once the data outgrows one node, Postgres turns into an operational project and BigTable just keeps going. That's the real dividing line, and it's the question I never got around to asking out loud.

Takeaway

So the honest shape of database selection, after this chapter:

The read/write access pattern selects the engine class: LSM vs B-tree, keyed vs positional. This part is real and reasoned, and it's what I actually did.
Scale and transactional needs finish the selection, and these are independent of the access pattern. Two stores can match the access pattern perfectly and differ entirely on whether they shard or do multi-row ACID. This is the step it's easy to skip once you've found something that fits.

The data structure follows the access pattern. The product follows the data structure plus scale.

This is the storage-engine layer of the decision. Replication, partitioning, and consistency (later chapters in the book) add their own factors, but those sit on top of this layer, they don't replace it.

TIL - Graph Thinking Without a Graph Database

M. Alwi Sukra — Thu, 28 May 2026 17:19:16 +0000

This week I read DDIA Chapter 2 related to data models. Most of it felt familiar. Relational vs document, many-to-many with junction tables, schema-on-read vs schema-on-write. These were things I had opinions about already.

But the graph data model section was a blind spot. I assumed graph databases were for social networks, interesting but not relevant to anything I was doing.

What graph data models actually are

One key aspect that the chapter emphasizes is how different data models handle many-to-many relationships. In a relational data model, we usually have several tables and a junction table that connects them. We also can add some additional columns to that junction table.

For a graph data model, we can think of it having 2 different tables: nodes and edges. Nodes are entities. Edges are relationships between them. Both can have properties.

There is not much difference between a relational and graph data model for a single relationship at a fixed depth. For example, a friendships table with user_a_id, user_b_id, since, is_close_friend is basically an edge with properties. Relational handles that fine.

The difference shows up when we start traversing.

Say we want "friends of friends". With a junction table, that's a self-join. "Friends of friends of friends" is another join. "Anyone reachable from me through any number of friendship hops" is a recursive CTE. It works, but the query complexity has nothing to do with how simple the question sounds.

In a graph query language, traversal is the native operation. Here's friends of friends in Cypher (Neo4j's query language):

MATCH (me:User {id: $userId})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof)
RETURN fof

We can read it almost like a sentence: match the pattern where I follow a friend, who follows a friend-of-friend. The arrows are edges; [:FOLLOWS] is the edge type to traverse.

And arbitrary depth is just one more character:

MATCH (me:User {id: $userId})-[:FOLLOWS*]->(reachable)
RETURN reachable

The * means "follow any number of these edges." Same query shape whether it's one hop or ten. In SQL, that jump from fixed depth to arbitrary depth means rewriting our query as a recursive CTE.

The second difference is that a graph treats different relationship types uniformly. In relational, follows, blocks, and memberships are usually separate tables, and traversing across them means a different join per table. In a graph, they're all just edges, and we can traverse across types in a single pattern.

So my take is that the real distinction is the traversal. Especially variable-depth traversal across multiple relationship types. It's a first-class operation in a graph model and an awkward bolt-on in SQL.

The shape that fits

The chapter convinced me that graph models suit problems where:

Relationships are recursive or variable-depth (friends of friends, transitive dependencies, reachability)
Multiple paths can exist between the same two entities
The type of relationship matters as much as the entities themselves
Queries are about traversal and reachability, not just lookup

If our data is mostly "fetch a row by ID" or "join two tables on a foreign key", relational is fine. But the moment we start asking "what's reachable from here through any valid path?", that's a graph question, whether we store it in a graph database or not.

Looking at my own work through this lens

I worked on an ads management system. The schema looked like this:

Some queries this service needed to answer:

Find all keywords in a shop.
Find all keywords in a group.
Find all keywords in an ad. (Keywords directly on the ad, plus keywords on the group that contains it.)

Reasonable schema, queries, and I'd worked with this code. But when I tried drawing the data as a graph, this is what I got:

There are multiple paths from a shop to a keyword: through an ad group, through an ad, through both. When I query "all keywords in a shop", I'm doing a graph traversal: "find every Keyword reachable from this Shop through any path of contains edges". I just hadn't been calling it that.

What actually changed for me

1. I started thinking about reachability instead of joins

Before: every question about the data felt like a question about which tables to join. To get keywords in a shop, I join keyword tables with ad/ad_group tables and filter by shop_id. The query was a sequence of join operations.

After: every question about the data feels like a question about which nodes are reachable from which. Find all keyword nodes reachable from this shop. The traversal is the question, and the join is just one implementation of the traversal.

The shift sounds subtle, but it's what made other approaches (denormalization, recursive CTEs, even just rephrasing the SQL) become visible. Once the question is "what's reachable from here?", the answer doesn't have to be "join these tables." It can be anything that gets us the same set of reachable nodes.

2. I noticed the multiple paths problem

In a tree, every node has exactly one parent. The hierarchy I was working with isn't a tree. A keyword can be reached from a shop through an ad group, or through an ad, or through both. An ad can belong to a shop directly, or be inside a group.

I'd been treating this as a quirk of the schema. The nullable columns, the join table, the two separate keyword tables, these were just "how things are." But the graph lens names it clearly: the data has multiple paths between the same kinds of nodes. That's a structural property, not a quirk.

And it explains why my SQL queries kept needing unions. Each UNION branch is one path. The graph is telling me up front that I'm going to need multiple branches; the schema was hiding that until the query made it visible.

3. I saw edges that weren't in my schema

The graph diagram has a contains edge from ad_group to ad. In my schema, that relationship lives in the ad_group_ad junction table.

But the graph also has implicit relationships that my schema doesn't model directly. The "keyword in shop X" relationship is real (and we query for it constantly), but no column or table represents it directly. It's a derived relationship, computed every time we run the traversal.

That's where the option space opens up. Once I can see shop-to-keyword as a meaningful relationship, I can ask whether to materialize it (denormalize shop_id onto every keyword) or keep deriving it (current approach with traversal). Both are valid; the graph view is what made the choice visible.

What it would look like as a graph query

In Cypher, "all keywords in a shop" is:

MATCH (s:Shop {id: $shopId})-[:CONTAINS*]->(k:Keyword)
RETURN k

In our actual schema, the same query takes two branches:

-- Keywords on groups in this shop
SELECT k.* FROM ad_group_keyword k
JOIN ad_group g ON k.ad_group_id = g.id
WHERE g.shop_id = $shopId

UNION

-- Keywords on ads in this shop
SELECT k.* FROM ad_keyword k
JOIN ad a ON k.ad_id = a.id
WHERE a.shop_id = $shopId AND a.status = 'active'

It works. But this query got slow on us in a way that took us a while to understand.

The issue wasn't the JOIN itself, or the UNION. It was the query planner. When a shop has many keywords, the planner sometimes picks an index path that ends up scanning across the ad tables (soft-deleted rows included), even when the request is for a small page of results.

The behavior was hard to predict because it depended on the shop's data distribution. Small shops were fine. Large shops sometimes triggered the bad path. And because the keyword list is a batch API, a single slow query multiplied across the batch and put real pressure on the database.

The team's fix was to stop letting the planner choose. We took the joins out of SQL and resolved them in application code: query each table separately with WHERE shop_id = X AND status = 'active' (which uses clean indexes predictably), then stitch the results in Go.

It works. But it's a workaround for a query that's conceptually one thing: find all keywords reachable from this shop. The graph traversal is happening, just spread across multiple queries and some application code, with the planner taken out of the loop entirely.

Would I actually use a graph database?

Constraints:

The hierarchy is only 3-4 levels deep. Graph databases shine on deep or unbounded traversal. Mine is bounded.
Nobody on the team has run Neo4j in production. PostgreSQL we know cold.
Our source of truth is already in Postgres. Adding a graph database means syncing two stores or migrating the source of truth, both big commitments.
Our queries are predictable. We're not doing pattern matching or shortest-path.

I'm not sure I'd reach for Neo4j here. The elegance of the Cypher query is real, but the operational cost feels high for the shape of problem I have.

What's more interesting is that the graph lens opened up another option.

A different option

What if every entity in the hierarchy carried its ancestor IDs directly?

ad:               id, shop_id, ad_group_id (nullable), status
ad_group_keyword: id, ad_group_id, tag, shop_id
ad_keyword:       id, ad_id, shop_id, ad_group_id (nullable), status

ad_group_ad still records the actual ad-to-group membership, but ad.ad_group_id is a maintained denormalization. Same idea on the keyword tables: each keyword carries shop_id, and ad_keyword also carries ad_group_id and status. The ID columns are nullable where the relationship doesn't apply.

Now the "find" queries get simpler:

-- All keywords in a shop
SELECT * FROM ad_group_keyword WHERE shop_id = $shopId
UNION
SELECT * FROM ad_keyword WHERE shop_id = $shopId AND status = 'active'

-- All keywords in a group
SELECT * FROM ad_group_keyword WHERE ad_group_id = $groupId
UNION
SELECT * FROM ad_keyword WHERE ad_group_id = $groupId AND status = 'active'

-- All keywords in an ad
-- Step 1: get the ad's group (single PK read)
SELECT ad_group_id FROM ad WHERE id = $adId
-- Step 2: pull keywords from both sources
SELECT * FROM ad_keyword WHERE ad_id = $adId AND status = 'active'
UNION
SELECT * FROM ad_group_keyword WHERE ad_group_id = $adGroupId

These are direct lookups on indexed columns. There's no join for the planner to mis-optimize, no intermediate result set whose size depends on data distribution. The query behavior is the same whether a shop has 50 keywords or 50,000.

The third case is two reads, but step 1 is just a primary-key lookup on the ad row we'd usually be fetching anyway.

The cost is that denormalization is now a system, not a single column. Moving an ad updates ad_group_id across its keywords; soft-deleting an ad updates status across its keywords. Two operations, both fan out to the keyword rows, both have to be transactional or the data drifts.

Drift is the part that worries me. The query patterns get dramatically cleaner, but the write paths get more places they could go wrong. Six months later, someone adds a new way to move ads between groups and forgets to update the keywords. The reads quietly return wrong results. So I'm not sure denormalization is the answer either.

What surprised me is that I'd been treating it as a normalization problem ("where should the foreign keys go?") instead of a modeling problem ("what shape does the data actually have?"). The graph perspective is what reframed it for me.

The takeaway

The data was always graph-shaped. The queries were always graph queries. The schema and the application code were doing graph work without the vocabulary to describe it. And once I could see the shape, alternatives I hadn't been considering became visible, even if I haven't decided which one is right.

I'm not sure I'll ever reach for a graph database. But learning about them is already changing how I think about modeling, even though I'm not using one.

TIL - What Response Time Metrics Really Mean

M. Alwi Sukra — Sun, 10 May 2026 07:55:53 +0000

I always thought high percentiles didn't really matter, they only impact a small number of users, right? I interpreted them as the worst case (something unlikely to affect most users).

This week I read DDIA and came across the part describing how Amazon sets their response time requirements at p99.9. That means the requirement is based on 1 in 1000 users :). But the reason is something I never thought of: the users in the high percentiles are most likely the ones with the most data, which makes them important users for Amazon.

I reflected on this with my experience working on an Ads Platform. Some processes were slow and it was almost always the same small group of users with many ads, which I assume also correlates with ads revenue contribution. I wonder if we had designed the system around those high-percentile users, maybe we could have made the platform better for all users, and best for our most important sellers.

Response time isn't the same as latency

I don't know why, but somehow I just know that response time and latency are different:

Service time: how long the server actually spends processing the request.
Latency: time the request spends waiting (queued, in transit, blocked).
Response time: what the caller sees: service time + network + queueing + everything else.

Response time is from the caller's perspective. Service time is from the callee's. They're almost never equal. I think this is important because most of us only track one side.

Average hides the shape

Response times aren't a single number, they're a distribution. Most requests are fast, a few are very slow, and the average sits somewhere awkward between them.

Average doesn't tell us how many users actually experienced the delay. An average of 200ms can mean everyone gets ~200ms, or that most get 50ms while a few get 2 seconds. The average doesn't tell us which one we have.

That's why averages aren't enough. We need a metric that respects the shape.

Percentiles, properly

A percentile shows "what response time were X% of requests faster than?"

p50: half were faster.
p95: 95% were faster, 5% were slower.
p99: 99% were faster, 1% were slower.
p99.9: 99.9% were faster, 0.1% were slower.

If p99 = 500ms, it means 1 out of every 100 requests took longer than 500ms. That's the part I used to dismiss as noise.

Which percentile to chase

Once we accept the tail matters, the next question is how far in?

Honestly, I don't know how to answer the question. Maybe the choice isn't really technical and it's a business question: which users have we decided to serve well? p99 means we're serving 99% of requests well. p99.9 means we're including heavy users (the ones who, going back to the Amazon insight, probably matter most).

A few things I wish I'd known earlier

Measure at both caller and callee. Callee might report p99 = 50ms while caller sees p99 = 300ms for the same calls. The 250ms gap is in the network, the connection pool, queueing, or the caller's own thread pool. If we only look at one side, we miss it.
Timeouts decouple the metrics. If the caller times out at 200ms and the callee takes 500ms, the callee's dashboard shows a successful 500ms response to a request the caller already gave up on. Both metrics are technically correct but are misleading on their own.
Don't average percentiles across servers. This is my second confession. For years, when our dashboard showed p99 from multiple servers, I'd take the average of those numbers and call it our "global p99." That's mathematically meaningless. The average of ten p99s is not the p99 of the combined population. The right way is to merge the underlying histograms first, then compute the percentile.

Takeaway

A metric isn't just a number. It's a statement about which users we've decided to serve well.

An average says "I care about the typical user." p99 says "I care about almost everyone." p99.9 says "I care about the heavy users too, the ones who probably matter most to the business."

For years, I was implicitly choosing the first one without realizing I was choosing anything.