Tawanda Nyahuye

Posted on Jun 28

The German Tank Problem: Why You Need UUIDs

#api #webdev #programming #backend

Business volume revealed by sequential IDs

In World War II, the Allies had a very expensive question and no good way to answer it: how many tanks is Germany actually building?

The official method was spies, intercepted chatter, and educated guessing. The official method was also, as it turns out, wildly, embarrassingly wrong, intelligence estimates put German tank production at well over a thousand a month.

Then some statisticians showed up and ruined everyone's mystique by doing arithmetic.

See, the Germans were excellent engineers, which is another way of saying they were pathologically organized. Every tank rolled off the line with neatly sequential serial numbers stamped on the gearbox, the chassis, the road wheels, everything. And every time the Allies captured or destroyed a tank, those numbers got written down.

So the statisticians stopped trying to spy on the factories and started reading the serial numbers of the wreckage. If you've captured a handful of tanks and the highest serial number you've seen is m, and you've seen k of them, then the total number produced is roughly:

N ≈ m + (m / k) − 1

The intuition is beautiful and slightly evil: the biggest number you've seen tells you roughly how close you are to the top, and how many you've seen tells you how confident to be about that. The gaps between the serials you do have tell you about the ones you don't.

For one month in 1942, the spies said Germany was making around 1,500 tanks. The statisticians, armed with nothing but captured serial numbers and a formula, said 327. After the war, the actual German production records were recovered.

The real number was 342.

The spies missed by a thousand. The nerds missed by fifteen. Somewhere, a very smug statistician got a medal, and the lesson was carved into the bedrock of intelligence work forever:

Sequential numbers leak. If your serial numbers go 1, 2, 3, 4, anyone who sees a few of them can estimate how many of you exist.

I think about this every single time I look at a URL that says /users/1042.

Your database is the German army

Here's the uncomfortable part. Almost every backend you've ever written is the Wehrmacht, cheerfully stamping sequential serial numbers on everything and then handing them to strangers.

You spin up a Postgres table. The primary key is id SERIAL — auto-incrementing integer, because of course it is, that's the default and it's beautiful and it sorts nicely. User 1 is you. User 2 is your co-founder. User 3 is your mom. Everything is fine.

Then you build the profile page. The route is /users/3. You ship it. You are now a German tank.

Because here is what your competitor, or a bored teenager, or a journalist, or anyone with a browser, can now do. They sign up for your app today and get assigned id = 4,317. They wait a week. They sign up again with a different email and get id = 4,981.

Subtract.

You got 664 signups this week. They didn't breach anything. They didn't hack you. They read your serial numbers, exactly like the Allies read the gearboxes, and your growth rate fell out of the arithmetic. Your "we're crushing it, investors love us" pitch deck just got fact-checked by a stranger with two throwaway emails and the subtraction skills of a nine-year-old.

It gets worse, because IDs leak more than count. They leak order and time:

/invoices/58 on launch day tells the world you have billed exactly 58 times in your company's entire existence.
A support ticket numbered #7 tells your enterprise customer they are, uh, one of your first seven enterprise customers. Inspiring.
Two orders placed a minute apart with IDs 9,000 and 9,003 tell a competitor you process roughly three orders a minute at peak.

And then there's the part the security people care about, which is that sequential IDs aren't just informative, they're guessable. If I can see /api/orders/9000, I can also just... try /api/orders/8999. And 8998. And if your authorization is even slightly lazy, and friend, it is I am now reading other people's orders. This has a name. It's called IDOR (Insecure Direct Object Reference), it's been in the OWASP Top 10 for approximately forever, and it is almost always born the moment someone exposes a sequential primary key to the outside world.

Enter the UUID, wearing a fake moustache

The fix is to stop stamping your serial numbers in order.

A UUID (Universally Unique Identifier) is a 128-bit value that, in its most common form (v4), is essentially random:

f47ac10b-58cc-4372-a567-0e02b2c3d479

Look at that gorgeous nonsense. What's the previous user's ID? You have no idea. What's the next one? No idea. How many users exist? You cannot tell, because there's no sequence to read, no maximum to anchor on, no gaps to measure. The German Tank Problem needs serial numbers in a row. A UUID is a serial number that fell into a wood chipper. The formula has nothing to bite on.

As a bonus, and this is the part that wins over the people who don't care about counting attacks, UUIDs are globally unique without coordination. Two different servers, two different services, an offline mobile client on a plane, can all generate IDs at the same time and never collide. No round-trip to the database to ask "what number am I allowed to use next?" You can generate the ID before the row even exists. For anyone building distributed systems, that property alone is worth the price of admission.

So: random, unguessable, count-hiding, coordination-free. We solved it. Ship it. Close the tab.

Okay, here's where I have to be the annoying friend

Because this idea has a failure mode on each end, and I refuse to write a post that pretends UUIDs are free.

They are not free. A bigint is 8 bytes. A UUID is 16, and if you store it as text like a maniac it's 36. Across a hundred-million-row table with a dozen foreign keys pointing at it, that overhead is not theoretical, it's your storage bill and your RAM.

But the real knife is the index. Your database stores its primary key in a B-tree, which is fastest and tidiest when new values arrive in roughly increasing order, every insert tucks neatly onto the end. A random UUIDv4 arrives like a drunk guest who sits between two people at every table. The database has to constantly split pages, shuffle things around, and re-read cold parts of the index off disk. This is called write amplification and page fragmentation, and it's why someone, somewhere, migrated a high-traffic table to random UUIDs and watched their insert performance fall off a cliff and then wrote a furious blog post about it. (You will read that post right before making the same mistake. It's tradition.)

So we did what engineers always do: we fixed the fix.

UUIDv7 (standardized in 2024) puts a timestamp in the high bits and randomness in the low bits. So IDs trend upward over time — the B-tree is happy again, while still being unguessable and uncountable. You can't subtract two of them to get a signup count. This is, for most apps, the correct default in 2026.
ULIDs do basically the same trick with a friendlier, sortable text encoding.
Snowflake IDs (the Twitter classic) cram a timestamp, a machine ID, and a counter into a compact 64 bits — smaller and sortable, at the cost of leaking a little timing info.

But notice the catch, because it's the whole German Tank Problem sneaking back in the side door: a time-ordered ID still leaks the one thing it's ordered by, time. UUIDv7 won't tell anyone your total user count, but it will whisper roughly when each record was created. That's a much smaller leak than "subtract for the growth rate," but it's not zero. If creation timestamps are sensitive in your domain, even v7 is a partial disrobing. Pick your poison on purpose.

And the sneakiest trap of all: an unguessable ID is not an authorization system. A UUID being hard to guess is not the same as a UUID being protected. If your only defense against me reading someone else's invoice is "well, they'd have to guess a 122-bit random number," you have built a password and called it an ID. UUIDs slam the door on enumeration. They do absolutely nothing if you forget to check whether the person holding the ID is actually allowed to use it. Check your authz. The random ID is the lock; the auth check is the guard. You need both.

So what do you actually do tomorrow

You can't un-stamp the serial numbers on a system you've already shipped, but you fully control two dials, so go set them on purpose:

Stop exposing your primary key. The cleanest move on most teams: keep a boring auto-increment bigint as the internal primary key (your indexes stay fast, your joins stay cheap) and add a separate random external ID — a UUIDv7 — for anything the outside world ever sees. URLs, API responses, invoice numbers. The fast key stays in the basement; the wood-chipper key goes out front.
Default new public-facing IDs to UUIDv7, not v4. You get the count-hiding without setting your write performance on fire. Reach for v4 only when you specifically want zero time signal and don't care about index locality.
Then check your authorization anyway, because the ID was never the security boundary. It just stops people from reading you like a captured gearbox.

The Allies won that particular round of the war with a formula and a pile of serial numbers, because the other side was tidy enough to number everything in order and careless enough to let those numbers be seen.

Don't be tidy where it counts against you. Number your tanks at random.

The statisticians are still out there. They are still very smug. And the next time someone signs up for your app twice in a week just to subtract the IDs, you get to be the insufferable person who already shipped UUIDv7, and is, annoyingly, fine.

Top comments (5)

UnitBuilds • Jun 28

Trust the germans to out engineer themselves 🫠 HH for cryptography (Enigma) and serial numbers for tank count... They really were way too organized for war...

The UUID thing is true, but be careful what you use it for. Did you know the enterprise billing system, BillQuick still uses a SQL entry for license checks? Replace values with NULL and it resets the license key... UUID or not, if you store the table as "BQLicenses" and you store the entry as "License Key", no amount of UUID will save you.

Imo, SQL needs a obfuscation protocol, so the devs can see what's up, but in production every entry is obfuscated (correct me if I'm wrong, but it doesnt have that yet?). Cuz abstraction helps, but systemic cryptography makes it far more secure. Because while a UUID hides user count, etc. It does nothing to prevent attacks on the db, if an attack happens why not have the data encrypted at rest?

Tawanda Nyahuye • Jun 29

Some great feedback there mate

Saleha Mubeen • Jun 30

This is a fantastic explanation of why identifier design matters beyond just database implementation. The connection to the German Tank Problem makes the risks of sequential public IDs very intuitive.

One point I'd emphasize is that using separate internal and external identifiers is often the sweet spot: keep an auto-incrementing primary key for efficient joins and indexing, while exposing a UUID (ideally UUIDv7 for most new systems) in APIs and URLs. And as you noted, UUIDs reduce enumeration risks—but they're not a substitute for proper authorization checks. Every request still needs server-side access validation.

Great read, especially for developers who default to exposing id values without considering the security and business implications.

Tawanda Nyahuye • Jun 30

Thank you for the feedback. That's a great suggestion, framing it as a double-edged sword with both internal and external aspects adds a valuable perspective. I'm now considering updating the blog to incorporate that idea, as I think it will make the article much stronger. I really appreciate your insight.

mote • Jul 2

The German Tank Problem applied to databases is well-established, but the same logic applies to AI agent internals â and nobody is thinking about it yet. If your agent's internal identifiers are sequential, an external observer (or a malicious tool) can infer: how many reasoning steps has this agent taken, when did it hit a failure, how many recovery attempts were made before success. Sequential IDs in agent state aren't just an enumeration risk â they're a timing side channel.

The UUIDv7 recommendation is correct for the "IDs exposed to the outside world" case. For internal agent state, there's an additional consideration: if your agent is doing structured reasoning â branching, backtracking, forking â a global time-ordering might actually be wrong for you. You want causal ordering, not wall-clock ordering. Two events that happened at similar timestamps might be causally unrelated (different branches) while two events from different timestamps might be on the same critical path. Causal IDs â something like Lamport timestamps or vector clocks â give you ordering guarantees that wall-clock UUIDs don't.

On the B-tree write amplification point for embedded contexts: it's even sharper than the article suggests. On a server with an OS page cache, the fragmentation cost is amortized. On an embedded system â a robot, a drone, a mobile app â there's no page cache. Every write amplification is a real flash write, and flash has finite endurance. The UUIDv7 approach helps significantly here by keeping inserts append-ish, which reduces fragmentation even in embedded B-tree implementations. Worth specifying UUIDv7 as the default for embedded agent runtimes specifically, not just general server-side systems.