Fernando Rodriguez

Posted on Apr 30 • Originally published at frr.dev

OpenAI scales PostgreSQL for 800 million users with a single writer (no sharding)

#postgres #scalability #infrastructure #openai

Every time an article comes out about a large company's infrastructure, half the Hacker News comments are variations of "of course they use Kubernetes with 47 microservices and a distributed database with custom consensus protocol." And when it turns out they don't—that they use plain PostgreSQL with a single primary and discipline—there's an uncomfortable silence.

That just happened with OpenAI.

The numbers nobody expected

Bohan Zhang, infrastructure engineer at OpenAI, published details about how they scale PostgreSQL for ChatGPT. The numbers:

800 million users
A single PostgreSQL *primary* (writer) on Azure
~50 *read replicas*
Millions of queries per second
p99 of 10-19ms
99.999% availability
One SEV-0 in a year (and that was from ImageGen's viral launch, which added 100 million new users in a week)

Read that again. One. Single. Writer. For 800 million users.

"But they should shard"

No. And the reason is brutally pragmatic.

Sharding PostgreSQL would have required modifying hundreds of endpoints in the application. Every query that assumes all data lives in the same database—which is practically all of them—would need to be rewritten to know which shard contains each piece of data.

The cost of that migration? Months of engineering work, new bugs at every corner, and a transition period where you maintain both systems.

What they did instead? They identified the heaviest writes and moved them to Cosmos DB. Not because Cosmos is better than PostgreSQL, but because those specific workloads fit better in a document model. The rest—the vast majority of business logic—stayed in PostgreSQL.

Instead of complicating the entire system, they isolated the problem and solved it where it hurt. Surgery with a scalpel, not a chainsaw.

PgBouncer: from 50ms to 5ms per connection

One of the first bottlenecks they found was connection establishment latency. PostgreSQL creates a process for each new connection. With thousands of simultaneous connections from hundreds of application pods, the connection overhead consumed 50ms before executing a single query.

The solution: PgBouncer as a connection pooler. It maintains a pool of already-established connections and reuses them. Result: connection latency dropped to 5ms. 90% less, by changing a piece of plumbing.

It's not new technology. PgBouncer has been in production at companies of all sizes for over 15 years. But there it is: a battle-tested, boring tool solving a problem in one of the most-used applications on the planet.

The ORM that did 12-table joins

This is my favorite. Because I've seen it in my students' projects, in startups, in banks. Everywhere.

The ORM generated queries with joins across 12 tables. Not because someone designed it that way, but because the models were related to each other and the ORM, obediently, followed the relationships to the end.

The solution wasn't changing ORMs or switching to manual queries for everything. It was moving logic to the application. Instead of asking PostgreSQL to do a monstrous join, they made several simpler queries and assembled the data in code.

Is that less elegant? Yes. Is it faster? Enormously. Because PostgreSQL can optimize simple queries much better than a 12-table join with cross conditions. And because you can cache partial results and reuse them.

-- BEFORE: the ORM generates this
SELECT u.*, p.*, s.*, t.*, ...
FROM users u
JOIN profiles p ON ...
JOIN settings s ON ...
JOIN teams t ON ...
JOIN ... -- 12 tables
WHERE u.id = $1;

-- AFTER: separate queries, logic in application
SELECT * FROM users WHERE id = $1;
SELECT * FROM profiles WHERE user_id = $1;
-- cacheable, parallelizable, debuggeable

Each individual query is trivial. The query planner executes them in microseconds. And if one fails or runs slow, you know exactly which one.

The defenses nobody sees

What I find brilliant about Bohan Zhang's article isn't the big numbers, but the small defenses that prevent everything from falling apart:

`idle_in_transaction_session_timeout`

If a transaction sits open doing nothing, PostgreSQL kills it after a configurable time. Why does this matter? Because an open transaction blocks *autovacuum. And without *autovacuum, tables bloat, indexes degrade, and eventually your database gets slower every day.

It's like leaving the fridge door open. Nothing happens for the first 5 minutes. But if you forget it all night, the next day everything is at room temperature.

Schema changes with 5-second timeout

When you do an ALTER TABLE in PostgreSQL, you need a lock on the table. If there are long transactions running, that lock waits. And while it waits, it blocks all new queries. A schema migration that takes 200ms can bring down your database if there's an old transaction that won't finish.

OpenAI's solution: SET lock_timeout = '5s'. If the migration can't get the lock in 5 seconds, it aborts. Better to fail fast and retry than block the entire system waiting.

Rate limiting in 4 layers

Not one. Not two. Four layers of rate limiting:

Edge/CDN — blocking abusive traffic before it reaches the application
API gateway — limits per user/API key
Application — limits per operation type
Database — connection limits and statement timeouts

Each layer catches what the previous one lets through. Defense in depth. The same onion philosophy I apply for defenses against hallucinations, but for infrastructure.

Workload isolation by priority

Not all queries are equal. A query for "show user's chat" is critical—if it fails, the user sees an error. A query for "generate analytics report" is important, but can wait 30 seconds.

OpenAI routes queries by priority to different read replicas. High-priority replicas have less load and respond faster. Low-priority ones can run hotter without affecting user experience.

It's common sense, but requires discipline. You have to classify each query, configure routing, and resist the temptation to send everything to the fast replica "because it's just one more query."

Backfills that take weeks

When you need to populate a new column for 800 million users, you can't do UPDATE users SET new_column = computed_value. That would lock the table, saturate the disk, and probably bring down the primary.

At OpenAI, backfills run with strict rate limiting. Weeks. A backfill that takes weeks.

Sound horrible? It's the opposite. It's the decision of a team that understands backfill speed is irrelevant compared to system stability. Better to take 3 weeks with nobody noticing than take 3 hours and have a SEV-0 at 2 AM.

The cascading replication that's coming

Currently they have ~50 replicas connected directly to the primary. Each replica consumes a replication connection and bandwidth from the primary. With 50 it's manageable. With 100+ it would be a problem.

The solution they're developing: cascading replication. Replicas that replicate from other replicas, not from the primary. A tree instead of a star. The primary sends data to 5-10 first-level replicas, and those replicas feed the rest.

It's the same idea as BitTorrent. Instead of everyone downloading from the same server, nodes share with each other. Works for pirated movies, works for WAL segments.

The lesson nobody wants to hear

The industry has an addiction to over-engineering. Every week a new database comes out promising to solve problems most companies don't have. And every week, engineering teams adopt those technologies because they "scale better" or "are more modern," without asking whether PostgreSQL with a bit of discipline would do the job.

OpenAI—the company defining the future of AI, with one of the fastest-growing products in history—uses PostgreSQL. With a single primary. No sharding. No exotic distributed database.

They use PgBouncer (2007). Read replicas (concept from the 90s). Connection pooling (as old as relational databases). Rate limiting (invented before most of us were born).

The magic isn't in the technology. It's in the discipline:

Simple queries instead of monstrous joins
Aggressive timeouts instead of infinite waits
Workload isolation instead of "everything on the same server"
Migrate only what needs migrating, don't rewrite everything

For your next standup

The next time someone on your team proposes migrating to a distributed database, or sharding PostgreSQL, or adding a queue service between the API and database "because it won't scale," show them these numbers.

800 million users. One primary. p99 of 10-19ms. 99.999% uptime.

And ask: "Is our problem really that PostgreSQL doesn't scale? Or is it that our queries are a mess?"

Because it's almost always the second one.

Source: Inside the Postgres Setup Powering 800M ChatGPT Users — Bohan Zhang, OpenAI. If you read only one infrastructure article this year, make it this one.

DEV Community