https://www.youtube.com/watch?v=fY-pGkrLXg4
You're building a RAG app. Your team says: Postgres for the data, Pinecone for the vector search. You nod — because that's what you always do, one database per job.
Here's the thing nobody tells you up front: Postgres is the database that became a platform. The vector search, the geospatial queries, the time-series rollups, the fuzzy text search — all of it might already be Postgres. And three specific design decisions are what made that possible.
MVCC: Readers Don't Wait for Writers
Two transactions on the same row. One reads. One updates. Same instant.
In a traditional row-lock database, one of them has to wait. In Postgres, for ordinary reads and writes, neither one waits — and the reason is one fact that almost nobody teaches in an intro.
In Postgres, an UPDATE does not change the row. It writes a new row. Every row has two hidden system columns: xmin (the transaction that created it) and xmax (the transaction that ended its visibility by updating or deleting it). When one transaction reads and another updates the same row, each one is looking at a different version.
-- txid 100: reads row. xmin=100 is visible to it.
SELECT email FROM users WHERE id = 42;
-- txid 101 at the same instant: updates. Creates NEW row with xmin=101.
-- Old row gets stamped xmax=101.
UPDATE users SET email = 'new@example.com' WHERE id = 42;
Both queries return without waiting. This is MVCC — multi-version concurrency control. Postgres still uses locks for schema changes and explicit SELECT FOR UPDATE, but for ordinary read-write conflicts on the same row, versioning replaces blocking.
And critically: this behavior isn't reserved for the users table. Every Postgres extension inherits it. Your vector search is lock-free. Your geospatial queries are lock-free. All of it, for free, from the core.
The Planner: EXPLAIN ANALYZE Shows You Everything
Paste EXPLAIN in front of any query and Postgres shows you the plan it's about to execute — not the SQL you wrote, but a tree of operators (scans, joins, sorts, indexes) with estimated costs. Add ANALYZE, and Postgres runs the query and fills in real timing at every node.
EXPLAIN ANALYZE
SELECT u.email, o.total
FROM users u JOIN orders o ON o.user_id = u.id
WHERE u.country = 'CA';
The output tells you: did it use an index on country? Did it hash-join or nested-loop? How many rows did each node actually produce vs. expect? You don't have to guess.
The mental model: your SQL is the input; the plan is the output. The planner builds the plan, the executor runs it. You wrote what answer you want — the planner picks how.
Here's the part that matters for the platform argument. The same planner that handles a JOIN on users also handles a pgvector nearest-neighbor query. Same tree, same operator framework. Extensions can even teach the planner how to estimate costs for their own operators. That's why, when you bolt on vector search, geospatial, or time-series, the queries feel native — to the planner, they are native.
Extensions: One Engine, Many Databases
Here's the decision that actually made Postgres what it is: CREATE EXTENSION.
Run this:
CREATE EXTENSION pgvector;
That one command installed new functionality into your Postgres instance:
- A new type:
vector, for storing arrays of floats - A new distance operator:
<->, computing the L2 distance between two vectors - New index access methods (
ivfflat,hnsw) for nearest-neighbor search
And it all plugs into the same engine. Same transactions. Same MVCC. Same planner.
Now you can do this:
CREATE TABLE docs (
id int PRIMARY KEY,
content text,
embedding vector(1536)
);
CREATE INDEX ON docs USING ivfflat (embedding vector_l2_ops);
SELECT id, content
FROM docs
ORDER BY embedding <-> $1
LIMIT 10;
That's vector similarity search, running inside the same transaction as your users table. No separate service. No separate API. No separate backup story.
And this is the pattern — not the feature. Once you see it once, you see it everywhere:
-
PostGIS adds geometry types, spatial operators (
ST_DWithin,ST_Intersects), GiST-based spatial indexes. Same mechanism. -
pg_trgm adds trigram-based fuzzy text matching, so
WHERE name % 'databse'matches "database". Same mechanism. - TimescaleDB goes further — it layers its own chunking machinery for time-series — but it plugs into the same transactions, the same planner, the same write-ahead log.
One engine. Many databases.
And because every change to core Postgres storage — whether it's an INSERT into users or an update to your pgvector index — goes through the same write-ahead log before touching the data file, extensions built on that storage inherit the same crash-recovery story as your primary tables. Kill the process mid-write, restart, the log replays.
So What Do You Do With This
Next time your team is about to add another datastore, run through four checks:
- Does Postgres have the data type?
- Does it have an index for your query pattern?
- Is there an extension that covers the use case?
- Is the latency acceptable for your workload?
If it passes all four — don't add the service yet. If it fails one, or you need extreme scale, distribution, or operational isolation — then add the separate service, intentionally.
You don't have to like every database. You just have to know what Postgres already does.
Top comments (0)