Most developers use PostgreSQL every day without knowing
what happens under the hood when they run an INSERT. I
wanted to find out. So I built pgstream — a tool that
reads every database change in real time directly from
PostgreSQL's internal log.
What is the WAL?
Every time you insert, update, or delete a row, Postgres
doesn't immediately write to the actual data files. It
first writes the change to a file called the WAL — Write
Ahead Log. This is how Postgres guarantees nothing gets
lost if the server crashes mid-write.
I had read about WAL in docs before but never really
understood why it existed until I started building this.
It clicked when I realized — the WAL is basically a
journal. Postgres writes every intention down first,
then acts on it.
Logical Replication and pgoutput
WAL records are stored in raw binary. You can't just
read them like a text file. Postgres has a built-in
plugin called pgoutput that decodes that binary into
readable messages — INSERT, UPDATE, DELETE — with the
actual row data included.
To read from the WAL stream your program creates a
replication slot. Think of it as a bookmark. Postgres
tracks your position in the stream and holds WAL records
until your reader has consumed them. This means you can
stop and restart your reader without missing any changes.
Building pgstream
I built pgstream in Go using the pglogrepl library which
handles the low level replication protocol. The project
has four main parts:
Connector — opens a replication connection to Postgres
and creates the slot.
Decoder — this was the hardest part. It reads raw bytes
coming from Postgres and figures out what each message
means. Postgres sends different message types — Relation
messages that describe table schemas, Insert messages,
Update messages, Delete messages, and keepalive
heartbeats. The decoder has to handle all of them
correctly.
The tricky part was the Relation message. Before Postgres
sends an INSERT event it sends a Relation message
describing the table structure — column names, types.
You have to store that and reference it later when
decoding the actual row data. Without this you get raw
bytes with no idea which column is which.
Handler — takes the decoded event and prints it cleanly
to the terminal.
The part that surprised me
What surprised me most was how fast it is. The moment
you run an INSERT in pgAdmin the change appears in the
terminal almost instantly. There is basically no delay.
I knew replication was fast in theory but seeing it
happen live made it real. This is the same mechanism
companies use to sync databases across regions in
real time.
What I learned
I understood Postgres as a database before this. Now I
understand it as a system. WAL, logical decoding,
replication slots, LSN — these are the internals that
make Postgres reliable at scale.
Go was also new to me coming from JavaScript and C.
The concurrency model with goroutines and channels
is genuinely different from anything I had used before.
Try it yourself
pgstream is open source.
github.com/mujib77/pgstream
If you want to understand what's happening inside your
Postgres database — clone it, run it, and watch your
changes stream in real time.
Top comments (0)