alexey.zh

Posted on Apr 18

A Long Story about how I dug into the PostgreSQL source code to write my own WAL receiver, and what came out of it

#postgres #go

Some thoughts are unpredictable.

For example:
"I wonder how pg_receivewal works internally?"

From the outside, it sounds almost innocent. Really, what could possibly be wrong with that? Just ordinary engineering curiosity. I will take a quick look,
understand the general structure, satisfy my curiosity, and then go on living peacefully.

But then, for some reason, this happens:
you are already building PostgreSQL from source, digging into receivelog.c, comparing the behavior of your little creation with the original step by
step, arguing with fsync, looking at .partial files like old friends, and suddenly discovering that you are writing
your own WAL receiver.

In short, everything started quite normally and with absolutely no signs of anything serious.

Why PostgreSQL in the First Place

I have been using PostgreSQL as the main DBMS in almost all of my projects for a long time - both personal and work-related. And the longer you
work with it, the more clearly you understand: this is not just a "good database". This is a system designed by people with a very
serious engineering culture.

When you read notes, discussions, and articles from PostgreSQL developers, you quickly notice how deeply they think through
changes, trade-offs, new features, and behavior in complex scenarios. After such materials, I usually
had a mixed feeling:

admiration
respect
and a slight feeling that I had once again looked at work of a level unreachable for me

PostgreSQL gives you everything you need out of the box for backups and continuous WAL archiving. Including
pg_receivewal - the utility that eventually set everything in motion for me.

Why Exactly `pg_receivewal`

Because it is a very good utility. And good utilities are especially dangerous: they make you want to understand exactly how they
are built.

pg_receivewal continuously receives WAL segments, can work in synchronous and asynchronous replication modes, and in general
looks fairly straightforward. From a distance.

Up close, it turns out that there are quite a few subtle things there:

how the main loop starts
how connection drops are survived
how restart is performed
at what point .partial becomes a complete WAL file
how timeline switching is handled
where and when important fsync calls must happen
what to do so that it is reliable, not slow, and not embarrassing

So, as usual: a simple utility with a decent amount of engineering accuracy hidden around it.

A Few Words About Other Good Solutions I Looked at With Respect and Envy

Before writing something of my own, of course, I spent a lot of time looking at already existing solutions.

I use two of them at work for continuous archiving of the most critical and main databases.

`pgBackRest`

pgBackRest is, without exaggeration, an engineering tank. Everything in its source code is impressive:

logging
testing
architectural discipline
incremental and differential backups
support for large installations
attention to edge cases

And, of course, validation by the community and by time.

When you read the code of this tool, you catch yourself thinking: yes, this is what a product
written by people who know what they are doing looks like.
And then you open your own repository and immediately become humble.

`Barman`

I like Barman for a different reason.
It does not try to magically solve everything in the world.
It is, essentially, a very understandable orchestrator around standard PostgreSQL tools: pg_receivewal and pg_basebackup.

It has a quality that I value a lot: a simple and reliable model.
Not "everything at once", but careful automation around already existing, proven tools.

This also strongly influenced how I started thinking about my own tool.

Why Go, If I Had to Look at So Much C

I decided to write my tool in Go.

The reasons are fairly ordinary:

recently, I have really enjoyed writing in this concise language
simplicity and a UNIX background
it is convenient for writing network and system-level things
concurrency is handled well in it
it fits cloud-native scenarios very naturally
and, importantly, it is still a little harder to accidentally shoot yourself in the foot with a grenade launcher

But there is an important nuance: to understand PostgreSQL, I had to seriously dig into C code.

And here I want to separately say something I formulated for myself a long time ago:
C is, in my opinion, both the most difficult and the most brilliant language at the same time.

I have not spent as much time on any other language trying to understand its semantics.
Syntax is nothing - semantics are everything. Pointers alone are a simple concept, but
hide a whole chain of icebergs underneath. There was even a time when I was making a compiler for C, with a preprocessor,
assembler, and PE32 output (*.exe). I played with that for a long time; it was a very interesting experience and time spent happily.

The C language is so direct, so honest, and so close to the metal that it becomes scary. It feels like
it is very easy to make six sextillion mistakes in it just while opening a file and taking a breath. One pointer going the wrong way -
and that is it, hello, a new form of humiliation. Segmentation Fault becomes a kind of spell that must not be said out loud, lest you
summon it.

With all that said, I cannot say that I know C.
Honestly, I probably know about three percent of it. And even that only on a good day.

But even those three percent were extremely useful to me.
Without them, I would not have been able to read PostgreSQL properly: to separate real logic from my own delusions,
follow the control flow, and at least roughly understand why everything here is arranged this way and not another.

So formally I wrote the tool in Go, but in practice this project also became my way of touching C a little more deeply

and gaining even more respect for the people who have been writing such systems in it for years.

The Beginning: Compiling PostgreSQL, Debugging, and the First Signs of Recklessness

To understand the implementation details at all, I had to go into the PostgreSQL source code.

I had to learn how to:

build PostgreSQL from source
run it in debug mode
attach a debugger
watch how calls flow
understand what happens inside the replication loop
establish the relationship between components and functions

And here I got a surprise: all of this turned out to be less scary than I had imagined. PostgreSQL built, pg_receivewal
started, the debugger attached to the process, and this immediately gave me the dangerous confidence that "well,
now I will definitely figure this out quickly".

Of course, I did not figure it out.

The first thing I did was, like a true amateur, add the most aggressive tracing possible. I logged everything:

function entries
exits
variable values
branches
important calls
and sometimes, it seemed, the mere fact that the universe existed

At first, it seems very clever. Then you have gigantic logs, you no longer understand whether you are reading the system or whether it is slowly
breaking your mind, and the realization comes: many logs do not mean much understanding.

But at this stage, the overall picture started to emerge. I began to understand how entities are connected, where the WAL receiving
loop starts, how errors are survived, what happens to .partial, and at which moments decisions are made about completing a segment.
I discovered libraries, very well-written and years-polished file handling functions, and many more insanely cool things for
the piggy bank of my mind.

And at some point I could not resist: enough watching, time to write.

The First Prototype: "I Will Just Reproduce `pg_receivewal`"

I had a very naive idea: not to invent anything new, but simply to reproduce the behavior of
pg_receivewal as closely as possible.

In theory, it sounds wonderful.
In practice, it means that you voluntarily sign up for weeks of studying:

exactly how the streaming loop starts
how it reacts to connection drops with the database
what a correct restart should look like, from which file and from which offset inside it
when a .partial file can be considered complete
how timeline changes are handled
where you misunderstood something
and where you no longer understand anything at all, but continue out of stubbornness

My first more-or-less stable prototype appeared after a couple of weeks. And those were very fun weeks. At times I
felt like a researcher and a super-cool mega-hacker, at other times - like a person who crawled into an aircraft engine without a license to repair it
using someone else's notes.

But there is one thing I really want to point out: PostgreSQL code is surprisingly pleasant to read. Good comments, competent
decomposition, respect for the reader and colleagues. Even if you yourself understand about twenty percent, it is still clear that in front of you is very
strong engineering work.

When You Realize That Simply Receiving WAL Is Only the Beginning

When the prototype finally worked, the joy did not last long.

Because I already understood: receiving WAL is only half the job. And then the usual engineering carnival begins:

compression
encryption
uploading to S3
uploading to SFTP
cleaning up old files
monitoring
external scripts
cron
more scripts
and then scripts that fix the previous scripts

And I have never liked this universe of external glue. Because it almost always looks like it was written
at night under the threat of a production incident, and then everyone was afraid to touch it. And all of it smells bad and looks disgusting.

Scripts around WAL archiving are often fragile, non-obvious, poorly tested, and live on faith that "it somehow
works". And in critical things, I wanted exactly the opposite.

I wanted the main program itself to manage the archive:

to know what can already be compressed
to know what still cannot be deleted
to understand when a file can be sent to remote storage
and not to try to make such decisions through a layer of suspicious bash magic

So management components began to appear around the WAL receiver:

one receives the log
another archives and encrypts
a third sends files to S3 or SFTP
a fourth handles retention and automatic cleanup
a fifth collects metrics and monitors process state

And at that point, the project stopped being "just a utility". It started turning into a small system where coordination,
order, and the absence of internal fights between components mattered.

About Base Backup: I Did Not Want To, but Curiosity Won

Initially, I had no intention of implementing base backup at all.

The reason is simple: the replication protocol is single-threaded. For small databases, that is fine. For large ones - not so rosy anymore.
If a backup takes ten hours every ten hours, that is, to put it mildly, not always convenient.

Multi-threaded approaches usually require the tool to live next to the database itself. And I wanted exactly the opposite: to remotely
collect WAL and make backups from databases located anywhere - in the cloud, on virtual machines, in Kubernetes - and at the same time not
require sidecar containers or any special infrastructure changes from them.

But then the thing that happens to many technical projects happened:
I did not plan this functionality, and then it simply became interesting.

In the end, I did implement streaming base backup. It does not claim to be a universal solution for huge
installations, but for databases around 200 GiB it turned out to be quite practical. A couple of hours for a nightly job is already a reasonable
scenario.

So it turned out not to be a "superweapon", but an honest working tool in a clear niche.

Why I Did Not Go Deeper Into Incremental Backups

Of course, I also looked at incremental / differential backups.

But there you quickly understand an unpleasant thing: taking an incremental backup is not victory yet. You then have to
assemble it back correctly. And that means a completely different level of complexity begins:

either write your own analogue of pg_combinebackup
or very carefully depend on an external tool
or drown in the number of edge cases and incompatibilities

At that point I honestly looked at the task and decided that I already had enough problems without it.

pgBackRest does such things in a truly well-thought-out way. But reproducing that level is not "built over a couple of
weekends on enthusiasm". It is large, heavy engineering work for years. So I consciously stopped at a simpler
model: reliable base backup for small and medium production environments.

Without claims to world domination. Just a working, predictable thing.

Architecture: The Moment When You Are No Longer Writing a Utility but Coordinating Chaos

As soon as you have several background processes, it immediately becomes clear that the main difficulty is no longer WAL as
such, but making sure this whole household does not fight with itself.

You need to be able to:

not start a backup if another one has not finished yet
not start archiving if it is already running
not delete something that may still be needed
handle errors correctly
carefully stop background processes
keep the system in a predictable state

Here I had to seriously think about patterns:

job queue
worker pool
supervisor
pipes
task lifecycle management
safe shutdown
goroutine coordination

At some point I realized that I was no longer "writing a WAL receiver". I was assembling a gearbox. And if even one gear
shifts a little, all of this will either start screaming or silently break. And silently breaking software is the worst kind of software.
At the same time, the main task was to make sure the main WAL receiving process was not affected by "noisy neighbors".

Streaming Large Files: Another Source of Creativity

There is another pleasant task as well: transferring large backup files to remote storage.

When a database weighs, for example, 300 GiB, you quickly understand:

you do not want to save everything locally, and often it is not convenient
you cannot pull it all into memory
you also do not want to write a crooked intermediate scheme, because you will have to maintain it yourself later

So you need a proper streaming pipeline: read the data, transform it on the way, and immediately send it further - without
intermediate garbage, without extra storage, without special effects.

Here Go was useful again. It has good primitives for streaming processing. Although the presence of primitives, of course, does not
stop you from making design mistakes for a very long time.

`fsync`: The Most Subtle Part and My Own Little Nervous Breakdown

If I had to choose what drained the most blood from me, the winner is obvious: fsync.

This is the place where you first think: "well, this part is simple". And then you discover that you have been staring at
the receivelog.c source code for several hours with the expression of a person who has voluntarily entered a very strange stage of life.

The problem here is that it is easy to be wrong in both directions:

call fsync too often - everything slows down
call it too rarely - later you may look at the result very sadly

So it is either slow or shameful. Quite a rich choice, to put it mildly.

I had to literally compare the behavior of my implementation with pg_receivewal step by step:

where exactly synchronization happens
at what moment
why exactly there
which scenarios must force fsync
and how to do neither too much nor too little

In the end, the key points turned out to be:

fsync after finishing writing a segment
fsync when renaming .partial to the final WAL file
fsync on keepalive if the server requests a reply
fsync on errors in the receiving loop

Then the truly fun part began: integration checks. I ran two receivers simultaneously (pg_receivewal, pgrwl), generated
WAL, compared timings, then compared the resulting files byte by byte, measured timing differences in milliseconds, and tried to remove
everything unnecessary.

I even got to logging: in places like this, you begin to understand that it can be either a helper or a quiet
saboteur. For example, you do not need to parse attributes if the logging level does not require it; extra CPU cycles
can be spent on more useful things.

In the end, I managed to achieve very similar behavior and complete matching of the resulting WAL files over the same interval. And
the small timing difference remained only where it is normal: two daemons cannot be started in the exact same
physical microsecond, no matter how hard you try.

In the fight against slowness, I even quickly wrote a small utility that injects
a defer into EVERY function, where the runtime of that function is measured. Not the best check,
but, as practice showed, it helps quickly identify especially hot functions, and then point
the profiler, debugger, and so on at them. My tracing looks something like this:

FUNCTION                            CALLS  TOTAL_NS     TOTAL_SEC
--------                            -----  --------     ---------
storecrypt.Put                      70     23061361400  23.06
receivesuperv.uploadOneFile         35     11606918000  11.61
fsync.Fsync                         106    8813968000   8.81
xlog.processOneMsg                  4481   6818721600   6.82
xlog.processXLogDataMsg             4481   6814495400   6.81
xlog.CloseWalFile                   35     6561511500   6.56
xlog.closeAndRename                 35     6559979000   6.56
fsync.FsyncFname                    70     6525596900   6.53

.....500 more lines

Metrics: Because I Wanted to See Whether It Was Still Alive or Already Dead

Over time, I also added metrics:

number of files
archive size
number of errors
transferred bytes
state of background tasks
deleted files
general runtime statistics

I even made a Grafana dashboard. Not the most beautiful one in the world, but useful enough to quickly understand: everything is still
alive or it is already time to get nervous.

It was important to me to make metrics free if they are disabled. So wherever possible, I used the
noop approach: if observability is not needed, the system should not pay for it.

Logging: Where I Also Realized I Still Have a Long Way to Go

Logging had its own coming-of-age story.

At first, I logged everything. Because, as everyone knows, any person who has deeply entered a complex system for the first time
starts with the phrase: "I will just add more logs and understand everything".

No.

Many logs are not understanding. They are just many logs.

Good logging is when, at the moment of a problem, logs really help you understand what is going on, and do not turn into
an additional source of noise and despair.

I have not yet managed to make this part as good as I would like. The current result is normal, but
not exemplary. And in this sense, pgBackRest still remains for me an example of a very smart and thoughtful approach: you can see
how much discipline and engineering care went specifically into diagnostics.

Integration Tests: The Hardest and Most Important Part

One of the most difficult and at the same time most necessary parts of the whole project is integration testing.

Because a daemon that depends on another daemon is already not the easiest object to test. And if you
also want to:

start PostgreSQL
generate WAL
stop processes
make a backup
restore the database
compare the state before and after
run failure scenarios
check compatibility and correctness

then life starts playing in especially bright colors

I settled on this approach: simple shell scripts that start the test environment in a container,
populate the database, perform actions, then restore everything and check the result.
I also really did not want to drag a ton of dependencies like testcontainers into the project.

In the end, it turned out like this:

shell scripts
docker compose
matrix in GitHub Actions
isolated scenarios
without unnecessary heavy magic where understandable mechanics are enough

That is how I got tests for:

comparison with pg_receivewal
backup/restore
uploading to S3 and SFTP
correctness of WAL files
stopping and restarting
different failure scenarios

And honestly, integration tests are what give me the main confidence in releases. Not one hundred percent, of course. One hundred
percent in such things is promised either by madmen or by marketers. But good, engineering-honest confidence - yes.

Unit tests, of course, also exist. But for me, integration checks are the main criterion
that all of this is not only nicely written (not nicely everywhere), but actually works.

What Came Out of It

Over time, from the fairly harmless desire to "just see how pg_receivewal works", a tool grew that now has:

streaming WAL receiver
archiving
compression
encryption (streaming AES-256-GCM)
uploading to S3 (streaming, +multipart)
uploading to SFTP
retention and automatic cleanup
metrics
logging (mostly zero-cost)
base backup
configuration through a file and environment variables
controlled shutdown
unit and integration tests
behavior comparison with pg_receivewal
documentation with diagrams and examples
as many usage examples as possible (standalone/docker-compose/k8s)
helm-chart (quite simple and working)
website (in progress, but at least now it is clear how this is done and that it is possible)
a set of patterns and libraries for further reuse in Go projects

So, as usually happens, the project long ago stopped being what it seemed to be at the beginning.

What Is Planned

improve metrics, remove what is unnecessary, add what is needed, build a truly useful and beautiful dashboard
improve logging quality, make it consistent, think through levels more carefully, preserve zero-cost semantics
add new capabilities for base backup - around fine-tuning retention periods
a huge amount of space for refactoring and documentation
add even more integration tests, I am planning a V2 version
add every "breaking" scenario to the tests that my imagination can produce
make the website properly, right now it is just a copy of the documentation
create a user guide (because it is simply interesting)
and much more

What I Took Away From This

Perhaps the main result is not that I wrote yet another tool.

The main result is something else:

I understood PostgreSQL much more deeply
I gained even more respect for C, although I know about a miserable three percent of it
I saw how difficult it is to reproduce even a small part of the behavior of a well-made system utility
and once again I became convinced that high-quality code written by others is the best way to quickly cure yourself of excessive self-confidence

Because one thing is to look at architecture from the outside and admire it.
And it is a completely different thing to try to reproduce at least part of that logic yourself and not fall apart along the way.

And yes. If it ever seems to you that the thought

"maybe I should also write some utility for PostgreSQL?"
sounds like a good idea for a couple of quiet weekends -

I have two pieces of news for you.

The first: the idea really is interesting.
The second: you most likely will not have quiet weekends anymore.

Links

Repository: https://github.com/pgrwl/pgrwl

Thanks for reading!

DEV Community

A Long Story about how I dug into the PostgreSQL source code to write my own WAL receiver, and what came out of it

Why PostgreSQL in the First Place

Why Exactly `pg_receivewal`

A Few Words About Other Good Solutions I Looked at With Respect and Envy

`pgBackRest`

`Barman`

Why Go, If I Had to Look at So Much C

The Beginning: Compiling PostgreSQL, Debugging, and the First Signs of Recklessness

The First Prototype: "I Will Just Reproduce `pg_receivewal`"

When You Realize That Simply Receiving WAL Is Only the Beginning

About Base Backup: I Did Not Want To, but Curiosity Won

Why I Did Not Go Deeper Into Incremental Backups

Architecture: The Moment When You Are No Longer Writing a Utility but Coordinating Chaos

Streaming Large Files: Another Source of Creativity

`fsync`: The Most Subtle Part and My Own Little Nervous Breakdown

Metrics: Because I Wanted to See Whether It Was Still Alive or Already Dead

Logging: Where I Also Realized I Still Have a Long Way to Go

Integration Tests: The Hardest and Most Important Part

What Came Out of It

What Is Planned

What I Took Away From This

Links

Top comments (0)

Why PostgreSQL in the First Place

Why Exactly pg_receivewal

A Few Words About Other Good Solutions I Looked at With Respect and Envy

pgBackRest

Barman

Why Go, If I Had to Look at So Much C

The Beginning: Compiling PostgreSQL, Debugging, and the First Signs of Recklessness

The First Prototype: "I Will Just Reproduce pg_receivewal"

When You Realize That Simply Receiving WAL Is Only the Beginning

About Base Backup: I Did Not Want To, but Curiosity Won

Why I Did Not Go Deeper Into Incremental Backups

Architecture: The Moment When You Are No Longer Writing a Utility but Coordinating Chaos

Streaming Large Files: Another Source of Creativity

fsync: The Most Subtle Part and My Own Little Nervous Breakdown

Metrics: Because I Wanted to See Whether It Was Still Alive or Already Dead

Logging: Where I Also Realized I Still Have a Long Way to Go

Integration Tests: The Hardest and Most Important Part

What Came Out of It

What Is Planned

What I Took Away From This

Links

Why Exactly `pg_receivewal`

`pgBackRest`

`Barman`

The First Prototype: "I Will Just Reproduce `pg_receivewal`"

`fsync`: The Most Subtle Part and My Own Little Nervous Breakdown