Some thoughts are unpredictable.
For example:
"I wonder how pg_receivewal works internally?"
From the outside, it sounds almost innocent. Really, what could possibly be wrong with that? Just ordinary engineering curiosity. I will take a quick look,
understand the general structure, satisfy my curiosity, and then go on living peacefully.
But then, for some reason, this happens:
you are already building PostgreSQL from source, digging into receivelog.c, comparing the behavior of your little creation with the original step by
step, arguing with fsync, looking at .partial files like old friends, and suddenly discovering that you are writing
your own WAL receiver.
In short, everything started quite normally and with absolutely no signs of anything serious.
Why PostgreSQL in the First Place
I have been using PostgreSQL as the main DBMS in almost all of my projects for a long time - both personal and work-related. And the longer you
work with it, the more clearly you understand: this is not just a "good database". This is a system designed by people with a very
serious engineering culture.
When you read notes, discussions, and articles from PostgreSQL developers, you quickly notice how deeply they think through
changes, trade-offs, new features, and behavior in complex scenarios. After such materials, I usually
had a mixed feeling:
- admiration
- respect
- and a slight feeling that I had once again looked at work of a level unreachable for me
PostgreSQL gives you everything you need out of the box for backups and continuous WAL archiving. Including
pg_receivewal - the utility that eventually set everything in motion for me.
Why Exactly pg_receivewal
Because it is a very good utility. And good utilities are especially dangerous: they make you want to understand exactly how they
are built.
pg_receivewal continuously receives WAL segments, can work in synchronous and asynchronous replication modes, and in general
looks fairly straightforward. From a distance.
Up close, it turns out that there are quite a few subtle things there:
- how the main loop starts
- how connection drops are survived
- how restart is performed
- at what point
.partialbecomes a complete WAL file - how timeline switching is handled
- where and when important
fsynccalls must happen - what to do so that it is reliable, not slow, and not embarrassing
So, as usual: a simple utility with a decent amount of engineering accuracy hidden around it.
A Few Words About Other Good Solutions I Looked at With Respect and Envy
Before writing something of my own, of course, I spent a lot of time looking at already existing solutions.
I use two of them at work for continuous archiving of the most critical and main databases.
pgBackRest
pgBackRest is, without exaggeration, an engineering tank. Everything in its source code is impressive:
- logging
- testing
- architectural discipline
- incremental and differential backups
- support for large installations
- attention to edge cases
And, of course, validation by the community and by time.
When you read the code of this tool, you catch yourself thinking: yes, this is what a product
written by people who know what they are doing looks like.
And then you open your own repository and immediately become humble.
Barman
I like Barman for a different reason.
It does not try to magically solve everything in the world.
It is, essentially, a very understandable orchestrator around standard PostgreSQL tools: pg_receivewal and pg_basebackup.
It has a quality that I value a lot: a simple and reliable model.
Not "everything at once", but careful automation around already existing, proven tools.
This also strongly influenced how I started thinking about my own tool.
Why Go, If I Had to Look at So Much C
I decided to write my tool in Go.
The reasons are fairly ordinary:
- recently, I have really enjoyed writing in this concise language
- simplicity and a UNIX background
- it is convenient for writing network and system-level things
- concurrency is handled well in it
- it fits cloud-native scenarios very naturally
- and, importantly, it is still a little harder to accidentally shoot yourself in the foot with a grenade launcher
But there is an important nuance: to understand PostgreSQL, I had to seriously dig into C code.
And here I want to separately say something I formulated for myself a long time ago:
C is, in my opinion, both the most difficult and the most brilliant language at the same time.
I have not spent as much time on any other language trying to understand its semantics.
Syntax is nothing - semantics are everything. Pointers alone are a simple concept, but
hide a whole chain of icebergs underneath. There was even a time when I was making a compiler for C, with a preprocessor,
assembler, and PE32 output (*.exe). I played with that for a long time; it was a very interesting experience and time spent happily.
The C language is so direct, so honest, and so close to the metal that it becomes scary. It feels like
it is very easy to make six sextillion mistakes in it just while opening a file and taking a breath. One pointer going the wrong way -
and that is it, hello, a new form of humiliation. Segmentation Fault becomes a kind of spell that must not be said out loud, lest you
summon it.
With all that said, I cannot say that I know C.
Honestly, I probably know about three percent of it. And even that only on a good day.
But even those three percent were extremely useful to me.
Without them, I would not have been able to read PostgreSQL properly: to separate real logic from my own delusions,
follow the control flow, and at least roughly understand why everything here is arranged this way and not another.
So formally I wrote the tool in Go, but in practice this project also became my way of touching C a little more deeply
- and gaining even more respect for the people who have been writing such systems in it for years.
The Beginning: Compiling PostgreSQL, Debugging, and the First Signs of Recklessness
To understand the implementation details at all, I had to go into the PostgreSQL source code.
I had to learn how to:
- build PostgreSQL from source
- run it in debug mode
- attach a debugger
- watch how calls flow
- understand what happens inside the replication loop
- establish the relationship between components and functions
And here I got a surprise: all of this turned out to be less scary than I had imagined. PostgreSQL built, pg_receivewal
started, the debugger attached to the process, and this immediately gave me the dangerous confidence that "well,
now I will definitely figure this out quickly".
Of course, I did not figure it out.
The first thing I did was, like a true amateur, add the most aggressive tracing possible. I logged everything:
- function entries
- exits
- variable values
- branches
- important calls
- and sometimes, it seemed, the mere fact that the universe existed
At first, it seems very clever. Then you have gigantic logs, you no longer understand whether you are reading the system or whether it is slowly
breaking your mind, and the realization comes: many logs do not mean much understanding.
But at this stage, the overall picture started to emerge. I began to understand how entities are connected, where the WAL receiving
loop starts, how errors are survived, what happens to .partial, and at which moments decisions are made about completing a segment.
I discovered libraries, very well-written and years-polished file handling functions, and many more insanely cool things for
the piggy bank of my mind.
And at some point I could not resist: enough watching, time to write.
The First Prototype: "I Will Just Reproduce pg_receivewal"
I had a very naive idea: not to invent anything new, but simply to reproduce the behavior of
pg_receivewal as closely as possible.
In theory, it sounds wonderful.
In practice, it means that you voluntarily sign up for weeks of studying:
- exactly how the streaming loop starts
- how it reacts to connection drops with the database
- what a correct restart should look like, from which file and from which offset inside it
- when a
.partialfile can be considered complete - how timeline changes are handled
- where you misunderstood something
- and where you no longer understand anything at all, but continue out of stubbornness
My first more-or-less stable prototype appeared after a couple of weeks. And those were very fun weeks. At times I
felt like a researcher and a super-cool mega-hacker, at other times - like a person who crawled into an aircraft engine without a license to repair it
using someone else's notes.
But there is one thing I really want to point out: PostgreSQL code is surprisingly pleasant to read. Good comments, competent
decomposition, respect for the reader and colleagues. Even if you yourself understand about twenty percent, it is still clear that in front of you is very
strong engineering work.
When You Realize That Simply Receiving WAL Is Only the Beginning
When the prototype finally worked, the joy did not last long.
Because I already understood: receiving WAL is only half the job. And then the usual engineering carnival begins:
- compression
- encryption
- uploading to S3
- uploading to SFTP
- cleaning up old files
- monitoring
- external scripts
- cron
- more scripts
- and then scripts that fix the previous scripts
And I have never liked this universe of external glue. Because it almost always looks like it was written
at night under the threat of a production incident, and then everyone was afraid to touch it. And all of it smells bad and looks disgusting.
Scripts around WAL archiving are often fragile, non-obvious, poorly tested, and live on faith that "it somehow
works". And in critical things, I wanted exactly the opposite.
I wanted the main program itself to manage the archive:
- to know what can already be compressed
- to know what still cannot be deleted
- to understand when a file can be sent to remote storage
- and not to try to make such decisions through a layer of suspicious bash magic
So management components began to appear around the WAL receiver:
- one receives the log
- another archives and encrypts
- a third sends files to S3 or SFTP
- a fourth handles retention and automatic cleanup
- a fifth collects metrics and monitors process state
And at that point, the project stopped being "just a utility". It started turning into a small system where coordination,
order, and the absence of internal fights between components mattered.
About Base Backup: I Did Not Want To, but Curiosity Won
Initially, I had no intention of implementing base backup at all.
The reason is simple: the replication protocol is single-threaded. For small databases, that is fine. For large ones - not so rosy anymore.
If a backup takes ten hours every ten hours, that is, to put it mildly, not always convenient.
Multi-threaded approaches usually require the tool to live next to the database itself. And I wanted exactly the opposite: to remotely
collect WAL and make backups from databases located anywhere - in the cloud, on virtual machines, in Kubernetes - and at the same time not
require sidecar containers or any special infrastructure changes from them.
But then the thing that happens to many technical projects happened:
I did not plan this functionality, and then it simply became interesting.
In the end, I did implement streaming base backup. It does not claim to be a universal solution for huge
installations, but for databases around 200 GiB it turned out to be quite practical. A couple of hours for a nightly job is already a reasonable
scenario.
So it turned out not to be a "superweapon", but an honest working tool in a clear niche.
Why I Did Not Go Deeper Into Incremental Backups
Of course, I also looked at incremental / differential backups.
But there you quickly understand an unpleasant thing: taking an incremental backup is not victory yet. You then have to
assemble it back correctly. And that means a completely different level of complexity begins:
- either write your own analogue of
pg_combinebackup - or very carefully depend on an external tool
- or drown in the number of edge cases and incompatibilities
At that point I honestly looked at the task and decided that I already had enough problems without it.
pgBackRest does such things in a truly well-thought-out way. But reproducing that level is not "built over a couple of
weekends on enthusiasm". It is large, heavy engineering work for years. So I consciously stopped at a simpler
model: reliable base backup for small and medium production environments.
Without claims to world domination. Just a working, predictable thing.
Architecture: The Moment When You Are No Longer Writing a Utility but Coordinating Chaos
As soon as you have several background processes, it immediately becomes clear that the main difficulty is no longer WAL as
such, but making sure this whole household does not fight with itself.
You need to be able to:
- not start a backup if another one has not finished yet
- not start archiving if it is already running
- not delete something that may still be needed
- handle errors correctly
- carefully stop background processes
- keep the system in a predictable state
Here I had to seriously think about patterns:
- job queue
- worker pool
- supervisor
- pipes
- task lifecycle management
- safe shutdown
- goroutine coordination
At some point I realized that I was no longer "writing a WAL receiver". I was assembling a gearbox. And if even one gear
shifts a little, all of this will either start screaming or silently break. And silently breaking software is the worst kind of software.
At the same time, the main task was to make sure the main WAL receiving process was not affected by "noisy neighbors".
Streaming Large Files: Another Source of Creativity
There is another pleasant task as well: transferring large backup files to remote storage.
When a database weighs, for example, 300 GiB, you quickly understand:
- you do not want to save everything locally, and often it is not convenient
- you cannot pull it all into memory
- you also do not want to write a crooked intermediate scheme, because you will have to maintain it yourself later
So you need a proper streaming pipeline: read the data, transform it on the way, and immediately send it further - without
intermediate garbage, without extra storage, without special effects.
Here Go was useful again. It has good primitives for streaming processing. Although the presence of primitives, of course, does not
stop you from making design mistakes for a very long time.
fsync: The Most Subtle Part and My Own Little Nervous Breakdown
If I had to choose what drained the most blood from me, the winner is obvious: fsync.
This is the place where you first think: "well, this part is simple". And then you discover that you have been staring at
the receivelog.c source code for several hours with the expression of a person who has voluntarily entered a very strange stage of life.
The problem here is that it is easy to be wrong in both directions:
- call
fsynctoo often - everything slows down - call it too rarely - later you may look at the result very sadly
So it is either slow or shameful. Quite a rich choice, to put it mildly.
I had to literally compare the behavior of my implementation with pg_receivewal step by step:
- where exactly synchronization happens
- at what moment
- why exactly there
- which scenarios must force
fsync - and how to do neither too much nor too little
In the end, the key points turned out to be:
-
fsyncafter finishing writing a segment -
fsyncwhen renaming.partialto the final WAL file -
fsyncon keepalive if the server requests a reply -
fsyncon errors in the receiving loop
Then the truly fun part began: integration checks. I ran two receivers simultaneously (pg_receivewal, pgrwl), generated
WAL, compared timings, then compared the resulting files byte by byte, measured timing differences in milliseconds, and tried to remove
everything unnecessary.
I even got to logging: in places like this, you begin to understand that it can be either a helper or a quiet
saboteur. For example, you do not need to parse attributes if the logging level does not require it; extra CPU cycles
can be spent on more useful things.
In the end, I managed to achieve very similar behavior and complete matching of the resulting WAL files over the same interval. And
the small timing difference remained only where it is normal: two daemons cannot be started in the exact same
physical microsecond, no matter how hard you try.
In the fight against slowness, I even quickly wrote a small utility that injects
a defer into EVERY function, where the runtime of that function is measured. Not the best check,
but, as practice showed, it helps quickly identify especially hot functions, and then point
the profiler, debugger, and so on at them. My tracing looks something like this:
FUNCTION CALLS TOTAL_NS TOTAL_SEC
-------- ----- -------- ---------
storecrypt.Put 70 23061361400 23.06
receivesuperv.uploadOneFile 35 11606918000 11.61
fsync.Fsync 106 8813968000 8.81
xlog.processOneMsg 4481 6818721600 6.82
xlog.processXLogDataMsg 4481 6814495400 6.81
xlog.CloseWalFile 35 6561511500 6.56
xlog.closeAndRename 35 6559979000 6.56
fsync.FsyncFname 70 6525596900 6.53
.....500 more lines
Metrics: Because I Wanted to See Whether It Was Still Alive or Already Dead
Over time, I also added metrics:
- number of files
- archive size
- number of errors
- transferred bytes
- state of background tasks
- deleted files
- general runtime statistics
I even made a Grafana dashboard. Not the most beautiful one in the world, but useful enough to quickly understand: everything is still
alive or it is already time to get nervous.
It was important to me to make metrics free if they are disabled. So wherever possible, I used the
noop approach: if observability is not needed, the system should not pay for it.
Logging: Where I Also Realized I Still Have a Long Way to Go
Logging had its own coming-of-age story.
At first, I logged everything. Because, as everyone knows, any person who has deeply entered a complex system for the first time
starts with the phrase: "I will just add more logs and understand everything".
No.
Many logs are not understanding. They are just many logs.
Good logging is when, at the moment of a problem, logs really help you understand what is going on, and do not turn into
an additional source of noise and despair.
I have not yet managed to make this part as good as I would like. The current result is normal, but
not exemplary. And in this sense, pgBackRest still remains for me an example of a very smart and thoughtful approach: you can see
how much discipline and engineering care went specifically into diagnostics.
Integration Tests: The Hardest and Most Important Part
One of the most difficult and at the same time most necessary parts of the whole project is integration testing.
Because a daemon that depends on another daemon is already not the easiest object to test. And if you
also want to:
- start PostgreSQL
- generate WAL
- stop processes
- make a backup
- restore the database
- compare the state before and after
- run failure scenarios
- check compatibility and correctness
then life starts playing in especially bright colors
I settled on this approach: simple shell scripts that start the test environment in a container,
populate the database, perform actions, then restore everything and check the result.
I also really did not want to drag a ton of dependencies like testcontainers into the project.
In the end, it turned out like this:
- shell scripts
- docker compose
- matrix in GitHub Actions
- isolated scenarios
- without unnecessary heavy magic where understandable mechanics are enough
That is how I got tests for:
- comparison with
pg_receivewal - backup/restore
- uploading to S3 and SFTP
- correctness of WAL files
- stopping and restarting
- different failure scenarios
And honestly, integration tests are what give me the main confidence in releases. Not one hundred percent, of course. One hundred
percent in such things is promised either by madmen or by marketers. But good, engineering-honest confidence - yes.
Unit tests, of course, also exist. But for me, integration checks are the main criterion
that all of this is not only nicely written (not nicely everywhere), but actually works.
What Came Out of It
Over time, from the fairly harmless desire to "just see how pg_receivewal works", a tool grew that now has:
- streaming WAL receiver
- archiving
- compression
- encryption (streaming AES-256-GCM)
- uploading to S3 (streaming, +multipart)
- uploading to SFTP
- retention and automatic cleanup
- metrics
- logging (mostly zero-cost)
- base backup
- configuration through a file and environment variables
- controlled shutdown
- unit and integration tests
- behavior comparison with
pg_receivewal - documentation with diagrams and examples
- as many usage examples as possible (standalone/docker-compose/k8s)
- helm-chart (quite simple and working)
- website (in progress, but at least now it is clear how this is done and that it is possible)
- a set of patterns and libraries for further reuse in Go projects
So, as usually happens, the project long ago stopped being what it seemed to be at the beginning.
What Is Planned
- improve metrics, remove what is unnecessary, add what is needed, build a truly useful and beautiful dashboard
- improve logging quality, make it consistent, think through levels more carefully, preserve zero-cost semantics
- add new capabilities for base backup - around fine-tuning retention periods
- a huge amount of space for refactoring and documentation
- add even more integration tests, I am planning a V2 version
- add every "breaking" scenario to the tests that my imagination can produce
- make the website properly, right now it is just a copy of the documentation
- create a user guide (because it is simply interesting)
- and much more
What I Took Away From This
Perhaps the main result is not that I wrote yet another tool.
The main result is something else:
- I understood PostgreSQL much more deeply
- I gained even more respect for C, although I know about a miserable three percent of it
- I saw how difficult it is to reproduce even a small part of the behavior of a well-made system utility
- and once again I became convinced that high-quality code written by others is the best way to quickly cure yourself of excessive self-confidence
Because one thing is to look at architecture from the outside and admire it.
And it is a completely different thing to try to reproduce at least part of that logic yourself and not fall apart along the way.
And yes. If it ever seems to you that the thought
"maybe I should also write some utility for PostgreSQL?"
sounds like a good idea for a couple of quiet weekends -
I have two pieces of news for you.
The first: the idea really is interesting.
The second: you most likely will not have quiet weekends anymore.
Links
- pg_receivewal Documentation
- pg_receivewal Source Code
- Streaming Replication Protocol
- Continuous Archiving and Point-in-Time Recovery
- Setting Up WAL Archiving
- pgBackRest
- Barman
Repository: https://github.com/pgrwl/pgrwl
Thanks for reading!
Top comments (0)