DEV Community: alexey.zh

PostgreSQL to Go REST API, Generated

alexey.zh — Sat, 09 May 2026 18:03:40 +0000

Writing business logic can be interesting.

Writing yet another REST wrapper around a table is not.

Never.

Ever.

This is not software development; it is digital sock knitting. Except the socks then need to be covered with tests, documented, wrapped into DTOs, passed through a service, a repository, a handler, a mapper, a payload, and, preferably, completed without crying in the process.

Imagine a very ordinary situation.

A DBA adds a new table to the database. From the SQL side, everything is beautiful: a clean schema, proper indexes, relationships, constraints, the whole package. The DBA is happy. The database is happy. Somewhere far away, a tiny bird is singing.

Meanwhile, the backend developer realizes two things:

They now need to create a controller, service, repository, interfaces, CRUD implementation, entity, DTO, payload, mappers, documentation, and probably something else, because “that is how we do things here.”
They urgently need to reconsider their career path. Maybe become a painter. Or a baker. Or a person who herds goats and has no idea what CreateUserRequest is.

And that is only one table.

When was the last time you saw a database with just one table?

Usually, things are much more entertaining. Especially when the database was originally built as a large standalone project, and the backend and frontend were postponed until “later.” This is a beautiful architectural approach: first build the city, then think about roads, electricity, and maybe a fountain.

You get the password to the DEV environment.

You connect.

And there you see several hundred tables, a couple dozen schemas, historical artifacts, a users2 table, a users_new table, a users_final table, a users_final_really table, and also a reference dictionary that nobody has touched since 2016 because “it works.”

And that is still considered a small database.

At this point, you once again want to change jobs, move to the forest, raise livestock, grow tomatoes, and explain to your children that you used to write REST APIs, but now you have a normal life.

Routine Code Is Not Heroism

I love writing code. I am a programmer.

But I do not love monkey work disguised as “just a normal task for a couple of days.”

Writing another handler is not hard. Writing another service is not hard. Writing another repository is not hard either.

That is exactly the problem.

You have already done it a thousand times. You know where GetByID will be. You know where Create will be. You know that somewhere nearby there will be Update, then Delete, then “let’s add filtering,” then “let’s add pagination,” then “why is this field not named the same way as on the frontend?”

And you sit there thinking:
am I really a software engineer, or just a very expensive template copier?

Business logic is interesting.
Architectural decisions are interesting.
Understanding the domain is interesting.
Writing the twenty-seventh CRUD layer for the dict_operation_status table is not a spiritual journey. It is punishment.

So the Generator Appeared

With these thoughts in mind, while starting another project, I decided to spend a couple of weeks not writing CRUD by hand for the first 100 tables, but building a code generator instead.

At first, it was for Java.

This is not about generating business logic. There is no magic here. The generator does not know how your business should work, why an order cannot be canceled after payment, or why production should not be touched after 6 PM on Friday, even though everyone still touches it anyway.

This is only about primitive CRUD.

That same layer that is almost always needed, but writing it by hand every time feels like manually moving bytes from one folder to another while commenting on the process in Jira.

And, as practice showed, the idea was not wasted.

Instead of heroically suffering for several weeks, you can generate the foundation, open the project in your IDE, and do normal work. Or at least a more meaningful form of suffering.

Then Go Came Along

Later, I started working on Go projects, and I decided to adapt the same logic there.

In some ways, Java is simpler. Project structure usually lives by the principle: “everything must be here, named exactly like this, built with Maven, and please do not ask unnecessary questions.”

Go is closer to me. It is simpler, freer, and gives more room for creativity.

Of course, along with creativity comes the other side of freedom: ten logging libraries, ten HTTP libraries, ten opinions about project structure, and twenty people in the comments explaining why your version is wrong.

After reading forums, GitHub, other people’s projects, and surviving a mild existential crisis, the structure gradually started to take shape.

Generation could begin.

“Just Generate an Entity” Sounds Easy

Collecting the list of tables from a database and turning them into Go structs is not that hard.

Well, almost. And those structs are not really useful on their own anyway; you need the full CRUD layer with methods and everything else.

And then the real database begins.

And in a real database, you have:

multiple schemas
identical table names in different schemas
tables without primary keys, because “it is fine”
composite primary keys
data types that stare at you like ancient gods
legacy decisions that nobody understands but everyone is afraid to delete
column names that make you want to call both a linguist and a therapist

And that is where CRUD stops being so simple.

But most problems are solvable. Over time, roughly 90% of typical cases can be covered. The remaining 10% can be finished by hand, and that is fine.

The important part is that the main routine is already done.

That means you no longer need to manually create hundreds of nearly identical files while pretending this is “backend layer development.”

Why This Is Actually Usable

I ran the generator against several production databases that were large enough, complex enough, and honest enough.

And I came to the conclusion: yes, this can be used.

The generated code is easy to refactor in an IDE. Today this is not a big problem: rename packages, move files around, adjust the structure, make it fit the project style.

The important thing is that the starting routine is already closed.

That same layer that made you want to run away to the forest and grow tomatoes in the morning is already sitting in the project.

Not perfect. Not the final architecture of your dreams. But good enough to start working instead of slowly turning into a boilerplate-code generator running on biological fuel.

A Small Example

For each table, we generate a package under internal/api/<schema>/<table>/:

internal/api/
  repository.go          # top-level repository interfaces (all tables)
  service.go             # top-level service interfaces (all tables)
  handler.go             # router: mounts all routes

  public/
    products/
      entity.go          # DB struct mapped from table schema
      repository.go      # pgx queries: Save, Update, Delete, Find, FindAll, paginated
      service.go         # business logic layer, delegates to repository
      dto.go             # CreateDto, UpdateDto, Dto (internal transfer types)
      payload.go         # HTTP request/response types with JSON tags
      handler.go         # net/http handlers with Swagger annotations

Let’s look at generation for a table like this:

create table products (
    record_id   serial primary key,
    category_id int          not null references categories (record_id),
    name        varchar(250) not null,
    description text
);

comment on table products is 'Stores products with a reference to their category.';
comment on column products.name is 'Name of the product.';

We get:

entity.go:

type Products struct {
    RecordID    int      `json:"record_id"    db:"record_id"`
    CategoryID  int      `json:"category_id"  db:"category_id"`
    Name        string   `json:"name"         db:"name"`
    Description *string  `json:"description"  db:"description"`
    CreatedAt   time.Time `json:"created_at"  db:"created_at"`
    UpdatedAt   time.Time `json:"updated_at"  db:"updated_at"`
    GUID        string   `json:"guid"         db:"guid"`
}

repository.go:

func (r *repo) Save(ctx context.Context, inputEntity *Products) (*Products, error) {
    query := `
        insert into public.products (category_id, name, description)
        values ($1, $2, $3)
        returning record_id, category_id, name, description, created_at, updated_at, guid
    `
    row := r.db.Pool.QueryRow(ctx, query,
        inputEntity.CategoryID,
        inputEntity.Name,
        inputEntity.Description,
    )
    return scanFullRow(row)
}

handler.go:

// @Summary Create new item
// @Tags products
// @Accept json
// @Produce json
// @Param request body productsCreateRequest true "Create input"
// @Success 201 {object} productsResponse
// @Router /api/v1/products [post]
func (h *Handler) Save(w http.ResponseWriter, r *http.Request) {
    req := &productsCreateRequest{}
    if err := httputils.ReadJSON(r, req); err != nil {
        httputils.WriteJSON(w, http.StatusBadRequest, httputils.ErrorResponse{Message: err.Error()})
        return
    }
    // validate -> map to DTO -> call service -> map to response
    resp, err := h.svc.Save(r.Context(), mapCreateRequestToCreateInputDto(req))
    ...
    httputils.WriteJSON(w, http.StatusCreated, dtoToPayload)
}

And a basic set of routes:

POST   /api/v1/products
PUT    /api/v1/products/{record_id}
DELETE /api/v1/products/{record_id}
GET    /api/v1/products/{record_id}
GET    /api/v1/products
GET    /api/v1/products/pageable

Everything is fairly simple, predictable, and clear: what goes where and why.

Why Put This in Open Source

Honestly, because it seems like this might be useful not only to me.

Many projects share the same pain: the database already exists, there are many tables, the API was needed yesterday, and for some reason the team wants to work on things that actually bring value instead of manually writing repository.go number 148.

gofromdb lets you quickly generate Go code from an existing database and get a foundation for further development.

Not instead of the developer.

Instead of the part of the developer that already runs on autopilot while sadly staring at the monitor.

You run the generator and get the templates.
Then you can write business logic, normalize the architecture, add rules, tests, validation, authorization, and everything that actually depends on the project.

Instead of sitting there and manually proving to the computer that the orders table really does need a GetOrderByID method.

What Comes Next

I see several possible directions for the project:

add a smarter type-handling system
add generation of tests and mocks
rethink the overall project structure and naming rules once again
improve the templates
remove what is unnecessary
add what is necessary
start growing tomatoes

I am especially interested in feedback on the project structure. In Go, this is always a lively topic, because every developer knows the one true project structure, and each of them has a different one.

Final Words

The project is here:

https://github.com/hashmap-kz/gofromdb

It runs with a single command. It should work correctly. At least, that is how every optimistic README usually begins.

If it looks interesting, try it.

If it does not work, open an issue, and I will try to help.

If it does work, write as well. Sometimes an open-source author needs to know that their project was not only downloaded by a CI bot, but also launched by at least one living person with a pulse who did not run away to the forest.

Good coding!

Stop Shipping Breaking Go APIs by Accident

alexey.zh — Wed, 06 May 2026 16:51:35 +0000

Every Go release has one question that matters more than the diff itself:

Did we break something users compile against?

A pull request can look harmless. A few files changed, a type moved, a method now returns an error, a struct field disappeared. The commit history may look clean. The changelog may sound reasonable.

But users do not depend on your commit messages.

They depend on your public Go API.

That is the purpose of relimpact: a small, fast tool that compares two Git refs and reports what changed in the exported Go API.

Not every file.
Not every commit.
Not every line.

Only the public API surface that users can import, call, implement, or compile against.

Why another report?

A raw git diff is great when you want to inspect implementation details.

A changelog is great when you want to explain a release to humans.

But neither of them is the best tool for answering:

Which public Go symbols changed between these two refs?

That question needs a different view.

For example, this is what matters before a release:

- func Load(path string) *Config
+ func Load(path string) (*Config, error)

That is not just “one line changed”.

That is a breaking API change.

And this is useful, but not breaking:

+ func FromEnv(prefix string) (*Config, error)

That is a new public API.

relimpact separates those two ideas clearly:

Breaking changes: changed or removed public API.
New API: compatible additions.

The result is a report that is easier to review in a pull request and easier to attach to a release.

It is not a diff tool

relimpact is intentionally narrow.

It does not try to replace git diff.
It does not generate a raw changelog.
It does not summarize commits.
It does not care how many commits happened between two refs.

Instead, it snapshots the exported Go API at one ref, snapshots it again at another ref, and compares the API surface.

That means the report is based on API changes between refs, not on commit messages.

This distinction matters.

A messy commit history can still produce a clean API report.

A small commit can still produce a breaking public API change.

That is the whole point.

What the report looks like

A Markdown report is designed for pull request comments.

It starts with a small compatibility summary, then puts breaking changes first:

There is also an HTML report for CI artifacts and release review, but Markdown is usually the best format for pull requests.

Add it to a GitHub pull request

Here is a simple GitHub Actions workflow that runs relimpact, generates a Markdown report, and posts it as a sticky pull request comment.

name: API compatibility

on:
  pull_request:
    branches: [ master ]

permissions:
  contents: read
  pull-requests: write

jobs:
  relimpact:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: actions/setup-go@v5
        with:
          go-version: "1.25"

      - name: Install relimpact
        run: |
          go install github.com/hashmap-kz/relimpact@latest

      - name: Generate Markdown API report
        run: |
          relimpact \
            --old="${{ github.event.pull_request.base.sha }}" \
            --new="${{ github.event.pull_request.head.sha }}" \
            --format=markdown \
            --output api-report.md

      - name: Comment API report on PR
        uses: marocchino/sticky-pull-request-comment@v2
        with:
          header: relimpact-api-report
          recreate: true
          path: api-report.md

That is all.

Every pull request gets a public API compatibility report.

If nothing public has changed, the report stays quiet.

If a method signature changed, a field disappeared, or a new exported type appeared, reviewers see it without digging through implementation diffs.

HTML as a CI artifact

For release review, you may also want a browser-friendly report:

      - name: Generate HTML API report
        run: |
          relimpact \
            --old="${{ github.event.pull_request.base.sha }}" \
            --new="${{ github.event.pull_request.head.sha }}" \
            --format=html \
            --output api-report.html

      - name: Upload HTML API report
        uses: actions/upload-artifact@v4
        with:
          name: api-report
          path: api-report.html

The HTML report keeps the same structure as Markdown:

verdict
summary
breaking changes
new API

The difference is presentation: package navigation, cleaner grouping, and a better browser view.

Why this matters

Go makes public API changes feel deceptively simple.

Changing a return value is easy.

Removing a field is easy.

Renaming a method is easy.

But for users, those changes may mean failed builds, broken imports, or migration work.

relimpact makes that visible before the release.

It helps reviewers focus on the question that actually matters:

Are we changing the contract?

Easy to try

relimpact is a single binary.

No server.
No database.
No external service.
No AI.

It works from your Git repository and compares two refs:

relimpact --old=v1.0.0 --new=HEAD

Generate HTML:

relimpact --old=v1.0.0 --new=HEAD --format=html --output api-report.html

Install with Go:

go install github.com/hashmap-kz/relimpact@latest

The project is here:

https://github.com/hashmap-kz/relimpact

If the idea feels useful, starring the repository is always motivating. It helps show that small, focused Go tools still matter.

Finding Structurally Duplicate Go Functions with AST Hashing

alexey.zh — Sun, 03 May 2026 15:50:48 +0000

You know that feeling when you're reviewing a PR and you see a function that looks suspiciously familiar? Same structure, different variable names, slightly different literals. Someone copy-pasted and tweaked it, and now there are two places to update every time something changes.

godedup catches this automatically. It finds structurally duplicate functions in Go codebases - even when the copies have been superficially modified. This post is about the algorithms that make it work, and a few implementation details I found interesting to think through.

The Problem With Text-Based Approaches

The naive approach is text diffing. Take two functions, run them through a standard diff algorithm, see if they're similar enough. This immediately falls apart:

// Function A
func (r *UserRepo) findByID(ctx context.Context, id int64) (*User, error) {
    row := r.db.QueryRowContext(ctx, "SELECT * FROM users WHERE id = $1", id)
    var u User
    if err := row.Scan(&u.ID, &u.Name, &u.Email); err != nil {
        return nil, fmt.Errorf("findByID: %w", err)
    }
    return &u, nil
}

// Function B
func (r *OrderRepo) findByID(ctx context.Context, id int64) (*Order, error) {
    row := r.db.QueryRowContext(ctx, "SELECT * FROM orders WHERE id = $1", id)
    var o Order
    if err := row.Scan(&o.ID, &o.Status, &o.Amount); err != nil {
        return nil, fmt.Errorf("findByID: %w", err)
    }
    return &o, nil
}

Text diff says these are 60% different. A human says these are the same function. The structure - a query, a scan, an error return, a result return - is identical. Only the types and column names differ.

What you actually want is to compare shape, not text. That means going through the AST.

The Core Insight: Normalize, Then Hash

The key idea is that two functions are structurally equivalent if their AST subtrees are isomorphic - same shape, same node types, same operators, same control flow - after you normalize away the parts that don't matter.

What doesn't matter for structural comparison:

Variable and parameter names (userID vs orderID)
String literals ("users" vs "orders")
Numeric literals (1 vs 42)
Package qualifiers (fmt.Println vs log.Println)

What does matter:

Node types (IfStmt, ForStmt, ReturnStmt)
Operators (+ vs - are different)
Control flow structure (a loop containing an if is different from an if containing a loop)
nil, true, false - these have semantic meaning

The implementation is a recursive hash function over the AST. For each node type, it combines a stable ID for the node type with the hashes of its children:

func (h *Hasher) hashNode(node ast.Node) uint64 {
    switch n := node.(type) {
    case *ast.IfStmt:
        return combine(nodeID("IfStmt"),
            h.hashNode(n.Init),
            h.hashNode(n.Cond),
            h.hashNode(n.Body),
            h.hashNode(n.Else),
        )

    case *ast.Ident:
        // Normalize: all identifiers hash identically
        // EXCEPT nil/true/false which have semantic meaning
        switch n.Name {
        case "nil":   return nodeID("nil")
        case "true":  return nodeID("true")
        case "false": return nodeID("false")
        }
        return nodeID("Ident") // all other identifiers are equivalent

    case *ast.BasicLit:
        // Normalize: literals hash only by kind, not value
        // "users" and "orders" produce the same hash
        return combine(nodeID("BasicLit"), uint64(n.Kind))

    case *ast.SelectorExpr:
        // Normalize package qualifier: only the selected name matters
        // fmt.Println and log.Println hash identically
        return combine(nodeID("SelectorExpr"), nodeID(n.Sel.Name))
    // ...
    }
}

The nodeID function maps a string name to a stable uint64 using the first 8 bytes of its SHA-256 hash. The combine function mixes multiple values using FNV-style multiplication - fast, good avalanche, and order-dependent (so [A, B] and [B, A] produce different hashes):

func combine(vals ...uint64) uint64 {
    var h uint64 = 0xcbf29ce484222325 // FNV offset basis
    for _, v := range vals {
        h ^= v
        h *= 0x100000001b3 // FNV prime
    }
    return h
}

After hashing, both findByID functions above produce the same uint64. Detecting exact clones is then just grouping by hash - O(n) with a map.

Two Representations Per Function

Here's a design decision that enables both exact and near clone detection from a single parse pass.

Each function gets two hash representations stored in FuncInfo:

type FuncInfo struct {
    TopHash  uint64   // hash of the entire function body
    StmtSeq  []uint64 // per-statement hashes
    // ...
}

TopHash is the hash of the complete function - used for exact clone detection. If two functions have the same TopHash, they're structurally identical.

StmtSeq is a slice where each element is the hash of one top-level statement. This is what enables near-clone detection.

Computing both is trivial:

stmtSeq := make([]uint64, 0, len(fn.Body.List))
for _, stmt := range fn.Body.List {
    stmtSeq = append(stmtSeq, h.hashNode(stmt))
}
topHash := hashUint64Slice(stmtSeq)

TopHash is derived from StmtSeq - it's the hash of the sequence of statement hashes. So you get both representations for the cost of one AST traversal.

Near-Clone Detection: Edit Distance on Hash Sequences

Two functions are near-clones if one has a few extra or different statements compared to the other. The canonical algorithm for "how many insertions/deletions does it take to transform sequence A into sequence B" is Levenshtein edit distance.

The twist: instead of computing edit distance on characters or lines, we compute it on the StmtSeq - the sequence of statement hashes.

func editDistance(a, b []uint64) int {
    la, lb := len(a), len(b)
    dp := make([][]int, la+1)
    for i := range dp {
        dp[i] = make([]int, lb+1)
    }
    for i := 0; i <= la; i++ { dp[i][0] = i }
    for j := 0; j <= lb; j++ { dp[0][j] = j }

    for i := 1; i <= la; i++ {
        for j := 1; j <= lb; j++ {
            if a[i-1] == b[j-1] {
                dp[i][j] = dp[i-1][j-1] // statements are structurally identical
            } else {
                dp[i][j] = 1 + min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1])
            }
        }
    }
    return dp[la][lb]
}

Similarity is then normalized to [0.0, 1.0]:

func Similarity(a, b *FuncInfo) float64 {
    dist := editDistance(a.StmtSeq, b.StmtSeq)
    maxLen := max(len(a.StmtSeq), len(b.StmtSeq))
    return 1.0 - float64(dist)/float64(maxLen)
}

This means: if two functions share 8 out of 10 statements (by structure), they score 0.80. The default threshold is 0.85, so that pair would not be reported - you need at least 85% structural overlap.

The practical effect: adding a logging statement or an extra validation check doesn't make two otherwise-identical functions invisible to the detector. The near-clone detection catches exactly the "same function, one copy got an extra guard clause" pattern.

Grouping Near-Clones: Union-Find

If function A is 90% similar to B, and B is 88% similar to C, then A, B, and C probably all belong to the same clone group. Union-Find handles this transitivity correctly.

The pairwise comparison is O(n^2) - for every pair of candidate functions (those not already in an exact clone group), compute similarity and union them if above threshold:

for i := 0; i < len(candidates); i++ {
    for j := i + 1; j < len(candidates); j++ {
        a, b := candidates[i], candidates[j]

        // fast pre-filter: skip pairs with very different statement counts
        ratio := float64(a.NumStmts) / float64(b.NumStmts)
        if ratio < 0.7 || ratio > 1.43 {
            continue
        }

        sim := hash.Similarity(&a, &b)
        if sim >= cfg.MinSimilarity {
            union(a.Name, b.Name, sim)
        }
    }
}

The statement count pre-filter is worth noting: if function A has 5 statements and function B has 20, they can't possibly score above 0.75 similarity (at best 5 matching statements out of 20). The filter skips pairs where the ratio is outside [0.7, 1.43] before doing the O(n·m) dynamic programming - significant in practice.

A subtle bug to be aware of: when you later compute the minimum similarity across a group, the similarity map is keyed by [2]string{a.Name, b.Name} in insertion order. But when you iterate over the group (which comes from a map), order is random. So you need to try both orderings:

na, nb := group[i].Name, group[j].Name
if s, ok := similarity[[2]string{na, nb}]; ok && s < minSim {
    minSim = s
} else if s, ok := similarity[[2]string{nb, na}]; ok && s < minSim {
    minSim = s
}

Miss this and near-clone groups all report 100% similarity regardless of actual score.

The Two-Pass Architecture

Detection runs in two sequential passes:

Pass 1 - Exact clones, O(n):
Group all functions by TopHash. Any group with 2+ members is an exact clone group. Mark all members so they're excluded from Pass 2.

Pass 2 - Near clones, O(n^2):
Only compare functions not already in an exact clone group. This is both a correctness choice (exact clones would trivially satisfy the near-clone threshold and pollute groups) and a performance choice (exact clones are often numerous - validate() copied across 10 packages - and skipping them keeps the O(n^2) set small).

// Pass 1: O(n) exact detection
exactGroups := make(map[uint64][]hash.FuncInfo)
for _, f := range funcs {
    exactGroups[f.TopHash] = append(exactGroups[f.TopHash], f)
}

// Pass 2: O(n^2) near detection on remaining functions
var candidates []hash.FuncInfo
for _, f := range funcs {
    if !inExactClone[f.Name] {
        candidates = append(candidates, f)
    }
}

What It Finds in Practice

Running it on a real codebase immediately found things worth fixing:

GROUP  TYPE   SIM   FUNCTION                          LOCATION                              STMTS  LINES
-----------------------------------------------------------------------------------------------------
1      EXACT  100%  auth.(*UserStore).GetUserByID     internal/auth/user_store.go:102       7      24
1      EXACT  100%  auth.(*UserStore).GetUserByEmail  internal/auth/user_store.go:127       7      23
1      EXACT  100%  auth.(*UserStore).GetUserByName   internal/auth/user_store.go:151       7      22
------------------------------------------------------------------------------------------------------
2      EXACT  100%  http.disableClientCache           internal/http/router.go:45            3      5
2      EXACT  100%  branding.disableClientCache       internal/wiki/branding/routes.go:249  3      5

The first group is three database query functions - same structure, different WHERE clauses. They should be one generic function or at least share a helper. The second is a middleware function that got copy-pasted into two packages instead of being placed in a shared location.

Both are actionable. Neither would have been caught by a linter.

What's Next

A few directions worth exploring:

Baseline support - the biggest adoption blocker for existing codebases is that there are already dozens of clones accumulated over years. A --save-baseline / --diff-baseline flag would let teams adopt the tool without failing CI on pre-existing debt.

SARIF output - SARIF is the standard format for GitHub Code Scanning. One output flag and findings appear as inline PR annotations with file links. Roughly 50 lines of output code.

LSH for scale - the O(n^2) near-clone pass starts showing latency on codebases with 5000+ functions. Locality-Sensitive Hashing on the StmtSeq arrays would reduce it to near-O(n) by only comparing functions that land in the same hash bucket.

The tool is at github.com/hashmap-kz/godedup - go install github.com/hashmap-kz/godedup@latest and point it at any Go project. It runs in seconds and exits 0, so there's no friction in trying it.

If you hit false positives or miss cases you expected to catch, issues are open. The normalization rules are the most interesting part to tune.

Don’t Trust Backups You Haven’t Restored

alexey.zh — Sun, 26 Apr 2026 14:39:58 +0000

0. A Backup Is Not a File, but a Promise

You can write anything you want in the logs:

backup completed successfully
WAL uploaded successfully
retention completed successfully
archive is healthy

But on the day of a real disaster, PostgreSQL will not read your beautiful logs.
It will simply ask for the next WAL file.

And if you cannot provide it, the whole story ends right there.

This article is not about the internal implementation of a WAL receiver.
There was already a separate long story about that.

This article is about how I see the future development of the tool.

The real question is this:

Can I restore PostgreSQL from what my tool has been so confidently saving?

1. Backups Are a Comforting Lie

Backups are one of the most comforting illusions in infrastructure.

The command completed successfully.
A file appeared in S3.
There is a green line in the logs.
Maybe somewhere on a dashboard, healthy is even glowing.

Everyone feels relaxed and confident.
The problem is that none of this proves that recovery is possible.

It only proves that some operation completed without an error.
Some bytes moved from one place to another.
Some process returned exit code 0.
Some object storage API said, “yes, I accepted the file.”

But the recovery process does not care about our bright feelings.

It cares about only one thing:

Give me the required WAL file.
Right now.
Under the correct name.
In the correct place.
And make sure it is not corrupted.

If the file exists, PostgreSQL continues restoring history.

If the file does not exist, the history ends.

At that exact moment, “backup completed successfully” turns from a pleasant phrase into a question:

What exactly did you successfully do, actually?

2. WAL Files Are Not the Goal

When you write a WAL receiver, it is easy to become emotionally attached to WAL files.

They stream beautifully.
They appear in a directory.
They have serious-looking names like:

00000001000000000000000A
00000001000000000000000B
00000001000000000000000C

They are uploaded to remote storage.
They can be compressed.
They can be encrypted.
They can be shown in a UI.
They can be counted, sorted, checked, deleted, and downloaded again.

At some point, it starts to feel like the project is about them.

But that is a trap.

WAL files are not the product.

The product is the restored database.

Nobody wakes up at 3 a.m. thinking:

“How wonderful that I have 438 beautiful WAL files sitting in S3.”

People think differently:

“Can I bring the database back before people start calling me?”

A WAL archive is not a museum of artifacts.
It does not exist to store pretty files with long names.

It exists so PostgreSQL can replay the history of changes, the transaction log.

In other words, it exists to restore the database state through the chain:

base-backup -> WAL -> WAL -> WAL -> target recovery point

Without a base backup, WAL files are useless.

Without WAL files, a base backup quickly becomes an outdated snapshot of the past.

Without restore_command, everything together turns into a collection of files that looks like a backup system, but has not yet proven that it can work as a backup system.

3. The Real Product Is Point-in-Time Recovery

PostgreSQL recovery is not “restoring one file.”

It is restoring history.

A base backup gives the database state at a specific moment.
WAL files after that moment provide the history of changes.
The recovery process applies that history up to the required point.

In simplified form:

[base-backup]
      |
      v
000000010000000000000001
      |
      v
000000010000000000000002
      |
      v
000000010000000000000003
      |
      v
[target recovery point]

The problem is that the smallest gap in this chain can break everything.

One missing WAL file, and PostgreSQL cannot continue replay.

It is like losing one page from a legal contract.
Except the contract is a production database, and the lawyer is PostgreSQL, which simply refuses to start any further.

So the question is not:

Do I have WAL files?

The real question is:

Do I have a continuous, recoverable WAL chain
starting from a known base backup?

This is where pgrwl stops being just a “WAL file receiver.”

It becomes part of the recovery chain.

4. `restore_command` Is the Final Boss

There is a moment in PostgreSQL recovery where all theory ends.

That moment is restore_command.

On paper, everything looks beautiful:

base-backup + WAL archive = point-in-time recovery

But during recovery, PostgreSQL does not say:

“Show me a beautiful dashboard.”

It does not say:

“Tell me how well the uploader worked last week.”

It does not ask:

“Did you have logs and metrics?”

It asks for a specific file.

Something like this:

I need WAL file 00000001000000000000000A.
Put it here.
Return success if it worked.

And that is all.

If restore_command can fetch this file, recovery continues.

If it cannot, the whole system stops being a recovery system and becomes a sad collection of partially useful data.

That is why restore_command is the final boss of a backup system.

This is where it becomes clear whether the archive is actually usable for recovery.

For the development of pgrwl, this is an important shift in thinking.

At first, it seems that the main thing is to receive WAL:

PostgreSQL -> replication protocol -> pgrwl -> local directory

Then it seems that the main thing is to upload WAL:

local directory -> compression/encryption -> S3/SFTP

And then comes the realization:

S3/SFTP -> restore_command -> PostgreSQL recovery

That is the moment of truth.

5. A Successful Upload Is Not Proof of Successful Future Recovery

Object storage makes people optimistic.

You uploaded a file.
The API returned a successful response.
The file appeared in the bucket.
Everything looks reliable.

But for a backup system, that is not enough.

upload successful proves only transport.

It does not prove:

that the file is complete
that the checksum matches
that encryption/decryption works
that compression/decompression works
that the file can be downloaded back
that restore_command will find it under the correct name
that retention will not delete it tomorrow
that the entire WAL chain is continuous

That is exactly why a backup tool must be unpleasantly suspicious.

It is not enough to ask:

Can I upload it?

You need to ask:

Can I upload it?
Can I list it?
Can I read it back?
Can I decrypt it?
Can PostgreSQL use it during recovery?

A backup uploaded to S3 but never downloaded back even once is a motivational poster, not a recovery strategy.

6. Storage Is Where Backup Tools Become Dangerous

Cleaning up files sounds tempting.

Delete old files.
Free up space.
Clean the archive.

What could possibly go wrong?

In backup systems, almost everything.

Deleting the wrong “old file” can turn the entire chain into a useless set of data.

Bad cleanup logic thinks like this:

Delete WAL files older than 7 days.

More correct cleanup logic should think like this:

Delete only those WAL files that are definitely not needed
by any backup and by any recovery scenario.

The difference is huge.

A WAL file may look old by timestamp, but still be required to recover from a specific base backup.

If you delete a WAL that the oldest backup needs, that backup turns into beautiful garbage.

Cleanup is not housekeeping.
It is part of the recovery contract.

A backup tool should delete files with the confidence of a nervous accountant, not a shell script with rm -rf.

7. Backup Tool Development Is Based on Negative Scenarios

The most useful way to think about a backup system is not to start with positive scenarios.

not with the receive command
not with a beautiful CLI
not with a dashboard

But with a disaster.

Imagine this:

the primary is unavailable
the local disk failed
the directory disappeared
a new PostgreSQL must be brought up
it must be recovered to the required point
people are waiting
coffee no longer helps

And now we ask the questions:

Where is the base backup?
Which WAL files are needed?
Are they in storage?
Can they be downloaded?
Can they be decrypted?
Can they be decompressed?
Does PostgreSQL know how to get them through restore_command?
Are there gaps in the chain?

When you design a system from this point, many features stop being “nice to have.”

A status API is not decoration.
It is a way to understand whether the receiver is alive.

WAL listing is not a toy for the UI.
It is a way to check the archive.

Backup metadata is not bureaucracy.
It is a recovery map.

Cleanup is not space saving.
It is a potentially dangerous operation.

Logging is not noise.
It is evidence.

A dashboard is not “for making things pretty.”
It is a way to quickly answer the question:

Are we okay, or do we just not yet know that we are already not okay?

8. What the Tool Should Know

At the beginning, it seems enough for the tool to know only a little:

where to read WAL from
where to write WAL
where to upload WAL

Then reality arrives with a long list of requirements.

9. What the Operator Should See

A backup system should not require archaeology.

If an operator needs to read 4,000 lines of logs to understand whether the archive is alive, the UX has already lost.

An infrastructure UI does not have to look like a spaceship.

It should quickly answer a few questions:

is the receiver alive?
is the WAL stream running?
what is the last received WAL?
what is the last uploaded WAL?
when was the last upload?
are there errors?
how many WAL files are stored locally?
how many WAL files are stored remotely?
which base backups exist?
which backup was the last successful one?
what will be required for recovery?

A green healthy label does not mean much by itself.

It is better to show evidence:

last received WAL: 00000001000000000000000A
last uploaded WAL: 000000010000000000000009
last upload: 42 seconds ago
slot: pgrwl_slot
mode: receive
storage: s3
errors: none

That is why the UI for pgrwl is not just about “making it pretty.”

It is an attempt to give the operator the state of the system without forcing them to read the tea leaves in logs.

10. Test Recovery Skeptically

The only honest test of a backup system is recovery.

Everything else is optimism.

A good test should be rough.

It should do unpleasant things:

create PostgreSQL
generate data
take a base backup
continue writing data
stream WAL
upload WAL to storage
stop everything
delete the data directory
restore the base backup into a new location
configure restore_command
replay WAL
verify that the data is in place

Even better if the tests can:

restart the receiver
interrupt uploads
switch modes
generate WAL under load
check for missing gaps
compare expected and restored data

A bad test:

command exited with 0

A good test:

new PostgreSQL instance started from restored backup
expected rows are present
target recovery point reached

If a backup test does not cause mild discomfort, it is probably a demo.

11. Things I Still Do Not Trust

There are parts of a backup system that I do not trust “just because.”

Not because they are necessarily broken.

But because they are too important to trust by default.

I do not trust retention until I understand why a file can be deleted.
I do not trust encryption until I have verified decryption.
I do not trust compression until I have verified decompression.
I do not trust object storage until I have read the data back.
I do not trust the restore procedure until I have started PostgreSQL from a restored backup.
I do not trust green badges on a dashboard if there are no concrete numbers behind them.

12. Make Recovery Boring

The goal of a backup system is not to make backups exciting.

The goal is to make recovery boring.

That means it should be:

predictable
documented
verified
observable
repeatable
without magic
without heroism
without “I think we also need to run this script here”

In an ideal world, recovery does not require a heroic engineer, three terminals, spiritual negotiations with object storage, and a random shell script from 2018.

It should be a procedure.

A boring procedure.

Because in infrastructure, boring is a compliment.

Thank you for reading!

WEB UI in Go? Nothing Can Stop Me!

alexey.zh — Sun, 26 Apr 2026 09:58:48 +0000

Introduction

Author's note: This article is not technical in nature.
Rather, it is a set of reflections and a fragmented stream of thoughts, something to read on the subway on the way to work.
It should not be taken too seriously.

Let's begin.

In the previous part
I briefly described my adventures while developing a WAL receiver.
Now it is time to continue.
Not because I need it, not because anyone else needs it.
Some things simply happen on their own, like you were just asleep a moment ago, and now you are already writing notes in the margins.

It all started with a timid internal question.

What if I attach the simplest possible web panel to the application? Sounds like an excellent idea.

Exactly until the moment you remember one small detail: you do not know how to write frontend at all. Not at all.

That is, at the level of ideas - yes, of course, now we will quickly make a neat web face, a couple of pages, a nice status view, tables, buttons. In practice, however, it looked roughly like this:
"Oh, cool idea."
Five minutes later:
"Wait... but the last time I touched frontend was who knows when."

But from a certain point, the idea became not just good, but truly tempting. Because if a task does not contain at least a drop of adventurism, self-deception, and light engineering recklessness, is it even really my project?

What Skill Loss Is

A couple of years ago, I already had a period when I studied frontend. And, to my surprise, I even liked it back then.
Specifically, Angular, the second one, the one on TypeScript.

Why Angular?
Because at that time, I was working with Java, and Angular seemed suspiciously similar in structure.
Understandable entities - services, components, dependency injection - all of this at least resembled not shamanism, but some kind of system. Yes, on top of it all, there was a generous sprinkling of HTML, CSS, and other frontend rituals, but in general, there was no feeling that you were picking at an ancient altar assembled from npm packages and developers' tears.

That is, back then, frontend did not look like hostile territory, but rather like a strange but tolerable neighbor.

But the years passed, and all that knowledge evaporated so thoroughly as if it had never existed.
Not "I kind of forgot it", not "I need to refresh it", but precisely disappeared. Evaporated. Erased. Drowned somewhere between work tasks, Go, infrastructure, Postgres, Kubernetes, and other things that do not require discussing which state manager is fashionable today.

When I tried to look toward modern frontend again, gloom caught up with me rather quickly. And not some noble gloom, but the ordinary,
everyday kind: when you look at yet another stack and realize that just to start, you first need to reread half the internet, then install a thousand dependencies, then understand why it does not build, and then also find out that all of it already became obsolete yesterday.

And at that moment, it became clear: no, I am not climbing back into this from scratch. Life is one, and node_modules is unfortunately much larger.

With that thought, I successfully put the idea aside for a year, threw it onto the shelf to gather dust together with other intentions sprinkled with laziness.

Sometimes the Solution to Forgotten Problems Happens by Itself

One day, while reading some random projects on GitHub, I stumbled upon htmx paired with Go. After digging further, I realized that this is a perfectly workable pattern.

And then that rare feeling happened, when, instead of irritation, there is almost childlike joy. Suddenly, it turned out that you can avoid arranging a second higher education for yourself in SPA magic and make everything much simpler.

In pure Go, with templates, with functions.
Without a giant frontend stack.
Without worshiping "reactivity".
Without the feeling that, for the sake of two status pages, you are obliged to build a small copy of the modern web industry.

I immediately felt like Eric Cartman -

[breathless] Mom-Mom! I've only just heard.
They're making Chinpokomon dolls, mom.
You can collect them all.
You can collect them all, Mother; quick, come on. Let's go to the toy store!

Forward, this needs to be done quickly!

Planning and Implementation

That is, in essence, you can build a completely normal single-page application - not in the sense of "a revolutionary interface of the future", but in the sense of a normal, honest, utilitarian status page. One that does not try to impress investors, but simply shows you what is going on in the system, and preferably without unnecessary complications.

After that, the idea suddenly stopped being nonsense and turned into a plan. A dangerous symptom, by the way.
This is exactly how the story of any side project usually begins:
"I'll just try a little."
And then you look - you already have architecture, API, UI, separate models, and for some reason you are seriously discussing the convenience of switching between receivers.

Then everything went according to the classics of handcrafted engineering creativity. I quickly sketched out
what it should roughly look like, wandered around websites, peeked at other people's templates, assembled a general style in my head, and began drawing a mockup.

The old-fashioned way - on paper.

Yes, literally on paper.
Not in Figma, not in Sketch, not in anything else fashionable and respectable. Just a sheet of paper, a pen, rectangles, labels, arrows. Like a person who is not fighting
for the title of "frontend developer of the year", but simply wants to understand where the WAL files will be and where the refresh button will be.

And, honestly, there is even a certain charm in this.
When you have no claim to "world-class product design",
it is surprisingly easy to focus on what is actually needed.

And what is needed, in fact, is quite simple.

First, such a panel really helps with debugging.
You do not have to climb into the container every time,
wander through directories, run yet another ls, find, cat, curl, and pretend that this is normal UX for a living human being.
You open the page, and there in front of your eyes are WAL files, backups, configuration, and overall status. Everything in one place.

Besides, an important part is that you can run a check for the existence of a backup, take its end LSN, select all WAL files, make sure that none of them are missing, and that recovery is possible. Next in the plans is to make something like a recovery verification as an option.

Not that this is merely an option; rather, it is the most important thing in the process. If a backup is not verified, then consider that you do not have a backup at all.
But as a rule, such checks are performed manually (I do it myself too, sometimes not too often because of laziness).

And with an interface and the possibility of automation, there will also be a desire to perform checks more often; besides, it is not just interesting, it adds peace of mind.
Beauty? Well, maybe not beauty, but not archaeology either.

Second, you can switch between receivers.
Because, of course, there may be several of them.

Initially, of course, I did not plan to do this.
At all. Not because the idea is bad, but because it seemed like one could live without it anyway. The console suits me 500%.
Plus, any website immediately causes natural suspicions: now there will be polling, background requests, endless updates, meaningless HTTP load, and in the end, you will attach a small source of indignation to yourself.

That is, first you make a panel "for convenience", and then you catch yourself realizing that the panel itself has become the thing that needs to be watched. Very engineering-like.
Tests for tests.

But then I thought: what if I do not turn it into a hysterical television with constant flickering?
What if I abandon endless polling loops and automatic observation altogether, and leave only manual refresh?

And suddenly everything started to look very reasonable.

Because in this form it is, in essence, the same curl request - only in human form. No watching-cycle, no background fuss, no unnecessary magic. Wanted to - opened it. Wanted to - refreshed it.
Looked at the current state, got the information you needed, and moved on.

That is, without SPA, without multilayeredness, without assembling a universe for the sake of a couple of tables.
Without the feeling that you accidentally subscribed to a second project instead of making a small improvement.

Just Go, templates, htmx, and a bit of adventurism.
And sometimes, oddly enough, that is quite enough.

Closer to the fifth version, the implementation began to resemble what I had drawn in my imagination. Of course, smart people would have done it more correctly, while putting in less effort, with a more reasonable structure, using the right tool.

But - sometimes a hammer, a stick, and swearing work wonders.

Conclusion

The dashboard started working, release 0.1.0 is ready to start using it.
From here on, it can be improved and polished.
Since this is a fully optional component that is not connected to the main application in any way (although the sources do live in the root of the project), it is possible to plan separate levels of development.

Repository: https://github.com/pgrwl/pgrwl

Thank you for reading to the end!

A Long Story about how I dug into the PostgreSQL source code to write my own WAL receiver, and what came out of it

alexey.zh — Sat, 18 Apr 2026 03:36:23 +0000

Some thoughts are unpredictable.

For example:
"I wonder how pg_receivewal works internally?"

From the outside, it sounds almost innocent. Really, what could possibly be wrong with that? Just ordinary engineering curiosity. I will take a quick look,
understand the general structure, satisfy my curiosity, and then go on living peacefully.

But then, for some reason, this happens:
you are already building PostgreSQL from source, digging into receivelog.c, comparing the behavior of your little creation with the original step by
step, arguing with fsync, looking at .partial files like old friends, and suddenly discovering that you are writing
your own WAL receiver.

In short, everything started quite normally and with absolutely no signs of anything serious.

Why PostgreSQL in the First Place

I have been using PostgreSQL as the main DBMS in almost all of my projects for a long time - both personal and work-related. And the longer you
work with it, the more clearly you understand: this is not just a "good database". This is a system designed by people with a very
serious engineering culture.

When you read notes, discussions, and articles from PostgreSQL developers, you quickly notice how deeply they think through
changes, trade-offs, new features, and behavior in complex scenarios. After such materials, I usually
had a mixed feeling:

admiration
respect
and a slight feeling that I had once again looked at work of a level unreachable for me

PostgreSQL gives you everything you need out of the box for backups and continuous WAL archiving. Including
pg_receivewal - the utility that eventually set everything in motion for me.

Why Exactly `pg_receivewal`

Because it is a very good utility. And good utilities are especially dangerous: they make you want to understand exactly how they
are built.

pg_receivewal continuously receives WAL segments, can work in synchronous and asynchronous replication modes, and in general
looks fairly straightforward. From a distance.

Up close, it turns out that there are quite a few subtle things there:

how the main loop starts
how connection drops are survived
how restart is performed
at what point .partial becomes a complete WAL file
how timeline switching is handled
where and when important fsync calls must happen
what to do so that it is reliable, not slow, and not embarrassing

So, as usual: a simple utility with a decent amount of engineering accuracy hidden around it.

A Few Words About Other Good Solutions I Looked at With Respect and Envy

Before writing something of my own, of course, I spent a lot of time looking at already existing solutions.

I use two of them at work for continuous archiving of the most critical and main databases.

`pgBackRest`

pgBackRest is, without exaggeration, an engineering tank. Everything in its source code is impressive:

logging
testing
architectural discipline
incremental and differential backups
support for large installations
attention to edge cases

And, of course, validation by the community and by time.

When you read the code of this tool, you catch yourself thinking: yes, this is what a product
written by people who know what they are doing looks like.
And then you open your own repository and immediately become humble.

`Barman`

I like Barman for a different reason.
It does not try to magically solve everything in the world.
It is, essentially, a very understandable orchestrator around standard PostgreSQL tools: pg_receivewal and pg_basebackup.

It has a quality that I value a lot: a simple and reliable model.
Not "everything at once", but careful automation around already existing, proven tools.

This also strongly influenced how I started thinking about my own tool.

Why Go, If I Had to Look at So Much C

I decided to write my tool in Go.

The reasons are fairly ordinary:

recently, I have really enjoyed writing in this concise language
simplicity and a UNIX background
it is convenient for writing network and system-level things
concurrency is handled well in it
it fits cloud-native scenarios very naturally
and, importantly, it is still a little harder to accidentally shoot yourself in the foot with a grenade launcher

But there is an important nuance: to understand PostgreSQL, I had to seriously dig into C code.

And here I want to separately say something I formulated for myself a long time ago:
C is, in my opinion, both the most difficult and the most brilliant language at the same time.

I have not spent as much time on any other language trying to understand its semantics.
Syntax is nothing - semantics are everything. Pointers alone are a simple concept, but
hide a whole chain of icebergs underneath. There was even a time when I was making a compiler for C, with a preprocessor,
assembler, and PE32 output (*.exe). I played with that for a long time; it was a very interesting experience and time spent happily.

The C language is so direct, so honest, and so close to the metal that it becomes scary. It feels like
it is very easy to make six sextillion mistakes in it just while opening a file and taking a breath. One pointer going the wrong way -
and that is it, hello, a new form of humiliation. Segmentation Fault becomes a kind of spell that must not be said out loud, lest you
summon it.

With all that said, I cannot say that I know C.
Honestly, I probably know about three percent of it. And even that only on a good day.

But even those three percent were extremely useful to me.
Without them, I would not have been able to read PostgreSQL properly: to separate real logic from my own delusions,
follow the control flow, and at least roughly understand why everything here is arranged this way and not another.

So formally I wrote the tool in Go, but in practice this project also became my way of touching C a little more deeply

and gaining even more respect for the people who have been writing such systems in it for years.

The Beginning: Compiling PostgreSQL, Debugging, and the First Signs of Recklessness

To understand the implementation details at all, I had to go into the PostgreSQL source code.

I had to learn how to:

build PostgreSQL from source
run it in debug mode
attach a debugger
watch how calls flow
understand what happens inside the replication loop
establish the relationship between components and functions

And here I got a surprise: all of this turned out to be less scary than I had imagined. PostgreSQL built, pg_receivewal
started, the debugger attached to the process, and this immediately gave me the dangerous confidence that "well,
now I will definitely figure this out quickly".

Of course, I did not figure it out.

The first thing I did was, like a true amateur, add the most aggressive tracing possible. I logged everything:

function entries
exits
variable values
branches
important calls
and sometimes, it seemed, the mere fact that the universe existed

At first, it seems very clever. Then you have gigantic logs, you no longer understand whether you are reading the system or whether it is slowly
breaking your mind, and the realization comes: many logs do not mean much understanding.

But at this stage, the overall picture started to emerge. I began to understand how entities are connected, where the WAL receiving
loop starts, how errors are survived, what happens to .partial, and at which moments decisions are made about completing a segment.
I discovered libraries, very well-written and years-polished file handling functions, and many more insanely cool things for
the piggy bank of my mind.

And at some point I could not resist: enough watching, time to write.

The First Prototype: "I Will Just Reproduce `pg_receivewal`"

I had a very naive idea: not to invent anything new, but simply to reproduce the behavior of
pg_receivewal as closely as possible.

In theory, it sounds wonderful.
In practice, it means that you voluntarily sign up for weeks of studying:

exactly how the streaming loop starts
how it reacts to connection drops with the database
what a correct restart should look like, from which file and from which offset inside it
when a .partial file can be considered complete
how timeline changes are handled
where you misunderstood something
and where you no longer understand anything at all, but continue out of stubbornness

My first more-or-less stable prototype appeared after a couple of weeks. And those were very fun weeks. At times I
felt like a researcher and a super-cool mega-hacker, at other times - like a person who crawled into an aircraft engine without a license to repair it
using someone else's notes.

But there is one thing I really want to point out: PostgreSQL code is surprisingly pleasant to read. Good comments, competent
decomposition, respect for the reader and colleagues. Even if you yourself understand about twenty percent, it is still clear that in front of you is very
strong engineering work.

When You Realize That Simply Receiving WAL Is Only the Beginning

When the prototype finally worked, the joy did not last long.

Because I already understood: receiving WAL is only half the job. And then the usual engineering carnival begins:

compression
encryption
uploading to S3
uploading to SFTP
cleaning up old files
monitoring
external scripts
cron
more scripts
and then scripts that fix the previous scripts

And I have never liked this universe of external glue. Because it almost always looks like it was written
at night under the threat of a production incident, and then everyone was afraid to touch it. And all of it smells bad and looks disgusting.

Scripts around WAL archiving are often fragile, non-obvious, poorly tested, and live on faith that "it somehow
works". And in critical things, I wanted exactly the opposite.

I wanted the main program itself to manage the archive:

to know what can already be compressed
to know what still cannot be deleted
to understand when a file can be sent to remote storage
and not to try to make such decisions through a layer of suspicious bash magic

So management components began to appear around the WAL receiver:

one receives the log
another archives and encrypts
a third sends files to S3 or SFTP
a fourth handles retention and automatic cleanup
a fifth collects metrics and monitors process state

And at that point, the project stopped being "just a utility". It started turning into a small system where coordination,
order, and the absence of internal fights between components mattered.

About Base Backup: I Did Not Want To, but Curiosity Won

Initially, I had no intention of implementing base backup at all.

The reason is simple: the replication protocol is single-threaded. For small databases, that is fine. For large ones - not so rosy anymore.
If a backup takes ten hours every ten hours, that is, to put it mildly, not always convenient.

Multi-threaded approaches usually require the tool to live next to the database itself. And I wanted exactly the opposite: to remotely
collect WAL and make backups from databases located anywhere - in the cloud, on virtual machines, in Kubernetes - and at the same time not
require sidecar containers or any special infrastructure changes from them.

But then the thing that happens to many technical projects happened:
I did not plan this functionality, and then it simply became interesting.

In the end, I did implement streaming base backup. It does not claim to be a universal solution for huge
installations, but for databases around 200 GiB it turned out to be quite practical. A couple of hours for a nightly job is already a reasonable
scenario.

So it turned out not to be a "superweapon", but an honest working tool in a clear niche.

Why I Did Not Go Deeper Into Incremental Backups

Of course, I also looked at incremental / differential backups.

But there you quickly understand an unpleasant thing: taking an incremental backup is not victory yet. You then have to
assemble it back correctly. And that means a completely different level of complexity begins:

either write your own analogue of pg_combinebackup
or very carefully depend on an external tool
or drown in the number of edge cases and incompatibilities

At that point I honestly looked at the task and decided that I already had enough problems without it.

pgBackRest does such things in a truly well-thought-out way. But reproducing that level is not "built over a couple of
weekends on enthusiasm". It is large, heavy engineering work for years. So I consciously stopped at a simpler
model: reliable base backup for small and medium production environments.

Without claims to world domination. Just a working, predictable thing.

Architecture: The Moment When You Are No Longer Writing a Utility but Coordinating Chaos

As soon as you have several background processes, it immediately becomes clear that the main difficulty is no longer WAL as
such, but making sure this whole household does not fight with itself.

You need to be able to:

not start a backup if another one has not finished yet
not start archiving if it is already running
not delete something that may still be needed
handle errors correctly
carefully stop background processes
keep the system in a predictable state

Here I had to seriously think about patterns:

job queue
worker pool
supervisor
pipes
task lifecycle management
safe shutdown
goroutine coordination

At some point I realized that I was no longer "writing a WAL receiver". I was assembling a gearbox. And if even one gear
shifts a little, all of this will either start screaming or silently break. And silently breaking software is the worst kind of software.
At the same time, the main task was to make sure the main WAL receiving process was not affected by "noisy neighbors".

Streaming Large Files: Another Source of Creativity

There is another pleasant task as well: transferring large backup files to remote storage.

When a database weighs, for example, 300 GiB, you quickly understand:

you do not want to save everything locally, and often it is not convenient
you cannot pull it all into memory
you also do not want to write a crooked intermediate scheme, because you will have to maintain it yourself later

So you need a proper streaming pipeline: read the data, transform it on the way, and immediately send it further - without
intermediate garbage, without extra storage, without special effects.

Here Go was useful again. It has good primitives for streaming processing. Although the presence of primitives, of course, does not
stop you from making design mistakes for a very long time.

`fsync`: The Most Subtle Part and My Own Little Nervous Breakdown

If I had to choose what drained the most blood from me, the winner is obvious: fsync.

This is the place where you first think: "well, this part is simple". And then you discover that you have been staring at
the receivelog.c source code for several hours with the expression of a person who has voluntarily entered a very strange stage of life.

The problem here is that it is easy to be wrong in both directions:

call fsync too often - everything slows down
call it too rarely - later you may look at the result very sadly

So it is either slow or shameful. Quite a rich choice, to put it mildly.

I had to literally compare the behavior of my implementation with pg_receivewal step by step:

where exactly synchronization happens
at what moment
why exactly there
which scenarios must force fsync
and how to do neither too much nor too little

In the end, the key points turned out to be:

fsync after finishing writing a segment
fsync when renaming .partial to the final WAL file
fsync on keepalive if the server requests a reply
fsync on errors in the receiving loop

Then the truly fun part began: integration checks. I ran two receivers simultaneously (pg_receivewal, pgrwl), generated
WAL, compared timings, then compared the resulting files byte by byte, measured timing differences in milliseconds, and tried to remove
everything unnecessary.

I even got to logging: in places like this, you begin to understand that it can be either a helper or a quiet
saboteur. For example, you do not need to parse attributes if the logging level does not require it; extra CPU cycles
can be spent on more useful things.

In the end, I managed to achieve very similar behavior and complete matching of the resulting WAL files over the same interval. And
the small timing difference remained only where it is normal: two daemons cannot be started in the exact same
physical microsecond, no matter how hard you try.

In the fight against slowness, I even quickly wrote a small utility that injects
a defer into EVERY function, where the runtime of that function is measured. Not the best check,
but, as practice showed, it helps quickly identify especially hot functions, and then point
the profiler, debugger, and so on at them. My tracing looks something like this:

FUNCTION                            CALLS  TOTAL_NS     TOTAL_SEC
--------                            -----  --------     ---------
storecrypt.Put                      70     23061361400  23.06
receivesuperv.uploadOneFile         35     11606918000  11.61
fsync.Fsync                         106    8813968000   8.81
xlog.processOneMsg                  4481   6818721600   6.82
xlog.processXLogDataMsg             4481   6814495400   6.81
xlog.CloseWalFile                   35     6561511500   6.56
xlog.closeAndRename                 35     6559979000   6.56
fsync.FsyncFname                    70     6525596900   6.53

.....500 more lines

Metrics: Because I Wanted to See Whether It Was Still Alive or Already Dead

Over time, I also added metrics:

number of files
archive size
number of errors
transferred bytes
state of background tasks
deleted files
general runtime statistics

I even made a Grafana dashboard. Not the most beautiful one in the world, but useful enough to quickly understand: everything is still
alive or it is already time to get nervous.

It was important to me to make metrics free if they are disabled. So wherever possible, I used the
noop approach: if observability is not needed, the system should not pay for it.

Logging: Where I Also Realized I Still Have a Long Way to Go

Logging had its own coming-of-age story.

At first, I logged everything. Because, as everyone knows, any person who has deeply entered a complex system for the first time
starts with the phrase: "I will just add more logs and understand everything".

No.

Many logs are not understanding. They are just many logs.

Good logging is when, at the moment of a problem, logs really help you understand what is going on, and do not turn into
an additional source of noise and despair.

I have not yet managed to make this part as good as I would like. The current result is normal, but
not exemplary. And in this sense, pgBackRest still remains for me an example of a very smart and thoughtful approach: you can see
how much discipline and engineering care went specifically into diagnostics.

Integration Tests: The Hardest and Most Important Part

One of the most difficult and at the same time most necessary parts of the whole project is integration testing.

Because a daemon that depends on another daemon is already not the easiest object to test. And if you
also want to:

start PostgreSQL
generate WAL
stop processes
make a backup
restore the database
compare the state before and after
run failure scenarios
check compatibility and correctness

then life starts playing in especially bright colors

I settled on this approach: simple shell scripts that start the test environment in a container,
populate the database, perform actions, then restore everything and check the result.
I also really did not want to drag a ton of dependencies like testcontainers into the project.

In the end, it turned out like this:

shell scripts
docker compose
matrix in GitHub Actions
isolated scenarios
without unnecessary heavy magic where understandable mechanics are enough

That is how I got tests for:

comparison with pg_receivewal
backup/restore
uploading to S3 and SFTP
correctness of WAL files
stopping and restarting
different failure scenarios

And honestly, integration tests are what give me the main confidence in releases. Not one hundred percent, of course. One hundred
percent in such things is promised either by madmen or by marketers. But good, engineering-honest confidence - yes.

Unit tests, of course, also exist. But for me, integration checks are the main criterion
that all of this is not only nicely written (not nicely everywhere), but actually works.

What Came Out of It

Over time, from the fairly harmless desire to "just see how pg_receivewal works", a tool grew that now has:

streaming WAL receiver
archiving
compression
encryption (streaming AES-256-GCM)
uploading to S3 (streaming, +multipart)
uploading to SFTP
retention and automatic cleanup
metrics
logging (mostly zero-cost)
base backup
configuration through a file and environment variables
controlled shutdown
unit and integration tests
behavior comparison with pg_receivewal
documentation with diagrams and examples
as many usage examples as possible (standalone/docker-compose/k8s)
helm-chart (quite simple and working)
website (in progress, but at least now it is clear how this is done and that it is possible)
a set of patterns and libraries for further reuse in Go projects

So, as usually happens, the project long ago stopped being what it seemed to be at the beginning.

What Is Planned

improve metrics, remove what is unnecessary, add what is needed, build a truly useful and beautiful dashboard
improve logging quality, make it consistent, think through levels more carefully, preserve zero-cost semantics
add new capabilities for base backup - around fine-tuning retention periods
a huge amount of space for refactoring and documentation
add even more integration tests, I am planning a V2 version
add every "breaking" scenario to the tests that my imagination can produce
make the website properly, right now it is just a copy of the documentation
create a user guide (because it is simply interesting)
and much more

What I Took Away From This

Perhaps the main result is not that I wrote yet another tool.

The main result is something else:

I understood PostgreSQL much more deeply
I gained even more respect for C, although I know about a miserable three percent of it
I saw how difficult it is to reproduce even a small part of the behavior of a well-made system utility
and once again I became convinced that high-quality code written by others is the best way to quickly cure yourself of excessive self-confidence

Because one thing is to look at architecture from the outside and admire it.
And it is a completely different thing to try to reproduce at least part of that logic yourself and not fall apart along the way.

And yes. If it ever seems to you that the thought

"maybe I should also write some utility for PostgreSQL?"
sounds like a good idea for a couple of quiet weekends -

I have two pieces of news for you.

The first: the idea really is interesting.
The second: you most likely will not have quiet weekends anymore.

Links

Repository: https://github.com/pgrwl/pgrwl

Thanks for reading!

SQL-First PostgreSQL Migrations Without the Magic

alexey.zh — Sun, 12 Apr 2026 14:29:11 +0000

If you work with PostgreSQL long enough, you start noticing a pattern: migration tools often become more complicated than the schema changes they are supposed to manage.

Some tools invent their own DSL.
Some hide behavior in config files.
Some couple migrations to an ORM.
Some force a directory layout that looks neat in a demo but awkward in a real project.

And then there is the simpler question:

Why can’t PostgreSQL migrations just stay plain SQL?

That is the idea behind gopgmigrate.

It is a SQL-first migration tool for PostgreSQL that keeps the core workflow boring in the best possible way:

write normal .sql files
organize them however you want
run them in order
track what was applied
support rollbacks
support repeatable migrations
make non-transactional migrations explicit

No YAML. No hidden DSL. No ORM lock-in. No magic comments.

Just SQL files and a clear naming convention.

Why this approach matters

A migration file should be easy to:

read in a code review
open in your editor
run directly with psql
troubleshoot at 2 AM
keep using even if you stop using the tool

That last point matters more than many teams realize.

A good migration format should outlive the tool that executes it. Your schema history is long-term infrastructure. It should not depend on a framework-specific abstraction that becomes painful to migrate away from later.

With gopgmigrate, the migration files remain usable as ordinary SQL. The tool adds safety and structure on top, but it does not take ownership of your database change process.

What gopgmigrate does

At a high level, the workflow is simple:

Scan a directory tree recursively for SQL migration files
Sort them globally by revision
Compare them with the migration history stored in PostgreSQL
Apply only what is pending
Record hashes and metadata for auditability
Support rolling back the last applied migrations
Re-run repeatable scripts only when their content changes

That gives you a clean PostgreSQL migration workflow with a small mental model.

The naming convention is the API

One of the nicest design choices in gopgmigrate is that the file name itself declares the migration behavior.

Example:

0000001-create-users-table.up.sql
0000001-create-users-table.down.sql
0000003-fn-get-users.r.up.sql
0000004-vacuum-users.notx.up.sql
0000005-refresh-stats.rnotx.up.sql

This is refreshingly explicit.

Versioned migrations

These run once in order:

0000002-add-roles-table.up.sql

Rollbacks

Rollback files are separate and predictable:

0000002-add-roles-table.down.sql

Repeatable migrations

Useful for functions, views, triggers, or other SQL objects you may want to refresh when the file changes:

0000003-fn-get-users.r.up.sql

Non-transactional migrations

Some PostgreSQL operations cannot run inside a transaction, for example:

VACUUM
CREATE INDEX CONCURRENTLY
DROP INDEX CONCURRENTLY
some forms of REINDEX
ALTER SYSTEM

Those are made explicit in the file name:

0000004-vacuum-users.notx.up.sql

And if a migration is both repeatable and non-transactional:

0000005-refresh-stats.rnotx.up.sql

This is a small detail, but it solves a real operational problem: the migration behavior is visible before you open the file.

Real projects are not flat folders

A lot of migration tools quietly assume every team wants the same directory structure.

Reality is messier.

Some teams want to split:

schema
data
functions
maintenance
environment-specific files
release-based groups

That is why I like that gopgmigrate does not force a rigid directory layout.

You can organize migrations by concern:

migrations/
  schema/
  data/
  functions/
  no-transaction/
  down/

Or by release:

migrations/
  v1.0.0/
  v1.1.0/
  down/

Or however your team naturally thinks about database changes.

The only rule is that version ordering remains global.

That is a practical compromise: freedom in layout, predictability in execution.

Why SQL-first migrations are still the best default

There is a reason SQL-first tools keep appealing to engineers who work close to PostgreSQL.

PostgreSQL already has a powerful language for schema and data changes. It is called SQL.

When a tool stays out of the way, you get a few concrete advantages:

Better reviewability

A migration diff is just SQL. Reviewers do not have to mentally decode a framework abstraction.

Better portability

You can run the file with psql, a database IDE, automation scripts, or CI jobs.

Better debugging

When something fails, you are looking at the actual statement PostgreSQL rejected.

Better longevity

Your migration history remains useful years later, even if your application stack changes.

That makes SQL-first migration tooling especially attractive for:

platform teams
backend teams with multiple services
teams that avoid ORM-heavy workflows
projects with long-lived PostgreSQL databases
teams that want plain operational ownership

Safety features that matter in practice

Simple does not mean naive.

For a migration tool to be usable in production, it needs a few guardrails. gopgmigrate includes some of the right ones:

Advisory locking

This helps prevent concurrent migration runs from stepping on each other.

Transactional safety by default

Most PostgreSQL DDL can run inside a transaction, and that is the safe default.

Explicit non-transactional mode

Instead of hiding exceptions, the tool makes them obvious in the filename.

Hash-based change detection

This is particularly useful for repeatable migrations. If the content changes, the tool knows it should re-apply the script.

History tracking

Applied migrations are recorded in a history table, along with metadata such as hash and timing-related details.

That is the kind of boring reliability you want from migration tooling.

Example CLI workflow

The CLI is intentionally straightforward.

Apply pending migrations:

gopgmigrate migrate \
  --dirname ./migrations \
  --connstr postgres://user:pass@localhost:5432/mydb

Preview without applying:

gopgmigrate migrate \
  --dirname ./migrations \
  --connstr postgres://user:pass@localhost:5432/mydb \
  --dry-run

Rollback the last migration count:

gopgmigrate rollback-count 2 \
  --dirname ./migrations \
  --connstr postgres://user:pass@localhost:5432/mydb

Use environment variables in CI:

export PGMIGRATE_DIRNAME=./migrations
export PGMIGRATE_CONNSTR=postgres://user:pass@localhost:5432/mydb

gopgmigrate migrate

That is the kind of interface that works well in local development, CI pipelines, containerized jobs, and release automation.

Where this fits especially well

I think gopgmigrate is especially appealing in a few scenarios.

1. PostgreSQL-first teams

If your team understands PostgreSQL and prefers direct SQL over framework migration layers, this fits naturally.

2. Teams with mixed migration types

Schema changes, data fixes, repeatable view/function refreshes, and non-transactional maintenance are all first-class cases here.

3. Repos with real structure

If your migration directory stopped being a cute flat demo folder a long time ago, recursive scanning and flexible layouts are genuinely useful.

4. CI/CD and automation

The CLI is simple enough to drop into pipelines without teaching your delivery system a new configuration language.

5. Engineers who dislike lock-in

Your migration files stay plain SQL. That is a strong long-term property.

What I like most about this design

The best tools often win not because they do more, but because they make fewer damaging decisions for you.

gopgmigrate seems built around a healthy principle:

the tool should manage execution, not redefine how SQL migrations ought to exist.

That means:

your files remain readable
your shell workflows still work
your database knowledge stays relevant
your migration history does not become framework glue

In database tooling, that is a strong design choice.

Final thoughts

There are plenty of PostgreSQL migration tools out there. Many are good. But a lot of them drift toward abstraction for its own sake.

If what you want is:

PostgreSQL migrations
plain SQL files
explicit rollbacks
repeatable migrations
non-transaction support
advisory locking
transactional safety
hash-based change detection
flexible directory layouts
clean CLI usage
minimal ceremony

then gopgmigrate is worth a look.

It takes a very practical path: keep migrations human-readable, keep behavior explicit, and keep the tool small enough that you can trust what it is doing.

That is a solid direction for database change management.

If you find gopgmigrate useful, consider giving the repo a star on GitHub. It helps more people discover the project.

Repository: https://github.com/hashmap-kz/gopgmigrate

Finding Hidden Bottlenecks in Go Apps: A Lazy, Hacky, and Bruteforce Method

alexey.zh — Thu, 02 Apr 2026 07:23:37 +0000

When developing pgrwl - a PostgreSQL WAL receiver - performance is a critical concern.

Every part of the program must be predictable. There should be no hidden bottlenecks.

But what about:

typos that silently degrade performance?
missing tests that fail to catch inefficiencies?
slow logic introduced "just for now"?
accidental O(n^2) behavior?

These issues are often hard to detect.

The Problem

For instance, you may concatenate a huge template in a loop, but that may be done once outside of the loop. And this will work fine, until the heavy load is reveal that.

Of course you should profile CPU/RAM, and there are a lot of great tools, but sometimes it's not enough.

The Bruteforce Idea

A lazy decision for measuring the whole picture is to to trace each function execution time, and the total number that function being called.

Yeah, that cannot help you to inspect all the loops, nasty conditions, memory leaks, etc... But may help a LOT to find as fast as possible really heavily loaded functions, and start profiling them deeply with more advanced profiling tools.

The Solution

I wrote a KISS library called gotrackfunc that injects timing into each single function in the whole project at one CLI command.

How it works

gotrackfunc injects timing code into all functions
Run your program
Apply load
Stop execution
Analyze the report

It's rough and primitive, but it works!!!

You MUST have a version control system of course, so you can drop
all these changes.

Usage And Example Output

# execute in directory of your project
gotrackfunc ./...

# run your app (a gotrackfunc.log will produced)
go run main.go

# make a report (turn gotrackfunc.log into readable form)
gotrackfunc summarize

In this example I've found that my Put() function is the slowest
part of the whole things. So I can inspect it, refactor, optimize,
write more unit-tests, write integration-tests and measure again.

FUNCTION                            CALLS  TOTAL_NS     TOTAL_SEC
--------                            -----  --------     ---------
storecrypt.Put                      70     23061361400  23.06
receivesuperv.uploadOneFile         35     11606918000  11.61
fsync.Fsync                         106    8813968000   8.81
xlog.processOneMsg                  4481   6818721600   6.82
xlog.processXLogDataMsg             4481   6814495400   6.81
xlog.CloseWalFile                   35     6561511500   6.56
xlog.closeAndRename                 35     6559979000   6.56
fsync.FsyncFname                    70     6525596900   6.53
receivesuperv.performUploads        2      3036884600   3.04
receivesuperv.uploadFiles           1      3034237400   3.03
fsync.FsyncFnameAndDir              35     435023800    0.44
xlog.WriteAtWalFile                 4481   208340500    0.21
cmd.mustInitPgrw                    1      201671300    0.20
xlog.NewPgReceiver                  1      201671300    0.20
xlog.SyncWalFile                    1      48771200     0.05
codec.Flush                         35     42261100     0.04
xlog.OpenWalFile                    36     16938600     0.02
xlog.createFileAndTruncate          36     11396900     0.01
storecrypt.ListInfo                 4      9813800      0.01
receivesuperv.performRetention      2      4906900      0.00
conv.ToUint64                       4482   2664000      0.00
receivesuperv.filterFilesToUpload   2      2647200      0.00
fsx.FileExists                      35     2628000      0.00
xlog.XLogSegmentOffset              8963   2083300      0.00
conv.Uint64ToInt64                  4517   1656300      0.00
cmd.loadConfig                      1      1563600      0.00
config.MustLoad                     1      1563600      0.00
config.mustLoadCfg                  1      1563600      0.00
pipe.CompressAndEncryptOptional     35     1278200      0.00
receivemetrics.AddWALBytesReceived  4481   1121600      0.00
codec.Close                         35     1001200      0.00
xlog.sendFeedback                   3      690700       0.00
xlog.findStreamingStart             1      608600       0.00
receivemode.Init                    1      608600       0.00
storecrypt.NewLocal                 1      522500       0.00
xlog.GetSlotInformation             2      522500       0.00
shared.SetupStorage                 1      522500       0.00
xlog.parseReadReplicationSlot       2      522500       0.00
cmd.mustInitStorageIfRequired       1      522500       0.00
codec.NewWriter                     35     504900       0.00
storecrypt.fullPath                 37     504000       0.00
xlog.XLogFileName                   36     42900        0.00
shared.InitOptionalHandlers         1      0            0.00
config.IsLocalStor                  3      0            0.00
jobq.Start                          1      0            0.00
receivemetrics.IncJobsExecuted      4      0            0.00
receivemetrics.IncWALFilesReceived  35     0            0.00
xlog.IsPartialXLogFileName          2      0            0.00
receivemetrics.ObserveJobDuration   4      0            0.00
xlog.ScanWalSegSize                 1      0            0.00
cmd.needSupervisorLoop              1      0            0.00
shared.getWriteExt                  1      0            0.00
storecrypt.transformsFromName       35     0            0.00
config.checkBackupConfig            1      0            0.00
receivemode.NewReceiveModeService   1      0            0.00
conv.ParseUint32                    2      0            0.00
storecrypt.isSupportedWriteExt      1      0            0.00
receivemode.NewReceiveController    1      0            0.00
receivemetrics.IncJobsSubmitted     4      0            0.00
config.checkMode                    1      0            0.00
storecrypt.encodePath               35     0            0.00
config.expandEnvsWithPrefix         1      0            0.00
config.checkLogConfig               1      0            0.00
xlog.IsPowerOf2                     1      0            0.00
aesgcm.NewChunkedGCMCrypter         1      0            0.00
shared.NewHTTPSrv                   1      0            0.00
xlog.XLogSegmentsPerXLogId          72     0            0.00
xlog.IsValidWalSegSize              1      0            0.00
config.checkMainConfig              1      0            0.00
config.checkStorageConfig           1      0            0.00
logger.Init                         1      0            0.00
receivesuperv.NewArchiveSupervisor  1      0            0.00
middleware.Middleware               6      0            0.00
xlog.existsTimeLineHistoryFile      1      0            0.00
xlog.IsXLogFileName                 2      0            0.00
config.IsExternalStor               1      0            0.00
receivesuperv.log                   83     0            0.00
cmd.App                             1      0            0.00
xlog.parseShowParameter             2      0            0.00
config.expand                       1      0            0.00
jobq.log                            8      0            0.00
xlog.updateLastFlushPosition        37     0            0.00
receivemetrics.IncWALFilesUploaded  35     0            0.00
strx.HeredocTrim                    1      0            0.00
cmd.checkPgEnvsAreSet               1      0            0.00
storecrypt.NewVariadicStorage       1      0            0.00
middleware.Chain                    1      0            0.00
xlog.SetStream                      1      0            0.00
jobq.Submit                         4      0            0.00
config.String                       1      0            0.00
config.validate                     1      0            0.00
jobq.NewJobQueue                    1      0            0.00
config.checkReceiverConfig          1      0            0.00
xlog.calculateCopyStreamSleepTime   3      0            0.00
middleware.SafeHandlerMiddleware    3      0            0.00
xlog.NewStream                      1      0            0.00
conv.Uint32ToInt32                  1      0            0.00
xlog.XLByteToSeg                    36     0            0.00
cmd.initMetrics                     1      0            0.00
xlog.GetShowParameter               2      0            0.00
shared.log                          1      0            0.00
xlog.log                            107    0            0.00
config.Cfg                          3      0            0.00
receivesuperv.filterOlderThan       2      0            0.00
storecrypt.decodePath               70     0            0.00
storecrypt.supportedExts            70     0            0.00
xlog.CurrentOpenWALFileName         37     0            0.00
config.checkStorageModifiersConfig  1      0            0.00
xlog.GetStartupInfo                 1      0            0.00

Conclusion

This approach is rough, primitive - and surprisingly effective.

Sometimes, brute force wins.

Backup Is Not Enough: A PostgreSQL Recovery Story

alexey.zh — Tue, 31 Mar 2026 13:57:06 +0000

This experiment is designed to test and validate the pgrwl tool in
real conditions: https://github.com/pgrwl/pgrwl

Instead of synthetic examples, we simulate a real-world failure and
verify that recovery actually works end-to-end.

Let’s do something slightly uncomfortable.

We're going to simulate a database crash and recovery.

Think disk failure. Whole server gone.

And then bring it back - byte for byte - as if nothing happened.

Not "some" data.
Not "close enough".

Everything.

The Myth of "Backups"

Most people think:

"I have a backup, so I'm safe."

That's... half true.

A base backup is just a snapshot - a frozen picture of your
database at one moment.

But databases don't sit still.

Every insert, update, delete - all of that happens after your
backup.

So where does that data live?

-> In WAL (Write-Ahead Log)

The Real Rule

If you remember one thing from this post, let it be this:

Recovery = Base Backup + WAL

Without WAL:

your backup is outdated
your data is incomplete
your recovery is a lie

The Experiment

Warning: this is not intended to be run on production environment

Note: For simplicity, both the database and the backup tool are running on the same machine. In production, you should never store backups on the same host where the database is running.

We'll prove this using a bunch of simple shell commands.

Note: env-vars are omitted for simplicity.

A full working script will be attached at the end of the article.

Step 1 --- Build a Database From Nothing

log "Initializing PostgreSQL cluster..."
initdb -D "$PGDATA" -A trust \
  --auth-local=trust \
  --auth-host=trust >/dev/null

log "Starting PostgreSQL..."
pg_ctl -D "$PGDATA" -l "$WORKDIR/pg.log" start >/dev/null
wait_for_postgres

log "Creating physical replication slot: $REPL_SLOT"
psql -d postgres -v ON_ERROR_STOP=1 \
  -c "select pg_create_physical_replication_slot('$REPL_SLOT');" >/dev/null

log "Creating test database: $DBNAME"
createdb "$DBNAME"

We start from zero.

Step 2 --- Start Capturing WAL

log "Writing pgrwl configuration..."
cat >"$PGRWL_CONFIG" <<EOF
{
  "main": {
    "listen_port": 7070,
    "directory": "$WAL_ARCHIVE_DIR"
  },
  "receiver": {
    "slot": "$REPL_SLOT",
    "no_loop": true
  },
  "log": {
    "level": "debug",
    "format": "text",
    "add_source": false
  }
}
EOF

log "Starting pgrwl receiver..."
pgrwl daemon -m receive -c "$PGRWL_CONFIG" >"$WORKDIR/pgrwl-receive.log" 2>&1 &

Starting pgrwl in a receive mode

Step 3 --- Take a Base Backup

log "Creating base backup..."
pgrwl backup -c "$PGRWL_CONFIG"

This is your snapshot in time.

Backup created by using PostgreSQL replication protocol (i.e. additional tools are not required).

Step 4 --- Populate DB

log "Initializing pgbench data (scale=10 ~ about 1 million rows in pgbench_accounts)..."
pgbench -i -s 10 "$DBNAME"

log "Running pgbench workload..."
pgbench -c 4 -j 2 -t 200 "$DBNAME"

All this data exists ONLY in WAL.

Step 5 --- Save the Truth

log "Dumping cluster state before destruction..."
pg_dumpall --quote-all-identifiers --restrict-key=0 >"$WORKDIR/before.sql"

This dump becomes our ground truth.
After restore + WAL replay, we expect the cluster to match this state.

Step 6 --- Delete Everything

log "Stopping PostgreSQL and pgrwl receiver..."
stop_postgres
stop_pgrwl_receive

log "Removing original PGDATA to simulate data loss..."
rm -rf "$PGDATA"

Database gone.

No tables.
No data.
No second chances.

Only backup and WAL remain.

Step 7 --- Restore the Base Backup

log "Restoring PGDATA from base backup..."
pgrwl restore --dest="$PGDATA" -c "$PGRWL_CONFIG"

chmod 0750 "$PGDATA"
chown -R postgres:postgres "$PGDATA"

# recovery.signal tells PostgreSQL to start in archive recovery mode.
touch "$PGDATA/recovery.signal"

We are back to snapshot state only.

Step 8 --- Replay History

log "Starting pgrwl restore server..."
pgrwl daemon -m serve -c "$PGRWL_CONFIG" >"$WORKDIR/pgrwl-serve.log" 2>&1 &
PGRWL_SERVE_PID=$!

cat >>"$PGDATA/postgresql.conf" <<EOF
restore_command = 'pgrwl restore-command --serve-addr=127.0.0.1:7070 %f %p'
EOF

log "Starting restored PostgreSQL cluster..."
pg_ctl -D "$PGDATA" -l "$WORKDIR/postgres-restored.log" start >/dev/null

wait_for_postgres
wait_until_out_of_recovery

Start pgrwl in serve mode for restore_command, run cluster, PostgreSQL starts replaying WAL.

It is replaying history.

Every insert.
Every update.
Every commit.

Reconstructed from WAL.

Step 9 --- Did It Work?

log "Dumping cluster state after recovery..."
pg_dumpall --quote-all-identifiers --restrict-key=0 >"$WORKDIR/after.sql"

log "Comparing dumps..."
if diff -u "$WORKDIR/before.sql" "$WORKDIR/after.sql" >"$WORKDIR/dump.diff"; then
  log "SUCCESS: restored cluster matches original state"
  echo "before: $WORKDIR/before.sql"
  echo "after : $WORKDIR/after.sql"
  echo "diff  : $WORKDIR/dump.diff (empty)"
else
  echo
  echo "FAIL: restored cluster differs from original state"
  echo "See diff: $WORKDIR/dump.diff"
  exit 1
fi

If there is no diff:

We recovered every single transaction.

Not approximately. Not logically. Exactly.

Mental Model

Think Git:

backup = commit
WAL = commits after
recovery = replay commits

Final Thought

If you don't understand WAL, you don't understand PostgreSQL recovery.

Using docker environment for integration tests

Integration Tests

#!/usr/bin/env bash
set -Eeuo pipefail

# setup docker-compose env
cd /tmp
git clone https://github.com/pgrwl/pgrwl.git
cd pgrwl/test/integration/environ
make restart

# exec into container
docker exec -it pg-primary bash

# run tests
su - postgres
cd scripts/tests
bash 011-basic-flow.sh

Full Script

#!/usr/bin/env bash
set -Eeuo pipefail

###############################################################################
# Simple 'Point In Time Recovery' tutorial with pgrwl
#
# What this script demonstrates:
#
#   1. Start a fresh PostgreSQL cluster
#   2. Start pgrwl in WAL receiver mode
#   3. Take a base backup
#   4. Generate more data AFTER the base backup
#   5. Save a logical dump of the final database state
#   6. Destroy PGDATA (simulate disaster)
#   7. Restore from the base backup
#   8. Replay archived WAL files
#   9. Compare the restored database with the original state
#
# Main idea:
#
#   A base backup is only a snapshot at one point in time.
#   All changes made after that snapshot live in WAL.
#   To recover to the latest committed transaction, we need BOTH:
#
#     - the base backup
#     - the WAL generated after the backup
#
###############################################################################

###############################################################################
# Configuration
###############################################################################

PGDATA="/tmp/pgrwl-basic/pgdata"
WAL_ARCHIVE_DIR="/tmp/pgrwl-basic/wal-archive"
PGRWL_CONFIG="/tmp/pgrwl-basic/pgrwl-config.json"

DBNAME="bench"
REPL_SLOT="pgrwl_v5"

export PGHOST="localhost"
export PGPORT="5432"
export PGUSER="postgres"
export PGPASSWORD="postgres"

PGRWL_RECEIVE_PID=""
PGRWL_SERVE_PID=""

###############################################################################
# Small helper functions
###############################################################################

log() {
  printf '\n[%s] %s\n' "$(date '+%F %T')" "$*"
}

die() {
  echo "ERROR: $*" >&2
  exit 1
}

wait_for_postgres() {
  log "Waiting for PostgreSQL to accept connections..."
  for _ in $(seq 1 120); do
    if pg_isready -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" >/dev/null 2>&1; then
      return 0
    fi
    sleep 1
  done
  die "PostgreSQL did not become ready in time"
}

wait_until_out_of_recovery() {
  log "Waiting for PostgreSQL to finish recovery..."
  for _ in $(seq 1 120); do
    if psql -d postgres -Atqc "select pg_is_in_recovery()" 2>/dev/null | grep -q '^f$'; then
      return 0
    fi
    sleep 1
  done
  die "PostgreSQL did not finish recovery in time"
}

stop_postgres() {
  if [[ -d "$PGDATA" ]]; then
    log "Stopping PostgreSQL..."
    pg_ctl -D "$PGDATA" -m immediate stop >/dev/null 2>&1 || true
  fi
}

stop_pgrwl_receive() {
  if [[ -n "${PGRWL_RECEIVE_PID:-}" ]]; then
    log "Stopping pgrwl receiver..."
    kill "$PGRWL_RECEIVE_PID" >/dev/null 2>&1 || true
    wait "$PGRWL_RECEIVE_PID" >/dev/null 2>&1 || true
    PGRWL_RECEIVE_PID=""
  fi
}

stop_pgrwl_serve() {
  if [[ -n "${PGRWL_SERVE_PID:-}" ]]; then
    log "Stopping pgrwl restore server..."
    kill "$PGRWL_SERVE_PID" >/dev/null 2>&1 || true
    wait "$PGRWL_SERVE_PID" >/dev/null 2>&1 || true
    PGRWL_SERVE_PID=""
  fi
}

cleanup() {
  stop_pgrwl_receive
  stop_pgrwl_serve
}
trap cleanup EXIT

###############################################################################
# Phase 0. Start from a clean state
###############################################################################

log "Cleaning up old processes and files..."
sudo pkill -9 postgres || true
sudo pkill -9 pgrwl || true
sudo rm -rf "/tmp/pgrwl-basic"

log "Preparing work directory: /tmp/pgrwl-basic"
mkdir -p "/tmp/pgrwl-basic" "$WAL_ARCHIVE_DIR"

###############################################################################
# Phase 1. Create and start a fresh PostgreSQL cluster
###############################################################################

log "Initializing PostgreSQL cluster..."
initdb -D "$PGDATA" -A trust --auth-local=trust --auth-host=trust >/dev/null

cat >>"$PGDATA/postgresql.conf" <<EOF
listen_addresses      = '*'

# Settings required for WAL streaming / archiving style workflows
wal_level                = replica
max_wal_senders          = 10
max_replication_slots    = 10
wal_keep_size            = 64MB

# Durability settings
fsync                    = on
synchronous_commit       = on
full_page_writes         = on

# Basic logging settings
log_directory            = '/tmp/pgrwl-basic'
log_filename             = 'pg.log'
log_lock_waits           = on
log_temp_files           = 0
log_checkpoints          = on
log_connections          = off
log_destination          = 'stderr'
log_error_verbosity      = 'DEFAULT' # TERSE, DEFAULT, VERBOSE
log_hostname             = off
log_min_messages         = 'WARNING' # DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, INFO, NOTICE, WARNING, ERROR, LOG, FATAL, PANIC
log_timezone             = 'Asia/Aqtau'
log_line_prefix          = '%t [%p-%l] %r %q%u@%d '
EOF

log "Starting PostgreSQL..."
pg_ctl -D "$PGDATA" -l "/tmp/pgrwl-basic/pg.log" start >/dev/null
wait_for_postgres

log "Creating physical replication slot: $REPL_SLOT"
psql -d postgres -v ON_ERROR_STOP=1 \
  -c "select pg_create_physical_replication_slot('$REPL_SLOT');" >/dev/null

log "Creating test database: $DBNAME"
createdb "$DBNAME"

###############################################################################
# Phase 2. Configure and start pgrwl in receive mode
###############################################################################

log "Writing pgrwl configuration..."
cat >"$PGRWL_CONFIG" <<EOF
{
  "main": {
    "listen_port": 7070,
    "directory": "$WAL_ARCHIVE_DIR"
  },
  "receiver": {
    "slot": "$REPL_SLOT",
    "no_loop": true
  },
  "log": {
    "level": "debug",
    "format": "text",
    "add_source": false
  }
}
EOF

log "Starting pgrwl receiver..."
pgrwl daemon -m receive -c "$PGRWL_CONFIG" >"/tmp/pgrwl-basic/pgrwl-receive.log" 2>&1 &
PGRWL_RECEIVE_PID=$!

# Give the receiver a moment to connect and begin streaming.
sleep 3

###############################################################################
# Phase 3. Take a base backup
###############################################################################

log "Creating base backup..."
pgrwl backup -c "$PGRWL_CONFIG"

###############################################################################
# Phase 4. Generate data AFTER the base backup
#
# This is the important part.
# If we recover only from the base backup, these changes would be lost.
# They survive only because the WAL receiver captures the WAL stream.
###############################################################################

log "Initializing pgbench data (scale=10 ~ about 1 million rows in pgbench_accounts)..."
pgbench -i -s 10 "$DBNAME"

log "Running pgbench workload..."
pgbench -c 4 -j 2 -t 200 "$DBNAME"

###############################################################################
# Phase 5. Save the final logical state before disaster
#
# This dump becomes our ground truth.
# After restore + WAL replay, we expect the cluster to match this state.
###############################################################################

log "Dumping cluster state before destruction..."
pg_dumpall --quote-all-identifiers --restrict-key=0 >"/tmp/pgrwl-basic/before.sql"

###############################################################################
# Phase 6. Force PostgreSQL to emit final WAL and let receiver catch up
###############################################################################

log "Forcing checkpoint and WAL switch..."
psql -d postgres -v ON_ERROR_STOP=1 -c "checkpoint;" >/dev/null
psql -d postgres -v ON_ERROR_STOP=1 -c "select pg_switch_wal();" >/dev/null

# Give pgrwl time to receive the last WAL segment(s).
sleep 3

###############################################################################
# Phase 7. Simulate disaster
###############################################################################

log "Stopping PostgreSQL and pgrwl receiver..."
stop_postgres
stop_pgrwl_receive

log "Removing original PGDATA to simulate data loss..."
rm -rf "$PGDATA"

###############################################################################
# Phase 8. Restore the base backup
###############################################################################

log "Restoring PGDATA from base backup..."
pgrwl restore --dest="$PGDATA" -c "$PGRWL_CONFIG"

chmod 0750 "$PGDATA"
chown -R postgres:postgres "$PGDATA"

# recovery.signal tells PostgreSQL to start in archive recovery mode.
touch "$PGDATA/recovery.signal"

###############################################################################
# Phase 9. Start pgrwl in serve mode for restore_command
###############################################################################

log "Starting pgrwl restore server..."
pgrwl daemon -m serve -c "$PGRWL_CONFIG" >"/tmp/pgrwl-basic/pgrwl-serve.log" 2>&1 &
PGRWL_SERVE_PID=$!

cat >>"$PGDATA/postgresql.conf" <<EOF
restore_command = 'pgrwl restore-command --serve-addr=127.0.0.1:7070 %f %p'
EOF

###############################################################################
# Phase 10. Start restored PostgreSQL and let it replay WAL
###############################################################################

log "Starting restored PostgreSQL cluster..."
pg_ctl -D "$PGDATA" -l "/tmp/pgrwl-basic/postgres-restored.log" start >/dev/null

wait_for_postgres
wait_until_out_of_recovery

###############################################################################
# Phase 11. Dump restored state and compare
###############################################################################

log "Dumping cluster state after recovery..."
pg_dumpall --quote-all-identifiers --restrict-key=0 >"/tmp/pgrwl-basic/after.sql"

log "Comparing dumps..."
if diff -u "/tmp/pgrwl-basic/before.sql" "/tmp/pgrwl-basic/after.sql" >"/tmp/pgrwl-basic/dump.diff"; then
  log "SUCCESS: restored cluster matches original state"
  echo "before: /tmp/pgrwl-basic/before.sql"
  echo "after : /tmp/pgrwl-basic/after.sql"
  echo "diff  : /tmp/pgrwl-basic/dump.diff (empty)"
else
  echo
  echo "FAIL: restored cluster differs from original state"
  echo "See diff: /tmp/pgrwl-basic/dump.diff"
  exit 1
fi

PostgreSQL Streaming WAL Archiver and a backup tool in Go (pgrwl)

alexey.zh — Sat, 28 Mar 2026 08:13:15 +0000

A production-grade, cloud-native PostgreSQL WAL archiving system designed for:

streaming WAL to S3 with compression, encryption, and retention
Kubernetes-native PostgreSQL backup workflows
zero data loss and reliable Point-in-Time Recovery (PITR)

Project

https://github.com/pgrwl/pgrwl

Features

WAL receiver (replication protocol)
Continuous WAL streaming
Backup to S3 (MinIO, AWS, etc.)
Backup to SFTP (backup server)
WAL compression (gzip, zstd-ready)
WAL encryption (AES-GCM)
WAL retention management
WAL monitoring and observability
Kubernetes & container ready
Helm chart support
YAML / JSON / ENV config
Lightweight single binary
Structured logging
Integration tests (containerized)
Unit tests
Backup automation (streaming basebackup)
Continuous backup for PostgreSQL

Key Capabilities

Streaming WAL

Uses PostgreSQL replication protocol
Supports synchronous WAL streaming
Enables zero data loss setups

Storage Backends

S3-compatible storage
SFTP backup servers

Compression + Encryption

Pipeline based on filename:

000000010000000000000001.gz.aes

Flow:

compress -> encrypt -> upload
download -> decrypt -> decompress

Architecture

PostgreSQL
   | (replication protocol)
WAL Receiver
   |
Local FS (fsync)
   |
Uploader (S3 / SFTP)
   |
Retention manager
   |
HTTP server (restore_command)

Continuous Backup

real-time WAL streaming
safe off-site storage
full PITR support
near-zero RPO

Kubernetes Ready

run as StatefulSet
works with StatefulSets / CNPG / Virtual Machines
deploy via Helm
GitOps-friendly

Configuration Example

main:
  listen_port: 7070
  directory: wals
receiver:
  slot: pgrwl_v5
log:
  level: trace
  format: text
  add_source: true

Testing

integration tests with real PostgreSQL containers
end-to-end WAL validation
unit-tested components

Why pgrwl?

simple deployment (single binary)
production-grade reliability
cloud-native design
built for Kubernetes and containers
secure and efficient WAL handling

Contribute

Star the repo (https://github.com/pgrwl/pgrwl)
Open issues (https://github.com/pgrwl/pgrwl/issues)
Suggest improvements
Submit PRs (https://github.com/pgrwl/pgrwl/blob/master/CONTRIBUTING.md)

Summary

pgrwl is a lightweight, powerful, production-ready WAL archiving solution that brings:

streaming
security
automation
observability

to PostgreSQL backups.

Patch-based, environment-aware Kubernetes deployments using plain YAML and zero templating

alexey.zh — Wed, 25 Jun 2025 14:18:53 +0000

Meet kubepatch — a simple tool for deploying Kubernetes manifests using a patch-based approach.

Unlike tools that embed logic into YAML or require custom template languages, kubepatch keeps your base manifests clean and idiomatic.

Simple: No templates, DSLs, or logic in YAML, zero magic
Predictable: No string substitutions or regex hacks
Safe: Only native Kubernetes YAML manifests - readable, valid, untouched
Layered: Patch logic is externalized and explicit via JSON Patch (RFC 6902)
Declarative: Cross-environment deployment with predictable, understandable changes

🛠 Example

Given a base set of manifests for deploy a basic microservice
see examples

---
apiVersion: v1
kind: Service
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  type: NodePort
  selector:
    app: myapp
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: "localhost:5000/restapiapp:latest"

A patches/prod.yaml might look like:

myapp-prod:
  deployment/myapp:
    - op: replace
      path: /spec/replicas
      value: 2
    - op: replace
      path: /spec/template/spec/containers/0/image
      value: "localhost:5000/restapiapp:1.21"
    - op: add
      path: /spec/template/spec/containers/0/env
      value:
        - name: RESTAPIAPP_VERSION
          value: prod
        - name: LOG_LEVEL
          value: info
    - op: add
      path: /spec/template/spec/containers/0/resources
      value:
        limits:
          cpu: "500m"
          memory: "512Mi"
        requests:
          cpu: "64m"
          memory: "128Mi"
  service/myapp:
    - op: add
      path: /spec/ports/0/nodePort
      value: 30266

A patches/dev.yaml might look like:

myapp-dev:
  deployment/myapp:
    - op: replace
      path: /spec/template/spec/containers/0/image
      value: "localhost:5000/restapiapp:1.22"
    - op: add
      path: /spec/template/spec/containers/0/env
      value:
        - name: RESTAPIAPP_VERSION
          value: dev
        - name: LOG_LEVEL
          value: debug
  service/myapp:
    - op: add
      path: /spec/ports/0/nodePort
      value: 30265

Apply the appropriate patch set based on the target environment.

kubepatch patch -f base/ -p patches/dev.yaml | kubectl apply -f -

Rendered manifest may look like this (note that all labels are set, as well as all patches are applied)

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: myapp-dev
  name: myapp-dev
spec:
  ports:
    - nodePort: 30265
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector:
    app: myapp-dev
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: myapp-dev
  name: myapp-dev
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp-dev
  template:
    metadata:
      labels:
        app: myapp-dev
    spec:
      containers:
        - env:
            - name: RESTAPIAPP_VERSION
              value: dev
            - name: LOG_LEVEL
              value: debug
          image: localhost:5000/restapiapp:1.22
          name: myapp

Installation

Manual Installation

Download the latest binary for your platform from the Releases page.
Place the binary in your system's PATH (e.g., /usr/local/bin).

Installation script

(
set -euo pipefail

OS="$(uname | tr '[:upper:]' '[:lower:]')"
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')"
TAG="$(curl -s https://api.github.com/repos/kubepatch/kubepatch/releases/latest | jq -r .tag_name)"

curl -L "https://github.com/kubepatch/kubepatch/releases/download/${TAG}/kubepatch_${TAG}_${OS}_${ARCH}.tar.gz" |
tar -xzf - -C /usr/local/bin && \
chmod +x /usr/local/bin/kubepatch
)

Package-Based installation (suitable in CI/CD)

Debian

sudo apt update -y && sudo apt install -y curl
curl -LO https://github.com/kubepatch/kubepatch/releases/latest/download/kubepatch_linux_amd64.deb
sudo dpkg -i kubepatch_linux_amd64.deb

Alpine Linux

apk update && apk add --no-cache bash curl
curl -LO https://github.com/kubepatch/kubepatch/releases/latest/download/kubepatch_linux_amd64.apk
apk add kubepatch_linux_amd64.apk --allow-untrusted

✨ Key Features

JSON Patch Only

Patches are applied using JSON Patch:

- op: replace
  path: /spec/replicas
  value: 1

Every patch is minimal, explicit, and easy to understand. No string manipulation or text templating involved.

Plain Kubernetes YAML Manifests

Your base manifests are 100% pure Kubernetes objects - no logic, no annotations, no overrides, no preprocessing. This
ensures:

Easy editing
Compatibility with other tools
Clean Git diffs

Cross-Environment Deploys

Deploy to dev, staging, or prod just by selecting the right set of patches. All logic lives in patch files, not
your base manifests.

Common Labels Support

Inject common labels (like env, team, app), including deep paths like pod templates and selectors.

Env Var Substitution (in Patch Values Only)

You can inject secrets and configuration values directly into patch files:

- op: add
  path: /spec/template/spec/containers/0/env
  value:
    - name: PGPASSWORD
      value: ${IAM_SERVICE_PGPASS}

Strict env-var substitution (prefix-based) is only allowed inside patches - never in base manifests.

Feedback

Have a feature request or issue? Feel free to open an issue or submit a PR!

Apply Kubernetes Manifests Atomically With Rollback

alexey.zh — Sat, 21 Jun 2025 06:57:27 +0000

katomik - Atomic Apply for Kubernetes Manifests with Rollback Support.

Applies multiple Kubernetes manifests with all-or-nothing guarantees. Like kubectl apply -f, but transactional:
if any resource fails to apply or become ready, all previously applied resources are rolled back automatically.

GitHub Repo →

Features

Atomic behavior: Applies multiple manifests as a unit. If anything fails, restores the original state.
Server-Side Apply (SSA): Uses PATCH with SSA to minimize conflicts and preserve intent.
Status tracking: Waits for all resources to become Current (Ready/Available) before succeeding.
Rollback support: Automatically restores previous state if apply or wait fails.
Recursive: Like kubectl, supports directories and -R for recursive traversal.
STDIN support: Use -f - to read from stdin.

Installation

Manual Installation

Download the latest binary for your platform from the Releases page.
Place the binary in your system's PATH (e.g., /usr/local/bin).

Installation script

(
set -euo pipefail

OS="$(uname | tr '[:upper:]' '[:lower:]')"
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')"
TAG="$(curl -s https://api.github.com/repos/hashmap-kz/katomik/releases/latest | jq -r .tag_name)"

curl -L "https://github.com/hashmap-kz/katomik/releases/download/${TAG}/katomik_${TAG}_${OS}_${ARCH}.tar.gz" |
tar -xzf - -C /usr/local/bin && \
chmod +x /usr/local/bin/katomik
)

Homebrew installation

brew tap hashmap-kz/homebrew-tap
brew install katomik

Usage

# Apply multiple files atomically
katomik apply -f manifests/

# Read from stdin
katomik apply -f - < all.yaml

# Apply recursively
katomik apply -R -f ./deploy/

# Set a custom timeout (default: 5m)
katomik apply --timeout 2m -f ./manifests/

# Process and apply a manifest located on a remote server
katomik apply \
  -f https://raw.githubusercontent.com/user/repo/refs/heads/master/manifests/deployment.yaml

Example Output

# katomik apply -f test/integration/k8s/manifests/

┌───────────────────────────────────────┬──────────────┐
│               RESOURCE                │  NAMESPACE   │
├───────────────────────────────────────┼──────────────┤
│ Namespace/katomik-test                │ (cluster)    │
│ ConfigMap/postgresql-init-script      │ katomik-test │
│ ConfigMap/postgresql-envs             │ katomik-test │
│ ConfigMap/postgresql-conf             │ katomik-test │
│ Service/postgres                      │ katomik-test │
│ PersistentVolumeClaim/postgres-data   │ katomik-test │
│ StatefulSet/postgres                  │ katomik-test │
│ ConfigMap/prometheus-config           │ katomik-test │
│ PersistentVolumeClaim/prometheus-data │ katomik-test │
│ Service/prometheus                    │ katomik-test │
│ StatefulSet/prometheus                │ katomik-test │
│ PersistentVolumeClaim/grafana-data    │ katomik-test │
│ Service/grafana                       │ katomik-test │
│ ConfigMap/grafana-datasources         │ katomik-test │
│ Deployment/grafana                    │ katomik-test │
└───────────────────────────────────────┴──────────────┘

+ watching
| Service/grafana                       katomik-test Unknown
| Deployment/grafana                    katomik-test Unknown
| StatefulSet/postgres                  katomik-test InProgress
| StatefulSet/prometheus                katomik-test InProgress
+ watching

✓ Success

Quick Start

cd test/integration/k8s
bash 00-setup-kind.sh
katomik apply -f manifests/

🔒 Rollback Guarantees

On failure (bad manifest, missing dependency, timeout, etc.):

Existing objects are reverted to their exact pre-apply state.
New objects are deleted.

This guarantees your cluster remains consistent - no partial updates.

Flags

Flag	Description
`-f`	File, directory, or `-` for stdin
`-R`	Recurse into directories
`--timeout`	Timeout to wait for readiness

Feedback

Have a feature request or issue? Feel free to open an issue
or submit a PR!

DEV Community: alexey.zh

PostgreSQL to Go REST API, Generated

Routine Code Is Not Heroism

So the Generator Appeared

Then Go Came Along

“Just Generate an Entity” Sounds Easy

Why This Is Actually Usable

A Small Example

Why Put This in Open Source

What Comes Next

Final Words

Stop Shipping Breaking Go APIs by Accident

Why another report?

It is not a diff tool

What the report looks like

Add it to a GitHub pull request

HTML as a CI artifact

Why this matters

Easy to try

Finding Structurally Duplicate Go Functions with AST Hashing

The Problem With Text-Based Approaches

The Core Insight: Normalize, Then Hash

Two Representations Per Function

Near-Clone Detection: Edit Distance on Hash Sequences

Grouping Near-Clones: Union-Find

The Two-Pass Architecture

What It Finds in Practice

What's Next

Don’t Trust Backups You Haven’t Restored

0. A Backup Is Not a File, but a Promise

1. Backups Are a Comforting Lie

2. WAL Files Are Not the Goal

3. The Real Product Is Point-in-Time Recovery

4. restore_command Is the Final Boss

5. A Successful Upload Is Not Proof of Successful Future Recovery

6. Storage Is Where Backup Tools Become Dangerous

7. Backup Tool Development Is Based on Negative Scenarios

8. What the Tool Should Know

9. What the Operator Should See

10. Test Recovery Skeptically

11. Things I Still Do Not Trust

12. Make Recovery Boring

WEB UI in Go? Nothing Can Stop Me!

Introduction

What Skill Loss Is

Sometimes the Solution to Forgotten Problems Happens by Itself

Planning and Implementation

Conclusion

A Long Story about how I dug into the PostgreSQL source code to write my own WAL receiver, and what came out of it

Why PostgreSQL in the First Place

Why Exactly pg_receivewal

A Few Words About Other Good Solutions I Looked at With Respect and Envy

pgBackRest

Barman

Why Go, If I Had to Look at So Much C

The Beginning: Compiling PostgreSQL, Debugging, and the First Signs of Recklessness

The First Prototype: "I Will Just Reproduce pg_receivewal"

When You Realize That Simply Receiving WAL Is Only the Beginning

About Base Backup: I Did Not Want To, but Curiosity Won

Why I Did Not Go Deeper Into Incremental Backups

Architecture: The Moment When You Are No Longer Writing a Utility but Coordinating Chaos

Streaming Large Files: Another Source of Creativity

fsync: The Most Subtle Part and My Own Little Nervous Breakdown

Metrics: Because I Wanted to See Whether It Was Still Alive or Already Dead

Logging: Where I Also Realized I Still Have a Long Way to Go

Integration Tests: The Hardest and Most Important Part

What Came Out of It

What Is Planned

What I Took Away From This

Links

SQL-First PostgreSQL Migrations Without the Magic

Why this approach matters

What gopgmigrate does

The naming convention is the API

Versioned migrations

Rollbacks

Repeatable migrations

Non-transactional migrations

Real projects are not flat folders

Why SQL-first migrations are still the best default

4. `restore_command` Is the Final Boss

Why Exactly `pg_receivewal`

`pgBackRest`

`Barman`

The First Prototype: "I Will Just Reproduce `pg_receivewal`"

`fsync`: The Most Subtle Part and My Own Little Nervous Breakdown