Markus Weiland

Posted on Apr 1 • Originally published at 2h10.de

Your content pipeline is lying to you, and in regulated software, that's a serious problem

#git #graphql #tooling #cms

There is a category of bug that does not show up in your test suite, does not trigger an alert, and does not produce a stack trace. It looks like this: the wrong version of your content is running in production, and you have no reliable way to prove otherwise.

For most applications, this is embarrassing. For software in regulated industries (medical devices, industrial systems, certified training applications, etc.) it can be a compliance failure with real consequences.

This post is about why this happens, why the obvious fixes do not actually fix it, and what a correct architecture looks like.

The problem with treating content like database state

Most content pipelines work roughly like this: content lives somewhere editable (a CMS, a database, Notion, a spreadsheet), a build process or runtime query pulls it out, and the application delivers it to users.

The fundamental assumption baked into this model is that "current content" means "whatever is in the database right now." That assumption is fine for a marketing website where you want changes to go live immediately. It is quietly disastrous for applications where what was delivered to a user needs to be auditable, reproducible, and tied to a specific approval event.

Consider a company building certified training software for medical device manufacturers. Their content — the training material that end users complete to be certified on a device — must reflect what was reviewed and approved by the manufacturer. If an editor makes a change in the CMS, saves it, and that change goes live immediately in the training application, you have a pipeline where:

There is no reliable record of what content was actually delivered to a specific user session
An approved state can be silently overwritten by any subsequent edit
Future content revisions for the next operating procedure version cannot be safely developed without risk of contaminating current production content
An audit asking "what exactly did this user see on this date?" cannot be answered with certainty

None of these are edge cases in regulated software. They are exactly the questions that certification and compliance processes ask.

Why the obvious solutions fall short

"We'll just add an approval workflow to our CMS."

While restricting the publication of content can solve the immediate issue of controlling which content reaches production, it cannot prevent that future content revisions overwrite existing published content. This makes it impossible to operate multiple exactly defined revisions in parallel, as end-users gradually switch to new procedures at their own pace.

"We'll use database snapshots or backups."

Snapshots are a disaster recovery mechanism, not an audit trail. They are coarse-grained, difficult to query selectively, and not designed to answer "show me exactly what field X contained for entry Y at timestamp T and who approved it."

"We'll version our database records."

You can build this. It requires significant custom engineering, and you will need to think carefully about referential integrity across versions, query complexity for fetching a consistent snapshot of related content at a given version, and how to expose this to non-technical content editors in a way that does not cause confusion. Going down this path is a surefire way of spending development effort on closing down a never-ending series of edge-cases, where that effort could have been spent on further building out your core product instead.

"We'll bake content into the application binary at build time."

This solves the "content changed at runtime" problem but introduces a different one: the only way to update content is to rebuild and redeploy the entire application. Iteration speed becomes painful. More importantly, you still need a way to manage and audit what goes into the build — the source of truth upstream of the binary is still mutable.

What a correct architecture actually requires

The properties you need are specific and worth stating precisely:

Immutability by reference. A given version of your content, once approved, must be permanently retrievable by a stable identifier. Not "the current state of approved content" but "the exact state of content as of approval event #4471."

Referential consistency. If your content model has relationships (e.g. a training module references a set of questions, which reference a set of answer options) fetching a specific version must return a consistent snapshot of the entire graph, not a mix of versions from different points in time.

A separation between working state and production state. Authors need to be able to work on future revisions without those changes being visible to production consumers. This is a branching problem, not a permissions problem.

An audit trail that is structural, not appended. The history of what changed, when, and as a result of what approval event should be intrinsic to the storage model, not a log table bolted on afterward.

Git already solved this problem for code

If you step back, these are exactly the properties that Git provides for source code:

Every commit is content-addressed and permanently immutable
A commit captures a consistent snapshot of the entire tree at a point in time
Branches allow parallel workstreams without interference
The commit graph is a structural, tamper-evident audit trail
A specific historical state is always retrievable by commit hash or tag

The reason we do not instinctively apply this to content is partly historical (Git tooling was built for developers, not editors) and partly because the content model is usually relational in ways that flat files handle awkwardly.

But the core insight holds: if your content were stored in Git, you would get immutability, branching, and audit trail for free, because those are Git's foundational properties — not features you add on top.

Making this practical

The gap between "store content in Git" and "have a usable content pipeline" is real. A few things need to exist:

A schema that defines your content model formally. Ad hoc JSON or YAML files in a repository are not enough. You need a schema that defines types, relationships, and constraints, so that content can be validated and queried consistently.

An API layer that understands the content model. Content consumers (your application, your build pipeline) should not be parsing content files directly, as it would cause a lot of content-related code to build up in the consumer. Instead, consumers should be querying a typed API that resolves references, enforces the schema, and lets them specify which version of the content they want by ref (a branch name, a tag, or a specific commit hash).

A way to express approval in Git terms. The most natural model: content development happens on feature branches, a pull request represents the review and approval event, and merging to a production branch (or creating a tag) is the act of approving content for delivery. This is a well-understood workflow that countless engineering teams are already following.

An editing interface that shields complexity from non-technical editors. Authors should neither need to know what a commit is, nor have to worry about getting content to match the schema. They need a schema-conformant, form-based UI that saves their changes, and a clear way to submit content for review. The schema validation and Git operations happen underneath.

When this is in place, your application can query content by specifying a production tag or branch. What it receives is guaranteed to be exactly what was approved, not because of a runtime check, but because the storage model makes any other outcome structurally impossible.

The operational benefit beyond compliance

There is a practical benefit that has nothing to do with audits: you can safely develop next-version content in parallel with delivering current-version content to production, with zero risk of cross-contamination.

For the medical device training example: when a manufacturer releases a new revision of operating procedures, the training content for that revision can be developed, reviewed, and approved on a separate branch while the current certified training continues serving users unchanged. The two versions never interfere with each other. Switching production to the new version is a single operation (moving the tag or updating a ref) that is itself logged in the Git history.

This is not a compliance feature. It is just a sane way to manage content for software that has release cycles.

Where to go from here

If this problem pattern resonates with your context, the conceptual model I have described here is the foundation of Commitspark, a set of open-source tools I built that provides a GraphQL API over Git-backed, schema-defined content. It is one concrete implementation of these ideas, but the architectural principles apply regardless of what tooling you choose.

The more important takeaway is this: if your application delivers content that needs to be auditable, versioned, and approved — and you are currently managing that content in a mutable database or a CMS with no structural versioning — you have a gap worth closing before a compliance process asks you to close it for you.

Your feedback

If you previously encountered this problem of having to prove a specific version of content was in production, what were you building and how did you solve it?

Let me know in the comments.

DEV Community