Most apps that handle a lot of documents — client portals, legal and case systems, funding or insurance workflows, anything with uploads and approvals — quietly drift into the same anti-pattern. The files become the real state of the system. The "status" of a case is whatever the latest PDF in the folder says. Who's allowed to see what is decided by where a file lives. The database, if there is one, ends up as a thin index over a pile of documents.
It feels natural, and it's a trap. Here's the pattern we reach for instead, and why it pays off.
The rule: the database is the system of record; files are just managed assets
State lives in the database. Documents live in object storage. The database owns the truth — statuses, ownership, permissions, the activity history — and each file is represented by a row with metadata and access rules, not by its existence in a folder.
Concretely:
- Every domain object is a real record — case, request, task, agreement, billing state — with explicit status fields and transitions, not implied by which documents exist.
- Files are assets, referenced from the database. The blob sits in object storage; the row holds the metadata, the owner, the permissions and the link.
- Access is checked against the data model, by role and relationship — "can this user see this record," not just "is this user logged in."
- State changes are logged as events. Sensitive transitions write an activity record, so "who did what, when" is a first-class part of the system.
Why it's worth the extra work
Auditability. When the truth is in the database and transitions are logged, you can answer "what happened to this case and who touched it" precisely. A folder of files can't tell you that.
Access control becomes possible. You can't reliably enforce "this paralegal sees only their assigned cases" when permission is a function of folder structure. When access is a check against records and relationships, you can.
Integrity. Documents get renamed, re-uploaded, deleted. If they're your source of truth, every one of those is a potential corruption of state. As assets pointed to from authoritative records, they're replaceable without losing the truth.
You can actually reason about the system. Statuses, transitions and ownership in a schema are something you can query, validate and test. "The newest PDF wins" is not.
A minimal shape
You don't need anything exotic. A typical version:
- Postgres (or similar) as the single source of truth for records, status, permissions and events
- Object storage (S3-style) for the file bytes
- A metadata row per file: owner, related record, type, access scope, timestamps
- Role- and relationship-based access checks on every protected route
- An append-only activity log for sensitive state changes
The expensive-sounding parts — the event log, the per-record access checks — are cheap to add at the start and very painful to retrofit once files have already become your de-facto database.
The trade-offs (so this isn't a sales pitch)
This pattern is overkill for a simple file share or an internal tool three people use. If there are no real roles, no compliance surface and no need to audit, a folder and a spreadsheet are fine — don't build a state machine to store five PDFs.
It earns its keep the moment you have multiple roles, sensitive data, approvals, or anyone who will later ask "who saw this and when": legal, healthtech, fintech, B2B client portals, anything touching personal data under GDPR. That's exactly where the file-as-truth approach fails most expensively.
The one-line version
If a document determines what your system does — who can act, what state something is in, what happens next — that decision belongs in your database, with the file as a managed asset hanging off it. Treat files as the source of truth and you've built a filing cabinet. Treat the database as the source of truth and you've built a system.
I'm Anna Hartung, founder of H-Studio, an architecture-first engineering studio in Berlin. We build document- and workflow-heavy platforms where this distinction is usually the difference between a product and a mess.
Top comments (0)