Kerman

Posted on Oct 14

Why Model-First and where the Truth lives

#programming #database #architecture #software

Relational databases have one minor shortcoming. They do not render tabular data on the user’s screen. On one hand, that’s not really a problem. A database does database things and doesn’t go into UI, where not welcome. On the other hand you must ferry data through another language, sometimes two or three in a web stack. And don’t forget the query language itself. It’s a language too.

All languages are tools to reduce digital chaos. They give you ways to collect scattered fields into one place and even name the resulting structure. In OOP languages that utility is part of the paradigm. Most server-side stacks between the DB and the UI are object oriented. Naturally developers reach for structural language tools to pack table fields into a class. That’s how DTOs, entities, POJOs, and other wrappers appear.

It doesn’t sound like a big deal. We just build the structure in our favorite programming language, make it match the database, and stack our little blocks together, hoping they’ll spell “happiness.” It works for a while. Then the structures in code and in the database start to drift apart. And if they don’t, poke your project with a stick. It might be dead. A dead bird doesn’t sing, and a dead project doesn’t change its structure.

It’s not that we haven’t learned how to solve this problem. We do, and quite well. For example, I once heard this approach: just make every change carefully in both the database and the code at the same time, and never make a mistake. Then nothing will drift out of sync. You laughed too, right?

Before we move on to practical solutions, I want to emphasize that this is a truly fundamental problem. It does not depend on the specific database, language, or framework. You can be the most advanced architect, the most pedantic senior, but if you work with a database, this problem will be there for you.

I am talking about relational data. Document databases have their own toys to play with. Schema drift there takes on truly twisted shapes. Objects in a collection gain fields over time. Consistency rules vary by ID ranges. Code accumulates workarounds and multiple deserializers for different vintage records because the format changed. And to top it off, teams sprinkle in on-the-fly migrations and versioned transformations.

Relational databases behave differently. A schema change applies to all rows at once. If you add a column it appears for the earliest rows too, which forces compatibility with the new structure. Sometimes you still run a migration job to reshape historical data. But that single-version model spares you much of the multiversion headache. I treat this as static typing for data: it forces structure discipline. So let’s get back to relational systems.

You have probably seen the interview question: “What’s the difference between code-first and db-first? Pros and cons?” It’s an interview favorite. Here’s my answer.

Approach	Advantage	Disadvantage
Code-first	Makes DB structure follow code	Makes DB structure follow code
DB-first	Makes code follow DB structure	Makes code follow DB structure

Just giving you a free hint.

I used to sit firmly in the DB-first camp. Data feels primary to me. Intuitively, it was weird to let a migration tool “poke” the database. What if someone accidentally deleted or renamed a field in the code? Will the column be dropped from the DB?

If code generation from DB goes wrong you regenerate, fix, or restore from Git. When you generate DB structure from code you touch actual data. Data is valuable. You cannot restore yesterday’s user contributions from Git. Backups are older. Mirrors get overwritten. Data loss hurts.

Databases come first for another reason. In new projects data often already exists. Maybe not as a clean SQL schema but as Excel files or CSV dumps. You design the DB structure and then write code around it. You treat schema carefully because “garbage in, garbage out”. Changing DB structure is harder than changing code because data lives in the DB.

I used to think of code first as a kind of training wheels approach. Something you start with when you’re just getting familiar with databases and building your first pet projects. I believed that sooner or later it would cause trouble once the system grew and the data, the data people actually worked hard to create, became valuable.

But there is one thing code first really gets right. It lives comfortably in Git, which means it branches by feature, merges back cleanly, and fits perfectly into existing developer workflows. That’s definitely a plus.

Both approaches work and have a place. So where’s the problem? Why the heated debates? Why do I treat it as a false dichotomy?

We must understand what these approaches actually do. Code generation and DB sync are tools. But each approach also carries an ideology bigger than hitting a script at startup.

It comes down to the source of the Truth.

Yes. That is the only real difference. One approach declares the Truth lives in code. The other declares the Truth lives in the database. Not “truth”. The Truth. When structures drift, both sides can be “correct”. The database doesn’t lie. Code doesn’t lie. The mismatch is a coordination problem.

If you prefer DB truth you assume code is the heretic that must be watched so it does not stray from the righteous path. If you see the Truth in code, you expect the database to follow its scripture. Either way, someone plays the role of the high priest.

But my beloved DB-first fails when you have multiple environments. Production, staging, dev, CI, feature locals, and so on. Each environment can claim it holds the Truth. Code-first has a symmetric flaw. It fails when multiple services share a single database. When multiple services run across multiple environments, good luck keeping anything consistent.

It’s not that we haven’t learned how to deal with these problems. We do, and quite well. As the saying goes, if a paradigm doesn’t work, you just haven’t added enough dirty hacks yet. In the db-first world, that means complex environment synchronizers, migration histories, and a lot more. Not exactly simple.

The debate between code-first and db-first supporters is fascinating and interesting. But third option people forget exists. Pull the source of truth out of both places and place it in a separate entity. Not code. Not DB. Something else.

That is the model-first approach.

Think of it like relational modeling. For 1:N relationships the referencing ID goes in the “many” table. For M:N you create a join table. Separated. Similarly, to avoid the coordination problems above you separate the authoritative definition of structure into a standalone artifact. That artifact must be language and framework agnostic. Then multiple environments and multiple services can reference the same model without arguing about who is “right.”

Some .NET users recall older Entity Framework flavors with a model-first workflow using .edmx files. That is model-first in spirit. But the cake is a lie. That tooling lived inside Visual Studio. The same Visual Studio that expired on macOS a year ago and never arrived on Linux. What if another project needs the same model? What if you want to use Dapper, a custom mapper, or a Java service that reads the same tables? What if a Django team touches the same DB? The .edmx model is not a shared contract. It is glued to a single stack.

Even if all your services are written in C# with EF, there is still no "galvanic decoupling" between the model and the code structure. All projects using this model are forced to include the full set of entities, even though they don’t need most of them.

A real model-first system must be independent of language and framework. Tools must be cross-platform. That’s the only way to share it across teams. The model must be a written contract of how data is shaped. Put it in Git. Version it. Diff it. Branch it. Merge it. Discuss it in PRs. Share it in meetings. That contract is the Truth you can point to.

Three years ago, I was looking for tools that would let teams work with a model without pain or suffering.

aaaand nothing

The paradigm existed, but the tools did not. As they say, the right tool simply wasn’t there. I started to understand why this approach wasn’t popular.

The closest match to my requirements was a piece of German software, but it was mostly geared toward ER diagrams rather than serious projects. I wanted more.

By the way, the software was quite expensive. Around $250 per developer per year, and it’s even more expensive now. The free version couldn’t handle model files. Just the entry ticket alone kills the model-first approach. And there’s nothing better. Please invent the wheel yourself. But make sure the GUI is there, RDBMS ideology is fully supported with all its nuances, database structure synchronization is advanced, and code generation works. Even at first glance, it doesn’t look simple.

Two years ago I started writing a database model editor. Not another quick ER toy. I wanted a tool that focuses on real project workflows. I wanted language-agnostic models, strong support for RDBMS nuances, advanced DB synchronization, and code generators that do not lock your choice of mapper or framework. It’s really simple. I could prototype something in a couple of months, isn’t it? Slightly more complex than an AI-powered everything startup, but less profitable.

Prototyping was not trivial. What looked like a short project exposed a pile of subtle issues. After two years I reached the core ideology and a usable product. I call the result OrmFactory. The initial milestone is an MVP suitable for real development. I’m looking for advanced developers to try it in real projects and provide feedback to improve the tool.

A useful example from recent work is a feature I call "Named Rows". Named Rows bring semantically meaningful table data (not just structure), into the code generator. You can reference specific table rows by name in generated code. This solves a family of problems about environment differences and language differences. It also addresses the psychological hurdle: “Can I trust a third-party tool to change my DB safely?” My implementation includes a galvanic decoupling between the authoritative model and the code artifacts that represent it. That decoupling reduces surprise for DB admins and developers across stacks.

The product story deserves a separate post. Today my aim was to argue for model-first as a sane alternative. I want model-first to be accessible, not a gated feature of expensive vendor tools. That is the goal behind OrmFactory.

May the Truth be with you.

DEV Community

Why Model-First and where the Truth lives

Top comments (0)