Data-Driven Architecture

#programming #architecture #objects #layers

The layering problem

Applications often have many layers, such as repositories and ORMs, due to patterns like MVVM, MVC, and the Hexagonal architecture. My main argument is that these architectural decisions, while intended for general use, often feel rigid or contextless, especially for microservices or even monorepos. The art of architecture is about composing components into systems, but this process relies heavily on context and appreciation. Patterns exist as recipes for recurring problems, but when applied without context, they can feel forced and less relevant. Examples from books rarely match your real problem, and frameworks can lock you into abstractions that don’t match your needs.

Here’s my thought: let problems find you first. If they don’t, it’s either because they don’t belong or you don’t recognize them. Reading about them isn’t enough. It’s better to try building with these patterns elsewhere—not at work—unless guided by someone more experienced. Sometimes practice is best, or even necessary, when speed matters. Frameworks exist because they embed valid patterns for specific cases. Companies offer ready-made solutions like ORMs, ERPs, or Next.js; you may use them if patterns fit. Otherwise, question and resist the urge to apply patterns by default. Avoid Resume-Driven-Development traps.

Data-Driven Design

While there is literature about high-level architectural design and modules in “The Philosophy of Software Design” that I find interesting, the simplest and most effective driver I’ve found is data itself. My argument leads to a key idea: everything points toward a data-driven approach. Rather than starting with high-level abstractions, focus on your data. Shape your system by following your data and encoding it using clear, simple structures. This approach will usually simplify both design and tool selection, letting your data's characteristics guide architecture and modeling choices to best suit your use case.

This is DDD at its core, and it also favors deep modules, as the referenced book says. Large modules should fade or be autogenerated. Focus on your domain and API simplicity, which depend on how well you encode data and its relationships. Here, DDD’s data aggregates help: when you need several objects to interact, think of them together as a unit, not just related items. This encourages simple design and cuts down race conditions, as changes to coupled parts remain atomic. Other concepts exist—better explained elsewhere—but everything comes down to business, data, or algorithm invariants. Find these invariants, then encode them. Use an automaton. If you could learn one thing in Computing Science, pick a DFA. It’s a first principle to help avoid over-engineering.

Automata and the Power of Explicit State

Finite Automata excel because they work with data and can process almost any computable input. Bugs often stem from accepting too much or omitting needed data. Protocols frequently use this concept. Though RFCs and implementations differ, starting with a DFA is wise for such problems. It’s the simplest way to define data boundaries and is nearly linear in speed. If your issue is data-related, always think it through with automation first. This clarifies your approach and shows if another data structure is needed. Don’t jump into polished solutions when starting new work, if you can avoid it.

Many think you shouldn’t build from scratch, which makes sense given today’s abundance of libraries. But if data is core to your business, treat it as seriously as outsourcing. Don’t outsource your core. Even without an MBA, I believe in this, and many smarter people agree. Books like Amazon’s “Working Backwards” or Netflix’s “Engineering Culture” strike me as well-meaning novels. Company realities are messier: every company is a ball of controlled technical chaos to keep up with business scale. To fight this, prioritize your principles as a technical lead and let business leaders handle the rest. Simplify early and check complexity with each new code addition.

Object-Oriented Pitfalls and the Relational Model Trap

Object-oriented programming’s downside is its link to the relational model. While not always true, big enterprise problems usually stem from data modeling. Academics start from relational algebra, which differs from category theory. Smalltalk’s background is separate from relational algebra, but it’s misapplied to enterprise systems. Originally, code automated processes, not fixed algorithms—this is a flawed outlook. Treat data as a first-class citizen from the start, not an afterthought. While we write in terms of behavior and output, we shouldn’t shape code solely for data or processes. It’s tricky, but embracing this shift leads to simpler code.

Final recommendations

Below are good starting points for my ideas. I have plenty of reinforcement bias, but I don’t recommend that, or linking to anything that isn’t my interpretation, since people might see things differently. A discussion forum might be better for debate than a blog post. If you’re interested, it’s better to look in books or projects rather than hearing from someone who shares your thoughts—unless it’s just for fun and entertainment, as I do.

Finally, I leave you with some extra advice, from my discussion about the subject with my today’s victim, as lately, ChatGPT or whatever other LLM kind of chat. It turns out that’s the best way to challenge my thoughts nowadays, and it’s worth it for summary or tl;dr sections if you are that kind of guy. But I put it at the end just bc of that. Although I am still looking for perfection on those ideas and principles, if you like, I’m aware that the world is complex enough and you always need to deal with physics one way or another, that means you eventually will face reality and need to talk with external systems and distributed networks, even though you only have a database and OS as a runtime. Because of that, on scale, you will need to think in other layers as well. But that’s the point, if you already have enough layers on top of your system. Why did you shoot yourself in the foot and put more inside your app?

The clean mental model (ChatGPT extracted)

Think in three layers:

Domain (DDD core)
- Strong invariants
- Impossible invalid states
- Explicit transitions
- “What is allowed?”
Application / workflow
- Ordering of steps
- Retries
- Orchestration
- “What happens next?”
Infrastructure
- DBs
- Queues
- APIs

Most bugs happen when:

Layer 2 logic leaks into layer 1
Invariants are not enforced in layer 1

Trying to cross-post here since I always wanted to give back to this community what I had learned from it. Other than that, my Substack looks pretty lonely so far. Check this out in case you found this article interesting (it will be published next week). And this may help me with early feedback to include corrections and examples there.