This chapter from Designing Data-Intensive Applications explores different database models and strategies for structuring application data. Understanding these concepts is crucial for making informed architectural decisions in modern software development.
by the way,Read chapter 1 here
How Applications Store Data
From an application developer's perspective, the data flow works like this: collect real-world data, model it using data structures and APIs, then store it in a database. The database might use JSON/XML documents, relational tables, or graph models, all ultimately stored as bytes on disk.
While we'll explore various data models, the relational model has proven most resilient and remains widely used. Modern applications often combine relational databases with non-relational databases (polyglot persistence) to meet specific requirements.
NoSQL vs SQL: Understanding the Trade-offs
SQL databases are familiar to most developers, but NoSQL offers compelling advantages:
- Higher scalability and write throughput for large datasets
- More open-source options available
- Support for queries not possible in relational databases
- Less restrictive schemas
- Better performance for certain use cases, matching application data structures more closely
The Object-Relational Impedance Mismatch
Most application code uses object-oriented programming, representing data as objects. However, relational databases store data in tables, rows, and columns. This creates an awkward translation layer between code and database the impedance mismatch.
Object-Relational Mappers (ORMs) help by eliminating manual SQL queries, but they mostly provide abstraction rather than truly solving the underlying problem.
When Document Models Excel
A document model is a NoSQL approach that stores data as flexible, semi-structured documents (JSON/XML) rather than rigid rows and columns. MongoDB is a popular example.
One-to-Many Relationships
Consider a LinkedIn profile where one user has multiple job positions. This one-to-many relationship was challenging for early SQL databases not designed to store multiple values per row. People created separate tables for positions, though modern SQL databases now support multi-valued fields.
Document models handle this elegantly because JSON provides better locality than multi-table schemas. Instead of multiple queries across tables, you retrieve all relevant data in a single query.
Many-to-One Relationships and Normalization
When storing user inputs like city, organization, or college, use ID references instead of raw strings (e.g., org_id = 5 instead of org = "Microsoft").
This approach:
- Models many-to-one relationships naturally (many users, one organization)
- Enables clean data validation through dropdowns
- Eliminates ambiguity from duplicate names
- Improves maintainability update once, reflect everywhere
- Allows centralized changes without touching thousands of records
For deeper exploration of normalization and schema design:
Read my detailed blog on database normalization
The JOIN Challenge
Joins combine fields from different tables in relational databases. While straightforward in SQL, document models have weak or nonexistent join support. Without database-level joins, you must emulate them in application code creating performance overhead and shifting complexity from the storage layer to your application.
Evolution of Database Models
Hierarchical Model
The first database model, used in IBM's IMS, employed a tree structure with single parents ideal for one-to-many relationships.
Limitations:
- No many-to-many relationship support
- Difficult denormalization decisions
Network Model
Created to solve hierarchical limitations, the network model allowed multiple parents, supporting many-to-one and many-to-many relationships.
The fatal flaw: Records connected like programming pointers through "access paths." These access paths made querying and updating extremely complex, negating the benefit of multiple relationship support.
Relational Model
The relational model succeeded because it:
- Stores data in simple tables (rows and columns)
- Eliminates complicated access paths
- Supports multiple relationships elegantly
- Allows inserting rows without foreign key concerns
The game-changer: query optimizers that make writing efficient queries straightforward.
Imperative vs Declarative Queries
Imperative code tells the computer every step: loop through lists, check conditions, push to arrays like giving turn-by-turn directions.
Declarative queries (like SQL) specify what you want, not how to get it:
SELECT * FROM animals WHERE family = 'Sharks'
The database determines optimal execution. This is powerful because query optimizers improve performance behind the scenes without code changes.
The surprising benefit: declarative languages are naturally parallelizable. Since you don't dictate execution order, databases can freely split work across multiple cores.
Key Takeaways
- Relational databases remain dominant but work well alongside NoSQL for specific use cases
- Document models excel at one-to-many relationships with better data locality
- Normalization using IDs prevents data duplication and eases maintenance
- Declarative queries enable database-level optimization and parallelization
- Understanding historical models (hierarchical, network) helps appreciate modern solutions
Top comments (0)