Faizan Firdousi

Posted on Jan 15

Understanding Database Models: DDIA Chapter 2

#programming #devops #distributedsystems #database

This chapter from Designing Data-Intensive Applications explores different database models and strategies for structuring application data. Understanding these concepts is crucial for making informed architectural decisions in modern software development.

by the way,Read chapter 1 here

How Applications Store Data

From an application developer's perspective, the data flow works like this: collect real-world data, model it using data structures and APIs, then store it in a database. The database might use JSON/XML documents, relational tables, or graph models, all ultimately stored as bytes on disk.

While we'll explore various data models, the relational model has proven most resilient and remains widely used. Modern applications often combine relational databases with non-relational databases (polyglot persistence) to meet specific requirements.

NoSQL vs SQL: Understanding the Trade-offs

SQL databases are familiar to most developers, but NoSQL offers compelling advantages:

Higher scalability and write throughput for large datasets
More open-source options available
Support for queries not possible in relational databases
Less restrictive schemas
Better performance for certain use cases, matching application data structures more closely

The Object-Relational Impedance Mismatch

Most application code uses object-oriented programming, representing data as objects. However, relational databases store data in tables, rows, and columns. This creates an awkward translation layer between code and database the impedance mismatch.

Object-Relational Mappers (ORMs) help by eliminating manual SQL queries, but they mostly provide abstraction rather than truly solving the underlying problem.

When Document Models Excel

A document model is a NoSQL approach that stores data as flexible, semi-structured documents (JSON/XML) rather than rigid rows and columns. MongoDB is a popular example.

One-to-Many Relationships

Consider a LinkedIn profile where one user has multiple job positions. This one-to-many relationship was challenging for early SQL databases not designed to store multiple values per row. People created separate tables for positions, though modern SQL databases now support multi-valued fields.

Document models handle this elegantly because JSON provides better locality than multi-table schemas. Instead of multiple queries across tables, you retrieve all relevant data in a single query.

Many-to-One Relationships and Normalization

When storing user inputs like city, organization, or college, use ID references instead of raw strings (e.g., org_id = 5 instead of org = "Microsoft").

This approach:

Models many-to-one relationships naturally (many users, one organization)
Enables clean data validation through dropdowns
Eliminates ambiguity from duplicate names
Improves maintainability update once, reflect everywhere
Allows centralized changes without touching thousands of records

For deeper exploration of normalization and schema design:
Read my detailed blog on database normalization

The JOIN Challenge

Joins combine fields from different tables in relational databases. While straightforward in SQL, document models have weak or nonexistent join support. Without database-level joins, you must emulate them in application code creating performance overhead and shifting complexity from the storage layer to your application.

Evolution of Database Models

Hierarchical Model

The first database model, used in IBM's IMS, employed a tree structure with single parents ideal for one-to-many relationships.

Limitations:

No many-to-many relationship support
Difficult denormalization decisions

Network Model

Created to solve hierarchical limitations, the network model allowed multiple parents, supporting many-to-one and many-to-many relationships.

The fatal flaw: Records connected like programming pointers through "access paths." These access paths made querying and updating extremely complex, negating the benefit of multiple relationship support.

Relational Model

The relational model succeeded because it:

Stores data in simple tables (rows and columns)
Eliminates complicated access paths
Supports multiple relationships elegantly
Allows inserting rows without foreign key concerns

The game-changer: query optimizers that make writing efficient queries straightforward.

Imperative vs Declarative Queries

Imperative code tells the computer every step: loop through lists, check conditions, push to arrays like giving turn-by-turn directions.

Declarative queries (like SQL) specify what you want, not how to get it:

SELECT * FROM animals WHERE family = 'Sharks'

The database determines optimal execution. This is powerful because query optimizers improve performance behind the scenes without code changes.

The surprising benefit: declarative languages are naturally parallelizable. Since you don't dictate execution order, databases can freely split work across multiple cores.

Key Takeaways

Relational databases remain dominant but work well alongside NoSQL for specific use cases
Document models excel at one-to-many relationships with better data locality
Normalization using IDs prevents data duplication and eases maintenance
Declarative queries enable database-level optimization and parallelization
Understanding historical models (hierarchical, network) helps appreciate modern solutions

DEV Community