In Week 2, I shifted focus from high-level scalability concepts to how data is actually stored, retrieved, and optimized in real systems.
The main goals this week were to:
- Build intuition around storage engines and access patterns
- Understand SQL vs NoSQL trade-offs
- Learn where indexes and caching fit into system design
- Apply these ideas by designing and implementing a User Profile Service
Compared to Week 1, this week went deeper into backend fundamentals—and required slowing down to truly understand what happens under the hood.
Key Learnings from Designing Data-Intensive Applications — Chapter 3
Chapter 3 (Storage & Retrieval) was significantly more challenging than earlier chapters—but also much more rewarding.
Storage Engines: Why Data Layout Matters
One of the biggest takeaways was that databases are not magic. They rely on concrete data structures that directly impact performance.
Two broad approaches stood out:
-
Log-structured storage
- Writes are fast (append-only)
- Reads require compaction and indexes
-
B-tree–based storage
- Optimized for reads
- Supports range queries efficiently
- Common in relational databases (e.g., PostgreSQL)
This helped me understand why certain databases excel at specific workloads.
The way data is written on disk determines what queries are fast—or painfully slow.
Indexes: Trading Space for Speed
Indexes are one of the most important tools for performance, but they are not free.
Key insights:
- An index is a separate data structure (not just “extra metadata”)
- Indexes speed up reads but:
- Slow down writes
- Increase storage usage
- Indexes should reflect real query patterns, not hypothetical ones
This clarified a common misconception I had earlier:
Indexes don’t automatically exist for every field—you must design them intentionally.
SQL vs NoSQL: It’s About Trade-offs, Not Trends
Instead of “SQL vs NoSQL”, the chapter reframed the question as:
What guarantees does your system need?
Key comparisons:
| Aspect | SQL | NoSQL |
|---|---|---|
| Schema | Strong, enforced | Flexible |
| Transactions | Strong consistency | Often eventual |
| Scaling | Harder (sharding) | Easier horizontally |
| Querying | Powerful | Limited / specialized |
This made it clear why relational databases are still a strong default choice, especially for systems that require correctness and structured data.
Applying the Concepts: User Profile Service
To apply these ideas, I designed and implemented a User Profile Service—a common real-world backend component.
Functional Requirements
- Create user profiles
- Retrieve user information
- Update user details
- Delete users
Non-Functional Requirements
- Low latency (read-heavy system)
- High availability
- Scalability (growing user base)
- Fault tolerance
- Optional analytics
Scale Assumptions
To guide design decisions, I assumed:
- Users created per day: ~10,000
- Reads per day: ~1,000,000
- Read/write ratio: ~100:1
This clearly indicates a read-heavy system, making caching and indexing critical.
High-Level Architecture
Figure 1: High-level flows for create, read, update, delete, and analytics.
The system consists of:
- Stateless API servers behind a load balancer
- PostgreSQL as the primary data store
- Redis for caching user profiles
- Optional analytics endpoints
Reads are optimized through caching, while writes remain strongly consistent.
Data Modeling & Indexing
The core data model includes:
-
userId(primary key) - User profile fields (name, address, metadata)
- Timestamps for creation and updates
Where Indexes Fit
- The primary key index enables fast lookup by user ID
- Additional indexes are unnecessary unless new access patterns emerge (e.g., search by email)
This aligns with the lesson from DDIA:
Indexes should be driven by real queries—not assumptions.
Caching Strategy
A key design decision was what to cache.
Chosen Approach: Cache the Entire User Object
- Cache key:
userId - Cache value: full user profile
Why this works well:
- User profiles are relatively small
- Read requests usually need the full object
- Simplifies cache logic
Cache Invalidation
- On update → invalidate cache
- On delete → invalidate cache
- Cache misses fall back to the database
This follows the cache-aside pattern, which is simple and robust.
Failure Considerations
The system is designed so that:
- If Redis fails → database still serves reads (slower, but correct)
- If an API instance crashes → load balancer routes traffic elsewhere
- If traffic spikes → stateless services scale horizontally
This reinforced an important lesson:
Cache is an optimization, not a dependency.
Implementation Notes
The service was implemented using:
- Kotlin + Spring Boot
- PostgreSQL
- Redis
- Stateless REST APIs
The focus was not on framework features, but on mapping system design decisions into real code.
What’s Next — Week 3
- Replication & consistency
- Read replicas vs primary databases
- Designing systems that survive partial failures
The journey continues 🚀
Top comments (0)