A systematic guide to building robust, scalable database architectures
Introduction
When I review production systems, I consistently observe that the data persistence layer serves as both the foundation and potential bottleneck of application architecture. The difference between a well-architected data layer and a hastily constructed one becomes evident under load, during schema evolution, or when debugging transaction anomalies at 2 AM.
This guide documents proven patterns for transforming domain models into production-ready repository implementations. We'll examine the Repository pattern, CQRS application to database architecture, ORM mapping strategies, migration workflows, transaction handling, and connection pool configuration—all grounded in official documentation and battle-tested practices.
Architecture Overview
Each layer depends only on layers below, enabling isolated testing and independent evolution.
The Repository Pattern: Mediating Between Domains and Data
Pattern Definition and Purpose
According to Martin Fowler's canonical definition in Patterns of Enterprise Application Architecture, a Repository "mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects." This abstraction serves three critical purposes:
- Isolation : Domain logic remains unaware of persistence mechanisms
- Testability : Repository interfaces enable straightforward mocking
- Flexibility : Implementation details can evolve without affecting consumers
The Repository pattern differs fundamentally from direct ORM usage. While an ORM provides entity-level CRUD operations, a Repository offers domain-centric query methods that express business intent.
TypeORM Repository Implementation
TypeORM supports both Active Record and Data Mapper patterns, with repositories naturally aligning with the Data Mapper approach. Each entity receives its own repository, handling operations specific to that entity type.
Basic Repository Structure
// src/domain/entities/User.ts
import { Entity, PrimaryGeneratedColumn, Column, Index } from 'typeorm';
@Entity('users')
@Index(['email'], { unique: true })
export class User {
@PrimaryGeneratedColumn('uuid')
id: string;
@Column({ type: 'varchar', length: 255 })
email: string;
@Column({ type: 'varchar', length: 255 })
name: string;
@Column({ type: 'timestamp', default: () => 'CURRENT_TIMESTAMP' })
createdAt: Date;
@Column({ type: 'timestamp', nullable: true })
lastLoginAt: Date | null;
@Column({ type: 'boolean', default: true })
isActive: boolean;
}
// src/infrastructure/repositories/UserRepository.ts
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { User } from '../../domain/entities/User';
@Injectable()
export class UserRepository {
constructor(
@InjectRepository(User)
private readonly repository: Repository<User>,
) {}
async findByEmail(email: string): Promise<User | null> {
return this.repository.findOne({
where: { email }
});
}
async findActiveUsers(): Promise<User[]> {
return this.repository.find({
where: { isActive: true },
order: { createdAt: 'DESC' },
});
}
async updateLastLogin(userId: string): Promise<void> {
await this.repository.update(
{ id: userId },
{ lastLoginAt: new Date() }
);
}
async save(user: User): Promise<User> {
return this.repository.save(user);
}
async countActiveUsers(): Promise<number> {
return this.repository.count({
where: { isActive: true },
});
}
}
This implementation demonstrates several key principles:
-
Domain-specific methods :
findActiveUsers()
andupdateLastLogin()
express business operations - Type safety : TypeScript ensures compile-time validation of entity properties
- Separation of concerns : The repository encapsulates query logic away from domain entities
TypeORM's repository provides foundational methods (find, save, update, delete), while custom repository classes add domain-specific query methods. This dual-layer approach balances flexibility with convenience.
CQRS: Segregating Read and Write Responsibilities
Pattern Overview and Applicability
Command Query Responsibility Segregation (CQRS) separates read operations from write operations using distinct models. This segregation enables independent optimization of each workload—a particularly valuable characteristic in systems with asymmetric read/write patterns.
Critical guidance from Martin Fowler : "CQRS should only be used on specific portions of a system (a BoundedContext in DDD terminology) and not the system as a whole. In particular, I've run into cases where CQRS has gotten a software system into serious difficulties."
CQRS Data Flow
Database-Level CQRS Implementation
Microsoft Azure's architecture documentation outlines several approaches to CQRS database separation:
- Single database with read replicas : PostgreSQL read replicas handle queries while primary handles commands
- Separate logical databases : Different schema optimizations for read vs. write workloads
- Heterogeneous stores : Relational database for writes, document store for reads
The third approach proves particularly effective when read patterns differ substantially from write patterns. Consider an e-commerce system:
- Write model : Normalized PostgreSQL schema ensuring referential integrity
- Read model : Denormalized MongoDB documents optimized for product catalog queries
Synchronization Strategies
AWS Prescriptive Guidance identifies two primary synchronization approaches:
Synchronous (Strong Consistency):
- Database-level replication (PostgreSQL streaming replication)
- Dual writes within distributed transactions
- Trade-off: Lower availability, higher write latency
Asynchronous (Eventual Consistency):
- Event-driven synchronization via message queue
- Change Data Capture (CDC) using tools like Debezium
- Trade-off: Temporary inconsistency window, higher complexity
For most applications, eventual consistency with asynchronous synchronization provides optimal balance. The key implementation requirement: robust event publishing from the write model.
// src/application/commands/CreateOrderCommand.ts
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { EventBus } from '../events/EventBus';
import { Order } from '../../domain/entities/Order';
import { OrderCreatedEvent } from '../events/OrderCreatedEvent';
@Injectable()
export class CreateOrderCommandHandler {
constructor(
@InjectRepository(Order)
private readonly orderRepository: Repository<Order>,
private readonly eventBus: EventBus,
) {}
async execute(command: CreateOrderCommand): Promise<void> {
// Write to normalized command database
const order = this.orderRepository.create({
userId: command.userId,
items: command.items,
totalAmount: command.totalAmount,
status: 'pending',
});
await this.orderRepository.save(order);
// Publish event for read model synchronization
await this.eventBus.publish(
new OrderCreatedEvent(order.id, order.userId, order.totalAmount)
);
}
}
The EventBus handles asynchronous delivery to read model update handlers, enabling the query database to maintain its denormalized view of order data.
ORM Mapping Strategies: Translating Inheritance to Tables
The Three Primary Strategies
When domain models utilize inheritance, ORMs must map class hierarchies to relational schemas. The official documentation for Hibernate, Doctrine, and SQLAlchemy all describe three fundamental strategies:
Inheritance Mapping Strategies
1. Single Table Inheritance (STI)
All classes in the hierarchy map to one table with a discriminator column indicating the concrete type.
Advantages :
- Simple schema with excellent query performance
- No joins required for polymorphic queries
- Simple to implement and understand
Disadvantages :
- Sparse columns for subclass-specific properties (NULL values)
- Table width grows with hierarchy complexity
- Potential for data integrity issues
2. Joined Table Inheritance (JTI)
Base class and each subclass receive separate tables. Subclass tables foreign-key reference the base table.
Advantages :
- Normalized schema minimizes redundancy
- Clear separation of base and subclass properties
- Type-safe schema enforcement
Disadvantages :
- Joins required for subclass queries (performance impact)
- More complex schema to maintain
- Insert operations span multiple tables
3. Table-Per-Concrete-Class (TPC)
Each concrete class receives its own table containing all properties, including inherited ones.
Advantages :
- No joins for concrete type queries
- Each table fully describes its entity
- Good performance for single-type queries
Disadvantages :
- Denormalized schema duplicates inherited columns
- Polymorphic queries require UNION operations
- Schema changes to base class ripple across all tables
TypeORM Implementation Example
TypeORM supports Single Table and Joined Table strategies. Here's a Joined Table implementation:
// src/domain/entities/Content.ts
import { Entity, PrimaryGeneratedColumn, Column, TableInheritance } from 'typeorm';
@Entity()
@TableInheritance({ column: { type: 'varchar', name: 'type' } })
export abstract class Content {
@PrimaryGeneratedColumn('uuid')
id: string;
@Column({ type: 'varchar', length: 500 })
title: string;
@Column({ type: 'text' })
description: string;
@Column({ type: 'timestamp', default: () => 'CURRENT_TIMESTAMP' })
createdAt: Date;
}
@Entity()
export class Article extends Content {
@Column({ type: 'text' })
body: string;
@Column({ type: 'varchar', length: 255 })
author: string;
@Column({ type: 'int', default: 0 })
readCount: number;
}
@Entity()
export class Video extends Content {
@Column({ type: 'varchar', length: 500 })
videoUrl: string;
@Column({ type: 'int' })
durationSeconds: number;
@Column({ type: 'varchar', length: 100, nullable: true })
resolution: string | null;
}
This Joined Table approach creates three tables:
-
content
: Base properties (id, title, description, createdAt, type) -
article
: Subclass properties (body, author, readCount) with FK to content -
video
: Subclass properties (videoUrl, durationSeconds, resolution) with FK to content
The discriminator column 'type' enables polymorphic queries while maintaining normalized schemas.
Migration Best Practices: Schema Evolution Under Version Control
Why Migrations Over Synchronization
TypeORM's synchronize: true
option automatically aligns database schemas with entity definitions—a convenient feature for development. However, as the official TypeORM documentation states: "It is unsafe to use synchronize: true for schema synchronization on production once you get data in your database."
Migrations provide version-controlled, auditable schema changes with rollback capability—essential characteristics for production systems.
The Migration Workflow
A 2025 guide to NestJS and TypeORM migrations documents this systematic workflow:
- Entity Definition : Define or modify TypeORM entities
-
Migration Generation : Run
npm run migration:generate -- src/migrations/AddUserLastLoginAt
- Review Generated SQL : Examine the UP and DOWN migration methods
- Version Control : Commit migration file alongside entity changes
- Deployment : Execute migrations before deploying new code
Generated Migration Example
// src/migrations/1696875432123-AddUserLastLoginAt.ts
import { MigrationInterface, QueryRunner } from 'typeorm';
export class AddUserLastLoginAt1696875432123 implements MigrationInterface {
name = 'AddUserLastLoginAt1696875432123';
public async up(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(`
ALTER TABLE "users"
ADD "last_login_at" TIMESTAMP
`);
}
public async down(queryRunner: QueryRunner): Promise<void> {
await queryRunner.query(`
ALTER TABLE "users"
DROP COLUMN "last_login_at"
`);
}
}
Transaction Control in Migrations
TypeORM provides three transaction modes for migrations:
- Default : All migrations run in a single transaction (all-or-nothing deployment)
-
--transaction each
: Each migration runs in its own transaction (partial rollback possible) -
--transaction none
: No transaction wrapping (for operations like CREATE INDEX CONCURRENTLY)
PostgreSQL's CREATE INDEX CONCURRENTLY operation cannot run within a transaction block, necessitating the --transaction none
flag for such migrations.
Migration Tracking and State Management
TypeORM maintains a migrations
table in your database, recording which migrations have executed. This table ensures:
- Idempotency : Migrations run exactly once
- Ordering : Migrations execute in chronological order
- Consistency : All environments converge to identical schemas
The migration table approach, used by Flyway, Liquibase, and most migration frameworks, provides reliable state tracking across environments.
Migration Workflow
Transaction Isolation and ACID Guarantees
PostgreSQL's ACID Implementation
PostgreSQL is ACID-compliant, providing Atomicity, Consistency, Isolation, and Durability guarantees for all transactions. Understanding these properties guides correct transaction usage:
- Atomicity : Transactions are all-or-nothing units of work
- Consistency : Database constraints are enforced across transaction boundaries
- Isolation : Concurrent transactions don't interfere (configurable level)
- Durability : Committed data persists through system failures (via WAL)
PostgreSQL implements durability through Write-Ahead Logging (WAL), where transaction records reach disk before the commit acknowledgment returns.
Isolation Levels and Their Trade-offs
The PostgreSQL official documentation defines four isolation levels, though PostgreSQL implements three:
Read Committed (Default)
Queries see only data committed before the query began. This level prevents dirty reads but allows non-repeatable reads and phantom reads.
Use case : General-purpose isolation for most application transactions
Repeatable Read
Queries see a consistent snapshot from transaction start. This level prevents dirty reads and non-repeatable reads but theoretically allows phantom reads (though PostgreSQL's implementation prevents phantoms as well).
Use case : Reports requiring consistent data across multiple queries
Serializable
Strictest isolation, emulating serial execution of transactions. Prevents all anomalies but may cause serialization failures requiring retry logic.
Use case : Financial transactions requiring absolute consistency
Practical Transaction Handling in TypeORM
// src/infrastructure/services/AccountService.ts
import { Injectable } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { DataSource, Repository } from 'typeorm';
import { Account } from '../../domain/entities/Account';
@Injectable()
export class AccountService {
constructor(
@InjectRepository(Account)
private readonly accountRepository: Repository<Account>,
private readonly dataSource: DataSource,
) {}
async transferFunds(
fromAccountId: string,
toAccountId: string,
amount: number
): Promise<void> {
await this.dataSource.transaction(
'SERIALIZABLE', // Isolation level for financial transactions
async (transactionalEntityManager) => {
// Acquire locks by reading with SELECT FOR UPDATE
const fromAccount = await transactionalEntityManager.findOne(Account, {
where: { id: fromAccountId },
lock: { mode: 'pessimistic_write' },
});
const toAccount = await transactionalEntityManager.findOne(Account, {
where: { id: toAccountId },
lock: { mode: 'pessimistic_write' },
});
if (!fromAccount || !toAccount) {
throw new Error('Account not found');
}
if (fromAccount.balance < amount) {
throw new Error('Insufficient funds');
}
// Perform balance updates
fromAccount.balance -= amount;
toAccount.balance += amount;
await transactionalEntityManager.save(fromAccount);
await transactionalEntityManager.save(toAccount);
}
);
}
}
This implementation demonstrates critical transaction patterns:
- Explicit isolation level : SERIALIZABLE prevents concurrent transfer anomalies
- Pessimistic locking : SELECT FOR UPDATE prevents lost updates
- Atomic operations : All changes commit or roll back together
- Business validation : Insufficient funds check occurs within transaction
PostgreSQL's MVCC (Multi-Version Concurrency Control) system enables these isolation levels without reader-writer blocking in most cases.
Connection Pooling: Scaling Database Access
Why Connection Pooling Matters
PostgreSQL's architecture forks a new process for each connection—an expensive operation for short transactions. Connection pooling amortizes this cost by reusing established connections.
Stack Overflow's engineering blog notes: "Connection pooling is a technique used to reuse database connections, reducing the overhead of establishing new connections for every query."
Pool Sizing: The Mathematical Approach
The authoritative formula for PostgreSQL connection pool sizing comes from the PostgreSQL community:
connections = ((core_count × 2) + effective_spindle_count)
For a 4-core database server with one SSD:
- (4 × 2) + 1 = 9 connections
This formula balances CPU utilization with disk I/O capacity. Setting pools too large leads to context switching overhead; too small causes queuing delays.
PgBouncer: Production-Grade Connection Pooling
PgBouncer serves as the industry-standard connection pooler for PostgreSQL, offering three pooling modes:
Transaction Mode (Recommended):
- Assigns connection for transaction duration
- Returns connection to pool after COMMIT/ROLLBACK
- Enables high connection reuse for short transactions
Session Mode :
- Assigns connection for client session duration
- Required for advisory locks and prepared statements
- Lower connection reuse, higher database load
Statement Mode :
- Assigns connection per statement
- Not compatible with multi-statement transactions
- Highest reuse, most restrictions
PgBouncer Configuration Example
# /etc/pgbouncer/pgbouncer.ini
[databases]
production_db = host=localhost port=5432 dbname=production_db
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
# Pool sizing based on 4-core database server
default_pool_size = 9
max_client_conn = 100
reserve_pool_size = 3
reserve_pool_timeout = 5
# Transaction-level pooling for optimal reuse
pool_mode = transaction
# Connection timeouts
server_idle_timeout = 600
server_lifetime = 3600
server_connect_timeout = 15
# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
Key parameters explained :
-
default_pool_size = 9
: Maximum server connections per user/database pair (based on formula) -
max_client_conn = 100
: Maximum client connections (enabling queuing) -
reserve_pool_size = 3
: Additional connections for reserve pool -
pool_mode = transaction
: Release connection after transaction completion
Scaling Beyond Single PgBouncer
PgBouncer runs as a single-threaded process, utilizing only one CPU core. For high-throughput systems, Crunchy Data documents running multiple PgBouncer instances:
- Multiple PgBouncer processes behind a load balancer
- Each PgBouncer instance with its own pool
- Collective pool size still adheres to the core-count formula
Signs you need multiple PgBouncer instances:
- PgBouncer CPU at 100% while PostgreSQL is under-utilized
- Application query latency increases despite database headroom
Connection Pooling Architecture
Multi-instance PgBouncer setup provides horizontal scaling while respecting database connection limits.
Integration: Building Production-Ready Data Layers
Layered Architecture Pattern
Combining these patterns yields a layered architecture:
- Domain Layer : Pure business entities and interfaces
- Repository Layer : Domain-centric data access abstractions
- ORM Layer : TypeORM entities and migrations
- Connection Layer : PgBouncer pools and database clusters
Each layer depends only on layers below, enabling isolated testing and independent evolution.
Configuration Management
Production systems require environment-specific configuration:
// src/config/database.config.ts
import { TypeOrmModuleOptions } from '@nestjs/typeorm';
import { DataSourceOptions } from 'typeorm';
export const getDatabaseConfig = (): TypeOrmModuleOptions => {
const isProduction = process.env.NODE_ENV === 'production';
return {
type: 'postgres',
host: process.env.DB_HOST || 'localhost',
port: parseInt(process.env.DB_PORT || '5432', 10),
username: process.env.DB_USERNAME,
password: process.env.DB_PASSWORD,
database: process.env.DB_NAME,
// Entity and migration paths
entities: ['dist/**/*.entity.js'],
migrations: ['dist/migrations/*.js'],
// Production-specific settings
synchronize: false, // NEVER use in production
migrationsRun: false, // Run migrations explicitly via CLI
logging: isProduction ? ['error', 'warn'] : true,
// Connection pool settings (application-level)
extra: {
max: 20, // Application pool size
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 10000,
},
// SSL for production
ssl: isProduction ? { rejectUnauthorized: false } : false,
};
};
This configuration demonstrates defense-in-depth:
- Explicit migration control : No automatic schema synchronization
- Connection pooling : Application-level pool before PgBouncer
- Environment-specific logging : Verbose in development, errors in production
- SSL enforcement : Encrypted connections in production
Monitoring and Observability
Production data layers require monitoring at multiple levels:
Database Level :
- Query performance:
pg_stat_statements
extension - Connection counts:
pg_stat_activity
view - Replication lag:
pg_stat_replication
view
Connection Pool Level :
- Pool utilization: PgBouncer SHOW POOLS command
- Queue depth: SHOW CLIENTS output
- Connection wait times: Application-level metrics
Application Level :
- Repository method latencies
- Transaction duration histograms
- Serialization failure counts (for SERIALIZABLE isolation)
Prometheus exporters exist for PostgreSQL and PgBouncer, enabling comprehensive dashboards in Grafana.
Conclusion: Systematic Data Architecture
Building production-ready database architectures requires systematic application of documented patterns. The Repository pattern isolates domain logic from persistence concerns. CQRS enables independent read/write optimization when workload characteristics differ. ORM mapping strategies translate object hierarchies to relational schemas with understood trade-offs. Migrations provide version-controlled schema evolution. Transaction isolation levels balance consistency guarantees against concurrency. Connection pooling scales database access without resource exhaustion.
Each pattern addresses specific architectural concerns. Their combination, guided by official documentation and industry best practices, yields robust data layers that maintain integrity under load, evolve cleanly with requirements, and surface actionable operational metrics.
I recommend beginning with simple Repository implementations backed by TypeORM, adding CQRS only when read/write patterns diverge substantially, selecting mapping strategies based on query patterns, enforcing migration-driven schema changes from project inception, choosing isolation levels matching consistency requirements, and sizing connection pools according to database server resources.
These patterns are documented, tested, and proven. Implement them systematically.
References
: [1] Fowler, M. (2002). "Repository." Patterns of Enterprise Application Architecture. Retrieved from https://martinfowler.com/eaaCatalog/repository.html
: [2] TypeORM. (2024). "Working with Repository." TypeORM Documentation. Retrieved from https://typeorm.io/docs/working-with-entity-manager/working-with-repository/
: [3] TypeORM. (2024). "Repository APIs." TypeORM Documentation. Retrieved from https://typeorm.io/docs/working-with-entity-manager/repository-api/
: [4] Microsoft. (2024). "CQRS Pattern." Azure Architecture Center. Retrieved from https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs
: [5] Fowler, M. (2011). "CQRS." Martin Fowler's Blog. Retrieved from https://martinfowler.com/bliki/CQRS.html
: [6] AWS. (2024). "CQRS Pattern." AWS Prescriptive Guidance. Retrieved from https://docs.aws.amazon.com/prescriptive-guidance/latest/modernization-data-persistence/cqrs-pattern.html
: [7] Doctrine Project. (2024). "Inheritance Mapping." Doctrine ORM Documentation. Retrieved from https://www.doctrine-project.org/projects/doctrine-orm/en/3.5/reference/inheritance-mapping.html
: [8] SQLAlchemy. (2024). "Mapping Class Inheritance Hierarchies." SQLAlchemy 2.0 Documentation. Retrieved from https://docs.sqlalchemy.org/en/20/orm/inheritance.html
: [9] TypeORM. (2024). "Migrations." TypeORM Documentation. Retrieved from https://typeorm.io/docs/advanced-topics/migrations/
: [10] Gunawardena, B. (2025). "NestJS & TypeORM Migrations in 2025." JavaScript in Plain English. Retrieved from https://javascript.plainenglish.io/nestjs-typeorm-migrations-in-2025-50214275ec8d
: [11] Aviator. (2024). "ACID Transactions and Implementation in a PostgreSQL Database." Retrieved from https://www.aviator.co/blog/acid-transactions-postgresql-database/
: [12] PostgreSQL Global Development Group. (2024). "Transaction Isolation." PostgreSQL 18 Documentation. Retrieved from https://www.postgresql.org/docs/current/transaction-iso.html
: [13] ScaleGrid. (2024). "PostgreSQL Connection Pooling: Part 1 - Pros & Cons." Retrieved from https://scalegrid.io/blog/postgresql-connection-pooling-part-1-pros-and-cons/
: [14] Stack Overflow. (2020). "Improve Database Performance with Connection Pooling." Stack Overflow Blog. Retrieved from https://stackoverflow.blog/2020/10/14/improve-database-performance-with-connection-pooling/
: [15] ScaleGrid. (2024). "PostgreSQL Connection Pooling: Part 2 - PgBouncer." Retrieved from https://scalegrid.io/blog/postgresql-connection-pooling-part-2-pgbouncer/
: [16] Crunchy Data. (2024). "Postgres at Scale: Running Multiple PgBouncers." Crunchy Data Blog. Retrieved from https://www.crunchydata.com/blog/postgres-at-scale-running-multiple-pgbouncers
Originally published at kanaeru.ai
Top comments (0)