Part 1 - MySQL vs PostgreSQL: Connection Architecture

#mysql #database #postgres #webdev

Part 1 - MySQL vs PostgreSQL: Connection Architecture

MySQL and PostgreSQL take fundamentally different approaches to handling client connections. MySQL uses a thread-based architecture where all connections share a single process, while PostgreSQL uses a process-based architecture where each connection gets its own dedicated process. This architectural difference has cascading effects on memory usage, connection scalability, performance characteristics, and why connection pooling is optional for MySQL but essential for PostgreSQL in production environments.

1. MySQL: The Thread-Based Approach

Reference: https://dev.mysql.com/blog-archive/the-new-mysql-thread-pool/

MySQL uses a single process with multiple threads to handle all connections. Think of it like one big house where everyone shares the same kitchen, living room, and resources. It's memory efficient because all threads live in the same space, which means you can handle thousands of connections without breaking the bank.
Here's how MySQL handles things:

The Flow: When clients (your app, CLI tools, or any API using the MySQL client-server protocol) send connection requests to MySQL, the system follows a sophisticated thread pooling mechanism:
- A Receiver Thread acts as the gatekeeper, queuing up incoming connections and managing the initial handshake process
- It processes them one by one, assigning each to a Thread Group in round-robin fashion to ensure balanced load distribution across available worker threads
- Query Worker Threads inside that Thread Group actually execute your queries, handling everything from parsing to execution
- Each connection gets its own THD (a thread context data structure that tracks connection state, session variables, transaction state, and query metadata)
One Big Process: There's a single main mysqld process running everything, which means all database operations, from query execution to buffer management, happen within this unified process space
Threads Everywhere: Each connection is just a thread in that process, making connection creation extremely lightweight since it doesn't require forking new processes or copying memory spaces
Shared Memory: All threads hang out in the same memory space and share resources like the buffer pool, query cache, and table metadata, enabling efficient resource utilization

The Good Stuff:

Memory Friendly: Threads share memory, so you can have thousands of connections without breaking the bank
Lightning Fast Connections: Creating a thread is super quick
Shared Cache Benefits: Everyone gets to use the same buffer pool and cache
Works on Tight Budgets: Perfect when you don't have tons of RAM
Handles the Crowd: Great for dealing with lots of concurrent connections

The Not-So-Good:

All Eggs in One Basket: If the main process goes down, everything dies
Not Much Separation: One bad connection can mess with the others
Shared Memory Risks: If memory gets corrupted, it affects everyone
Less Protection: Threads aren't as isolated as separate processes
Thread Contention: Under heavy load, thread synchronization overhead can impact performance

2. PostgreSQL: The Process-Based Approach

Reference: https://medium.com/@hnasr/postgresql-process-architecture-f21e16459907

PostgreSQL takes a completely different approach. PostgreSQL uses separate processes for each connection. Think of it like a neighborhood where each family has their own house with their own resources. Each connection is completely isolated from the others, which provides better stability and security, but it comes at the cost of higher memory usage since each process needs its own space.

The Flow: When a client connects to PostgreSQL, the system follows a fork-based process creation model:
- The Postmaster (main supervisor process) listens for incoming connections on the configured port (default 5432) and acts as the primary coordinator for all database activity
- When a request arrives, Postmaster authenticates it by validating credentials and checking access permissions before allowing the connection to proceed
- Once authenticated, it uses the Unix fork() system call to create a brand new Backend Process dedicated exclusively to that connection, complete with its own memory space and execution context
- This Backend Process handles all queries from that client until disconnect, maintaining session state, transaction context, and query execution buffers independently from all other connections
Memory Architecture: PostgreSQL's memory model is divided into two distinct areas to balance isolation with efficiency:
- Private Memory: Each Backend Process gets its own isolated memory space (~2-5MB base, can grow based on workload) for query execution, session state, temporary tables, sort operations, and connection-specific buffers, ensuring complete isolation from other connections
- Shared Memory: All processes share a common area for caching data pages (shared_buffers), Write-Ahead Log (WAL) buffers, lock tables, and coordination structures, allowing efficient data sharing while maintaining process isolation
Background Processes: PostgreSQL also runs essential helper processes like WAL Writer (for transaction logging), Checkpointer (for flushing dirty buffers), Autovacuum Workers (for cleaning up dead tuples), and Stats Collector (for gathering query statistics) to keep things running smoothly without impacting user connections

The Good Stuff:

Total Isolation: Each connection is its own thing, completely separate
More Stable: If one process crashes, the others keep chugging along
Protected Memory: Each process has its own memory sandbox
Extra Security: OS-level process isolation is pretty solid

The Not-So-Good:

Memory Hog: Each process uses ~2-5MB base memory per connection, which can grow based on workload complexity
Slower Startup: Forking a new process takes longer than spinning up a thread
Connection Limits: Can't handle as many connections at once
Memory Gets Messy: As processes grow, memory gets fragmented
Needs Connection Pooling: Pretty much required for production (more on this below)
IPC Overhead: Inter-process communication is slower than inter-thread communication

3. Connection Pooling: Why PostgreSQL Really Needs It

Here's the deal: Because of how PostgreSQL is built, you basically have to use connection pooling in production. MySQL can handle more direct connections than PostgreSQL thanks to its thread-based architecture, but connection pooling is still highly recommended for production environments to minimize connection overhead (authentication, handshake costs) and improve overall performance.

Why PostgreSQL Is Basically Begging for Connection Pooling:

Memory Gets Expensive: Each connection = 1 whole OS process using ~2-5MB base memory (can grow with workload)
Slow Connections: Forking processes is way slower than spinning up threads
You'll Hit a Wall: You're limited by how much memory you have and how many processes your system allows
Memory Gets Wasted: Over time, memory gets fragmented and inefficient

Two Ways to Pool: Proxy vs Application-Level

Let me break down the two main approaches:

Application-level pooling is baked right into your app's code. You use a library or framework feature that creates and manages a pool of connections when your app starts up. It's like having your own personal stash of database connections.

sqlDB, err := db.DB()
// SetMaxIdleConns sets the maximum number of connections in the idle connection pool.
sqlDB.SetMaxIdleConns(10)
// SetMaxOpenConns sets the maximum number of open connections to the database.
sqlDB.SetMaxOpenConns(100)
// SetConnMaxLifetime sets the maximum amount of time a connection may be reused.
sqlDB.SetConnMaxLifetime(time.Hour)

On the other hand, Proxy-level pooling is like having a middleman. You set up a separate service (like PgBouncer) that sits between your app and the database. Your app talks to the proxy, and the proxy manages a pool of real database connections. When you need to do something, the proxy hands you a connection that's already warmed up and ready to go.

Proxy Pool Modes:

Session Mode: Safest and most compatible option that supports all PostgreSQL features including prepared statements, cursors, advisory locks, and session-level settings - a client connection is mapped to a server connection for the entire session duration, just like connecting directly to PostgreSQL
Transaction Mode: Recommended for most web applications and REST APIs because it releases the server connection back to the pool after each transaction commits or rolls back, providing excellent connection reuse while supporting most common use cases (note: older PgBouncer versions don't support prepared statements in this mode, but PgBouncer 1.21.0+ added support via max_prepared_statements parameter; still doesn't support cursors or session-level features across transactions)
Statement Mode: Highest throughput and most aggressive pooling that returns connections to the pool after every single SQL statement, maximizing connection reuse but with significant restrictions - doesn't support multi-statement transactions, prepared statements, or any session state, making it suitable only for very specific simple read-only workloads

Comparison between two approaches

Factor	Proxy-Level (PgBouncer)	Application-Level (GORM)
Where It Lives	Between app and database	Inside your application
Who It Helps	All your applications	Just one application
How You Set It Up	One central config	Configure each app
Memory Cost	Barely touches your app	Uses your app's memory
Connection Control	Controls everything globally	Each app sets its own limits
When Things Break	Built-in failover	You handle it yourself
Watching Metrics	See everything in one place	Per-app metrics
How You Deploy It	Separate service to run	Just part of your app

The Smart Move: Use Both

Honestly? Do both for maximum efficiency and reliability:

Use PgBouncer to manage connections globally across all your services, providing a centralized connection pool that prevents any single application from overwhelming the database and allows you to monitor and control all database access from one place
Use GORM (or whatever your framework offers) for app-specific tuning, allowing each application to optimize its connection behavior based on its specific workload patterns, request rate, and performance requirements without affecting other services
This layered approach gives you redundancy and flexibility—best of both worlds—where PgBouncer provides the critical last line of defense against connection exhaustion while application-level pools optimize for each service's unique needs and can fail gracefully if PgBouncer encounters issues

4. Key Takeaways: Side-by-Side Comparison

Factor	MySQL (Thread-Based)	PostgreSQL (Process-Based)
Memory per Connection	Very low (threads share memory)	High (each process needs own memory)
Max Connections	Very high with thread pool	Limited without pooling
Connection Speed	Fast (thread creation)	Slower (process forking)
Connection Pooling	Recommended (handles direct connections better)	Required in production
Crash Impact	Entire server goes down	Only affected connection fails
Process Isolation	Shared memory (lower isolation)	OS-level (strong isolation)
Best For	High connection count, simple queries, serverless	Complex queries, write-heavy, advanced features
Infrastructure	Simple, works out-of-the-box	Needs PgBouncer/PgPool setup

Bottom Line: MySQL's thread model is more forgiving and handles high connection counts easily. PostgreSQL's process model provides better isolation but requires connection pooling in production. Both are excellent databases—choose based on your connection patterns and operational requirements.