DEV Community: Pranay Ravi

Stop Running psql Commands by Hand — Build a REST API for PostgreSQL User Management

Pranay Ravi — Fri, 29 May 2026 21:45:12 +0000

If you manage PostgreSQL databases across multiple environments, you've probably done this:

SSH to the DB host (or connect via psql)
Run CREATE USER jsmith CONNECTION LIMIT 20 PASSWORD '...'
Slack the password to the developer
Forget to log it anywhere
Repeat for every environment, every onboarding, every access request

It's tedious, error-prone, and leaves zero audit trail. Here's a better way.

What I Built

pg-user-api is a lightweight Flask REST API that wraps PostgreSQL user provisioning in clean HTTP endpoints. You register your databases once in a SQLite inventory, then any tooling — CI pipelines, internal portals, Ansible playbooks, or a plain curl — can create and manage users across environments without ever touching psql.

GitHub: pcraavi/PostgreSQL-user-creation-API

The Problem It Solves

In teams that span dev, QA, UAT, and prod, you end up with different patterns of users:

App service accounts — named after the host/port combo (web01_8080)
Kubernetes workload accounts — named after env prefix + farm (dv_gearservice)
Individual dev/QA accounts — low connection limits, scoped to non-prod
Read-only analyst accounts — prod only, no DDL
DBA accounts — CREATEDB CREATEROLE LOGIN, rarely provisioned

Each type has different CONNECTION LIMIT values, privilege levels, and naming conventions. Encoding these patterns in an API means the rules are consistent, repeatable, and auditable.

Architecture

The project is intentionally small — five Python files and a requirements list:

pg_user_api/
├── app.py              # Flask app — all endpoints
├── auth.py             # HTTP Basic Auth (constant-time compare)
├── database.py         # SQLite registry + audit log
├── notifications.py    # Notification stubs (Webex / Slack / Email)
├── seed_db.py          # One-time setup: creates DB + sample records
└── requirements.txt

Two credential pairs, clearly separated:

PG_API_USER / PG_API_PASS — who can call this API (your team/tooling)
PG_ADMIN_USER / PG_ADMIN_PASS — the PostgreSQL DBA role that executes DDL

The DBA credentials never appear in API URLs or response bodies. Callers only need the API credentials plus env and database.

SQLite as a config-free registry:

Rather than a static YAML or environment file listing every hostname, databases are registered once in a db_registry table:

Column	Description
`env`	`dev` / `qa` / `uat` / `prod`
`db_name`	PostgreSQL database name
`hostname`	FQDN or IP of the host
`port`	PostgreSQL port (default 5432)
`active`	`1` = active, `0` = skip

Every endpoint looks up the hostname dynamically from this registry. No hardcoded connection strings anywhere in application code.

The Endpoints

All endpoints are GET with query params (by design — simple to curl, simple to call from automation scripts).

GET /                            # Health check, no auth required
GET /api/v1/registry             # List registered databases
GET /api/v1/users/all            # List all PostgreSQL roles on a database
GET /api/v1/users/app            # Create VM/container service account
GET /api/v1/users/app-k8s        # Create Kubernetes workload account
GET /api/v1/users/devqa          # Create individual dev/QA user
GET /api/v1/users/devlead        # Create dev-lead user
GET /api/v1/users/readonly       # Create read-only user
GET /api/v1/users/dba            # Create DBA user (CREATEDB + CREATEROLE)
GET /api/v1/users/reset          # Reset a user's password
GET /api/v1/users/search-path    # Update search_path for a user
GET /api/v1/users/find           # Look up a specific user

Every user-creation endpoint returns the same structured response:

{
  "username": "web01_8080",
  "password": "generatedSecurePassword",
  "status": "user created",
  "hostname": "pg-dev-01.example.com",
  "database": "myapp_dev",
  "port": "5432",
  "env": "dev"
}

Password is returned once at creation time. The API uses secrets.token_urlsafe(16) for generation — no insecure random module.

Running It

1. Clone and install

git clone https://github.com/pcraavi/PostgreSQL-user-creation-API.git
cd PostgreSQL-user-creation-API
pip install -r requirements.txt

2. Seed the registry

python seed_db.py

This creates pg_registry.db with sample entries. Edit SAMPLE_RECORDS in seed_db.py to point at your real PostgreSQL hosts.

3. Set credentials via environment variables

# Windows PowerShell
$env:PG_API_USER   = "pgadmin"
$env:PG_API_PASS   = "Ch@ngeMe2024!"
$env:PG_ADMIN_USER = "role_create"
$env:PG_ADMIN_PASS = "your_pg_password"

# Linux / macOS
export PG_API_USER="pgadmin"
export PG_API_PASS="Ch@ngeMe2024!"
export PG_ADMIN_USER="role_create"
export PG_ADMIN_PASS="your_pg_password"

4. Start

python app.py
# Listening on http://localhost:5000

Example Calls

Create a service account for a VM running on port 8080:

curl -u pgadmin:Ch@ngeMe2024! \
  "http://localhost:5000/api/v1/users/app?env=dev&database=myapp_dev&servername=web01&port=8080"

Creates user web01_8080 with CONNECTION LIMIT 200.

Create a Kubernetes workload account:

curl -u pgadmin:Ch@ngeMe2024! \
  "http://localhost:5000/api/v1/users/app-k8s?env=dev&database=myapp_dev&env_prefix=dv&farmname=gearservice"

Creates user dv_gearservice.

Reset a forgotten password:

curl -u pgadmin:Ch@ngeMe2024! \
  "http://localhost:5000/api/v1/users/reset?env=prod&database=myapp_prod&username=analyst01"

Generates a new secrets.token_urlsafe(16) password and applies it immediately. Returns the new password in the response.

Idempotency

All create endpoints check pg_catalog.pg_roles before issuing CREATE USER. If the role already exists, the API returns "status": "user already exists" and exits cleanly. Safe to call from automation without worrying about duplicate creation errors.

Audit Log

Every operation (create, reset, search_path change) is written to a audit_log table in the same pg_registry.db SQLite file:

sqlite3 pg_registry.db \
  "SELECT * FROM audit_log ORDER BY performed_at DESC LIMIT 10;"

You get a timestamped record of who called what, on which database, with what outcome. Useful for access reviews and incident investigations.

Notification Hooks

notifications.py ships with ready-to-uncomment stubs for Webex Teams, Slack, and email. Wire in your webhook URL or SMTP config, then call send_notification() from any endpoint to push alerts to your team when accounts are created or passwords reset.

Security Notes

Intended for internal / intranet use — put it behind a VPN or API gateway, not on the open internet.
For internet-facing deployments, swap HTTP Basic Auth for JWT or an API key header.
pg_registry.db contains your real hostnames — it's in .gitignore and should stay off version control.
The DBA password never appears in URLs, query strings, or logs.

What I'd Add Next

A few things on the roadmap:

GRANT/REVOKE endpoints — privilege management beyond account creation
Schema-level grants — GRANT SELECT ON ALL TABLES IN SCHEMA patterns
Token-based auth — drop-in replacement for Basic Auth
Docker packaging — docker run for teams that don't want to manage Python deps
Structured audit export — JSON or CSV export of the audit log for compliance workflows

Wrapping Up

If your team provisions PostgreSQL users more than a few times a month, wrapping it in an HTTP interface pays for itself quickly. The audit trail alone is worth it.

The full source is at github.com/pcraavi/PostgreSQL-user-creation-API. It's MIT-licensed — fork it, adapt the user types to your org's naming conventions, and wire in your notification channels.

Questions or suggestions? Drop them in the comments or open an issue on GitHub.

PostgreSQL VACUUM Tuning: A Technical Deep Dive Into Autovacuum Configuration

Pranay Ravi — Sun, 24 May 2026 09:37:29 +0000

Author's Note: This article documents a production incident investigation and the technical findings that emerged from returning to foundational documentation. The fix was implemented by a colleague; this article captures the learning journey through proper documentation review.

The Incident: High CPU Due to Autovacuum Contention

A production Aurora PostgreSQL cluster experienced sustained CPU utilization between 85-90% over a 3-4 hour window. CloudWatch Performance Insights identified the primary wait event as CPU (not I/O or lock contention), and the top consuming operation was autovacuum VACUUM processes running on two large tables.

Observed State:

Table A (593 GB, 623M rows): 124 million dead tuples (16.6% dead ratio)
Table B (465M rows): 74 million dead tuples (13.7% dead ratio)
Autovacuum workers: 2 running concurrently
CPU utilization: 85-90%
Autovacuum frequency: Single massive vacuum operation per 4-6 hours
Status of tables: Never manually vacuumed since instance creation

Root Cause: Autovacuum thresholds were configured at system defaults, which were inappropriate for high-churn tables receiving bulk updates every 2-3 hours via scheduled data loader processes.

Understanding MVCC: PostgreSQL vs. Oracle's Undo Architecture

Before tuning can be effective, the fundamental difference in how PostgreSQL and Oracle manage concurrent access must be understood.

Oracle's Approach: Automatic Undo Retention Management (AUM)

In Oracle, Multi-Version Concurrency Control (MVCC) is implemented via undo tablespace segments:

Update operation: When a row is updated, the old version is written to undo tablespace (not to the table itself).
Undo retention: Oracle's Automatic Undo Retention Management (AUM) manages undo tablespace as a circular buffer. Undo extents are recycled automatically based on the UNDO_RETENTION parameter and available tablespace.
Space reclamation: Undo space is automatically freed when either (a) the retention period expires, or (b) tablespace pressure forces rollback of older undo data. The DBA's responsibility is limited to allocating sufficient undo tablespace upfront.

Key characteristic: The DBA tunes this mechanism once (setting retention period and tablespace size) and then relies on Oracle's background processes to manage undo lifecycle automatically.

[IMAGE 2: Oracle vs PostgreSQL MVCC Architecture Diagram]

PostgreSQL's Approach: Heap-Based MVCC with Explicit VACUUM

In PostgreSQL, MVCC is implemented at the table (heap) level:

Update operation: When a row is updated, a new version of the row is inserted into the same table. The old version is marked as "dead" but remains physically in the table.
Space reclamation: VACUUM must scan the table, identify dead tuples, and mark their space as reusable. Dead tuples are not automatically removed.
Autovacuum trigger: Autovacuum is a background process that decides when to vacuum based on tunable thresholds. Unlike Oracle's automatic undo recycling, PostgreSQL requires explicit configuration of when vacuum should trigger.

Key characteristic: The DBA must actively tune VACUUM parameters based on table churn patterns. There is no "set it and forget it" mechanism comparable to Oracle's AUM.

Implication: On Oracle, a table with high UPDATE volume simply generates more undo, which Oracle's AUM handles. On PostgreSQL, the same UPDATE volume generates more dead tuples, and if autovacuum thresholds are too conservative, dead tuples accumulate until autovacuum finally triggers—often at high volume, causing CPU spikes.

The Vacuum Threshold Formula: The Mathematical Foundation

When autovacuum decides to run, it evaluates the following formula for each table:

VACUUM_TRIGGER_THRESHOLD = autovacuum_vacuum_threshold + 
                           (autovacuum_vacuum_scale_factor × n_live_tup)

Where:

autovacuum_vacuum_threshold: Absolute minimum dead tuples (default: 50)
autovacuum_vacuum_scale_factor: Percentage of table size (default: 0.1 = 10%)
n_live_tup: Current number of live tuples in the table

Pre-Investigation Configuration

System defaults on the cluster:

autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.1

Calculation for Table A (725M rows):

THRESHOLD = 50 + (0.1 × 725,000,000)
          = 50 + 72,500,000
          = 72.5 million dead tuples

Interpretation: Autovacuum would not trigger on Table A until 72.5 million dead tuples accumulated. This is the critical misconfiguration.

For comparison, Table A actually accumulated 124 million dead tuples before the vacuum completed—well beyond this threshold, indicating autovacuum had already triggered much earlier in the lifecycle but was running continuously against an accumulating workload.

The Root Cause: Loader Pattern and Threshold Mismatch

The data loader process (running every 2-3 hours with 4 parallel workers) updated rows using a COALESCE merge pattern:

UPDATE table_a t SET 
  column_1 = COALESCE(s.column_1, t.column_1),
  column_2 = COALESCE(s.column_2, t.column_2),
  column_3 = COALESCE(s.column_3, t.column_3)
FROM staging_table s
WHERE t.id = s.id

This pattern creates a dead tuple for every row touched, regardless of whether values actually changed. Over a 2-3 hour window with millions of rows, dead tuple generation rate significantly exceeded autovacuum's ability to reclaim space given the high threshold values.

The collision:

High dead tuple generation rate from bulk UPDATE operations
Autovacuum thresholds calibrated for the default use case (moderate churn on tables of typical size)
No table-level overrides to account for this specific workload pattern

Result: Dead tuples accumulated to 16.6% of table size before stabilizing.

Investigation: Diagnostic Queries and Findings

Three queries provided diagnostic clarity:

Query 1: Current dead tuple status

SELECT 
  relname, 
  n_live_tup,
  n_dead_tup,
  ROUND(100.0 * n_dead_tup / (n_live_tup + n_dead_tup), 2) AS dead_ratio_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE relname IN ('table_a', 'table_b')
ORDER BY n_dead_tup DESC;

Result:

Table A: 623M live, 124M dead (16.6%)
Table B: 465M live, 74M dead (13.7%)

Query 2: Active autovacuum processes

SELECT 
  pid, 
  query, 
  query_start,
  EXTRACT(EPOCH FROM (NOW() - query_start))::INT AS runtime_seconds
FROM pg_stat_activity
WHERE query LIKE '%VACUUM%' 
  AND query NOT LIKE '%pg_stat%';

Result: Two autovacuum workers running concurrently, one on each table, both active for 55+ minutes and 3+ minutes respectively at time of investigation.

Query 3: Cumulative churn analysis

SELECT 
  relname,
  n_live_tup,
  n_dead_tup,
  n_tup_upd + n_tup_del AS total_modifications,
  ROUND(100.0 * (n_tup_upd + n_tup_del) / n_live_tup, 1) AS churn_ratio_pct
FROM pg_stat_user_tables
WHERE relname IN ('table_a', 'table_b');

Result:

Table A: 16 billion total modifications on 725M rows (2203% cumulative churn)
Table B: 11.9 billion total modifications on 465M rows (2566% cumulative churn)

Interpretation: These represent lifetime cumulative statistics since instance creation. The 2200%+ ratio indicates every row has been touched approximately 22 times on average over the instance lifetime.

The Configuration Adjustment: Formula-Based Approach

Rather than implementing ad-hoc changes, a formula-based approach was used to calculate appropriate thresholds.

The colleague conducting the fix referenced textbook formulas for maintenance memory allocation and threshold calculation, confirming that the system defaults were inappropriate for tables with this churn profile.

Applied changes (table-level only):

ALTER TABLE table_a SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 10000,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_analyze_threshold = 5000,
  autovacuum_vacuum_cost_delay = 2,
  autovacuum_vacuum_cost_limit = 5000
);

ALTER TABLE table_b SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_threshold = 10000,
  autovacuum_analyze_scale_factor = 0.005,
  autovacuum_analyze_threshold = 5000,
  autovacuum_vacuum_cost_delay = 2,
  autovacuum_vacuum_cost_limit = 5000
);

Rationale for each parameter:

Scale factor (0.1 → 0.01): Lowers the percentage-based trigger from 10% to 1% of table size, causing autovacuum to start at 7.25 million dead tuples instead of 72.5 million. This increases vacuum frequency but reduces per-operation work volume.
Threshold (50 → 10000): Sets an explicit minimum to prevent excessive vacuum triggering on very small tables, but still allows reasonable triggering on large tables.
Analyze scale factor (0.05 → 0.005): ANALYZE (which updates table statistics for the query planner) triggers more frequently, preventing stale statistics during high-churn periods.
Cost parameters: cost_delay = 2ms and cost_limit = 5000 distribute vacuum work into smaller increments with longer pauses, reducing per-operation CPU spike while still completing the work.

New threshold calculation for Table A:

THRESHOLD = 10,000 + (0.01 × 725,000,000)
          = 10,000 + 7,250,000
          = 7.26 million dead tuples

This represents a 10× reduction in the threshold value, causing autovacuum to trigger 10× more frequently but with proportionally smaller work loads.

Critical Design Decision: Table-Level Configuration, Not Cluster-Level

A deliberate choice was made to apply all tuning parameters at the table level only, not at the system/cluster level.

Why table-level configuration is necessary:

Heterogeneous workloads: Not all tables in a cluster have the same churn pattern. Table A and Table B are high-churn bulk-update targets. Other tables in the same cluster may be mostly static or read-heavy.
Preventing cascade effects: A cluster-wide reduction in autovacuum_vacuum_scale_factor would cause autovacuum to trigger more frequently on all tables, including those with minimal churn. This could result in:
- Unnecessary vacuum operations consuming CPU and I/O
- More frequent ANALYZE operations on stable tables
- Increased lock contention if many vacuums run concurrently
Isolation of risk: By tuning only the affected tables, the change is isolated to the problem source and does not introduce unexpected side effects on other database objects or applications.

This is a deliberate engineering discipline: tune at the smallest scope that solves the problem.

Results: Before and After Metrics

Before Configuration (April 13, 09:51 AM):

Metric	Table A	Table B
Dead tuples	124M	74M
Dead ratio	16.6%	13.7%
Last autovacuum	N/A (first vacuum)	N/A (first vacuum)
CPU utilization	85-90%	85-90%
Vacuum frequency	1 per 4-6 hours	1 per 4-6 hours

After Configuration (April 13, 12:07 PM - 13:45 PM):

Metric	Table A	Table B
Dead tuples	0 (vacuum completed)	21M (declining)
Dead ratio	0%	7.8%
Last autovacuum	2026-04-13 12:07:08	2026-04-13 13:45:44
CPU utilization	30-40%	30-40%
Vacuum frequency	Multiple per loader cycle	Multiple per loader cycle

The vacuum operations completed naturally without manual intervention. Dead tuple levels stabilized well below the new thresholds. CPU alerts cleared.

Monitoring: Proactive Visibility

To prevent recurrence, a monitoring table was implemented to capture dead tuple trends:

CREATE TABLE pg_table_stats_history (
  captured_at TIMESTAMP DEFAULT NOW(),
  table_name TEXT,
  n_live_tup BIGINT,
  n_dead_tup BIGINT,
  dead_ratio_percent NUMERIC,
  n_tup_upd BIGINT,
  n_tup_del BIGINT,
  last_autovacuum TIMESTAMP
);

SELECT cron.schedule('capture_table_stats', '*/30 * * * *', $$
  INSERT INTO pg_table_stats_history
  SELECT 
    NOW(), relname, n_live_tup, n_dead_tup,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2),
    n_tup_upd, n_tup_del, last_autovacuum
  FROM pg_stat_user_tables
  WHERE relname IN ('table_a', 'table_b');
$$);

This captures a snapshot every 30 minutes, allowing observation of daily patterns: "During loader window (12:00-14:00), dead tuples climb to 5-8M, then autovacuum brings them back to 0.5M."

Key Concepts: What the Investigation Revealed

1. The Autovacuum Cost Mechanism

VACUUM has two cost-control parameters:

autovacuum_vacuum_cost_limit: Units of work (reading a page = 1 unit, writing a page = 20 units) allowed before autovacuum pauses
autovacuum_vacuum_cost_delay: Milliseconds to pause when cost limit is reached

Lower cost_limit + higher delay = slower vacuum, less CPU spike

Higher cost_limit + lower delay = faster vacuum, more CPU spike

The pre-incident configuration had cost_delay = 5ms and cost_limit = 1800 (already aggressive). The post-incident configuration used cost_delay = 2ms and cost_limit = 5000, allocating higher work budgets but still enforcing frequent pauses for distribution.

When autovacuum triggers, it scans the entire table. If dead tuples are being created faster than they're being reclaimed (because the loader is still running), the vacuum operation takes longer and consumes sustained CPU.

In the incident:

Table A vacuum started at 09:51:54 and completed at 12:07:08 (2h 15m)
Table B vacuum started at 08:59:39 and completed at 13:45:44 (4h 46m)
During this window, the loader was also running (starting 12:37:27), creating additional dead tuples

The concurrent activity created contention for CPU and I/O resources, explaining the 85-90% CPU utilization.

3. Wraparound Protection: A Critical Background Mechanism

While not the active problem in this incident, understanding wraparound protection is essential context:

PostgreSQL uses 4-byte transaction IDs (2^32 = 4.3 billion possible values). For MVCC visibility comparison to work correctly, only a 2-billion value range is usable at any given time. If autovacuum falls so far behind that unfrozen tuples approach the 2-billion transaction boundary, PostgreSQL will:

Log warnings at 200M transactions remaining
Shift to READ-ONLY mode at 1M transactions remaining
Force shutdown if the limit is reached

This is not a theoretical concern—it's a safety mechanism that has forced database shutdowns in under-monitored systems. Autovacuum must keep up, or the database fails.

The Documentation Recovery: Why AI Alone Is Insufficient

When the incident was first encountered, AI-based diagnostic tools provided generic suggestions: "lower cost_delay," "increase cost_limit," "check maintenance_work_mem."

These suggestions were not wrong, but they were not calibrated to the specific situation. The threshold formula, the rationale for scale factor adjustment, and the risks of cluster-level configuration changes were not apparent from AI output alone.

The fix required returning to foundational documentation:

PostgreSQL VACUUM documentation (official): Explained the cost mechanism and threshold formula
AWS Aurora PostgreSQL tuning guide: Provided context-specific guidance for managed Aurora instances
Textbook references on MVCC: Clarified why dead tuples accumulate and how autovacuum prevents wraparound

The colleague's insistence on reading the documentation forced a deeper investigation that revealed:

The mathematical formula driving autovacuum trigger decisions
The specific interaction between bulk update workloads and default thresholds
The risks and benefits of table-level vs. cluster-level configuration

The lesson: Base concepts—MVCC, dead tuples, autovacuum thresholds, wraparound protection—are non-negotiable foundations. Understanding these requires reading documentation. Once the foundations are understood, AI tools can accelerate diagnosis and suggest configurations. But without foundations, suggestions are just knobs to turn without understanding consequences.

Combining both—foundational reading with AI-assisted diagnosis—yields better outcomes than either alone.

Diagnostic Queries for Future Incidents

When autovacuum CPU spikes occur, these queries provide immediate visibility:

Dead tuple status:

SELECT 
  relname, n_live_tup, n_dead_tup,
  ROUND(100.0 * n_dead_tup / (n_live_tup + n_dead_tup), 2) AS dead_pct,
  last_autovacuum
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Active autovacuum activity:

SELECT pid, query, query_start, 
  EXTRACT(EPOCH FROM (NOW() - query_start))::INT AS runtime_sec
FROM pg_stat_activity
WHERE query LIKE '%VACUUM%';

Churn rate analysis:

SELECT 
  relname, n_live_tup, 
  n_tup_upd + n_tup_del AS total_mods,
  ROUND(100.0 * (n_tup_upd + n_tup_del) / n_live_tup, 1) AS churn_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Transaction age (wraparound risk):

SELECT 
  datname,
  age(datfrozenxid) AS txid_age,
  (SELECT setting::int FROM pg_settings 
   WHERE name = 'autovacuum_freeze_max_age') - age(datfrozenxid) 
   AS txid_remaining
FROM pg_database
WHERE datallowconn
ORDER BY txid_age DESC;

Conclusion

This incident demonstrated that PostgreSQL's VACUUM mechanism requires active tuning based on workload patterns. Unlike Oracle's Automatic Undo Retention Management—which automatically recycles undo tablespace based on retention policies—PostgreSQL requires explicit configuration of autovacuum thresholds calibrated to specific table churn patterns.

The resolution came not from tweaking random parameters, but from understanding the mathematical formulas governing autovacuum behavior and applying them methodically at table scope.

The broader lesson is methodological: when facing production database performance issues, foundational understanding of mechanisms (documented in official sources) combined with diagnostic data yields better outcomes than parameter suggestions alone.

References

PostgreSQL Official: VACUUM
PostgreSQL Official: Autovacuum
AWS Database Blog: Understanding autovacuum in Amazon RDS for PostgreSQL
Percona: Overcoming VACUUM WRAPAROUND
CYBERTEC: Autovacuum wraparound protection

From Zombie Connections to XactSync: A Post-Mortem on Aurora Postgres CPU Spikes

Pranay Ravi — Sat, 23 May 2026 20:54:26 +0000

The alert fires at 1:28 PM. By the time you open CloudWatch, CPU is already back to normal. No application errors. No user complaints. Nothing obviously broken.

Do you close the ticket and move on, or dig in?

This post is about why you should always dig in — and exactly how we traced a "self-resolving" CPU spike on Aurora PostgreSQL 16 all the way back to a mismatch between pipeline commit logic and the database storage engine.

The Environment

Engine: Aurora PostgreSQL 16.1 on db.x2g.xlarge (4 vCPUs, 128GB RAM)
Workload: Kafka stream-based pipeline inserting ~365 rows/sec average into a daily-partitioned table, with burst peaks near 825/sec
Replication: Oracle GoldenGate
Observability: AWS CloudWatch Database Insights + AppDynamics Database Agent

What the Dashboard Showed

The first stop is always CloudWatch $\rightarrow$ Database Insights $\rightarrow$ Database Load. This chart shows Average Active Sessions (AAS) broken down by wait type:

Color	Wait Type	What It Means
🟢 Green	`CPU`	Active computation
🟠 Orange	`IO:AuroraStorageLogAllocate`	WAL/storage log writes
🟣 Purple	`LWLock:BufferContent`	Buffer manager contention
🟤 Brown	`Timeout:SpinDelay`	Spinlock spin waits
🩷 Pink	`LWLock:WALInsert`	WAL insert lock contention

Our spike was pure green—CPU only, no I/O or lock contention. The Top SQL tab showed a high-volume INSERT into a partitioned event log table running at ~825 calls/sec with 0.17ms average latency. Fast and normal individually. The volume was the question.

However, the Database Telemetry tab revealed two ominous signals that often catch a developer's attention:

Max time idle in transaction: Climbing linearly for 3.2 hours straight.
Vacuum: Max used transaction IDs: On a steady linear climb.

When you see both of those linear trends on a single chart, it usually points to a common suspect in relational databases: something has been holding an open transaction for a very long time, threatening your autovacuum health.

Background: The Core Mechanics of Postgres MVCC

To understand why long-running transactions are so closely tracked, it helps to look at how PostgreSQL manages Multi-Version Concurrency Control (MVCC).

Unlike database engines that keep old record data in a separate utility tablespace (like Oracle's Undo Tablespace), PostgreSQL relies on a Heap-Based Approach.

The Postgres Heap Reality: There is no separate undo space. When a row is updated or deleted, the old version (called a "dead tuple") remains physically stored inside the exact same table page. It stays there until the background VACUUM process scans the table and marks that dead space as reusable for future inserts.

If an application leaves a transaction open, it anchors the xmin (the baseline transaction ID floor for that snapshot). VACUUM cannot safely clean up any dead tuples that were generated after the oldest active transaction's `xmin`. This can trigger a problematic domino effect:

Session opens → acquires xmin snapshot
       ↓
Session goes idle (no explicit commits or rollbacks)
       ↓
Dead tuples accumulate → VACUUM cannot clear them out
       ↓
Autovacuum loops continuously, wasting CPU scanning bloated tables
       ↓
Transient CPU spikes alert operations

Because autovacuum periodically wakes up to evaluate tables, the resulting CPU spikes look completely intermittent. The alert clears, engineers close the ticket as a glitch, and the underlying architectural friction persists.

Step 1: Find the Stuck Sessions

To hunt down the culprit, we ran a quick diagnostic query against pg_stat_activity:

SELECT
  pid,
  usename,
  application_name,
  client_addr,
  state,
  query,
  now() - state_change AS idle_duration,
  now() - xact_start   AS transaction_age
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY xact_start ASC;

The results: 35 sessions, all originating from the same connection IP, belonging to the replication user. The oldest transaction had been open for 133 days.

The queries they were idling on:

SELECT count(*) from pg_settings where name='ansi_force_foreign_key_checks';
SELECT count(*) from pg_type where typtype='e' and oid in (...);

These are classic compatibility probe queries issued during connection initialization by integration tooling like GoldenGate. The integration platform had been connecting, executing its basic discovery probes, and abandoning the sessions without explicitly closing them every time it restarted or reconnected over a four-month period.

Step 2: Verify the Vacuum Blocker (The Twist)

Finding idle in transaction sessions is a major red flag, but we had to verify if they were genuinely blocking data cleanup. Not all idle sessions hold an active database snapshot.

We executed the source-of-truth query to find actual vacuum blockers:

SELECT
  pid,
  usename,
  backend_xmin,
  age(backend_xmin) AS xmin_age_in_transactions,
  now() - xact_start AS session_age,
  state
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC
LIMIT 10;

⚠️ This is where our initial MVCC-bloat hypothesis collapsed.

The zombie integration sessions did not appear in this list; their backend_xmin fields were entirely NULL. Because they had only executed basic, read-only catalog queries during initialization, they never acquired a persistent data snapshot transaction ID. They were not blocking vacuum. The actual oldest xmin holders were normal application sessions with transaction ages under 3,000—completely standard for a busy transaction engine.

While these zombie connections were a clear operational hazard (consuming connection slots and wasting system memory), they weren't driving our CPU alert.

Step 3: Check the Partitions

With zombie connections ruled out as vacuum blockers, we turned our focus to the physical table health. Because our target event log table is a partitioned table, statistics are tracked natively at the individual child partition level, not the parent logical table. We targeted the table prefix:

SELECT schemaname, relname, n_live_tup, n_dead_tup,
  ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE relname LIKE 'event_log%'
ORDER BY n_dead_tup DESC;

Result: 0 dead tuples across all active partitions.

The partitioned strategy was working flawlessly. The application used an insert-only pattern on the active daily partition, and old partitions were dropped entirely rather than vacuumed. This meant VACUUM bloat was completely ruled out.

Step 4: Finding the Real Culprit via Wait States

With MVCC bloat off the table, we pivoted to the APM wait state drill-down for the intensive INSERT query during the exact minutes of the CPU spike:

ClientRead (54%): The database engine spent more than half its time completely idle, waiting for the application client to send the next query over the network wire.
Active (24.7%): Time spent physically executing the raw insert statements.
XactSync (17.9%): The engine waiting for the Write-Ahead Log (WAL) to be flushed and synchronized to durable storage upon a COMMIT.
AuroraStorageLogAllocate (2.4%): Storage block layer allocation.

Two metrics tell the final story here:

XactSync at 17.9%: This wait state occurs when an explicit COMMIT forces PostgreSQL to flush WAL records. At a peak volume of 825 inserts per second, the application was issuing 825 individual commits per second. This meant 825 distinct sync operations forcing the storage layer to acknowledge writes sequentially.
ClientRead at 54%: This is the classic signature of single-row, one-at-a-time application database calls. The application loop was executing: Send row $\rightarrow$ Execute $\rightarrow$ Commit $\rightarrow$ Wait for network Ack $\rightarrow$ Send next row.

The shape of the CPU spike closely mirrored a backlog flush pattern. Messages would pile up briefly in the upstream Kafka topic, and then the consumer would drain them all at once in a massive burst of single-row, high-frequency synchronous commits.

Root Cause Summary

Hypothesis	Verdict	Technical Proof
Dead Tuple Accumulation	❌ Ruled Out	`pg_stat_user_tables` showed a 0% dead tuple ratio across all database partitions.
Zombie Connections Blocking Vacuum	❌ Ruled Out	`backend_xmin` was explicitly `NULL` for all zombie replication sessions.
Burst Insert Flush + Per-Row Commits	Root Cause	High `XactSync` (17.9%) paired with a massive `ClientRead` (54%) network wait pattern.

The Action Plan

Fix 1: Terminate and Block the Zombie Connections

Even though they didn't block vacuum in this specific instance, leaving 35 abandoned sessions active for 133 days is risky. They leak connection resources and backend memory.

First, we safely terminated the dangling backends:

SELECT pg_terminate_backend(pid), pid, now() - xact_start AS xact_age
FROM pg_stat_activity
WHERE usename = 'repl_agent' 
  AND state = 'idle in transaction';

Next, we implemented an automated defense line. We updated the cluster configuration parameters to enforce a strict timeout globally:

idle_in_transaction_session_timeout = 600000 # 10 minutes in milliseconds

If any session sits idle inside an uncommitted transaction block for more than 10 minutes, Postgres will now automatically drop the connection and free up system resources.

Fix 2: Implement Micro-Batching in the Consumer Pipeline

The true fix for the CPU spike required changing how the application interacted with the storage engine. Instead of committing every single row individually, we modified the JDBC producer configuration to use a batch size of 500.

-- Old Pattern (825 WAL Syncs/sec):
INSERT row 1 → COMMIT → WAL fsync → Wait
INSERT row 2 → COMMIT → WAL fsync → Wait

-- New Pattern (1.7 WAL Syncs/sec):
INSERT row 1, INSERT row 2 ... INSERT row 500 → COMMIT → 1 WAL fsync

Metric	Before Batching	After Batching (Est.)
Commits/sec (Burst)	~825	~1.7
WAL Fsyncs/sec	~825	~1.7
`XactSync` Wait Time	17.9%	< 1.0%
Transient CPU Spikes	Frequent	Eliminated

Essential Checklist Queries for Triage

Add these queries to your incident response runbook for analyzing database load:

1. Identify Long-Running Snapshot Blockers

SELECT pid, usename, backend_xmin,
  age(backend_xmin) AS xmin_age_transactions,
  now() - xact_start AS session_age
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY age(backend_xmin) DESC 
LIMIT 5;

2. Spot Dangerous `Idle in Transaction` Sessions

SELECT pid, usename, client_addr,
  now() - xact_start AS transaction_age,
  now() - state_change AS idle_duration,
  LEFT(query, 80) AS last_query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND now() - xact_start > interval '5 minutes'
ORDER BY xact_start ASC;

3. Check Table Churn & Dead Tuple Ratios (Including Partitions)

SELECT schemaname, relname, n_live_tup, n_dead_tup,
  ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_ratio_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE relname LIKE 'your_table_prefix%'  -- Handles partitioned child tables
ORDER BY n_dead_tup DESC;

💡 Tip: A dead_ratio_pct > 10% indicates a strong need for space recovery; > 20% means the table is critically bloated and query execution performance will likely degrade.

What's Next?

If you found this breakdown helpful, keep an eye out for our next post. We will be diving into the opposite side of database optimization: When Autovacuum goes wrong. We will walk through a separate production incident where a high-churn table choked on over 124 million dead tuples (a brutal 16.6% dead ratio) due to misconfigured autovacuum cost thresholds. Stay tuned!

What It Actually Takes to Audit Aurora PostgreSQL on AWS

Pranay Ravi — Thu, 21 May 2026 23:28:33 +0000

Most operational infrastructure starts this way: a requirement appears before the architecture does.

One day the team needed a database audit solution. Not in a planning doc — someone asked, and I had to build something. I'd owned the Oracle audit pipeline already, so I knew what the destination looked like. The question was what it would take to get there on Aurora PostgreSQL and AWS.

The short answer: more than you'd expect. The longer answer is this article.

What We're Actually Auditing — and Why It Matters

The scope here is specific: individual human users making direct DML changes (SELECT, INSERT, UPDATE, DELETE) to application tables.

Application service accounts are expected to modify data — that's their function. The risk surface is direct human access. A developer connected via psql. A support engineer running an ad-hoc update. A credential that shouldn't have had access to begin with. Those are the actions that need an audit trail in a regulated environment.

This distinction shapes every architectural decision: we enable pgAudit per individual user account, not cluster-wide. Application accounts are excluded entirely.

The Oracle Baseline

Before getting into the build, it's worth framing the comparison.

On Oracle, Unified Auditing (an Enterprise-tier feature) handles this with a single policy definition at the intersection of user, action, and object:

CREATE AUDIT POLICY user_dml_activity
  ACTIONS SELECT, INSERT, UPDATE, DELETE ON app_schema.orders
  WHEN 'SYS_CONTEXT(''USERENV'',''SESSION_USER'') != ''APP_SVC_ACCOUNT'''
  EVALUATE PER SESSION;

AUDIT POLICY user_dml_activity;

Records land in UNIFIED_AUDIT_TRAIL — a structured, SQL-queryable audit view. From there, the data can be exported on a schedule to any downstream analytics platform such as Elasticsearch, Splunk, OpenSearch, or a dedicated audit store. Scheduled queries or alerting rules can then detect patterns like bulk deletes, after-hours access, or unexpected DDL changes and trigger notifications through PagerDuty, Opsgenie, or an observability platform via HTTP webhook. Once the audit policy is defined, the downstream detection and alerting pipeline is relatively small because the audit data is already structured and queryable.

One important operational caveat with Oracle's approach: if the audit tablespace fills up, Oracle halts the database rather than silently dropping audit records — it guarantees completeness at the cost of availability. Tablespace monitoring becomes a hard operational requirement. pgAudit writing to the log stream instead of a table sidesteps this entirely: if log delivery has issues, the database keeps running.

pgAudit is different. There's no policy-based object/action targeting. You set a log class (read, write, ddl, all) per user. Records go to the PostgreSQL log stream — on Aurora, that means CloudWatch Logs. There's no queryable view. You're working with text:

AUDIT: SESSION,1,1,WRITE,INSERT,TABLE,public.orders,
"INSERT INTO orders (customer_id, amount) VALUES ($1, $2)"

It works. But extracting structured, alertable signal from a log stream requires deliberate engineering.

The Architecture

Aurora PostgreSQL  (pgAudit per-user)
        │
        ▼
CloudWatch Logs
        │
EventBridge  rate(5 min)
        │
        ▼
Lambda  → runs Log Insights query → filters noise → publishes count metric
        │
        ▼
CloudWatch Alarm  (count > 0)
        │
        ▼
SNS → incident platform + email

The Lambda exists because CloudWatch Alarms cannot evaluate Log Insights query results directly — there's no native bridge. Lambda runs the query, counts filtered results, and publishes a standard CloudWatch metric. The alarm evaluates that metric. Lambda invocation logs also give you an independent timestamped record of every detection event — useful when compliance asks "when was this first noticed?"

Full code, IAM policies, SNS configurations, and Terraform module: github.com/pcraavi/PostgreSQL-Audit

Gotcha 1: The Silent pgAudit Setup Failure

Enable pgaudit in shared_preload_libraries, reboot the cluster. Simple.

Except — shared_preload_libraries only loads the binary into shared memory. You also need:

CREATE EXTENSION pgaudit;

Without it: zero audit records. Zero error messages. Nothing in CloudWatch. The cluster had been rebooted, the parameter was confirmed, logs were being exported — and there was nothing to show for it.

The diagnostic:

SELECT * FROM pg_extension WHERE extname = 'pgaudit';
-- 0 rows returned

That was the entire problem. One missing statement.

Then enable per-user logging:

ALTER USER john_doe SET pgaudit.log TO 'all';

-- Verify
SELECT usename, useconfig FROM pg_user WHERE usename = 'john_doe';

Gotcha 2: The Noise Problem

The first time you look at raw audit logs from a production cluster, the signal-to-noise ratio is genuinely bad:

AUDIT: SESSION,...,READ,SELECT,,,"SELECT version()"
AUDIT: SESSION,...,READ,SELECT,,,"SELECT * FROM pg_shdescription..."
AUDIT: SESSION,...,READ,SELECT,,,"SET application_name='DBeaver 23.2.0'"
AUDIT: SESSION,...,READ,SELECT,,,"SELECT oid, typarray FROM pg_type WHERE typname=$1"

Every database tool (DBeaver, DataGrip, pgAdmin) fires a catalog query sequence on connect. Every JDBC driver runs initialization SELECTs. Every connection pool runs health checks. Alert on this raw stream and you're alerting on noise constantly — which means you stop paying attention, which defeats the audit entirely.

The filter query was built incrementally by watching actual production traffic:

fields @timestamp, @message, @logStream, @log
| filter @message like /AUDIT:/
| filter (
    @message like /SELECT/ or @message like /INSERT/ or
    @message like /UPDATE/ or @message like /DELETE/
  )
| filter @message not like /SELECT version()/
| filter @message not like /pg_shdescription/
| filter @message not like /pg_catalog/
| filter @message not like /information_schema/
| filter @message not like /SET application_name/
| filter @message not like /DBeaver/
| filter @message not like /PostgreSQL JDBC Driver/
| filter @message not like /datname = \$1/
| sort @timestamp desc

Your exclusion list will grow. Every application stack generates its own connection init patterns.

Gotcha 3: The KMS + CloudWatch Alarms Incompatibility

Encrypt the SNS topic at rest. Straightforward — use alias/aws/sns, the AWS-managed SNS key. Done.

Except the alarm stopped delivering.

CloudWatch Alarms does not have authorization to access the SNS topic encryption key

AWS-managed keys have immutable key policies. You cannot grant CloudWatch kms:GenerateDataKey. The fix is a customer-managed KMS key with an explicit CloudWatch service grant in the key policy. The managed key works fine if Lambda is publishing directly to SNS — the limitation is specific to the CloudWatch Alarms → SNS delivery path.

Key policy and full SNS security configuration (HTTPS enforcement, cross-account lockdown): github.com/pcraavi/PostgreSQL-Audit/tree/main/sns

Gotcha 4: The Incident Platform Deduplication Problem

First alert: fired correctly.
Second alert, third, fourth: nothing.

The incident platform logs showed requests arriving and the "Create Alert" action starting — but no alert created. The issue: platforms like Opsgenie, PagerDuty, and VictorOps deduplicate by alarm name (used as the alert alias). If that alert is already open or acknowledged, subsequent triggers are suppressed.

Correct behavior for most alarms. For an audit alert that should fire every detection window, it needs handling.

Fix options:

Timestamp the alarm name at deploy time ($(date +%s) suffix) — unique alias per deployment
Configure auto-close when the alarm returns to OK — fresh alert on each ALARM transition
Override the alias template in the integration to include a timestamp from the alert payload

Why Not Database Activity Streams?

Experienced AWS engineers will immediately ask this. DAS provides near-real-time activity streaming to Kinesis with structured output. It's a real solution.

The reason we didn't use it: it introduces Kinesis as a required dependency, adds per-event cost, and requires a consumer layer for decryption and processing. For teams without an existing Kinesis pipeline, that's meaningful operational surface area.

pgAudit + CloudWatch uses infrastructure Aurora already depends on. Lambda and SNS are general-purpose services. There's no proprietary tooling, no specialized operational knowledge required. Any engineer with standard AWS experience can maintain it. That portability and low learning curve matters when you're deploying across multiple accounts.

GuardDuty RDS Protection is complementary — it handles threat detection, not structured audit trails. OpenSearch subscription filters would work but introduce an OpenSearch cluster as a dependency, which adds cost and operational overhead.

A Note on pgAudit Overhead

Overhead is minimal at this audit scope. Individual human users in a regulated production environment are not expected to generate high transaction volumes — the audit target is ad-hoc access, not application traffic. The per-session overhead of writing to the PostgreSQL log is negligible when audit scope is limited to non-application users.

For high-volume clusters where audit is a compliance requirement: account for log overhead in instance sizing. Set CloudWatch log retention to match your compliance window rather than keeping logs indefinitely — use aws logs put-retention-policy to enforce it.

What I'd Do Differently

Ship filtered audit records to a structured store. CloudWatch Log Insights is a query tool, not a data store. Extending the Lambda to write each filtered record to DynamoDB or an RDS audit table gives you the closest equivalent to Oracle's UNIFIED_AUDIT_TRAIL: indexed, queryable, long-term audit history. Kinesis Firehose → S3 → Athena is another path for columnar query performance over large windows. The Lambda architecture makes this extension natural — the query and filtering logic is already there.

Terraform from day one. The full stack — Lambda, EventBridge rule, SNS topic, IAM role, CloudWatch alarm, KMS key — fits in a single reusable module. Deploying to additional accounts should be terraform apply with environment variables, not a re-run of CLI commands.

Tag every resource. Environment, ClusterName, Owner. Untagged audit infrastructure across multiple accounts becomes unauditable infrastructure.

Final Thought

Each component here has a clean interface and behaves predictably in isolation. The engineering is in the integration contracts: Log Insights results don't flow to alarms without mediation; AWS-managed KMS keys don't work with CloudWatch delivery; incident platform deduplication suppresses repeated alarm transitions. None of these are documented prominently — they surface when components are wired together under production conditions.

Start with the Log Insights filter query. Validate the signal before building anything around it. The filtering logic is the foundation. The rest is plumbing.

Full implementation: Lambda code, IAM policies, SNS topic policy, KMS key policy, CloudWatch alarm, Terraform module → github.com/pcraavi/PostgreSQL-Audit

Drop a comment if you've hit different noise patterns in your environment, or if you've implemented the Lambda → structured audit table extension — curious how others have approached long-term audit retention on Aurora.

Automated 25 Minutes of My Morning With a Prompt (Not a Script)

Pranay Ravi — Thu, 21 May 2026 03:07:21 +0000

Every serious engineering org I've worked in has the same split personality.

One side: A modern observability stack. AppDynamics, Datadog, whatever the current favorite is. Real-time metrics, distributed traces, beautiful dashboards, alert routing. Years of investment. It works.

The other side: A long tail of legacy monitoring tools that predate the current stack by a decade. Tools that don't speak OpenTelemetry. Tools that can't push to a webhook. Tools whose output format is, and will remain, a 73-row HTML table emailed at 11:15 UTC every morning.

These two sides don't talk to each other. And in most orgs, someone fills that gap manually for 25–30 minutes every morning.

That someone was me.

The Gap Nobody Draws on the Architecture Diagram

The box that should exist on every enterprise architecture diagram but never does:

"Dave reads his email for 25 minutes"

This is what I call an observability seam — the boundary between your modern monitoring stack and legacy outputs it can't ingest. These seams exist because:

Legacy tools were built before modern observability conventions
Integration work is permanently lower priority than feature work
The cost is diffuse (distributed across many humans' mornings) rather than concentrated, so it never hits "fix it" priority

The insight I kept coming back to: this isn't a technology problem. It's an architectural surface area nobody drew. Once you see it that way, the solution becomes tractable.

What the Legacy Tail Actually Looked Like

My specific environment had four data sources, none of which fed the observability stack:

Source	Format	Frequency
Replication latency alerts	Email (body text)	Multiple times daily
DB backup status report	Email (73-row HTML table)	Daily at 11:15 UTC
Infrastructure/AWS notifications	Email (subject-line pattern)	20–50 per day
Deployment Review	Confluence page backed by Jira macros	Checked manually

None of them have APIs. None of them feed my SIEM or APM. The only integration surface available was: it arrives as email, or it lives at a known URL.

That constraint is also the opportunity.

The Architecture: Agent as Integration Layer

The core decision was to treat an LLM agent not as a productivity tool, but as a seam-closing integration layer — something that sits between heterogeneous, unstructured legacy outputs and a human who needs a consistent, actionable daily summary.

The agent does four things each morning:

1. Replication Latency — Signal extraction from noise

Find the latest alert, read the body, and report whether the body is actually populated.

This sounds trivial. It isn't. Replication alerts sometimes fire with empty bodies — the email arrived, but the content didn't. Manually, this is easy to miss because you see the email and assume it's fine. The agent makes "body is empty" an explicit flagged state. That's catching a monitoring failure, not just monitoring an alert.

2. Backup Report — Structured extraction from semi-structured HTML

Parse the 73-row HTML table, tally the BACKUP_STATUS column, surface only the rows that didn't complete.

Output: "69 completed, 0 failed, 3 with no backup — db01, db02, db05."

Reading time: three seconds. A human eyeballing 73 rows is slower and occasionally misses things.

3. Infrastructure Notifications — Classification at scale

Twenty to fifty emails daily, each following the pattern STATUS - Alert (Server Name : X and DB Name : Y).

The agent classifies every subject line, counts by status, and surfaces only the non-SUCCESS entries with a direct link to each email.

Output: "21 SUCCESS, 1 WARNING on SERVER04U / DB09."

That one line is the entire actionable output of fifty emails.

4. Deployment Review — Live data from a known URL

Read the Confluence page, identify the Jira macro structure, query the underlying filters directly, and report live issue counts. Bridges the gap between "someone updated a Confluence page" and "here's the actual current state of production deployments."

Why an LLM Instead of a Python Script?

A reasonable engineer will ask: couldn't this be IMAP + regex + cron?

Yes — and in many cases that's the right answer. Large enterprises already run Airflow, Power Automate, Splunk SOAR, Logic Apps. This isn't automation entering a vacuum.

The honest comparison:

Approach	Strength	Breaks when…
Python + regex + cron	Deterministic, auditable, fast	Email format changes, new alert pattern appears
ETL / SOAR pipeline	Scalable, governed, integrated	Requires schema agreement upstream
LLM agent	Tolerates format variance, low setup cost	Output is nondeterministic, harder to audit

What the LLM reduces is integration friction — not the need for integration.

The backup report column order can shift. The AWS notification subject-line pattern can vary. A new data source can be added in a sentence of English rather than a week of schema work.

That flexibility has a cost: you trade determinism for adaptability. For a morning triage digest where a missed edge case surfaces in the next run, that tradeoff is acceptable. For a system that triggers automated remediation, it isn't.

Right mental model: LLM agents lower the cost of closing observability seams. They don't replace deterministic pipelines where correctness guarantees matter.

Design Principles Worth Keeping

Independent section isolation

Each source is handled independently. A connector outage in one section doesn't abort the others — the agent renders an "unavailable" note and continues. A partial report is vastly more valuable than a silent failure.

Absence as a signal

Each section includes a sanity check that fires if no emails are found for that source.

"No replication alerts received today" is bolded as a warning, not silently skipped.

This paid off in month two, when the backup report email silently stopped arriving after a mail routing change. The sanity check flagged it immediately. Manual triage would have assumed a clean run.

Delta orientation

The report answers "what changed or failed since yesterday", not "what is the current state of everything."

Confirming sameness has no information value but costs the same attention as confirming change. A delta-oriented report routes attention only where it's needed.

Runtime date resolution

The prompt never hardcodes a date. "Today" is resolved against the local clock at execution time. Small thing. Prevents a class of subtle bugs where a stale prompt runs with yesterday's scope.

The Prompt (Sanitized)

The prompt itself is the deliverable worth preserving:

You are producing the daily 9 AM operations report. Run silently — this prompt is self-contained.

REQUIRED CONNECTORS
- Microsoft 365 / Outlook (email search + read_resource on mail:// URIs)
- Atlassian (getConfluencePage; optionally searchJiraIssuesUsingJql)

If any connector is missing or errors out, render that section with a clear
"⚠ connector unavailable" note and continue. Never abort the whole report.

SCOPE OF "TODAY"
"Today" = the calendar day the task fires, in local time.
Always pass afterDateTime: "today" to outlook_email_search.

====
SECTION 1 — Replication Latency Monitoring (latest alert today)
====
1. Search for latest alert today.
2. read_resource on its URI for the full body.
3. Report: count of alerts today, receivedDateTime, sender, and either the
   body text or "Body is empty" if it's only whitespace.

====
SECTION 2 — DB Backup Report (failure check)
====
1. Find today's report email.
2. Parse rows; tally BACKUP_STATUS column.
3. State explicitly: "X completed, Y failed, Z with no backup."
   **Bold** the failure count if > 0.

====
SECTION 3 — Infra notifications today
====
1. Fetch alerts from sender, limit 50.
2. Parse subject pattern: "STATUS - Alert (Server : X and DB : Y)"
3. Summarize counts by status. **Bold** anything not SUCCESS.
4. List non-SUCCESS items with a webLink to each.

====
SECTION 4 — Deployment Review (Confluence)
====
1. getConfluencePage, contentFormat="html".
2. If JIRA tool available, run each filter and include live issue counts.

====
OUTPUT FORMAT
====
- Plain markdown. One H1 with today's date. Four H2 sections in order.
- **Bold** anything requiring attention.
- End with "Sources:" — no duplicate links.
- Tone: factual. No filler. No apology lines.

====
SANITY CHECKS
====
- Section 1 zero results: **"No replication alerts today — monitoring may be down."**
- Section 2 zero results: **"Backup report email not received yet."**
- Section 3 zero results: **"No infra notifications today — verify alerting pipeline."**
- Section 4 failure: include the direct Confluence URL for manual access.

Three things worth noting:

Section independence is explicit. Each section is instructed to fail gracefully and continue. Fault isolation encoded directly in the prompt.

Sanity checks are the highest-value lines. Flagging "no email received today" catches the failure mode where the upstream monitoring system has broken silently — the exact failure mode that checklist-style human review tends to miss.

The prompt is a versioned artifact. As the environment changes — new email formats, new Confluence pages, connector upgrades — the prompt evolves. Treat it as a living document.

Honest Tradeoffs

This wouldn't be a useful post without naming the downsides.

LLM output is nondeterministic. Identical inputs can produce slightly different summaries across runs. I spot-check the raw email count against the agent's tally once a week.

Token truncation is real. A 73-row HTML table pushed through a context window can get silently trimmed. Defensive prompting mitigates but doesn't eliminate this.

The desktop-up dependency. In my setup, scheduled tasks fire when the desktop app is open. A laptop asleep at 9 AM means the report runs when you open it. Acceptable for a daily digest; not for time-critical monitoring.

Trust takes time to calibrate. The first week I ran this, I verified everything manually anyway. By week four, I'd stopped double-checking, and I had enough incident data to know the report was reliable. That calibration period is part of the deployment.

The Pattern Generalizes

Once you see the shape — agent reads heterogeneous sources at a known cadence, filters to deltas, surfaces only what requires human attention — it appears everywhere:

On-call handoff summaries (PagerDuty + Slack + Jira + APM, joined and summarized)
Sprint health reports (blocked tickets, aging tickets, no-owner tickets — weekly, automated)
Security alert triage (SOC mailers often emit 200 events to find 3 that matter)
Compliance evidence collection (SOC 2 renewals are largely a gathering problem, not an analysis problem)

The domain changes. The architecture doesn't.

The Mental Model Shift

The efficiency gain is real — roughly 25 minutes a day × 250 working days ≈ 100 hours annually. But that's not the most interesting outcome.

From push to pull. Most monitoring tools send you everything and you decide what matters. The cost of sending is zero for the system and enormous for the human. An agent flips that: I'll ask each morning for the delta.

From dashboards to digests. A dashboard is only valuable if someone looks at it. A daily digest that synthesizes five sources and filters to a single page is more operationally useful than five perfect dashboards, because it solves the actual constraint — attention — rather than the assumed one — visualization.

From checklists to deltas. The morning health check used to be: open these five places, confirm each is green. That's cognitive load spent confirming sameness. The right version is only tell me what changed.

What This Is, and Isn't

This is not an argument for replacing your observability infrastructure with LLM prompts. Your APM is doing things this agent will never do: real-time anomaly detection, distributed trace correlation, infrastructure topology mapping. The agent doesn't compete with that.

What the agent does is address the seam — the gap between the modern observability stack and the legacy outputs it can't ingest.

Making the seam visible is the first step. Closing it is the second. The agent is the bridge.

If your morning starts with "let me just check a few things in my email" — you have an observability seam. The tooling to close it is available now, and the implementation cost is a well-written prompt.

What observability seams do you have in your environment? Curious what data sources people are working around. Drop a comment.

Can AI Agents Replace Enterprise Workflow Orchestration? A Real-World Test — OpenClaw. n8n. Claude Dispatch. A side-by-side comparison..

Pranay Ravi — Sun, 17 May 2026 21:58:49 +0000

"A database administrator's honest investigation into whether the new wave of AI automation tools can handle enterprise-grade workflows — or whether the boring answer is still the right one."
tags: n8n, automation, database, devops

A database administrator's honest investigation into whether the new wave of AI automation tools can handle enterprise-grade workflows — or whether the boring answer is still the right one.

The workflow that started the question — and ended up answering it

Everyone was talking about these tools. I got curious.

I work as a database administrator. I support hundreds of databases across multiple environments — dev, staging, non-prod, production — in a HIPAA-regulated organization. Access management is a constant, grinding operational burden. Developers need access. Analysts need access. Application owners need access. And every single request needs to be approved, documented, and auditable.

For a long time, the process looked like this: someone would message the DBA, the DBA would manually create a Jira ticket and a Word document, chase down approvals from a manager, a database manager, and the security team, manually create the account, and email the credentials back. Days could pass. Follow-ups stacked up. The security team had to trust that the paperwork was right.

Then I started hearing about Claude Dispatch. And OpenClaw. Both were being described as AI tools that could receive a message and take action — automate tasks, call APIs, connect to services. The demos looked impressive. The communities were excited.

And I thought: wait. Could one of these actually solve the problem I have been living with for years?

I had already built something with n8n — a workflow automation tool. But I genuinely wanted to know whether I had picked the right tool, or whether something newer and smarter had passed it by. So I did what any engineer would do: I used my actual problem as the test.

The problem I needed to solve

Before comparing any tools, let me describe the workflow precisely, because the details are what make the comparison meaningful.

A developer — or a data analyst, or a product manager — needs access to a specific database. The request needs to:

Go to their direct manager for approval
Then to the database manager for approval
Then to the security team for final approval
Create a full audit trail at every step
Call a pre-existing API to provision the account with the correct access level
Deliver the credentials back to the requester
Spin up a Jira ticket for the network team to open the firewall port

Every approval is sequential — no step fires unless the previous one passes. If anyone says no, the chain stops and the requester is notified. If security does not approve, no account gets created. Full stop.

This is not a hypothetical. It runs in a regulated environment where access to production data is governed by HIPAA. The audit trail is not nice-to-have. It is required.

The access levels themselves are structured — not free text. Read only. Read/write. Dev owner. Application owner. DBA. Each maps to a specific API endpoint. Each produces a deterministic result. The central database inventory that drives all of this is a relational database populated automatically when infrastructure is created through Terraform.

Nobody should be able to bypass it.

Figure 1: The complete approval chain — from Webex message to provisioned credentials and dual Jira audit trail

First candidate: Claude Dispatch

Claude Dispatch is Anthropic's answer to the question: what if you could delegate tasks to your AI from your phone and come back to find them done? It lives inside Claude Cowork — a desktop agent product — and creates a persistent connection between your mobile app and the Claude Desktop app running on your computer.

The pitch is genuinely compelling. Send a message from your phone, Claude acts on your desktop: reads files, calls APIs, summarizes documents, delivers results. For personal productivity this is interesting. For ad-hoc delegation it is actually useful.

So I asked the obvious question: could Dispatch receive a Webex message, run an approval chain, call my database API, and write to Jira?

Here is where the investigation got honest quickly.

Dispatch requires the Claude Desktop app to be running on your computer. The moment the laptop sleeps, it stops.
There is no server. There is no always-on process. There is no execution log.
The workflow is driven by an LLM reasoning about what to do — not a deterministic set of rules. The same input could produce a different output on a different day.
There is no concept of a sequential approval gate. Claude does not wait for a human to respond before deciding the next step.
There is no audit trail. No timestamps. No record of who approved what and when.

A workflow that depends on a laptop staying awake is not an enterprise workflow. It is a personal convenience. There is nothing wrong with that — but it is a different category of tool entirely.

Cost-wise, Dispatch is bundled with Claude Pro at $20 per month or Max at $100 per month. Accessible pricing. But the architecture disqualifies it for this use case before the price even matters.

Second candidate: OpenClaw

OpenClaw is a different kind of tool. It is open-source, self-hostable, and designed as a personal AI assistant that runs on your own infrastructure. You can connect it to Webex, Telegram, Slack, WhatsApp — it listens on those channels and uses an LLM to decide what action to take in response to a message.

The self-hosted angle immediately made it more interesting for regulated environments. If you run it on a VPS rather than your laptop, it can operate 24/7. And because it is open-source under an MIT license, the software itself costs nothing. Your real costs are the VPS — roughly $5 to $15 per month — and the API tokens from whichever LLM provider you connect.

So OpenClaw gets past the first disqualification that knocked out Dispatch. It can stay on. Good start.

But then I pushed further:

OpenClaw has no concept of a deterministic approval chain. It reasons about what to do. If I ask it to get manager approval before proceeding, it will try — but there is no guarantee it handles every edge case the same way every time.
There is no built-in error handling or retry logic. If an API call fails, the agent may or may not handle it gracefully.
There are no execution logs in any structured, auditable format. The LLM's reasoning is not a HIPAA audit trail.
There is no native Jira integration. You can make API calls, but you are building that logic yourself in an environment with no visual workflow editor.
Setup requires real DevOps experience — Docker, VPS configuration, model routing. Not a weekend project for someone who just needs automation.

OpenClaw is genuinely impressive for what it is: a powerful, flexible personal AI assistant for technical users who want to automate their own workflows. It is not what I needed.

The core tension is this: OpenClaw lets an AI decide what to do. In a regulated environment, I need a system that does exactly what it is configured to do — every single time.

The one I already had: n8n

n8n is not the newest tool in this comparison. It is not the most talked-about. It does not have a viral GitHub repository or a growing community of people sharing AI agent demos. It is a workflow automation platform — visual, node-based, deterministic.

I had already built the database access workflow in n8n before I started this investigation. What the investigation forced me to do was articulate why it works where the others do not.

n8n runs on a server. Always on. No desktop dependency.
Every workflow execution is logged — step by step, with timestamps. If something fails, you know exactly where and why.
The approval chain is explicit: manager node fires, waits for a webhook response, branches on yes or no. Database manager node fires. Security node fires. No LLM is deciding the order. The order is the configuration.
Jira, Webex, and REST API integrations are native — no custom code required to connect them.
When security approves, a Jira ticket is created automatically with every approval documented: who approved, their role, the timestamp. A second ticket fires for the network team. The requester receives credentials via Webex.

The entire chain — from chat message to provisioned account — is deterministic, auditable, and server-side. It does not matter whether my laptop is on. It does not matter whether the LLM is having a creative day. The workflow does what it is built to do.

Putting them side by side

Here is what the investigation produced — not as opinion, but as a structured comparison against the actual requirements of the problem.

![Figure 2: Tool comparison — enterprise workflow criteria]

Figure 2: How the three tools compare across the criteria that actually matter for enterprise workflows

The green cells are not a coincidence. n8n was built for exactly this category of problem — structured, multi-step, multi-system, always-on automation with governance requirements. Dispatch and OpenClaw were built for a different problem: intelligent, flexible, personal task delegation.

Neither of those things is wrong. They are just different categories.

But is n8n actually accessible? Or is it just the enterprise answer no one wants to hear?

This is the question I kept coming back to. Because n8n has a reputation for requiring technical knowledge. Webhooks, JSON payloads, API authentication, conditional branching. It is not a no-code tool in the purest sense.

And yet — compared to OpenClaw, which requires DevOps expertise to host and configure, and compared to Dispatch, which requires you to trust an LLM to handle regulated processes — n8n is actually the most accessible path to a production-grade solution.

The visual builder is genuinely good. The template library covers most common patterns. The community is large. And the self-hosted Community Edition is free — you pay only for the server, which can be as little as $4 a month.

What n8n asks for is clarity of thought. You need to understand your process before you can automate it. That is not a technical barrier. That is just good engineering.

In 2008, I had my first exposure to BPM tools — business process management software. Back then, automating a multi-step approval workflow required consultants, enterprise licenses, and six-month implementation projects. n8n in 2026 is that same capability, accessible to a single engineer on a modest budget, in a weekend.

The AI-native tools will get there. The direction is right. But for workflows where consistency and auditability are non-negotiable, deterministic automation still wins. Not because AI is not impressive — it is. But because a HIPAA auditor does not accept "the AI usually does it right" as an answer.

What the real world added that no tool comparison captures

There is one thing I did not discover until the workflow was running: some users have managers on paper who are not actually the owners or decision-makers for the systems being requested.

Organizational charts say one thing. Actual accountability sits somewhere else. When the approval request went to the wrong person, the workflow stalled — not because the automation failed, but because the data it depended on was wrong.

No tool comparison surfaces this. You only find it when you run the thing on real people in a real organization.

This is one of the most useful things a well-designed workflow can do: make your organizational data gaps visible. The automation did not hide the problem. It exposed it. And that forced the fix.

So — can the new AI tools replace n8n for this?

No. Not yet. Not for this class of problem.

Claude Dispatch is a remote control for your desktop. It is well-designed and genuinely useful for personal delegation. But it has no server, no audit trail, and no deterministic logic — three things that are non-negotiable in a regulated environment.

OpenClaw is a powerful personal AI assistant that technical users can self-host and extend. It can call APIs and respond to messages. But it has no structured approval chain, no execution logging, and no enterprise governance features.

n8n is not the flashiest answer. It did not go viral. It does not use an LLM to decide what to do next. But it runs reliably, it logs everything, it integrates natively with Jira and Webex, and it does exactly what you configure it to do — every single time.

The right tool is not the newest tool. It is the tool that matches the shape of your problem. And some problems have a shape that requires determinism, not intelligence.

The question I started with — can these AI tools solve what n8n solves — turned out to be the wrong question. The better question is: what kind of automation are you building? If the answer involves regulated data, sequential human approvals, and a legal requirement to prove who did what and when, the answer is still n8n. If the answer involves personal productivity, intelligent delegation, and flexible task handling, Dispatch and OpenClaw are worth a serious look.

Both of those things can be true at the same time.

Principal Engineer | 16+ years in databases, cloud & observability | Oracle · PostgreSQL · Kafka · AWS · Splunk · AppD | Platform engineering & AI delivery

How I Built a Zero-Subscription Local AI Stack — Inspired by a 60-Second YouTube Short

Pranay Ravi — Sun, 17 May 2026 02:33:21 +0000

How I Built a Completely Free Local AI Stack — Inspired by a 60-Second YouTube Short

By Pranaychandra Ravi

It started with a YouTube Short. Someone on my feed casually demonstrated connecting a local AI model to Claude Code and I stopped mid-scroll. No API key. No subscription. No code leaving their machine. I had to know how it worked.

What followed was a deep dive into local AI — Ollama, Gemma4, Docker, Open WebUI, vector databases, context windows, and a Python script that made my local model generate an ASCII diagram of the Earth and Moon. This post documents everything I learned, every question I asked, and every mistake I made along the way. If you're curious about running AI entirely on your own hardware, this one is for you.

First Question: Wait, Is This Actually Free?

My first instinct was skepticism. Claude Code is Anthropic's product. Surely using it requires a Claude subscription?

The short answer is no — not when you pair it with Ollama and a local model.

Here's what I learned: Claude Code is the agent — the tool that reads your files, runs commands, edits code, and manages multi-step tasks in your terminal. By default it calls Anthropic's API, which costs money. But Claude Code exposes environment variables that let you redirect those API calls anywhere you want — including a local Ollama server running on your own machine.

Ollama added official support for Anthropic's Messages API format, meaning Claude Code can talk to it natively. No hacks, no middleware, no subscription. The only cost is your own electricity and hardware.

Claude Code  →  talks to  →  Ollama (local server)  →  runs  →  Your model
                              (no Anthropic servers involved)

So What Exactly Is Ollama?

Before I could set anything up I needed to understand what Ollama actually is, because "install Ollama" doesn't tell you much.

Think of Ollama as two things in one:

1. A model manager — it downloads, stores, and organizes AI models on your machine. Like a package manager but for AI brains.

2. A local API server — once running, it exposes an endpoint at http://localhost:11434 that any application can call. Your code, Claude Code, Open WebUI, VS Code extensions — anything that speaks the Anthropic or OpenAI API format can connect to it.

This is the key insight I kept coming back to: Ollama itself has no intelligence. It's an empty engine. You have to download a model — a large file containing all the AI's weights and knowledge — before anything useful happens.

Without a model:   Ollama = empty server, useless
With a model:      Ollama = fully local AI, free forever

Downloading Your First Model — Which One?

This is where hardware matters. I have:

32GB RAM
NVIDIA GPU with ~11GB VRAM
Core i9 processor

With an NVIDIA card, Ollama automatically uses CUDA — no setup needed. Your GPU handles inference and it's dramatically faster than CPU-only.

The key concept here is VRAM vs RAM:

Model fits in VRAM  →  GPU handles everything  →  Very fast ✅
Model too big for VRAM  →  spills into system RAM  →  Slower ⚠️

With 11GB VRAM I can fit most 7B–13B parameter models entirely in GPU memory, which means fast, snappy responses.

After thinking through my use cases — coding help, image analysis, document review — I landed on Gemma4 (Google's multimodal model, ~12GB). Here's why it beat out alternatives like Qwen3.6 (28GB):

	Gemma4	Qwen3.6
Size	~12GB	~28GB
Fits in 11GB VRAM	Nearly (tiny RAM overflow)	Partial (big RAM spill)
Image understanding	✅ Yes (multimodal)	❌ No
Coding quality	Good	Better
Speed on my hardware	Fast	Slower

My use cases included image-to-text extraction and converting images to coloring pages — Qwen3.6 can't do either because it's text-only. Gemma4 won.

ollama pull gemma4

One command. It downloads, verifies, and stores the model. You can see progress in the terminal.

The Architecture in Plain English

Before going further, I want to share the mental model that made everything click for me:

┌─────────────────────────────────────────────────────┐
│                    YOUR COMPUTER                    │
│                                                     │
│  ┌─────────────┐    ┌──────────────┐               │
│  │ Claude Code │───▶│    Ollama    │               │
│  │  (terminal) │    │ :11434 (API) │               │
│  └─────────────┘    └──────┬───────┘               │
│                            │                        │
│  ┌─────────────┐    ┌──────▼───────┐               │
│  │  Open WebUI │───▶│   Gemma4    │               │
│  │  (browser)  │    │  (the brain) │               │
│  └─────────────┘    └─────────────┘               │
│                                                     │
│  ┌─────────────┐                                   │
│  │  Python API │───▶ http://localhost:11434        │
│  │   scripts   │                                   │
│  └─────────────┘                                   │
└─────────────────────────────────────────────────────┘
              Zero data leaves your machine

Three different interfaces. One local model. Everything private.

Context Windows — What Are They and Why Do They Matter?

One of the most important concepts I clarified was the context window — the model's working memory. It's the maximum amount of text a model can "see" at once in a conversation. Exceed it and it starts forgetting the beginning.

Here's the reality check comparison:

	Claude Sonnet 4.5	Gemma4 (local)
Context window	200,000 tokens	~8,000–32,000 tokens
Approximate pages	~150,000 words	~6,000–24,000 words
6 years of tax docs	Handles comfortably	Would overflow

Your VRAM directly affects how large a context window your local model can hold. More VRAM = more of the model loaded = bigger context available.

You can manually increase it:

ollama run gemma4 --ctx-size 32768

For single documents, images, or focused coding tasks — perfectly fine. For analyzing six years of tax filings all at once? That's where Claude's 200k context is a genuine advantage local models can't match yet.

Can Local Models Search the Internet?

Short answer: No, not by default.

Local models are frozen at their training date. They have no internet connection during your conversation. This was an important distinction to understand.

Claude (this chat)  →  Has web search tool  →  Knows current events ✅
Gemma4 (local)     →  No internet          →  Knowledge frozen at training ❌

This raised an interesting follow-up question though. When I used Gemini to analyze my tax filing and it spotted mistakes — was it searching the internet to find them?

No. And this was a real misconception I had.

Gemini found tax errors because tax law, IRS rules, and common filing mistakes were baked into the model during training. It learned from millions of tax documents, accounting textbooks, and IRS publications. During your session it's not googling anything — it's applying trained knowledge to your specific document.

Think of it like a tax accountant. They studied tax law for years. When reviewing your return they're not searching Google — they're applying what they already know to what you show them.

Local models work the same way. The difference is:

Gemini/Claude: More recent training data, larger knowledge base, up-to-date tax law changes
Gemma4 local: Good foundational knowledge, may be slightly behind on very recent rule changes, but your documents never leave your machine

For sensitive financial documents, that privacy trade-off is significant.

Connecting Claude Code to Gemma4

This was surprisingly simple. Claude Code reads three environment variables:

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL=http://localhost:11434

Or using Ollama's built-in launcher:

ollama launch claude

When Claude Code started up I saw this at the bottom of the welcome screen:

gemma4 · API Usage Billing · pranayraavi@gmail.com's Organization

That confirms it's using Gemma4 through Ollama. No Anthropic billing. No subscription.

What you get with this setup:

✅ File reading and editing across your project
✅ Terminal command execution
✅ Multi-step agentic coding tasks
✅ Git operations
✅ MCP connectors and plugins
✅ Project context awareness
⚠️ Intelligence capped at Gemma4's capability (weaker than Claude Sonnet/Opus)

The Python API Test

Before setting up a GUI I wanted to confirm the raw API worked. Here's the script I wrote:

import requests

def chat(prompt):
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "gemma4",
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

print(chat("Write a hello world in ascii diagram of moon and earth"))

Output:

          (           )
         /              \
  ----(---O---)    (------)  <-- Orbit Path
 /  /   \    /  /   \
|   |     | | |     |   |

Gemma4, running entirely on my machine, responding to a Python script. No API key. No internet. Completely local. This was the moment it really clicked.

Setting Up Open WebUI — The ChatGPT-Like Interface

For a proper GUI I went with Open WebUI — a beautiful, feature-rich interface that runs locally and connects to Ollama.

First attempt using pip failed because I had Python 3.13 and Open WebUI requires Python 3.11 or 3.12:

ERROR: Could not find a version that satisfies the requirement open-webui

So I went the Docker route instead.

Installing Docker Desktop

Docker Desktop is free for personal use. Download from docker.com/products/docker-desktop. During install, WSL 2 backend gets configured automatically on Windows.

Running Open WebUI

docker run -d `
  -p 127.0.0.1:3000:8080 `
  --name open-webui `
  -v open-webui:/app/backend/data `
  --add-host=host.docker.internal:host-gateway `
  ghcr.io/open-webui/open-webui:main

I initially tried -p 3000:80 which caused a port conflict (another process was using port 3000 on my machine). Switching to -p 127.0.0.1:3000:8080 fixed it.

Confirmed it was running:

netstat -ano | findstr :3000
# TCP  0.0.0.0:3000  LISTENING  ← Docker up and running

curl http://localhost:3000
# StatusCode: 200 OK  ← Server responding

Then opened http://localhost:3000 in Chrome and saw the Open WebUI interface with Gemma4 auto-detected.

First Real Test — Image to Text Extraction

One of the reasons I picked Gemma4 over Qwen3.6 was its multimodal capability — it can actually see images. I put this to the test immediately.

I had a photo of handwritten chess notes and uploaded it directly into the Open WebUI chat. The prompt was simple: "convert this image to text".

Gemma4 thought for 11 seconds and returned:

FORK/DOUBLE ATTACK

When we attack two or more pieces at the same time then it is known
as fork or double attack

Note- Knights are good at making fork.

That's a perfect transcription of handwritten text — extracted entirely locally, no cloud OCR service, no API key, nothing leaving my machine. It even generated a relevant follow-up suggestion: "Are there other kinds of tactical attacks besides forks, like pins or skewers?"

This is the multimodal capability in action:

✅ Handwritten text extracted accurately
✅ Context understood (chess notes)
✅ Intelligent follow-up suggested
✅ 100% local — image never left my PC
✅ Free

For anyone with scanned documents, handwritten notes, receipts, or any image containing text — this works out of the box with Gemma4 in Open WebUI.

Document Upload and RAG — How It Actually Works

One of the most powerful features of Open WebUI is document upload with RAG (Retrieval Augmented Generation). This is how you can upload your AWS docs, tax returns, or any PDFs and chat with them.

Here's what happens under the hood:

You upload PDF
      ↓
Open WebUI splits it into chunks
      ↓
Converts chunks to embeddings (mathematical vectors)
      ↓
Stores in ChromaDB (local vector database)
      ↓
You ask a question
      ↓
ChromaDB finds the most relevant chunks
      ↓
Sends chunks to Gemma4 as context
      ↓
Gemma4 answers based on YOUR document

Everything is stored locally at:

C:\Users\lavan\AppData\Roaming\open-webui\data\
  📁 vector_db    ← document embeddings (ChromaDB)
  📁 uploads      ← original files
  📄 webui.db     ← chat history (SQLite)

Your documents never leave your machine. ChromaDB is completely free and open source.

One important limitation: RAG finds relevant chunks, not the entire document. If an answer spans many sections of a large document, it might miss some context. The workaround is to upload smaller, focused documents rather than one giant PDF.

The Full Stack — What I Now Have Running

✅ Ollama          — model manager and local API server
✅ Gemma4          — the AI model (multimodal, ~12GB)
✅ Claude Code     — agentic coding with local model
✅ Open WebUI      — browser-based chat interface with document upload
✅ Python API      — scripts calling the model directly

Total monthly cost: $0

When to Use What

After going through all of this, here's the practical split I settled on:

Task	Use
Coding with file editing	Claude Code + Gemma4
Image analysis / image to text	Open WebUI + Gemma4
Document Q&A (private)	Open WebUI + RAG + Gemma4
Web research / current events	Claude.ai or Perplexity
Complex reasoning / large context	Claude.ai (paid)
Tax doc analysis (all years)	Claude.ai or NotebookLM
Quick Python scripts calling AI	Direct Ollama API

Honest Reflections

What surprised me: How straightforward the setup actually was once I understood the mental model. Ollama is the server, the model is the brain, everything else just connects to it.

What I underestimated: The quality gap between local models and Claude Sonnet/Opus is real. For simple tasks Gemma4 is impressive. For complex multi-step reasoning, Claude's frontier models are noticeably stronger.

What I'd tell myself at the start: Local AI is not a replacement for cloud AI — it's a complement. Use local for private, repetitive, or experimental tasks. Use cloud AI for research, complex reasoning, and anything that benefits from a larger context window.

The privacy win is real: For sensitive documents — financial records, personal data, proprietary code — local AI is genuinely better from a privacy standpoint. Your data does not leave your machine. Full stop.

Resources

Ollama: ollama.com
Open WebUI: openwebui.com
Claude Code: claude.ai/code
Ollama + Claude Code docs: docs.ollama.com/integrations/claude-code
Docker Desktop (free): docker.com/products/docker-desktop

All of this runs on a Windows machine with 32GB RAM, an NVIDIA GPU with ~11GB VRAM, and a Core i9 processor. If you have similar hardware you can replicate this entire stack in an afternoon.