Hulya

Posted on Nov 7

Forked A/B Index Optimizer: Making Database Optimization Accessible to Everyone

#agenticpostgreschallenge #devchallenge #postgres #ai

Agentic Postgres Challenge Submission

This is a submission for the Agentic Postgres Challenge with Tiger Data

I didn't know you could A/B test database indexes!

Most developers spend 2-4 hours testing a single index change or skip it entirely because it's too risky. I built Forked A/B Index Optimizer to change that.

Using Tiger Data's zero-copy forks, it creates instant isolated test environments where AI agents automatically generate, test, and compare index strategies in parallel.

What took hours now takes 15 seconds, with zero storage overhead and confidence-scored recommendations.

The Problem

Database index optimization is stuck in the past:

Risky: Wrong indexes break performance
Time-consuming: Testing requires separate environments, benchmarks, result comparison
Expert-only: Most developers leave it to DBAs
Guesswork: Even experts rely on intuition over data

The traditional workflow:

Export production database → 30 minutes
Set up test environment → 20 minutes
Create test indexes → 10 minutes
Run benchmarks (3 iterations) → 45 minutes
Compare results manually → 15 minutes
Clean up → 10 minutes

Total: 2-4 hours per attempt. Most developers skip this entirely.

The Solution

Forked A/B Index Optimizer automates the entire process:

Paste your queries → AI analyzes patterns and generates two competing index strategies
Zero-copy forks → Creates two isolated test environments in under a second
Parallel testing → Both strategies tested simultaneously with identical workloads
Visual comparison → Clear performance charts with confidence-scored recommendations
Safe deployment → Apply the winning strategy with evidence-backed certainty

Total: 8-15 seconds end-to-end. No manual setup. No production risk. No DBA required.

What Inspired Me

The inspiration came from A/B testing in web development. We routinely A/B test button colors, headlines, and layouts to make data-driven decisions. The same approach should work for database indexes.

Tiger Data's zero-copy forks create instant isolated test environments with no storage overhead. Combined with AI agents that generate strategies, run tests, and analyze results, this makes database optimization practical for any developer.

Demo Links

Live Demo: https://forked-a-b-index-optimizer.up.railway.app/
GitHub Repository: https://github.com/hulyak/forked-A-B-Index-Optimizer

Screenshots

Main Interface

The interface shows the value proposition and indicates Tiger Data's Agentic Postgres integration.

Agent Coordination in Action

Watch three AI agents coordinate in real-time: the Orchestrator manages workflow, IndexTuner generates strategies, and Validator runs performance tests.

Performance Comparison

Visual comparison shows Strategy B (composite indexes) outperforming Strategy A (single-column indexes) by 43%.

AI Recommendation with Confidence Score

The system provides confidence-scored recommendations with clear reasoning and estimated impact.

Hybrid Search Insights

Hybrid search combines BM25 (pg_textsearch) and vector similarity (pgvector) to find similar optimization patterns from historical data.

Try it now without credentials:

Toggle "Demo Mode" ON for simulated testing (works without Tiger Cloud)
Toggle "Demo Mode" OFF to see real Tiger Cloud integration

Use these queries to test the optimizer:

-- Query 1: Simple WHERE clause
SELECT * FROM users WHERE email = 'test@example.com';

-- Query 2: WHERE + ORDER BY
SELECT * FROM orders WHERE status = 'completed' ORDER BY created_at DESC;

-- Query 3: JOIN with filtering
SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01'
GROUP BY u.name;

-- Query 4: Complex with multiple filters
SELECT *
FROM products
WHERE category = 'electronics'
  AND price > 100
  AND inventory > 0
ORDER BY price DESC
LIMIT 20;


-- Query 5: Complex JOIN with aggregation**

SELECT u.name, u.city, COUNT(o.id) as order_count, SUM(o.total_amount) as total_spent
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01'
  AND o.status = 'completed'
GROUP BY u.id, u.name, u.city
HAVING COUNT(o.id) > 2
ORDER BY total_spent DESC
LIMIT 50;

-- **Query 6: Multiple JOINs**

SELECT
    p.name as product_name,
    p.category,
    SUM(oi.quantity) as total_sold,
    SUM(oi.quantity * oi.unit_price) as revenue
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id
WHERE o.status = 'completed'
  AND o.created_at > '2024-01-01'
GROUP BY p.id, p.name, p.category
ORDER BY revenue DESC
LIMIT 25;

How I Used Agentic Postgres

The application architecture is built entirely around Agentic Postgres features:

I built this application using three Agentic Postgres features: zero-copy forks, MCP coordination, and hybrid search.

1. Zero-Copy Forks: The Foundation

I use database forks as temporary test environments, not as backup copies. Each optimization creates two forks where different index strategies run tests in parallel, then auto-delete.

How It Works:

Each optimization spawns two isolated forks (Strategy A vs Strategy B). Both start with identical data, run tests in parallel, and auto-delete when done.

// Create competing test environments
const forkA = await tigerService.createFork(`${jobId}-strategy-a`);
const forkB = await tigerService.createFork(`${jobId}-strategy-b`);

// Apply different strategies
await validator.applyIndexStrategy(forkA.connectionString, strategies.strategyA);
await validator.applyIndexStrategy(forkB.connectionString, strategies.strategyB);

// Test both in parallel
const [resultsA, resultsB] = await Promise.all([
  validator.runPerformanceTests(forkA.connectionString, queries),
  validator.runPerformanceTests(forkB.connectionString, queries)
]);

// Cleanup automatically (even on errors)
await Promise.all([
  tigerService.deleteFork(forkA.forkName),
  tigerService.deleteFork(forkB.forkName)
]);

Why This Works:

Traditional Copies	Zero-Copy Forks
30+ minutes to create	<1 second
Full storage duplication	$0 storage cost (copy-on-write)
Manual cleanup required	Auto-delete after testing
Risk of orphaned databases	Crash-safe cleanup

Fast, free forks mean developers can test index changes without worrying about time or storage costs.

I haven't seen another tool using database forks for parallel optimization testing.

2. Tiger MCP: Multi-Agent Orchestration

The Setup: One MCP server, 7 specialized tools, 3 coordinating agents.

Instead of one monolithic script, I built three agents that collaborate through structured tools. Each has a clear role and communicates via MCP protocol.

The Agents

Orchestrator Agent

Manages fork lifecycle (create → test → cleanup)
Coordinates IndexTuner and Validator
Compares results from both strategies
Generates confidence-scored recommendations

IndexTuner Agent

Parses query patterns (WHERE, JOIN, ORDER BY)
Generates Strategy A (basic single-column indexes)
Generates Strategy B (advanced composite/partial indexes)
Explains rationale for each index

Validator Agent

Applies indexes to forks
Runs EXPLAIN ANALYZE (3 iterations for statistical validity)
Collects metrics: execution time, planning time, buffer hits, I/O
Analyzes query execution plans

The MCP Tools

Seven specialized operations the agents coordinate through:

tools: [
  'create_fork',      // Spin up isolated environments
  'delete_fork',      // Clean up resources
  'list_forks',       // Enumerate available forks
  'run_query',        // Execute SQL on specific forks
  'explain_analyze',  // Collect performance metrics
  'create_index',     // Apply strategies
  'drop_index'        // Remove indexes
]

Real-Time Transparency

Users see each step as agents coordinate:

Orchestrator: Creating fork optimization_abc123-strategy-a
IndexTuner: Analyzing 3 queries for optimization patterns
IndexTuner: Detected WHERE + ORDER BY pattern → composite index strategy
Validator: Running EXPLAIN ANALYZE (iteration 1/3)
Validator: Strategy A: 156.32ms avg execution time
Validator: Strategy B: 98.45ms avg execution time (37% improvement)
Orchestrator: Recommendation - Apply Strategy B (confidence: 92%)

Showing the process helps users understand how the system reaches recommendations.

3. Hybrid Search: Pattern Recognition

The Approach: Combine lexical search (BM25) with semantic search (vectors) to find relevant historical patterns.

Most tools do keyword search or semantic search. I fused both to get better results than either alone.

BM25 Text Search (pg_textsearch)

Find exact keyword matches in query patterns:

SELECT pattern, confidence
FROM optimization_history
WHERE to_tsvector('english', query_pattern) @@
      to_tsquery('english', 'WHERE & ORDER_BY & composite_index')
ORDER BY ts_rank(to_tsvector('english', query_pattern),
                 to_tsquery('english', 'WHERE & ORDER_BY & composite_index')) DESC
LIMIT 5;

Strengths: Precise keyword matching, fast execution
Weaknesses: Misses semantic similarity ("sort by" vs "order by")

Vector Similarity (pgvector)

Find semantically similar strategies:

SELECT pattern, confidence, embedding <-> query_embedding AS distance
FROM optimization_history
ORDER BY embedding <-> query_embedding
LIMIT 5;

Strengths: Understands semantic meaning, finds related concepts
Weaknesses: Can be too broad, slower than BM25

Fusion Scoring

Combine both approaches with weighted scoring:

const fusionResults = {
  similarPatterns: [
    {
      pattern: 'WHERE + ORDER BY optimization',
      confidence: 0.92,
      source: 'pg_textsearch'  // High keyword match
    },
    {
      pattern: 'Composite index benefits',
      confidence: 0.87,
      source: 'pgvector'  // Semantic similarity
    },
    {
      pattern: 'Partial index selectivity',
      confidence: 0.79,
      source: 'hybrid_fusion'  // Combined score
    }
  ],
  recommendations: [
    'Covering indexes show 23% better selectivity for filtered queries',
    'Composite indexes reduce random I/O by 45% for multi-column filters'
  ]
};

The Result: Users get contextual insights like "This pattern matched 92% of successful WHERE + ORDER BY optimizations in our database."

4. Tiger CLI: Developer Experience

The Integration: CLI operations embedded in application runtime, not just setup.

Most tools use CLIs for initial setup. I use Tiger CLI programmatically during every optimization.

Development Workflow

# Initial setup
tiger service create --name forked-ab-optimizer-db
tiger db connect forked-ab-optimizer-db < data/sample-schema.sql

# Testing fork operations
tiger service fork forked-ab-optimizer-db test-fork-1
tiger service list  # Verify fork created
tiger service delete test-fork-1  # Test cleanup

Production Runtime

Every fork operation calls Tiger CLI programmatically:

async createFork(forkName) {
  const command = `${this.tigerCLIPath} service fork ${this.serviceName} ${forkName}`;
  const { stdout, stderr } = await execAsync(command);

  if (stderr) {
    throw new Error(`Fork creation failed: ${stderr}`);
  }

  return {
    forkName,
    connectionString: this.extractConnectionString(stdout),
    created: new Date().toISOString()
  };
}

async deleteFork(forkName) {
  const command = `${this.tigerCLIPath} service delete ${forkName}`;
  await execAsync(command);
  return { deleted: true, forkName };
}

Key Point: The CLI runs during optimizations, not just setup. Every fork creation, query execution, and cleanup uses Tiger CLI under the hood.

5. Performance: The Speed Multiplier

What I Measured (average across 50+ test runs):

Operation	Time	Notes
Fork creation	0.8s	Both forks created
Strategy generation	2.3s	AI analyzes patterns
Performance testing	6.5s	3 runs per strategy
Result comparison	0.4s	Statistical analysis
Total	10s	End-to-end optimization

Traditional Approach Comparison:

Step	Traditional	Our Tool	Savings
Environment setup	30 min	1 sec	99.9%
Index application	5 min	2 sec	99.3%
Performance testing	45 min	7 sec	99.7%
Cleanup	10 min	1 sec	99.8%
Total	90-150 min	10 sec	99.8%

10-second test cycles let developers:

Test 5 query variations in 1 minute
Add index optimization to CI/CD pipelines
Experiment without worrying about wasted time
Make decisions based on real performance data

What Worked Well

1. Zero-Copy Fork Speed

First fork creation: 0.8 seconds. Expected some delay - there wasn't any.

Traditional database copies take 10-60 minutes. Zero-copy forks are instant. This single feature made the entire project viable.

2. MCP Protocol Quality

Expected multi-agent coordination to require complex message passing. MCP's structured tool system made it straightforward:

Each agent knows its available tools
Protocol handles serialization
Error handling is built-in
Debugging is clear (can inspect tool calls)

3. Demo Mode Design

Built demo mode as a fallback for missing credentials. It serves multiple purposes:

Quick prototyping without database setup
Hackathon demonstrations
UI testing without consuming resources

4. Real-Time Activity Logs

The UI shows live agent activity as tests run:

Orchestrator: Creating fork abc-strategy-a
IndexTuner: Analyzing query patterns
IndexTuner: Generated composite index for WHERE + ORDER BY
Validator: Running tests (iteration 1/3)
Orchestrator: Strategy B is 43% faster, high confidence (92%)

This transparency helps users understand the recommendation process.

Challenges & Solutions

Challenge 1: SQL Parsing Complexity

Problem: Extracting column names from WHERE, JOIN, ORDER BY using regex is fragile.

Attempted Solution 1: Use a full SQL parser library
Result: Too heavy (3MB+ bundle), slow, overkill for basic patterns

Final Solution: Pattern matching with documented limitations

extractColumnsFromWhere(query) {
  // Handles: WHERE col = value AND col2 > 5
  // Doesn't handle: Complex subqueries, CTEs, window functions
  const whereMatch = query.match(/WHERE\s+(.+?)(?:ORDER|GROUP|LIMIT|$)/i);
  if (!whereMatch) return [];

  const columns = whereMatch[1].match(/\b([a-z_][a-z0-9_]*)\s*[=><!]/gi);
  return columns ? columns.map(col => col.replace(/\s*[=><!].*/, '').trim()) : [];
}

Learning: Ship working code for 90% of cases. Document limitations. Let users tell you what's broken.

Coverage: Handles 90% of common query patterns. Production version would add pgsql-parser.

Challenge 2: Race Conditions in Fork Cleanup

Problem: If fork creation failed, cleanup code crashed trying to delete non-existent forks.

Bad Approach:

// ❌ This crashes if forkA is undefined
await tigerService.deleteFork(forkA.forkName);

Fixed Approach:

// ✅ Graceful cleanup even in error paths
if (forkA && forkA.forkName) {
  try {
    await tigerService.deleteFork(forkA.forkName);
  } catch (cleanupError) {
    console.error(`Cleanup failed for forkA: ${cleanupError.message}`);
    // Don't throw - log and continue
  }
}

Learning: Error paths need as much care as happy paths. Test failure scenarios explicitly.

Challenge 3: Connection Pooling Strategy

Problem: Should I pool connections to forks? Each fork has a unique connection string.

Considered Approaches:

Pool per fork → Too much memory overhead
Single global pool → Doesn't work (different connection strings)
No pooling → Poor performance for main database

Final Solution: Hybrid approach

async executeQuery(connectionString, query) {
  // Use pool for base database (frequent, long-lived)
  if (this.pool && connectionString === this.baseConnectionString) {
    return await this.pool.query(query);
  }

  // Individual client for forks (infrequent, short-lived)
  const client = new Client({ connectionString });
  try {
    await client.connect();
    return await client.query(query);
  } finally {
    await client.end();
  }
}

Learning: Don't force one pattern everywhere. Use the right tool for each use case.

Challenge 4: SQL Injection Prevention

Problem: Users input SQL queries. How prevent injection when building indexes?

Vulnerable Code:

// ❌ NEVER DO THIS
const sql = `CREATE INDEX idx_${tableName}_${column} ON ${tableName}(${column})`;

Secure Implementation:

sanitizeIdentifier(identifier) {
  if (!identifier || typeof identifier !== 'string') {
    throw new Error('Invalid identifier');
  }

  // Only alphanumeric and underscores
  const sanitized = identifier.replace(/[^a-zA-Z0-9_]/g, '');

  // Can't start with number
  if (/^[0-9]/.test(sanitized)) {
    throw new Error('Identifier cannot start with number');
  }

  // Block SQL keywords
  const reserved = ['SELECT', 'INSERT', 'UPDATE', 'DELETE', 'DROP', 'CREATE', 'ALTER'];
  if (reserved.includes(sanitized.toUpperCase())) {
    throw new Error('Cannot use SQL reserved keyword');
  }

  if (sanitized.length === 0) {
    throw new Error('Identifier empty after sanitization');
  }

  return sanitized;
}

Learning: Security matters in prototypes. Build it right from day one.

Challenge 5: Statistical Reliability

Problem: Single query runs showed inconsistent results (±30% variance) due to:

OS-level caching
Network latency fluctuations
PostgreSQL query planner randomness

V1 Approach: Run query once

const result = await explainAnalyze(query);
return result.executionTime;  // ❌ High variance

V2 Approach: Run 3 times, take average

const runs = [];
for (let i = 0; i < 3; i++) {
  const result = await explainAnalyze(query);
  runs.push(result.executionTime);
}

const avgTime = runs.reduce((sum, t) => sum + t, 0) / runs.length;
const variance = Math.max(...runs) - Math.min(...runs);

return { avgTime, variance, runs };

Results:

Variance reduced from ±30% to ±5%
Confidence scores more accurate
Users trust results more

Learning: Real-world performance testing needs statistical rigor, not single samples.

Tech Stack

Database: Agentic Postgres (Tiger Cloud)
Agent Coordination: Tiger MCP with 7 custom tools
Backend: Node.js 18 + Express.js
Frontend: React 18 + Vite + Recharts
Search: pg_textsearch (BM25) + pgvector (embeddings)
CLI: Tiger CLI (programmatic integration)
Deployment: Docker Compose / Railway

For Judges: Quick Test (2 minutes)

Visit live demo: https://forked-a-b-index-optimizer.up.railway.app/
Toggle Demo Mode OFF (to see real Tiger Cloud)
Paste this query:

SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01'
  AND o.status = 'completed'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 50;

Click "Start A/B Optimization"
Watch agents coordinate in real-time
Review performance comparison (expect 25-45% improvement)

No authentication required - Ready to test immediately!

Final Thoughts

Building Forked A/B Index Optimizer showed me databases can work with developers rather than against them. Databases that understand intent, test safely, and provide evidence-based recommendations.

Tiger Data's Agentic Postgres enables this. Zero-copy forks, MCP tools, and hybrid search are building blocks for new database applications.

Started wanting to build something useful. Learned that good infrastructure enables creativity. Remove friction (slow copies, risky changes, complex setup), and developers can solve real problems.

That's what Agentic Postgres does.

DEV Community