Nolwen Sean

Posted on Nov 9

NeuroBase: AI-Powered Conversational Database with Multi-Agent Intelligence

#devchallenge #agenticpostgreschallenge #ai #postgres

Agentic Postgres Challenge Submission

NeuroBase is an intelligent, self-learning conversational database system that transforms PostgreSQL into a cognitive system. It features two powerful modes:

🗣️ Interactive Natural Language Mode

Talk to your database - no SQL required. NeuroBase understands your questions, generates optimized SQL, learns from corrections, and gets smarter with every interaction.

NeuroBase> Show me users who signed up today

🧠 Analyzing query...
📝 Generated SQL:
   SELECT * FROM users WHERE created_at::date = CURRENT_DATE;

⚡ Execution time: 23ms
💡 Learned: "users who signed up today" → created_at::date filter

🤖 Multi-Agent Orchestration Mode

Autonomous AI agents work in parallel on isolated database forks to handle schema evolution, query validation, learning aggregation, and A/B testing - all without touching production data.

What inspired me? A friend sent me this challenge and dared me to take it on. I've always wanted to see how easy (or hard!) it would be to integrate AI directly into databases - not just on top of them, but inside them. This challenge was the perfect opportunity. Turns out, with Tiger Data's agentic features, making databases truly intelligent is surprisingly elegant!

Core Features

Natural Language Interface:

SQL-free queries in plain English
Context-aware automatic SQL generation
Continuous learning from interactions
Conversation history and context retention
Multi-LLM support (OpenAI, Claude, Ollama)

Multi-Agent System:

Specialized agents (Schema Evolution, Query Validator, Learning Aggregator, A/B Testing)
Intelligent fork management (isolated environments per agent)
Real-time dashboard with live metrics and charts
Asynchronous task processing with automatic execution
Inter-agent data synchronization with conflict resolution
Free plan friendly with shared database mode

Demo

🔗 GitHub Repository: github.com/4n0nn43x/neurobase

Quick Start - Interactive Mode

git clone https://github.com/4n0nn43x/neurobase
cd neurobase
npm install
cp .env.example .env  # Configure your database and LLM provider
npm start  # Start interactive CLI

How I Used Agentic Postgres

I leveraged Tiger Data's agentic features to create a truly intelligent system:

🐅 Tiger CLI for Dynamic Fork Management

Each agent can have its own isolated database fork for safe experimentation:

// Create agent with dedicated fork
const agent = await orchestrator.registerAgent({
  name: 'Schema Evolution Agent',
  type: 'schema-evolution',
  forkStrategy: 'now',  // Instant zero-copy fork
  cpu: '0.5',
  memory: '2Gi',
  enabled: true
});

// Fork strategies supported:
// - 'now': Current state
// - 'last-snapshot': Previous snapshot
// - 'to-timestamp': Specific point in time
// - 'shared': No fork (free plan friendly)

Why this matters: Agents can test schema changes, validate queries, and run experiments without any risk to production data. Tiger's copy-on-write forks make this instant and lightweight.

🔍 pg_tsvector for Semantic Search

The Learning Aggregator agent uses PostgreSQL full-text search to find patterns:

// Find relevant learnings using semantic search
const insights = await pool.query(`
  SELECT *,
         ts_rank(search_vector, plainto_tsquery('query optimization')) as rank
  FROM neurobase_learnings
  WHERE search_vector @@ plainto_tsquery('query optimization')
  ORDER BY rank DESC, confidence DESC
  LIMIT 10
`);

The interactive mode also uses tsvector to remember past queries and improve translation accuracy:

// Store learned patterns with semantic indexing
CREATE TABLE neurobase_learnings (
  id SERIAL PRIMARY KEY,
  natural_language TEXT,
  generated_sql TEXT,
  search_vector tsvector GENERATED ALWAYS AS (
    to_tsvector('english', natural_language || ' ' || COALESCE(generated_sql, ''))
  ) STORED,
  confidence NUMERIC DEFAULT 1.0
);

CREATE INDEX idx_learnings_search ON neurobase_learnings USING GIN(search_vector);

💾 Fast Forks for A/B Testing

The A/B Testing agent creates parallel forks to compare strategies:

// Test multiple indexing strategies simultaneously
const experiment = await abTesting.createExperiment(
  "Index Strategy Comparison",
  "Which index type performs better?",
  [
    { name: 'btree-strategy', sql: 'CREATE INDEX USING btree...' },
    { name: 'gin-strategy', sql: 'CREATE INDEX USING gin...' },
    { name: 'brin-strategy', sql: 'CREATE INDEX USING brin...' }
  ]
);

// Each strategy runs on its own fork
await abTesting.startExperiment(experiment.id);
const results = await abTesting.analyzeResults(experiment.id);
console.log(`Winner: ${results.winner.name} with ${results.winner.speedup}x improvement`);

Tiger's fast forks (2-3 seconds) make this practical - you can test dozens of strategies in minutes, not hours.

🔄 Fork Synchronization for Knowledge Sharing

Agents share discoveries through selective synchronization:

// Sync insights from agent fork back to main database
const syncJob = await synchronizer.createSyncJob({
  source: agentFork.connectionString,
  target: mainDatabase.connectionString,
  tables: ['neurobase_learnings', 'neurobase_optimizations'],
  mode: 'incremental',        // Only new records
  conflictResolution: 'source-wins',
  batchSize: 100
});

await synchronizer.executeSync(syncJob.id);

This ensures all agents benefit from each other's learnings.

🆓 Graceful Degradation for Free Plan

Hit the service limit? No problem - shared database mode:

// Free plan friendly: multiple agents, one database
const validator = await orchestrator.registerAgent({
  name: 'Query Validator',
  type: 'query-validator',
  useFork: false,          // Uses shared mainPool
  forkStrategy: 'shared',  // No fork creation
  enabled: true
});

This was crucial! I hit the free plan's service limit early during development. Instead of being blocked, I implemented shared database mode where agents work together on the main database. Perfect for testing and small-scale use.

🧠 Natural Language Understanding with Schema Awareness

The interactive CLI uses Tiger's connection to introspect schema and generate accurate SQL:

// Load schema with Tiger's fast INFORMATION_SCHEMA queries
const tables = await pool.query(`
  SELECT table_name, column_name, data_type
  FROM information_schema.columns
  WHERE table_schema = 'public'
  ORDER BY table_name, ordinal_position
`);

// Feed to LLM for context-aware translation
const prompt = `
Given these tables: ${JSON.stringify(tables)}
Translate to SQL: "${userQuery}"
`;

Overall Experience

🎉 What Worked Exceptionally Well

Tiger Data's fork speed is mind-blowing! Creating a complete database copy in 2-3 seconds (not minutes or hours!) made the entire multi-agent architecture practical. Without this, agents would spend more time waiting for forks than doing actual work.

Copy-on-write is brilliant - I expected running 4-5 agents with separate forks to be resource-heavy, but Tiger's implementation is surprisingly lightweight. Memory usage stayed reasonable even with multiple active agents doing parallel work.

The Tiger CLI is beautifully simple - Commands like tiger service fork and tiger service list just work. No complex configuration files, no wrestling with parameters. The UX is on point.

pg_tsvector search is blazingly fast - Full-text search across thousands of learned patterns returns results in single-digit milliseconds. No need for external search infrastructure like Elasticsearch.

Developer experience - Going from idea to working multi-agent system took less time than expected, largely because Tiger's features are well-designed and composable.

😮 What Surprised Me

The free plan limitation became a feature! When I hit the service limit, I initially thought "well, that's it for testing." Instead, it forced me to implement shared database mode, which actually made the system more flexible. Now users can choose:

Free/Development: Multiple agents on one database (no forks)
Production: Each agent gets isolated fork (better safety)

Natural language to SQL is harder than multi-agent orchestration! I thought the multi-agent system would be the complex part, but turns out getting LLMs to consistently generate correct SQL with proper schema awareness is the real challenge. Context management and prompt engineering took significant iteration.

Inter-agent synchronization patterns are fascinating - Watching agents discover insights independently, then sync and build on each other's findings feels like watching distributed intelligence emerge. It's closer to biological learning than traditional programming.

pg_tsvector semantic search punches way above its weight - I expected to need pgvector for semantic capabilities, but PostgreSQL's built-in full-text search with tsvector handles most use cases remarkably well. Only truly complex semantic reasoning needs vector embeddings.

DEV Community