Harish Kotra (he/him)

Posted on Nov 10

DevMind - AI-Powered Developer Second Brain

#devchallenge #agenticpostgreschallenge #ai #postgres

Agentic Postgres Challenge Submission

Ever spent hours searching for a solution you know you've solved before?

DevMind is your intelligent knowledge companion that captures, organizes, and retrieves your technical learnings using natural language and AI-powered hybrid search—all powered by Agentic Postgres features on Tiger Cloud.

What I Built

DevMind is an AI-powered developer knowledge management system that solves a problem every developer faces: we keep solving the same problems over and over because our solutions get scattered across Slack threads, browser bookmarks, and forgotten notes.

The inspiration? After spending 2 hours debugging a PostgreSQL connection pool issue I'd already solved 3 months ago, I realized we need a "second brain" that actually understands what we're looking for—not just keyword matching.

Key capabilities:

🔍 Hybrid Search: Combines semantic understanding (pgvector) with keyword precision (pg_textsearch BM25) using Reciprocal Rank Fusion
🔱 Database Forks: Use Tiger Cloud's zero-copy forks to isolate knowledge by project, client, or team
🤖 AI-Powered: Auto-tagging, smart summaries, and embeddings for intelligent organization
🛠️ Multi-Interface: Web UI, CLI, REST API, and Tiger MCP for natural language queries via Claude

What makes it unique:
Instead of choosing between semantic OR keyword search, DevMind uses BOTH. When you search for "database timeout issues," it finds entries about "PostgreSQL connection pool exhausted" (semantic) AND exact error codes like "ECONNREFUSED" (keyword)—all ranked intelligently.

Demo

GitHub Repository: https://github.com/harishkotra/devmind

When you search for "fix postgres performance", DevMind:

Generates embeddings for semantic understanding
Performs BM25 keyword search
Combines both using RRF algorithm
Returns ranked results in <100ms

CLI - Quick Knowledge Capture

$ npm run cli add -- \
    --title "Fix PostgreSQL Connection Pool" \
    --content "Increase max_connections to 200 and use pgbouncer for transaction pooling..." \
    --type "bug-fix" \
    --tags "postgresql,performance"

✓ Entry created successfully!
  ID: 42
  Title: Fix PostgreSQL Connection Pool
  Type: bug-fix
  Auto-generated tags: postgresql, performance, database, connection, optimization
  AI Summary: Configure max_connections and implement connection pooling with pgbouncer...

Tiger MCP Integration

You: "Find all my PostgreSQL performance tips"

Claude (via Tiger MCP): I found 8 entries about PostgreSQL performance:

1. Fix Connection Pool Exhaustion (Score: 0.92)
   - Use pgbouncer with transaction pooling
   - Set max_connections appropriately

2. Index Optimization Strategy (Score: 0.89)
   - Use EXPLAIN ANALYZE for query planning
   - Create indexes on frequently queried columns
...

Architecture Diagram

┌──────────────────────────────────────┐
│    Your Developer Brain              │
│ (Notes, Snippets, Bug Fixes, etc.)  │
└─────────────┬────────────────────────┘
              │
    ┌─────────▼─────────┐
    │  Web UI           │
    │  CLI              │
    │  Tiger MCP        │
    │  REST API         │
    └─────────┬─────────┘
              │
    ┌─────────▼──────────┐
    │  Express + OpenAI  │
    └─────────┬──────────┘
              │
    ┌─────────▼──────────────────────┐
    │   Agentic Postgres             │
    │   (Tiger Cloud)                │
    │                                │
    │  ✅ pgvector (semantic)        │
    │  ✅ pg_textsearch (BM25)       │
    │  ✅ Hybrid Search RRF          │
    │  ✅ Fast Forks (zero-copy)     │
    │  ✅ Tiger MCP integration      │
    └────────────────────────────────┘

How I Used Agentic Postgres

I leveraged five key Agentic Postgres features creatively:

1. pg_textsearch (BM25) for Keyword Search

Modern full-text search with BM25 ranking built directly into Postgres:

-- Create BM25 index
CREATE INDEX idx_knowledge_bm25
ON knowledge_entries
USING bm25(id, search_vector);

-- Query with BM25 scoring
SELECT id, title,
       bm25_score(id, 'idx_knowledge_bm25') as score
FROM knowledge_entries
WHERE search_vector @@ plainto_tsquery('english', 'postgresql performance')
ORDER BY score DESC;

Why it's powerful: BM25 is superior to traditional tsvector ranking—it understands term frequency, document length, and provides more relevant results for technical searches.

2. pgvector for Semantic Search

Vector embeddings enable searching by meaning, not just keywords:

-- Create vector index
CREATE INDEX idx_knowledge_embedding
ON knowledge_entries
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Semantic similarity search
SELECT id, title,
       1 - (embedding <=> $1::vector) as similarity
FROM knowledge_entries
ORDER BY embedding <=> $1::vector
LIMIT 20;

Why it's powerful: You can search for "database timeout" and find entries about "connection pool exhausted"—even when exact keywords don't match!

3. Hybrid Search Function (RRF Algorithm)

The magic happens when we combine BOTH approaches:

CREATE FUNCTION hybrid_search(
    query_text TEXT,
    query_embedding vector(1536),
    match_limit INT DEFAULT 10,
    rrf_k INT DEFAULT 60
)
RETURNS TABLE (
    entry_id INT,
    title TEXT,
    content TEXT,
    score FLOAT
) AS $$
BEGIN
    RETURN QUERY
    WITH semantic_search AS (
        SELECT id,
               ROW_NUMBER() OVER (ORDER BY embedding <=> query_embedding) AS rank
        FROM knowledge_entries
        ORDER BY embedding <=> query_embedding
        LIMIT 20
    ),
    keyword_search AS (
        SELECT id,
               ROW_NUMBER() OVER (ORDER BY bm25_score(id, 'idx_knowledge_bm25')) AS rank
        FROM knowledge_entries
        WHERE search_vector @@ plainto_tsquery('english', query_text)
        ORDER BY bm25_score(id, 'idx_knowledge_bm25')
        LIMIT 20
    ),
    combined AS (
        SELECT
            COALESCE(s.id, k.id) AS entry_id,
            COALESCE(1.0 / (rrf_k + s.rank), 0.0) +
            COALESCE(1.0 / (rrf_k + k.rank), 0.0) AS score
        FROM semantic_search s
        FULL OUTER JOIN keyword_search k ON s.id = k.id
        ORDER BY score DESC
        LIMIT match_limit
    )
    SELECT c.entry_id, e.title, e.content, c.score
    FROM combined c
    JOIN knowledge_entries e ON e.id = c.entry_id
    ORDER BY c.score DESC;
END;
$$ LANGUAGE plpgsql;

Why Reciprocal Rank Fusion (RRF)?

Combines rankings from multiple sources fairly
Doesn't require score normalization
Proven to outperform individual methods in information retrieval research

4. Fast Database Forks for Project Context Isolation

This is where it gets really creative! Tiger Cloud's zero-copy forks enable powerful use cases:

# Create a fork for a specific client project
tiger service fork main-service --name devmind-client-acme --last-snapshot

# Get connection string for the fork
tiger service get devmind-client-acme --with-password

# Update .env with fork's DATABASE_URL
# Now you have isolated knowledge for this client!

Real-world scenario: As a consultant working on multiple projects:

devmind-main - Personal general knowledge base
devmind-client-acme - ACME Corp project-specific notes
devmind-client-beta - Beta Inc project-specific notes
devmind-experiments - Testing and experiments

Why forks are perfect here:

✅ Fast: Created in ~30 seconds (zero-copy)
✅ Cheap: No data duplication
✅ Independent: Changes don't affect other forks
✅ Secure: Client knowledge stays isolated

5. Tiger MCP for Natural Language Interface

Tiger MCP enables querying your knowledge base conversationally through Claude:

# Install Tiger MCP
tiger mcp install

# Configure in Claude Desktop
# Now you can talk to your knowledge base!

Example conversation:

You: "What was that solution I found for React re-rendering issues?"

Claude: I found 3 relevant entries in your DevMind:

1. Fix React useEffect Infinite Loop (Score: 0.94)
   - Add dependency array to prevent infinite re-renders
   - Use useCallback for function dependencies

2. React.memo Optimization (Score: 0.87)
   - Wrap components with React.memo to prevent unnecessary renders

3. Virtual List Performance (Score: 0.72)
   - Use react-window for large lists

Why this matters: You can query your knowledge base in natural language while coding, without leaving Claude Desktop!

Overall Experience

What Worked Well

1. Tiger Cloud Setup Was Effortless

I was genuinely surprised by how easy it was to get started:

# Create service with AI addons
tiger service create --name devmind --ai-addons

# Get connection string
tiger service get devmind --with-password

That's it! pgvector and pg_textsearch were already enabled. No manual extension installation, no configuration headaches.

2. Hybrid Search Actually Works

I was skeptical about combining semantic + keyword search, but the results speak for themselves:

Test query: "fix docker build taking forever"
Semantic found: "Optimize Docker Image Size with Multi-stage Builds"
Keyword found: "Docker BuildKit Cache Configuration"
Hybrid result: Both entries, perfectly ranked!

The RRF algorithm elegantly combines different ranking systems without needing complex score normalization.

3. Database Forks Are a Game Changer

The ability to fork my database in 30 seconds opened up architectural patterns I hadn't considered:

Development vs. production knowledge
Per-project isolated contexts
Team vs. personal knowledge bases

This is similar to Git branches, but for your entire database. Brilliant!

4. pg_textsearch BM25 is Superior to Traditional FTS

Coming from tsvector and ts_rank, BM25 provides noticeably better results for technical content. It understands:

Term frequency saturation (prevents over-weighting common terms)
Document length normalization
Inverse document frequency

What Surprised Me

1. Postgres Can Handle AI Workloads

I initially thought I'd need Elasticsearch or a dedicated vector database. Nope! Postgres with pgvector + pg_textsearch handles:

✅ Vector similarity search (sub-100ms on 10K entries)
✅ Full-text search with BM25
✅ Complex hybrid queries
✅ All in one database!

2. Zero-Copy Forks Are FAST

I expected forking to take minutes. It took 27 seconds for a 500MB database. The zero-copy technology is impressive—you're not duplicating data until changes are made.

Challenges & Learnings

Challenge 1: Vector Dimension Tuning

Initially used OpenAI's text-embedding-3-small (512 dimensions), but switched to text-embedding-ada-002 (1536 dimensions):

512d: Faster, cheaper, but less accurate for technical jargon
1536d: Slower, pricier, but significantly better for code and technical terms

Learning: For developer-focused content, the extra dimensions matter!

Challenge 2: RRF Constant (k) Optimization

The RRF formula uses a constant k:

score = 1/(k + semantic_rank) + 1/(k + keyword_rank)

I tested k values: 10, 30, 60, 100

k=10: Too much weight on top-ranked items
k=60: Sweet spot for balanced results ✅
k=100: Rankings became too flat

Learning: The default k=60 from research papers actually works great!

Challenge 3: Auto-tagging Quality

Early AI-generated tags were too generic ("programming", "coding", "software"). Improved by:

Providing better prompts with technical context
Using the content + title + entry type for tag generation
Limiting to 5 tags maximum

Learning: Good AI prompts make a huge difference in output quality.

Challenge 4: Index Selection for Performance

For pgvector, choosing between IVFFlat and HNSW:

IVFFlat: Faster inserts, good for frequently updated knowledge
HNSW: Faster searches, but slower builds

Chose IVFFlat since knowledge bases grow incrementally.

Learning: Understand your access patterns before choosing indexes!

Development Experience

Documentation: Tiger Cloud docs were comprehensive. The examples for pgvector + pg_textsearch saved me hours.

Support: The Tiger Data community on Discord was responsive when I hit a snag with fork permissions.

Developer Experience: The Tiger CLI is beautifully designed:

tiger service list                    # Clear, formatted output
tiger service fork --help            # Helpful command docs
tiger mcp install                    # Just works

Performance Notes

Final metrics after optimization:

Hybrid search: <100ms on 10K entries
Vector generation: ~200ms per entry (OpenAI API)
Fork creation: ~30s for 500MB database
Storage per entry: ~2KB with embeddings

🎁 Try It Yourself

Want to build your own developer second brain?

GitHub Repository: https://github.com/harishkotra/devmind

Quick Start:

# Clone and install
git clone https://github.com/harishkotra/devmind.git
cd devmind
npm install

# Configure (Tiger Cloud + OpenAI)
cp .env.example .env
# Edit .env with your credentials

# Setup database
npm run db:setup

# Start server
npm run dev
# Visit http://localhost:3000

Prerequisites:

Tiger Cloud account (free tier available)
OpenAI API key (or use Gaia nodes)
Node.js 18+

💭 Final Thoughts

Building DevMind taught me that Postgres is far more powerful than most developers realize. With pgvector and pg_textsearch, you get:

Semantic AI search
Modern BM25 ranking
ACID guarantees
Powerful SQL queries
All in one database!

Tiger Cloud's fast forks added another dimension—the ability to create isolated contexts in seconds opens up new architectural patterns we're just beginning to explore.

We developers spend countless hours re-solving problems we've already solved. DevMind ensures you never solve the same problem twice.