DEV Community

Junaid
Junaid

Posted on

CocoIndex Setup Guide

Real-Time Codebase Indexing for AI-Powered IDEs


Table of Contents

  1. What is CocoIndex?
  2. Why Use CocoIndex?
  3. Prerequisites
  4. Initial Setup
  5. Indexing Your First Codebase
  6. IDE Integration
  7. Indexing Additional Codebases
  8. Updates: Manual vs Automatic
  9. Common Issues & Troubleshooting
  10. Best Practices

What is CocoIndex?

CocoIndex is an open-source framework for building real-time codebase indexes using Tree-sitter for semantic code parsing and vector embeddings for semantic search [page:1]. It creates a searchable database of your codebase that:

  • Understands code semantically (functions, classes, methods) rather than treating it as plain text
  • Updates incrementally – only reprocesses changed files, not the entire codebase
  • Integrates with AI-powered IDEs via Model Context Protocol (MCP) to provide context-aware code assistance

Why Use CocoIndex?

As software engineers and open-source maintainers, CocoIndex enables:

  • AI coding assistants: Give Claude, Codex, or Gemini semantic understanding of your entire codebase [page:1]
  • Better IDE integrations: Works with VS Code, Kiro, Trae, Qoder, Cursor, Windsurf via MCP [page:1]
  • Semantic code search: Find code by what it does, not just keyword matching [page:1]
  • Automated workflows: Code review agents, refactoring tools, PR summarization [page:1]
  • Always up-to-date: Incremental processing means your index stays fresh without full rebuilds [page:1]

Prerequisites

Required Software

  1. Python 3.7+ (You have Python 3.13.7 ✓)
  2. pip (You have pip 25.2 ✓)
  3. pipx – For managing isolated Python applications
  4. PostgreSQL 14+ with pgvector extension
  5. Git – For cloning repositories

System Requirements

  • macOS (Homebrew-managed Python) or Linux
  • ~500MB disk space for dependencies
  • PostgreSQL database access (local or remote)

Initial Setup

Step 1: Install pipx

Since you're using Homebrew-managed Python on macOS, you must use pipx to avoid "externally-managed-environment" errors:

brew install pipx
pipx ensurepath
Enter fullscreen mode Exit fullscreen mode

Restart your terminal after installation.


Step 2: Install PostgreSQL with pgvector

Install PostgreSQL

brew install postgresql@14
brew services start postgresql@14
Enter fullscreen mode Exit fullscreen mode

Install pgvector extension

brew install pgvector
Enter fullscreen mode Exit fullscreen mode

Create database and enable vector extension

# Create the database
createdb cocoindex

# Enable pgvector extension
psql cocoindex -c "CREATE EXTENSION IF NOT EXISTS vector;"

# Verify installation
psql cocoindex -c "\dx"
Enter fullscreen mode Exit fullscreen mode

You should see vector listed in the extensions.


Step 3: Install CocoIndex

pipx install 'cocoindex[embeddings]'
Enter fullscreen mode Exit fullscreen mode

Verify installation:

cocoindex --version
Enter fullscreen mode Exit fullscreen mode

Step 4: Clone the Realtime Codebase Indexing Repository

This provides a ready-made main.py with proper flow configuration:

cd ~/Documents/workspace  # or your preferred directory
git clone https://github.com/cocoindex-io/realtime-codebase-indexing.git
cd realtime-codebase-indexing
Enter fullscreen mode Exit fullscreen mode

Step 5: Install Project Dependencies

pipx runpip cocoindex install -e .
Enter fullscreen mode Exit fullscreen mode

This installs the project alongside your pipx-managed cocoindex tool.


Step 6: Configure Environment Variables

Create a .env file in the realtime-codebase-indexing directory:

COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex
Enter fullscreen mode Exit fullscreen mode

Important: Every project that uses CocoIndex needs this environment variable set. You can either:

  • Add it to each project's .env file, or
  • Export it globally in your shell profile (~/.zshrc or ~/.bashrc):
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
Enter fullscreen mode Exit fullscreen mode

Indexing Your First Codebase

Step 1: Adjust the Source Configuration

Edit main.py in your project directory. Find the LocalFile source configuration:

data_scope["files"] = flow_builder.add_source(
    cocoindex.sources.LocalFile(
        path=".",  # Current directory (recommended for single-repo indexing)
        included_patterns=["**/*.py", "**/*.rs", "**/*.toml", "**/*.md", "**/*.mdx"],
        excluded_patterns=[
            "**/.git/**",
            "**/venv/**", 
            "**/.venv/**", 
            "**/site-packages/**",
            "**/node_modules/**",
            "target"
        ],
    )
)
Enter fullscreen mode Exit fullscreen mode

Key Parameters:

  • path="." – Index the current directory
  • included_patterns – File types to index (adjust for your stack)
  • excluded_patterns – Directories to skip (add any custom build/cache dirs)

Step 2: Run the Indexing Flow

From your project directory:

# Make sure environment variable is set
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"

# Run the indexing
cocoindex update main
Enter fullscreen mode Exit fullscreen mode

On first run, you'll see:

[ TO CREATE ] CocoIndex Metadata Table
[ TO CREATE ] Flow: CodeEmbedding
[ TO CREATE ] Postgres table CodeEmbedding__code_embeddings

Changes need to be pushed. Continue? [yes/N]:
Enter fullscreen mode Exit fullscreen mode

Type yes and press Enter. CocoIndex will:

  1. Create necessary tables in PostgreSQL [web:42]
  2. Enable pgvector extension [web:64]
  3. Parse your code with Tree-sitter [page:1]
  4. Generate embeddings using sentence-transformers/all-MiniLM-L6-v2 [page:1]
  5. Store everything in the code_embeddings table with HNSW vector index [web:64]

You'll see output like:

Updated index: files: ▕████████████▏1234/1234 source rows: 1234 added
Enter fullscreen mode Exit fullscreen mode

Step 3: Test the Index

Query the index directly:

/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py
Enter fullscreen mode Exit fullscreen mode

At the prompt, enter natural language queries:

Enter search query (or Enter to quit): how does authentication work?
Enter fullscreen mode Exit fullscreen mode

It will return relevant code snippets with similarity scores and file locations [web:34][web:70].


IDE Integration

CocoIndex integrates with AI-powered IDEs via Model Context Protocol (MCP) servers [web:10][web:7].

Supported IDEs

  • VS Code (with GitHub Copilot or compatible extensions)
  • Kiro IDE ✓ (You're already set up!)
  • Trae IDE (ByteDance AI-powered editor)
  • Qoder IDE (Agentic IDE with Quest Mode)
  • Cursor, Windsurf (VS Code forks with MCP support)

Setup: Kiro IDE (Your Current Setup)

Your mcp.json configuration (already working):

Location: ~/.kiro/settings/mcp.json

{
  "mcpServers": {
    "cocoindex": {
      "command": "uvx",
      "args": [
        "--from",
        "cocoindex-mcp",
        "cocoindex-mcp"
      ],
      "env": {
        "COCOINDEX_DATABASE_URL": "postgresql://localhost:5432/cocoindex"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Status: Connected (2 tools) ✓

Using CocoIndex in Kiro

In Kiro's agent chat:

Use cocoindex to find where we initialize the database connection pool.

Search the codebase for JWT token validation logic.
Enter fullscreen mode Exit fullscreen mode

The AI agent will call the MCP tools and return relevant code chunks from your indexed codebase [web:10][web:85].


Setup: VS Code

Location: .vscode/mcp.json (in your project root)

{
  "mcpServers": {
    "cocoindex": {
      "command": "uvx",
      "args": [
        "--from",
        "cocoindex-mcp",
        "cocoindex-mcp"
      ],
      "env": {
        "COCOINDEX_DATABASE_URL": "postgresql://localhost:5432/cocoindex"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart VS Code. In Copilot chat:

@cocoindex find error handling patterns in this project
Enter fullscreen mode Exit fullscreen mode

[web:7][web:38]


Setup: Trae IDE

Similar to VS Code, add the MCP server configuration to Trae's settings [web:15][web:18].


Indexing Additional Codebases

To index a different project:

Option 1: Per-Project Setup (Recommended)

  1. Copy main.py to the new project:
cp ~/Documents/workspace/realtime-codebase-indexing/main.py ~/path/to/new-project/
Enter fullscreen mode Exit fullscreen mode
  1. Create .env in the new project:
COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex
Enter fullscreen mode Exit fullscreen mode
  1. Adjust path in main.py:
path="."  # Keep this for current directory
Enter fullscreen mode Exit fullscreen mode
  1. Customize file patterns for your stack:
included_patterns=["**/*.js", "**/*.ts", "**/*.jsx", "**/*.tsx", "**/*.json"],
Enter fullscreen mode Exit fullscreen mode
  1. Run indexing:
cd ~/path/to/new-project
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main
Enter fullscreen mode Exit fullscreen mode

Option 2: Index Multiple Repos into One Database

You can index multiple projects into the same cocoindex database. They'll all be searchable together:

# Multiple sources in main.py
data_scope["backend"] = flow_builder.add_source(
    cocoindex.sources.LocalFile(path="/absolute/path/to/backend", ...)
)
data_scope["frontend"] = flow_builder.add_source(
    cocoindex.sources.LocalFile(path="/absolute/path/to/frontend", ...)
)
Enter fullscreen mode Exit fullscreen mode

[web:93][web:94]


Updates: Manual vs Automatic

🔄 Incremental Updates (Manual Trigger)

By default, CocoIndex requires you to manually trigger updates:

cd /path/to/your/project
cocoindex update main
Enter fullscreen mode Exit fullscreen mode

What happens:

  • CocoIndex compares file timestamps and content hashes [web:26][web:34]
  • Only changed files are reprocessed (not the entire codebase)
  • New embeddings are generated only for modified chunks [web:26]
  • The index is updated incrementally in PostgreSQL

When to run:

  • After pulling new code from Git
  • After making significant code changes
  • Before starting a coding session where you want fresh context
  • As part of CI/CD pipelines (optional)

⚡ Live/Automatic Updates (Continuous Mode)

For real-time index updates while you code:

cocoindex update -L main
Enter fullscreen mode Exit fullscreen mode

The -L flag enables live mode [web:26][web:32]:

  • Watches your filesystem for changes
  • Automatically reindexes modified files
  • Runs continuously until you stop it (Ctrl+C)

Use cases:

  • During active development sessions
  • When you want IDE assistants to always have the latest context
  • For long-running dev environments

⚠️ Note: This keeps the process running. For team environments, decide whether each developer runs this locally or you run one shared indexer.


🤖 CI/CD Integration (Optional)

Add to your CI pipeline to keep shared indexes fresh:

# .github/workflows/index-codebase.yml
name: Update Codebase Index
on:
  push:
    branches: [main]

jobs:
  index:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Update CocoIndex
        env:
          COCOINDEX_DATABASE_URL: ${{ secrets.COCOINDEX_DATABASE_URL }}
        run: |
          pipx install 'cocoindex[embeddings]'
          pipx runpip cocoindex install -e .
          cocoindex update main
Enter fullscreen mode Exit fullscreen mode

[web:5][web:70]


Common Issues & Troubleshooting

❌ "externally-managed-environment" error

Problem:

error: externally-managed-environment
× This environment is externally managed
Enter fullscreen mode Exit fullscreen mode

Solution: Always use pipx on macOS with Homebrew Python:

pipx install 'cocoindex[embeddings]'
# NOT: pip install cocoindex
Enter fullscreen mode Exit fullscreen mode

[web:1]


❌ "role 'cocoindex' does not exist"

Problem:

RuntimeError: Failed to connect to database postgres://cocoindex:cocoindex@localhost/cocoindex
error returned from database: role "cocoindex" does not exist
Enter fullscreen mode Exit fullscreen mode

Solution: Your connection URL references a user that doesn't exist. Either:

Option A: Simplify the URL (use your default user):

COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex
Enter fullscreen mode Exit fullscreen mode

Option B: Create the user:

psql postgres -c "CREATE USER cocoindex WITH PASSWORD 'cocoindex';"
psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE cocoindex TO cocoindex;"
Enter fullscreen mode Exit fullscreen mode

[web:25][web:46]


❌ "extension 'vector' is not available"

Problem:

RuntimeError: error returned from database: extension "vector" is not available
Enter fullscreen mode Exit fullscreen mode

Solution: Install pgvector:

brew install pgvector
psql cocoindex -c "CREATE EXTENSION IF NOT EXISTS vector;"
Enter fullscreen mode Exit fullscreen mode

Verify:

psql cocoindex -c "\dx"
Enter fullscreen mode Exit fullscreen mode

[web:62][web:58]


❌ "ModuleNotFoundError: No module named 'dotenv'"

Problem:

ModuleNotFoundError: No module named 'dotenv'
Enter fullscreen mode Exit fullscreen mode

Solution: Use the pipx Python interpreter:

/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py
# NOT: python3 main.py
Enter fullscreen mode Exit fullscreen mode

Or create an alias:

echo 'alias coco-py="/Users/junaid/.local/pipx/venvs/cocoindex/bin/python"' >> ~/.zshrc
source ~/.zshrc
coco-py main.py
Enter fullscreen mode Exit fullscreen mode

❌ "Database is required for this operation"

Problem:

ValueError: Invalid Request: Database is required for this operation.
Enter fullscreen mode Exit fullscreen mode

Solution: Set the environment variable before running:

export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main
Enter fullscreen mode Exit fullscreen mode

Or add it to .env in your project root [web:25].


❌ Search returns irrelevant results (venv/site-packages)

Problem: Query results show library code instead of your project code.

Solution: Tighten your excluded_patterns:

excluded_patterns=[
    "**/.git/**",
    "**/venv/**",
    "**/.venv/**",
    "**/__pycache__/**",
    "**/site-packages/**",
    "**/node_modules/**",
    "target",
    "build",
    "dist"
]
Enter fullscreen mode Exit fullscreen mode

Then reindex:

cocoindex update main
Enter fullscreen mode Exit fullscreen mode

[web:88][web:91]


❌ Kiro/IDE shows "MCP server disconnected"

Problem: IDE can't reach the MCP server.

Checklist:

  1. Is COCOINDEX_DATABASE_URL set in the MCP config's env section?
  2. Is PostgreSQL running? (brew services list | grep postgresql)
  3. Does the database have data? (psql cocoindex -c "SELECT COUNT(*) FROM \"CodeEmbedding__code_embeddings\";")
  4. Is cocoindex-mcp installed? (pipx list | grep cocoindex)

Debug: Check Kiro's MCP server logs (usually in IDE settings/output panel).


Best Practices

1. One .env per Project

Keep environment variables project-specific:

my-project/
├── .env                    # COCOINDEX_DATABASE_URL here
├── main.py                 # CocoIndex flow definition
├── src/
└── ...
Enter fullscreen mode Exit fullscreen mode

This makes it easy to switch between projects without conflicts.


2. Exclude Build Artifacts

Always exclude generated/downloaded code:

excluded_patterns=[
    "**/.git/**",
    "**/venv/**",
    "**/.venv/**",
    "**/node_modules/**",
    "**/build/**",
    "**/dist/**",
    "**/__pycache__/**",
    "**/target/**",          # Rust
    "**/.next/**",           # Next.js
    "**/vendor/**",          # Go/PHP
]
Enter fullscreen mode Exit fullscreen mode

This keeps your index focused on actual source code [web:88][web:91].


3. Use Pattern Matching for Monorepos

For large monorepos, be selective:

# Index only backend services
included_patterns=["apps/backend/**/*.py", "services/**/*.py"]

# Or exclude specific apps
excluded_patterns=["apps/legacy/**", "apps/deprecated/**"]
Enter fullscreen mode Exit fullscreen mode

[web:93]


4. Run Updates Before Coding Sessions

Establish a team habit:

# Morning routine
git pull
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main
Enter fullscreen mode Exit fullscreen mode

Or use live mode during active development:

cocoindex update -L main  # Runs continuously
Enter fullscreen mode Exit fullscreen mode

[web:26]


5. Share PostgreSQL for Team Environments (Optional)

Option A – Individual indexes (simpler):

  • Each dev runs PostgreSQL locally
  • Each maintains their own index
  • No coordination needed

Option B – Shared index (advanced):

  • Set up a shared PostgreSQL server
  • Point all team members' COCOINDEX_DATABASE_URL to it
  • One person (or CI) runs updates, everyone benefits
  • Requires network access and credentials management

For most teams, Option A is recommended initially [web:25].


6. Customize Embeddings for Your Domain

The default model (sentence-transformers/all-MiniLM-L6-v2) works well for general code [page:1]. For specialized domains, consider:

# Use a code-specific model
cocoindex.functions.SentenceTransformerEmbed(
    model="microsoft/codebert-base"
)

# Or use API-based embeddings (Gemini, OpenAI, Voyage)
cocoindex.functions.EmbedText(
    api_type=cocoindex.LlmApiType.GEMINI,
    model="text-embedding-004"
)
Enter fullscreen mode Exit fullscreen mode

[web:70][web:76]


Quick Reference Commands

Setup (One-time per machine)

brew install pipx postgresql@14 pgvector
pipx ensurepath
brew services start postgresql@14
createdb cocoindex
psql cocoindex -c "CREATE EXTENSION vector;"
pipx install 'cocoindex[embeddings]'
Enter fullscreen mode Exit fullscreen mode

Setup (One-time per project)

cd /path/to/project
cp ~/realtime-codebase-indexing/main.py .
echo 'COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex' > .env
Enter fullscreen mode Exit fullscreen mode

Daily Usage

# Update index after code changes
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main

# Query directly
/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py

# Live mode (continuous updates)
cocoindex update -L main
Enter fullscreen mode Exit fullscreen mode

IDE Usage

# In Kiro/VS Code agent chat:
Use cocoindex to find [what you're looking for]

# Examples:
Use cocoindex to find authentication middleware
Search for error handling patterns
Find database query optimization examples
Enter fullscreen mode Exit fullscreen mode

Additional Resources


Support & Questions

For team questions:

  1. Check this document first
  2. Review official docs at https://cocoindex.io
  3. Search GitHub issues: https://github.com/cocoindex-io/cocoindex/issues
  4. Ask in team chat with context (error messages, relevant main.py snippet)

Happy Indexing! 🚀

This comprehensive guide covers everything your team needs from initial setup through daily usage, common pitfalls, and best practices. Save this as COCOINDEX_SETUP.md in your team's documentation or wiki.

Top comments (0)