Junaid

Posted on Feb 10

CocoIndex Setup Guide

#ai #opensource #tooling #tutorial

Real-Time Codebase Indexing for AI-Powered IDEs

What is CocoIndex?
Why Use CocoIndex?
Prerequisites
Initial Setup
Indexing Your First Codebase
IDE Integration
Indexing Additional Codebases
Updates: Manual vs Automatic
Common Issues & Troubleshooting
Best Practices

What is CocoIndex?

CocoIndex is an open-source framework for building real-time codebase indexes using Tree-sitter for semantic code parsing and vector embeddings for semantic search [page:1]. It creates a searchable database of your codebase that:

Understands code semantically (functions, classes, methods) rather than treating it as plain text
Updates incrementally – only reprocesses changed files, not the entire codebase
Integrates with AI-powered IDEs via Model Context Protocol (MCP) to provide context-aware code assistance

Why Use CocoIndex?

As software engineers and open-source maintainers, CocoIndex enables:

AI coding assistants: Give Claude, Codex, or Gemini semantic understanding of your entire codebase [page:1]
Better IDE integrations: Works with VS Code, Kiro, Trae, Qoder, Cursor, Windsurf via MCP [page:1]
Semantic code search: Find code by what it does, not just keyword matching [page:1]
Automated workflows: Code review agents, refactoring tools, PR summarization [page:1]
Always up-to-date: Incremental processing means your index stays fresh without full rebuilds [page:1]

Prerequisites

Required Software

Python 3.7+ (You have Python 3.13.7 ✓)
pip (You have pip 25.2 ✓)
pipx – For managing isolated Python applications
PostgreSQL 14+ with pgvector extension
Git – For cloning repositories

System Requirements

macOS (Homebrew-managed Python) or Linux
~500MB disk space for dependencies
PostgreSQL database access (local or remote)

Initial Setup

Step 1: Install pipx

Since you're using Homebrew-managed Python on macOS, you must use pipx to avoid "externally-managed-environment" errors:

brew install pipx
pipx ensurepath

Restart your terminal after installation.

Step 2: Install PostgreSQL with pgvector

Install PostgreSQL

brew install postgresql@14
brew services start postgresql@14

Install pgvector extension

brew install pgvector

Create database and enable vector extension

# Create the database
createdb cocoindex

# Enable pgvector extension
psql cocoindex -c "CREATE EXTENSION IF NOT EXISTS vector;"

# Verify installation
psql cocoindex -c "\dx"

You should see vector listed in the extensions.

Step 3: Install CocoIndex

pipx install 'cocoindex[embeddings]'

Verify installation:

cocoindex --version

Step 4: Clone the Realtime Codebase Indexing Repository

This provides a ready-made main.py with proper flow configuration:

cd ~/Documents/workspace  # or your preferred directory
git clone https://github.com/cocoindex-io/realtime-codebase-indexing.git
cd realtime-codebase-indexing

Step 5: Install Project Dependencies

pipx runpip cocoindex install -e .

This installs the project alongside your pipx-managed cocoindex tool.

Step 6: Configure Environment Variables

Create a .env file in the realtime-codebase-indexing directory:

COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex

Important: Every project that uses CocoIndex needs this environment variable set. You can either:

Add it to each project's .env file, or
Export it globally in your shell profile (~/.zshrc or ~/.bashrc):

export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"

Indexing Your First Codebase

Step 1: Adjust the Source Configuration

Edit main.py in your project directory. Find the LocalFile source configuration:

data_scope["files"] = flow_builder.add_source(
    cocoindex.sources.LocalFile(
        path=".",  # Current directory (recommended for single-repo indexing)
        included_patterns=["**/*.py", "**/*.rs", "**/*.toml", "**/*.md", "**/*.mdx"],
        excluded_patterns=[
            "**/.git/**",
            "**/venv/**", 
            "**/.venv/**", 
            "**/site-packages/**",
            "**/node_modules/**",
            "target"
        ],
    )
)

Key Parameters:

path="." – Index the current directory
included_patterns – File types to index (adjust for your stack)
excluded_patterns – Directories to skip (add any custom build/cache dirs)

Step 2: Run the Indexing Flow

From your project directory:

# Make sure environment variable is set
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"

# Run the indexing
cocoindex update main

On first run, you'll see:

[ TO CREATE ] CocoIndex Metadata Table
[ TO CREATE ] Flow: CodeEmbedding
[ TO CREATE ] Postgres table CodeEmbedding__code_embeddings

Changes need to be pushed. Continue? [yes/N]:

Type yes and press Enter. CocoIndex will:

Create necessary tables in PostgreSQL [web:42]
Enable pgvector extension [web:64]
Parse your code with Tree-sitter [page:1]
Generate embeddings using sentence-transformers/all-MiniLM-L6-v2 [page:1]
Store everything in the code_embeddings table with HNSW vector index [web:64]

You'll see output like:

Updated index: files: ▕████████████▏1234/1234 source rows: 1234 added

Step 3: Test the Index

Query the index directly:

/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py

At the prompt, enter natural language queries:

Enter search query (or Enter to quit): how does authentication work?

It will return relevant code snippets with similarity scores and file locations [web:34][web:70].

IDE Integration

CocoIndex integrates with AI-powered IDEs via Model Context Protocol (MCP) servers [web:10][web:7].

Supported IDEs

VS Code (with GitHub Copilot or compatible extensions)
Kiro IDE ✓ (You're already set up!)
Trae IDE (ByteDance AI-powered editor)
Qoder IDE (Agentic IDE with Quest Mode)
Cursor, Windsurf (VS Code forks with MCP support)

Setup: Kiro IDE (Your Current Setup)

Your mcp.json configuration (already working):

Location: ~/.kiro/settings/mcp.json

{
  "mcpServers": {
    "cocoindex": {
      "command": "uvx",
      "args": [
        "--from",
        "cocoindex-mcp",
        "cocoindex-mcp"
      ],
      "env": {
        "COCOINDEX_DATABASE_URL": "postgresql://localhost:5432/cocoindex"
      }
    }
  }
}

Status: Connected (2 tools) ✓

Using CocoIndex in Kiro

In Kiro's agent chat:

Use cocoindex to find where we initialize the database connection pool.

Search the codebase for JWT token validation logic.

The AI agent will call the MCP tools and return relevant code chunks from your indexed codebase [web:10][web:85].

Setup: VS Code

Location: .vscode/mcp.json (in your project root)

{
  "mcpServers": {
    "cocoindex": {
      "command": "uvx",
      "args": [
        "--from",
        "cocoindex-mcp",
        "cocoindex-mcp"
      ],
      "env": {
        "COCOINDEX_DATABASE_URL": "postgresql://localhost:5432/cocoindex"
      }
    }
  }
}

Restart VS Code. In Copilot chat:

@cocoindex find error handling patterns in this project

[web:7][web:38]

Setup: Trae IDE

Similar to VS Code, add the MCP server configuration to Trae's settings [web:15][web:18].

Indexing Additional Codebases

To index a different project:

Option 1: Per-Project Setup (Recommended)

Copy main.py to the new project:

cp ~/Documents/workspace/realtime-codebase-indexing/main.py ~/path/to/new-project/

Create .env in the new project:

COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex

Adjust path in main.py:

path="."  # Keep this for current directory

Customize file patterns for your stack:

included_patterns=["**/*.js", "**/*.ts", "**/*.jsx", "**/*.tsx", "**/*.json"],

Run indexing:

cd ~/path/to/new-project
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main

Option 2: Index Multiple Repos into One Database

You can index multiple projects into the same cocoindex database. They'll all be searchable together:

# Multiple sources in main.py
data_scope["backend"] = flow_builder.add_source(
    cocoindex.sources.LocalFile(path="/absolute/path/to/backend", ...)
)
data_scope["frontend"] = flow_builder.add_source(
    cocoindex.sources.LocalFile(path="/absolute/path/to/frontend", ...)
)

[web:93][web:94]

Updates: Manual vs Automatic

🔄 Incremental Updates (Manual Trigger)

By default, CocoIndex requires you to manually trigger updates:

cd /path/to/your/project
cocoindex update main

What happens:

CocoIndex compares file timestamps and content hashes [web:26][web:34]
Only changed files are reprocessed (not the entire codebase)
New embeddings are generated only for modified chunks [web:26]
The index is updated incrementally in PostgreSQL

When to run:

After pulling new code from Git
After making significant code changes
Before starting a coding session where you want fresh context
As part of CI/CD pipelines (optional)

⚡ Live/Automatic Updates (Continuous Mode)

For real-time index updates while you code:

cocoindex update -L main

The -L flag enables live mode [web:26][web:32]:

Watches your filesystem for changes
Automatically reindexes modified files
Runs continuously until you stop it (Ctrl+C)

Use cases:

During active development sessions
When you want IDE assistants to always have the latest context
For long-running dev environments

⚠️ Note: This keeps the process running. For team environments, decide whether each developer runs this locally or you run one shared indexer.

🤖 CI/CD Integration (Optional)

Add to your CI pipeline to keep shared indexes fresh:

# .github/workflows/index-codebase.yml
name: Update Codebase Index
on:
  push:
    branches: [main]

jobs:
  index:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Update CocoIndex
        env:
          COCOINDEX_DATABASE_URL: ${{ secrets.COCOINDEX_DATABASE_URL }}
        run: |
          pipx install 'cocoindex[embeddings]'
          pipx runpip cocoindex install -e .
          cocoindex update main

[web:5][web:70]

Common Issues & Troubleshooting

❌ "externally-managed-environment" error

Problem:

error: externally-managed-environment
× This environment is externally managed

Solution: Always use pipx on macOS with Homebrew Python:

pipx install 'cocoindex[embeddings]'
# NOT: pip install cocoindex

[web:1]

❌ "role 'cocoindex' does not exist"

Problem:

RuntimeError: Failed to connect to database postgres://cocoindex:cocoindex@localhost/cocoindex
error returned from database: role "cocoindex" does not exist

Solution: Your connection URL references a user that doesn't exist. Either:

Option A: Simplify the URL (use your default user):

COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex

Option B: Create the user:

psql postgres -c "CREATE USER cocoindex WITH PASSWORD 'cocoindex';"
psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE cocoindex TO cocoindex;"

[web:25][web:46]

❌ "extension 'vector' is not available"

Problem:

RuntimeError: error returned from database: extension "vector" is not available

Solution: Install pgvector:

brew install pgvector
psql cocoindex -c "CREATE EXTENSION IF NOT EXISTS vector;"

Verify:

psql cocoindex -c "\dx"

[web:62][web:58]

❌ "ModuleNotFoundError: No module named 'dotenv'"

Problem:

ModuleNotFoundError: No module named 'dotenv'

Solution: Use the pipx Python interpreter:

/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py
# NOT: python3 main.py

Or create an alias:

echo 'alias coco-py="/Users/junaid/.local/pipx/venvs/cocoindex/bin/python"' >> ~/.zshrc
source ~/.zshrc
coco-py main.py

❌ "Database is required for this operation"

Problem:

ValueError: Invalid Request: Database is required for this operation.

Solution: Set the environment variable before running:

export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main

Or add it to .env in your project root [web:25].

❌ Search returns irrelevant results (venv/site-packages)

Problem: Query results show library code instead of your project code.

Solution: Tighten your excluded_patterns:

excluded_patterns=[
    "**/.git/**",
    "**/venv/**",
    "**/.venv/**",
    "**/__pycache__/**",
    "**/site-packages/**",
    "**/node_modules/**",
    "target",
    "build",
    "dist"
]

Then reindex:

cocoindex update main

[web:88][web:91]

❌ Kiro/IDE shows "MCP server disconnected"

Problem: IDE can't reach the MCP server.

Checklist:

Is COCOINDEX_DATABASE_URL set in the MCP config's env section?
Is PostgreSQL running? (brew services list | grep postgresql)
Does the database have data? (psql cocoindex -c "SELECT COUNT(*) FROM \"CodeEmbedding__code_embeddings\";")
Is cocoindex-mcp installed? (pipx list | grep cocoindex)

Debug: Check Kiro's MCP server logs (usually in IDE settings/output panel).

Best Practices

1. One `.env` per Project

Keep environment variables project-specific:

my-project/
├── .env                    # COCOINDEX_DATABASE_URL here
├── main.py                 # CocoIndex flow definition
├── src/
└── ...

This makes it easy to switch between projects without conflicts.

2. Exclude Build Artifacts

Always exclude generated/downloaded code:

excluded_patterns=[
    "**/.git/**",
    "**/venv/**",
    "**/.venv/**",
    "**/node_modules/**",
    "**/build/**",
    "**/dist/**",
    "**/__pycache__/**",
    "**/target/**",          # Rust
    "**/.next/**",           # Next.js
    "**/vendor/**",          # Go/PHP
]

This keeps your index focused on actual source code [web:88][web:91].

3. Use Pattern Matching for Monorepos

For large monorepos, be selective:

# Index only backend services
included_patterns=["apps/backend/**/*.py", "services/**/*.py"]

# Or exclude specific apps
excluded_patterns=["apps/legacy/**", "apps/deprecated/**"]

[web:93]

4. Run Updates Before Coding Sessions

Establish a team habit:

# Morning routine
git pull
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main

Or use live mode during active development:

cocoindex update -L main  # Runs continuously

[web:26]

5. Share PostgreSQL for Team Environments (Optional)

Option A – Individual indexes (simpler):

Each dev runs PostgreSQL locally
Each maintains their own index
No coordination needed

Option B – Shared index (advanced):

Set up a shared PostgreSQL server
Point all team members' COCOINDEX_DATABASE_URL to it
One person (or CI) runs updates, everyone benefits
Requires network access and credentials management

For most teams, Option A is recommended initially [web:25].

6. Customize Embeddings for Your Domain

The default model (sentence-transformers/all-MiniLM-L6-v2) works well for general code [page:1]. For specialized domains, consider:

# Use a code-specific model
cocoindex.functions.SentenceTransformerEmbed(
    model="microsoft/codebert-base"
)

# Or use API-based embeddings (Gemini, OpenAI, Voyage)
cocoindex.functions.EmbedText(
    api_type=cocoindex.LlmApiType.GEMINI,
    model="text-embedding-004"
)

[web:70][web:76]

Quick Reference Commands

Setup (One-time per machine)

brew install pipx postgresql@14 pgvector
pipx ensurepath
brew services start postgresql@14
createdb cocoindex
psql cocoindex -c "CREATE EXTENSION vector;"
pipx install 'cocoindex[embeddings]'

Setup (One-time per project)

cd /path/to/project
cp ~/realtime-codebase-indexing/main.py .
echo 'COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex' > .env

Daily Usage

# Update index after code changes
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main

# Query directly
/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py

# Live mode (continuous updates)
cocoindex update -L main

IDE Usage

# In Kiro/VS Code agent chat:
Use cocoindex to find [what you're looking for]

# Examples:
Use cocoindex to find authentication middleware
Search for error handling patterns
Find database query optimization examples

Additional Resources

Official Docs: https://cocoindex.io/docs
GitHub Repo: https://github.com/cocoindex-io/cocoindex
Example Repo: https://github.com/cocoindex-io/realtime-codebase-indexing
MCP Documentation: https://code.visualstudio.com/docs/copilot/customization/mcp-servers
pgvector: https://github.com/pgvector/pgvector

Support & Questions

For team questions:

Check this document first
Review official docs at https://cocoindex.io
Search GitHub issues: https://github.com/cocoindex-io/cocoindex/issues
Ask in team chat with context (error messages, relevant main.py snippet)

Happy Indexing! 🚀

This comprehensive guide covers everything your team needs from initial setup through daily usage, common pitfalls, and best practices. Save this as COCOINDEX_SETUP.md in your team's documentation or wiki.

Real-Time Codebase Indexing for AI-Powered IDEs

Table of Contents

What is CocoIndex?

Why Use CocoIndex?

Prerequisites

Required Software

System Requirements

Initial Setup

Step 1: Install pipx

Step 2: Install PostgreSQL with pgvector

Install PostgreSQL

Install pgvector extension

Create database and enable vector extension

Step 3: Install CocoIndex

Step 4: Clone the Realtime Codebase Indexing Repository

Step 5: Install Project Dependencies

Step 6: Configure Environment Variables

Indexing Your First Codebase

Step 1: Adjust the Source Configuration

Step 2: Run the Indexing Flow

Step 3: Test the Index

IDE Integration

Supported IDEs

Setup: Kiro IDE (Your Current Setup)

Using CocoIndex in Kiro

Setup: VS Code

Setup: Trae IDE

Indexing Additional Codebases

Option 1: Per-Project Setup (Recommended)

Option 2: Index Multiple Repos into One Database

Updates: Manual vs Automatic

🔄 Incremental Updates (Manual Trigger)

⚡ Live/Automatic Updates (Continuous Mode)

🤖 CI/CD Integration (Optional)

Common Issues & Troubleshooting

❌ "externally-managed-environment" error

❌ "role 'cocoindex' does not exist"

❌ "extension 'vector' is not available"

❌ "ModuleNotFoundError: No module named 'dotenv'"

❌ "Database is required for this operation"

❌ Search returns irrelevant results (venv/site-packages)

❌ Kiro/IDE shows "MCP server disconnected"

Best Practices

1. One .env per Project

2. Exclude Build Artifacts

3. Use Pattern Matching for Monorepos

4. Run Updates Before Coding Sessions

5. Share PostgreSQL for Team Environments (Optional)

6. Customize Embeddings for Your Domain

Quick Reference Commands

Setup (One-time per machine)

Setup (One-time per project)

Daily Usage

IDE Usage

Additional Resources

Support & Questions

1. One `.env` per Project