Real-Time Codebase Indexing for AI-Powered IDEs
Table of Contents
- What is CocoIndex?
- Why Use CocoIndex?
- Prerequisites
- Initial Setup
- Indexing Your First Codebase
- IDE Integration
- Indexing Additional Codebases
- Updates: Manual vs Automatic
- Common Issues & Troubleshooting
- Best Practices
What is CocoIndex?
CocoIndex is an open-source framework for building real-time codebase indexes using Tree-sitter for semantic code parsing and vector embeddings for semantic search [page:1]. It creates a searchable database of your codebase that:
- Understands code semantically (functions, classes, methods) rather than treating it as plain text
- Updates incrementally – only reprocesses changed files, not the entire codebase
- Integrates with AI-powered IDEs via Model Context Protocol (MCP) to provide context-aware code assistance
Why Use CocoIndex?
As software engineers and open-source maintainers, CocoIndex enables:
- AI coding assistants: Give Claude, Codex, or Gemini semantic understanding of your entire codebase [page:1]
- Better IDE integrations: Works with VS Code, Kiro, Trae, Qoder, Cursor, Windsurf via MCP [page:1]
- Semantic code search: Find code by what it does, not just keyword matching [page:1]
- Automated workflows: Code review agents, refactoring tools, PR summarization [page:1]
- Always up-to-date: Incremental processing means your index stays fresh without full rebuilds [page:1]
Prerequisites
Required Software
- Python 3.7+ (You have Python 3.13.7 ✓)
- pip (You have pip 25.2 ✓)
- pipx – For managing isolated Python applications
- PostgreSQL 14+ with pgvector extension
- Git – For cloning repositories
System Requirements
- macOS (Homebrew-managed Python) or Linux
- ~500MB disk space for dependencies
- PostgreSQL database access (local or remote)
Initial Setup
Step 1: Install pipx
Since you're using Homebrew-managed Python on macOS, you must use pipx to avoid "externally-managed-environment" errors:
brew install pipx
pipx ensurepath
Restart your terminal after installation.
Step 2: Install PostgreSQL with pgvector
Install PostgreSQL
brew install postgresql@14
brew services start postgresql@14
Install pgvector extension
brew install pgvector
Create database and enable vector extension
# Create the database
createdb cocoindex
# Enable pgvector extension
psql cocoindex -c "CREATE EXTENSION IF NOT EXISTS vector;"
# Verify installation
psql cocoindex -c "\dx"
You should see vector listed in the extensions.
Step 3: Install CocoIndex
pipx install 'cocoindex[embeddings]'
Verify installation:
cocoindex --version
Step 4: Clone the Realtime Codebase Indexing Repository
This provides a ready-made main.py with proper flow configuration:
cd ~/Documents/workspace # or your preferred directory
git clone https://github.com/cocoindex-io/realtime-codebase-indexing.git
cd realtime-codebase-indexing
Step 5: Install Project Dependencies
pipx runpip cocoindex install -e .
This installs the project alongside your pipx-managed cocoindex tool.
Step 6: Configure Environment Variables
Create a .env file in the realtime-codebase-indexing directory:
COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex
Important: Every project that uses CocoIndex needs this environment variable set. You can either:
- Add it to each project's
.envfile, or - Export it globally in your shell profile (
~/.zshrcor~/.bashrc):
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
Indexing Your First Codebase
Step 1: Adjust the Source Configuration
Edit main.py in your project directory. Find the LocalFile source configuration:
data_scope["files"] = flow_builder.add_source(
cocoindex.sources.LocalFile(
path=".", # Current directory (recommended for single-repo indexing)
included_patterns=["**/*.py", "**/*.rs", "**/*.toml", "**/*.md", "**/*.mdx"],
excluded_patterns=[
"**/.git/**",
"**/venv/**",
"**/.venv/**",
"**/site-packages/**",
"**/node_modules/**",
"target"
],
)
)
Key Parameters:
-
path="."– Index the current directory -
included_patterns– File types to index (adjust for your stack) -
excluded_patterns– Directories to skip (add any custom build/cache dirs)
Step 2: Run the Indexing Flow
From your project directory:
# Make sure environment variable is set
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
# Run the indexing
cocoindex update main
On first run, you'll see:
[ TO CREATE ] CocoIndex Metadata Table
[ TO CREATE ] Flow: CodeEmbedding
[ TO CREATE ] Postgres table CodeEmbedding__code_embeddings
Changes need to be pushed. Continue? [yes/N]:
Type yes and press Enter. CocoIndex will:
- Create necessary tables in PostgreSQL [web:42]
- Enable pgvector extension [web:64]
- Parse your code with Tree-sitter [page:1]
- Generate embeddings using
sentence-transformers/all-MiniLM-L6-v2[page:1] - Store everything in the
code_embeddingstable with HNSW vector index [web:64]
You'll see output like:
Updated index: files: ▕████████████▏1234/1234 source rows: 1234 added
Step 3: Test the Index
Query the index directly:
/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py
At the prompt, enter natural language queries:
Enter search query (or Enter to quit): how does authentication work?
It will return relevant code snippets with similarity scores and file locations [web:34][web:70].
IDE Integration
CocoIndex integrates with AI-powered IDEs via Model Context Protocol (MCP) servers [web:10][web:7].
Supported IDEs
- VS Code (with GitHub Copilot or compatible extensions)
- Kiro IDE ✓ (You're already set up!)
- Trae IDE (ByteDance AI-powered editor)
- Qoder IDE (Agentic IDE with Quest Mode)
- Cursor, Windsurf (VS Code forks with MCP support)
Setup: Kiro IDE (Your Current Setup)
Your mcp.json configuration (already working):
Location: ~/.kiro/settings/mcp.json
{
"mcpServers": {
"cocoindex": {
"command": "uvx",
"args": [
"--from",
"cocoindex-mcp",
"cocoindex-mcp"
],
"env": {
"COCOINDEX_DATABASE_URL": "postgresql://localhost:5432/cocoindex"
}
}
}
}
Status: Connected (2 tools) ✓
Using CocoIndex in Kiro
In Kiro's agent chat:
Use cocoindex to find where we initialize the database connection pool.
Search the codebase for JWT token validation logic.
The AI agent will call the MCP tools and return relevant code chunks from your indexed codebase [web:10][web:85].
Setup: VS Code
Location: .vscode/mcp.json (in your project root)
{
"mcpServers": {
"cocoindex": {
"command": "uvx",
"args": [
"--from",
"cocoindex-mcp",
"cocoindex-mcp"
],
"env": {
"COCOINDEX_DATABASE_URL": "postgresql://localhost:5432/cocoindex"
}
}
}
}
Restart VS Code. In Copilot chat:
@cocoindex find error handling patterns in this project
[web:7][web:38]
Setup: Trae IDE
Similar to VS Code, add the MCP server configuration to Trae's settings [web:15][web:18].
Indexing Additional Codebases
To index a different project:
Option 1: Per-Project Setup (Recommended)
-
Copy
main.pyto the new project:
cp ~/Documents/workspace/realtime-codebase-indexing/main.py ~/path/to/new-project/
-
Create
.envin the new project:
COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex
-
Adjust
pathinmain.py:
path="." # Keep this for current directory
- Customize file patterns for your stack:
included_patterns=["**/*.js", "**/*.ts", "**/*.jsx", "**/*.tsx", "**/*.json"],
- Run indexing:
cd ~/path/to/new-project
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main
Option 2: Index Multiple Repos into One Database
You can index multiple projects into the same cocoindex database. They'll all be searchable together:
# Multiple sources in main.py
data_scope["backend"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="/absolute/path/to/backend", ...)
)
data_scope["frontend"] = flow_builder.add_source(
cocoindex.sources.LocalFile(path="/absolute/path/to/frontend", ...)
)
[web:93][web:94]
Updates: Manual vs Automatic
🔄 Incremental Updates (Manual Trigger)
By default, CocoIndex requires you to manually trigger updates:
cd /path/to/your/project
cocoindex update main
What happens:
- CocoIndex compares file timestamps and content hashes [web:26][web:34]
- Only changed files are reprocessed (not the entire codebase)
- New embeddings are generated only for modified chunks [web:26]
- The index is updated incrementally in PostgreSQL
When to run:
- After pulling new code from Git
- After making significant code changes
- Before starting a coding session where you want fresh context
- As part of CI/CD pipelines (optional)
⚡ Live/Automatic Updates (Continuous Mode)
For real-time index updates while you code:
cocoindex update -L main
The -L flag enables live mode [web:26][web:32]:
- Watches your filesystem for changes
- Automatically reindexes modified files
- Runs continuously until you stop it (Ctrl+C)
Use cases:
- During active development sessions
- When you want IDE assistants to always have the latest context
- For long-running dev environments
⚠️ Note: This keeps the process running. For team environments, decide whether each developer runs this locally or you run one shared indexer.
🤖 CI/CD Integration (Optional)
Add to your CI pipeline to keep shared indexes fresh:
# .github/workflows/index-codebase.yml
name: Update Codebase Index
on:
push:
branches: [main]
jobs:
index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Update CocoIndex
env:
COCOINDEX_DATABASE_URL: ${{ secrets.COCOINDEX_DATABASE_URL }}
run: |
pipx install 'cocoindex[embeddings]'
pipx runpip cocoindex install -e .
cocoindex update main
[web:5][web:70]
Common Issues & Troubleshooting
❌ "externally-managed-environment" error
Problem:
error: externally-managed-environment
× This environment is externally managed
Solution: Always use pipx on macOS with Homebrew Python:
pipx install 'cocoindex[embeddings]'
# NOT: pip install cocoindex
[web:1]
❌ "role 'cocoindex' does not exist"
Problem:
RuntimeError: Failed to connect to database postgres://cocoindex:cocoindex@localhost/cocoindex
error returned from database: role "cocoindex" does not exist
Solution: Your connection URL references a user that doesn't exist. Either:
Option A: Simplify the URL (use your default user):
COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex
Option B: Create the user:
psql postgres -c "CREATE USER cocoindex WITH PASSWORD 'cocoindex';"
psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE cocoindex TO cocoindex;"
[web:25][web:46]
❌ "extension 'vector' is not available"
Problem:
RuntimeError: error returned from database: extension "vector" is not available
Solution: Install pgvector:
brew install pgvector
psql cocoindex -c "CREATE EXTENSION IF NOT EXISTS vector;"
Verify:
psql cocoindex -c "\dx"
[web:62][web:58]
❌ "ModuleNotFoundError: No module named 'dotenv'"
Problem:
ModuleNotFoundError: No module named 'dotenv'
Solution: Use the pipx Python interpreter:
/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py
# NOT: python3 main.py
Or create an alias:
echo 'alias coco-py="/Users/junaid/.local/pipx/venvs/cocoindex/bin/python"' >> ~/.zshrc
source ~/.zshrc
coco-py main.py
❌ "Database is required for this operation"
Problem:
ValueError: Invalid Request: Database is required for this operation.
Solution: Set the environment variable before running:
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main
Or add it to .env in your project root [web:25].
❌ Search returns irrelevant results (venv/site-packages)
Problem: Query results show library code instead of your project code.
Solution: Tighten your excluded_patterns:
excluded_patterns=[
"**/.git/**",
"**/venv/**",
"**/.venv/**",
"**/__pycache__/**",
"**/site-packages/**",
"**/node_modules/**",
"target",
"build",
"dist"
]
Then reindex:
cocoindex update main
[web:88][web:91]
❌ Kiro/IDE shows "MCP server disconnected"
Problem: IDE can't reach the MCP server.
Checklist:
- Is
COCOINDEX_DATABASE_URLset in the MCP config'senvsection? - Is PostgreSQL running? (
brew services list | grep postgresql) - Does the database have data? (
psql cocoindex -c "SELECT COUNT(*) FROM \"CodeEmbedding__code_embeddings\";") - Is
cocoindex-mcpinstalled? (pipx list | grep cocoindex)
Debug: Check Kiro's MCP server logs (usually in IDE settings/output panel).
Best Practices
1. One .env per Project
Keep environment variables project-specific:
my-project/
├── .env # COCOINDEX_DATABASE_URL here
├── main.py # CocoIndex flow definition
├── src/
└── ...
This makes it easy to switch between projects without conflicts.
2. Exclude Build Artifacts
Always exclude generated/downloaded code:
excluded_patterns=[
"**/.git/**",
"**/venv/**",
"**/.venv/**",
"**/node_modules/**",
"**/build/**",
"**/dist/**",
"**/__pycache__/**",
"**/target/**", # Rust
"**/.next/**", # Next.js
"**/vendor/**", # Go/PHP
]
This keeps your index focused on actual source code [web:88][web:91].
3. Use Pattern Matching for Monorepos
For large monorepos, be selective:
# Index only backend services
included_patterns=["apps/backend/**/*.py", "services/**/*.py"]
# Or exclude specific apps
excluded_patterns=["apps/legacy/**", "apps/deprecated/**"]
[web:93]
4. Run Updates Before Coding Sessions
Establish a team habit:
# Morning routine
git pull
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main
Or use live mode during active development:
cocoindex update -L main # Runs continuously
[web:26]
5. Share PostgreSQL for Team Environments (Optional)
Option A – Individual indexes (simpler):
- Each dev runs PostgreSQL locally
- Each maintains their own index
- No coordination needed
Option B – Shared index (advanced):
- Set up a shared PostgreSQL server
- Point all team members'
COCOINDEX_DATABASE_URLto it - One person (or CI) runs updates, everyone benefits
- Requires network access and credentials management
For most teams, Option A is recommended initially [web:25].
6. Customize Embeddings for Your Domain
The default model (sentence-transformers/all-MiniLM-L6-v2) works well for general code [page:1]. For specialized domains, consider:
# Use a code-specific model
cocoindex.functions.SentenceTransformerEmbed(
model="microsoft/codebert-base"
)
# Or use API-based embeddings (Gemini, OpenAI, Voyage)
cocoindex.functions.EmbedText(
api_type=cocoindex.LlmApiType.GEMINI,
model="text-embedding-004"
)
[web:70][web:76]
Quick Reference Commands
Setup (One-time per machine)
brew install pipx postgresql@14 pgvector
pipx ensurepath
brew services start postgresql@14
createdb cocoindex
psql cocoindex -c "CREATE EXTENSION vector;"
pipx install 'cocoindex[embeddings]'
Setup (One-time per project)
cd /path/to/project
cp ~/realtime-codebase-indexing/main.py .
echo 'COCOINDEX_DATABASE_URL=postgresql://localhost:5432/cocoindex' > .env
Daily Usage
# Update index after code changes
export COCOINDEX_DATABASE_URL="postgresql://localhost:5432/cocoindex"
cocoindex update main
# Query directly
/Users/junaid/.local/pipx/venvs/cocoindex/bin/python main.py
# Live mode (continuous updates)
cocoindex update -L main
IDE Usage
# In Kiro/VS Code agent chat:
Use cocoindex to find [what you're looking for]
# Examples:
Use cocoindex to find authentication middleware
Search for error handling patterns
Find database query optimization examples
Additional Resources
- Official Docs: https://cocoindex.io/docs
- GitHub Repo: https://github.com/cocoindex-io/cocoindex
- Example Repo: https://github.com/cocoindex-io/realtime-codebase-indexing
- MCP Documentation: https://code.visualstudio.com/docs/copilot/customization/mcp-servers
- pgvector: https://github.com/pgvector/pgvector
Support & Questions
For team questions:
- Check this document first
- Review official docs at https://cocoindex.io
- Search GitHub issues: https://github.com/cocoindex-io/cocoindex/issues
- Ask in team chat with context (error messages, relevant
main.pysnippet)
Happy Indexing! 🚀
This comprehensive guide covers everything your team needs from initial setup through daily usage, common pitfalls, and best practices. Save this as COCOINDEX_SETUP.md in your team's documentation or wiki.
Top comments (0)