# π§ CodeSense AI: Building a Scalable RAG System for Repository Intelligence
The Problem: Navigating large, unfamiliar codebases is slow.
The Solution: CodeSense AIβa sophisticated RAG (Retrieval-Augmented Generation) engine that lets you "talk" to your code using AWS Bedrock and Pinecone.
π Overview
CodeSense AI isn't just a chatbot; it's a semantic code navigator. Most code search tools rely on keyword matching (Grepping). CodeSense AI uses Vector Embeddings to understand the intent and logic behind your code.
Core Value Props:
- Instant Architecture Mapping: Ask "How does the auth flow work?" and get a cross-file explanation.
- Contextual Debugging: Share an error and find exactly where that logic resides in a 100-file repo.
- Seamless Ingestion: Point to a GitHub URL, and the pipeline handles the restβfrom cloning to vectorization.
π οΈ The Modern AI Stack
The Frontend (The User Experience)
- React 18.3 & TypeScript: A type-safe foundation for handling complex UI states during long indexing processes.
- Tailwind CSS & shadcn/ui: For a high-fidelity, developer-centric aesthetic.
- TanStack Query: Manages the server state for real-time indexing progress updates.
The Intelligence (The Reasoning)
- AWS Bedrock (Amazon Titan Text Express): Chosen for its high-throughput, low-latency reasoning capabilities.
- Titan Embeddings v2: Generates 1024-dimensional vectors, optimized for technical documentation and source code.
- Pinecone: A serverless vector database that provides sub-100ms similarity search using Cosine Similarity.
ποΈ Architecture & System Design
High-Level Design (HLD)
The architecture follows a Decoupled Proxy Pattern. To ensure maximum security, the frontend never communicates directly with AWS or Pinecone.
Low-Level Design (LLD)
1. The Code-Aware Indexing Pipeline
Standard text chunking fails for code because it breaks logical blocks. CodeSense AI implements a Sliding Window Chunking strategy:
- Chunk Size: 1000 characters.
- Overlap: 200 characters (ensures variable declarations aren't cut off from their usage).
- Metadata Enrichment: Every vector is tagged with its
filePath,repoOwner, andlineRangeto ensure the AI can cite its sources.
2. Secure Edge Orchestration
Using Supabase Edge Functions as an orchestration layer allows us to implement AWS Signature V4.
// Example: The signature-v4 process ensures your AWS_SECRET_KEY
// never leaves the server-side environment.
const headers = await signRequest(
'POST',
bedrockUrl,
requestBody,
Deno.env.get('AWS_ACCESS_KEY_ID'),
Deno.env.get('AWS_SECRET_ACCESS_KEY')
);
3. Multi-Tenant Vector Isolation
To prevent data leakage between repositories, we utilize Pinecone Namespaces. Each repository is assigned a unique namespace derived from its GitHub path.
- Query Filtering:
namespace: "zumerlab-zumerbox" - Security: No user can query code outside of their current repository context.
π Data Flow: The Lifecycle of a Query
- Input: User types: "Where is the database connection initialized?"
- Vectorization: The query is converted into a 1024-dim vector using AWS Bedrock.
- Retrieval: Pinecone identifies the Top-5 most relevant code chunks within that specific repo's namespace.
- Augmentation: The system builds a prompt: > "System: You are an expert. Context: [Snippet 1], [Snippet 2]. Question: Where is the database...?"
- Generation: Titan Express synthesizes the context and generates a markdown-formatted answer.
Lovable UI
Pinecone UI
βοΈ Engineering Setup
Environment Prerequisites
- AWS Bedrock Model Access: Ensure
amazon.titan-text-express-v1andamazon.titan-embed-text-v2:0are enabled. - Pinecone Index: 1024 dimensions, Cosine metric.
Pinecone Index Setup
Create a new index in Pinecone Console
Set dimensions to 1024 (Titan Embed v2 output)
Use cosine similarity metric
Note the index URL for configuration
AWS Bedrock Setup
- Enable Amazon Bedrock in your AWS account
- Request access to: preferred model (Chat)
- Create IAM credentials with Bedrock access
Development Prerequisites
- Node.js 18+ and npm (Used Lovable for building UI)
- Supabase project (or Lovable Cloud)
Edge Function Secrets
supabase secrets set AWS_ACCESS_KEY_ID=xxx
supabase secrets set AWS_SECRET_ACCESS_KEY=xxx
supabase secrets set PINECONE_API_KEY=xxx
supabase secrets set PINECONE_INDEX_URL=https://your-index.svc.pinecone.io
π Security Standards
- JWT-Locked APIs: All Edge Functions require a valid Supabase Auth header.
- Secret Management: Zero hardcoded keys. No client-side exposure.
- Rate Limiting: Implemented at the Edge Function layer to protect AWS Bedrock quotas.
π Future Roadmap
- Language Support: Expanding AST-based parsing for better semantic chunking.
- Multi-Repo Chat: Aggregating context across microservices.
- Local LLM Support: Integrating Ollama for on-premise deployments.



Top comments (1)
"π€ AhaChat AI Ecosystem is here!
π¬ AI Response β Auto-reply to customers 24/7
π― AI Sales β Smart assistant that helps close more deals
π AI Trigger β Understands message context & responds instantly
π¨ AI Image β Generate or analyze images with one command
π€ AI Voice β Turn text into natural, human-like speech
π AI Funnel β Qualify & nurture your best leads automatically"