DEV Community

Midas126
Midas126

Posted on

Beyond the Hype: Building a Practical AI-Powered Codebase Assistant from Scratch

From Sci-Fi to Your IDE: The Real Power of AI in Code

Another week, another flood of AI articles. We've seen the demos: paste a GitHub URL, ask a question in plain English, and get an answer about the codebase. It feels like magic—or maybe just a well-executed API call to a large language model (LLM). But what's actually happening under the hood? How can you, as a developer, move from being a consumer of these AI tools to a builder who understands and customizes them?

This guide cuts through the hype. We won't just use an AI API; we'll build the core of a practical, local codebase assistant. We'll focus on the fundamental technical architecture that makes "asking your codebase a question" possible: Retrieval-Augmented Generation (RAG). By the end, you'll have a working Python prototype that can answer questions about a local project using open-source models.

Deconstructing the "Google Maps for Codebases" Analogy

The popular analogy is apt. A tool like this needs two core functions:

  1. Indexing (Mapping): Creating a searchable representation of your code's "terrain"—its files, functions, classes, and relationships.
  2. Querying (Asking for Directions): Finding the relevant parts of the map and generating a human-friendly answer to your question.

The secret sauce connecting these is RAG. Instead of asking an LLM a question directly (which would rely on its potentially outdated or generic training data), we first retrieve relevant context from our specific codebase and then augment the LLM's prompt with that context to generate a precise answer.

Building the Engine: A Step-by-Step Implementation

Let's build a minimal but functional system. We'll use langchain for orchestration, sentence-transformers for embeddings, Chroma as our vector database, and the open-source Llama 3.2 model via Ollama.

Step 1: Setting Up the Project

mkdir codebase-assistant && cd codebase-assistant
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install langchain langchain-community chromadb sentence-transformers pypdf
pip install ollama
Enter fullscreen mode Exit fullscreen mode

Step 2: The Indexer – Creating the Code Map

Our first job is to load code files, split them into meaningful chunks, and convert those chunks into numerical vectors (embeddings) that capture their semantic meaning.

# indexer.py
import os
from pathlib import Path
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

class CodebaseIndexer:
    def __init__(self, source_dir, persist_dir="./chroma_db"):
        self.source_dir = Path(source_dir)
        self.persist_dir = persist_dir
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            separators=["\n\nfunction", "\n\nclass", "\n\ndef ", "\n\n//", "\n\n#", "\n\n", " ", ""]
        )
        # Using a lightweight, open-source embedding model
        self.embeddings = HuggingFaceEmbeddings(
            model_name="all-MiniLM-L6-v2"
        )

    def load_and_chunk_documents(self):
        """Walk through the source directory and load all .py, .js, .md files."""
        documents = []
        for ext in ["*.py", "*.js", "*.md", "*.txt"]:
            for file_path in self.source_dir.rglob(ext):
                try:
                    loader = TextLoader(str(file_path), encoding='utf-8')
                    loaded_docs = loader.load()
                    for doc in loaded_docs:
                        doc.metadata["source"] = str(file_path.relative_to(self.source_dir))
                    documents.extend(loaded_docs)
                except Exception as e:
                    print(f"Error loading {file_path}: {e}")
        print(f"Loaded {len(documents)} raw documents.")
        # Split documents into chunks
        chunks = self.text_splitter.split_documents(documents)
        print(f"Split into {len(chunks)} chunks.")
        return chunks

    def create_vector_store(self, chunks):
        """Create and persist a Chroma vector database from document chunks."""
        vectordb = Chroma.from_documents(
            documents=chunks,
            embedding=self.embeddings,
            persist_directory=self.persist_dir
        )
        vectordb.persist()
        print(f"Vector store created and persisted to {self.persist_dir}")
        return vectordb

if __name__ == "__main__":
    # Index your own project directory
    indexer = CodebaseIndexer(source_dir="../my_project")
    chunks = indexer.load_and_chunk_documents()
    vectordb = indexer.create_vector_store(chunks)
Enter fullscreen mode Exit fullscreen mode

Step 3: The Retriever – Finding Relevant Code

Once indexed, we need a way to find the chunks most relevant to a user's question.

# retriever.py
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings

class CodeRetriever:
    def __init__(self, persist_dir="./chroma_db"):
        self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
        self.vectordb = Chroma(
            persist_directory=persist_dir,
            embedding_function=self.embeddings
        )
        # Configure to retrieve top 4 most relevant chunks
        self.retriever = self.vectordb.as_retriever(search_kwargs={"k": 4})

    def get_relevant_context(self, query):
        """Retrieve code chunks semantically similar to the query."""
        relevant_docs = self.retriever.get_relevant_documents(query)
        context = "\n\n---\n\n".join([f"From {doc.metadata['source']}:\n{doc.page_content}" for doc in relevant_docs])
        return context
Enter fullscreen mode Exit fullscreen mode

Step 4: The Generator – Crafting the Answer with an LLM

This is where we augment the retrieved context and generate the final answer. We'll use a local LLM via Ollama.

# First, pull and run the model locally (ensure Ollama is installed)
ollama pull llama3.2
Enter fullscreen mode Exit fullscreen mode
# generator.py
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from retriever import CodeRetriever

class CodeQAGenerator:
    def __init__(self):
        self.llm = Ollama(model="llama3.2", temperature=0.1)
        self.retriever = CodeRetriever()

        # The critical RAG prompt template
        self.prompt_template = PromptTemplate(
            input_variables=["context", "question"],
            template="""
            You are an expert software engineer analyzing a codebase.
            Use the following retrieved code snippets to answer the question.
            If the context does not contain enough information, say so clearly.

            Context from the codebase:
            {context}

            Question: {question}

            Answer (be concise and reference file names):
            """
        )

    def answer_question(self, user_question):
        """The main RAG pipeline: Retrieve -> Augment -> Generate."""
        print("Retrieving relevant context...")
        context = self.retriever.get_relevant_context(user_question)

        print("Generating answer...")
        prompt = self.prompt_template.format(context=context, question=user_question)
        answer = self.llm.invoke(prompt)

        return answer, context  # Return context for transparency

if __name__ == "__main__":
    assistant = CodeQAGenerator()
    question = "How does the project handle user authentication?"
    answer, context = assistant.answer_question(question)

    print("\n" + "="*50)
    print(f"QUESTION: {question}")
    print("="*50)
    print("\nRETRIEVED CONTEXT:\n", context[:1000], "...")  # Truncated for display
    print("\n" + "="*50)
    print("GENERATED ANSWER:\n", answer)
    print("="*50)
Enter fullscreen mode Exit fullscreen mode

Running Your Local Codebase Assistant

  1. Index your code: Run python indexer.py to create the vector database.
  2. Ask a question: Run python generator.py. Modify the question variable in the __main__ block.
  3. Interact: Wrap the generator.py logic in a simple CLI or Gradio UI for continuous interaction.

Leveling Up: Practical Enhancements

This basic RAG pipeline works, but production systems add several key layers:

  • Metadata Filtering: Allow queries like "Find all functions in auth.py." Enhance the retriever to filter by file path, language, or symbol type.
  • Hybrid Search: Combine semantic vector search with traditional keyword (BM25) search for better recall. The langchain community package offers tools for this.
  • Code-Aware Chunking: Instead of splitting by characters, use Abstract Syntax Tree (AST) parsers to chunk by function or class definition, preserving logical boundaries.
  • Caching: Store embeddings and common query results to speed up repeated questions.
  • Agentic Workflow: Instead of a single query, let the AI decide to look up definitions, trace function calls, or read documentation, mimicking a developer's workflow.

The Takeaway: You Are the Architect

The true power isn't in any single model; it's in the architecture you design. By understanding and building the RAG pipeline, you gain the ability to:

  • Control your data: Everything runs locally. Your code never leaves your machine.
  • Customize for your stack: Tweak the chunking, embeddings, and prompts for Python, React, Go, or your specific framework.
  • Debug failures: When the AI gives a wrong answer, you can inspect the retrieved context and the prompt to understand why.

Don't just wait for the next AI coding tool to be released. Clone the accompanying repository for this guide, run it on your own project, and start experimenting. Break it, improve it, and make it yours. The frontier of AI-assisted development isn't just for big tech—it's in your terminal, waiting for you to build it.

What will you ask your codebase first?

Top comments (0)