From generic embeddings to real-time, graph-powered codebase understanding.
đ Qoder Free trial
Introduction
AI coding tools promise to understand a developerâs codebase and deliver relevant suggestions. In reality, most systems rely on generic embedding APIs to index code snippets and documents. The result is often a disconnected experience: embeddings capture textual similarity but ignore structural relationships; indices refresh every few minutes, leaving developers without upâtoâdate context; and privacy is compromised when embeddings are sent to thirdâparty APIs.
This article introduces our codebaseâaware indexing system. It combines a serverâside vector database with a code graph and a preâindexed codebaseâknowledge(a.k.a. Repo Wiki) base to deliver accurate, secure and realâtime context for AI coding workflows. The following sections outline the challenges of generic retrieval, describe our hybrid architecture and explain how we scale, personalize and secure the system.
Challenges with Generic Code Search
Latency and Stale Context
Conventional retrieval pipelines call external APIs to compute embeddings and use remote vector databases to search for similar snippets. These pipelines suffer from multiâminute update intervals; when a developer switches branches or renames a function, the index lags behind and returns irrelevant context. Even when updated, large codebases produce so many embeddings that transferring and querying them introduces noticeable latency.
Lack of Structural Awareness
Generic embeddings measure textual similarity, but codebase queries often require understanding structural relationships. For example, a callâsite and its function definition may share little lexical overlap; documentation might use terms not present in the code; crossâlanguage implementations of the same algorithm look entirely different. Embeddings alone miss these relationships, leading to irrelevant results and wasted prompt space.
Hybrid Retrieval Architecture
ServerâSide Vector Search
We deploy a highâperformance vector database in our backend that stores embeddings for code snippets, documentation and codebase artifacts. Using custom AI models trained on code and domain knowledge, we generate embeddings that better capture semantic relationships and prioritize helpfulness over superficial similarity. The server processes indexing requests continuously, ingesting new or modified files within seconds.

Code Graph and CodebaseâKnowledge PreâIndex
On the client side, we build a code graph representing functions, classes, modules and the relationships between them (e.g., call graphs, inheritance, crossâlanguage links). We also preâindex Codebase knowledge such as design documents, architecture diagrams and internal wiki pages. This preâindex allows us to perform graph traversals and conceptâbased lookups with ultra-low latency.
Combining Vector Search with GraphâBased Retrieval
When a user issues a query (via chat, completion or code search), the system:
Computes an embedding of the query using the same custom model.
Performs a vector search on the server to retrieve topâN similar snippets.
Uses the code graph to expand or refine the candidates based on structural relationships (e.g., include the function that calls the retrieved snippet or documentation that references it).
Ranks the final results by combining similarity scores with graphâbased relevance signals.
This hybrid approach ensures that relevant but textually dissimilar code (such as a function definition referenced by a callâsite) is surfaced alongside semantically similar snippets. It also allows the system to align retrieval with the developerâs current branch and local changes.
RealâTime Updates and Personalization
Every developer has a personal index tied to their current working state. When you switch branches, edit files or perform searchâandâreplace operations, the client notifies the server of the changes, and the server updates the corresponding embeddings within seconds. The graph is updated simultaneously. This realâtime synchronization ensures that suggestions always reflect the latest state of your codebase.

Scalability and Performance
Our backend is built to handle the high throughput of software development. It processes thousands of files per second and scales horizontally to accommodate large repositories. The client caches graphs to avoid redundant computation, and batched updates prevent network congestion.
Security and Privacy by Design
We never send raw code to thirdâparty services; all embedding computation and vector search occur within our own infrastructure. Before retrieving any snippet, the client must prove possession of the fileâs content by sending a cryptographic hash, ensuring that only authorized users can access code. Embeddings are encrypted in transit and at rest.
Use Cases and Examples
Navigating Complex Codebases
When working on a large monorepo, Qoder may need to understand how a service interacts with downstream components. Qoder Agent searches the entire codebaseânot only for definitions with similar names, but also for the call chain, configuration files, and design documents related to that functionâthanks to graph traversal and knowledge pre-indexing.
Incident Response and Debugging
During an incident, you need to quickly identify all code paths affected by a failing component. Our hybrid retrieval surfaces related code modules, tests and runbooks, allowing you to triage faster than with generic search.
Top comments (0)