Moving from Typescript and Langchain to Rust and Loops

#rust #vectordatabase #typescript #opensource

Embarking on your open-source journey can be like exploring uncharted territory. Picture this: you're a newcomer, eager to contribute but struggling to find your way around the codebase. That's where Repo-Query steps in. Let's delve into the exciting journey of Repo-Query and how it enables easier contributions to open-source projects.

The Evolution of Repo-Query

Repo-Query, a REST service that indexes public repositories and provides insightful answers to user queries, all within your browser through the OpenSaucedAI browser extension.

Adding a Chat Window to the browser extension#226

From Inception to Prototype

In perhaps a day or two, the prototype of Repo-Query was hacked together. Leaning on the abstractions provided by modern web development, the initial version of the service was woven together using Typescript and Langchain. The sandbox for this initial experiment was the repository gh-answering-proto, and the outputs of the semantic-search were close to what a human would find when asked to find a code chunk relevant to a query. This accuracy produced some impressive results.

> example@1.0.0 start
> tsx ask.ts
What is your query?
How is the AI PR description being generated? response {
text: 'The AI PR description is being generated by leveraging the getAiDescription function, which takes in a pull request URL and uses it to retrieve the pull request API URL. It then retrieves the description configuration and checks if it is empty. Finally, it uses the `getDescriptionContext` function to get the diff and commit messages based on the configuration source, and generates the description using AI. The generated description is then displayed using the `setContent` function.'
What is your query?
How are the project releases made? response {
text: 'The project releases are made using their own configuration for semantic-release, which allows them to automatically generate changelogs and releases for their projects bas ed on the commit messages. The beta branch is the default branch, and they squash & merge PRS to the beta branch. They never commit directly to `main. A merge to `beta` will trigger a beta release, while a merge to `main` will trigger a full release.'
}
What is your query?
How is the project preventing duplicate button injections? 
response {
text: 'The project is preventing duplicate button injections by checking if the button already exists on the page before injecting it. This is done inside the injection script, where it checks if the button DOM element already exists using `document.getElementById("ai-description-button")`. If the element already exists, the injection script returns without injecting the button again.'
}
What is your query?
How is authentication being done in the project?
response {
text: 'Authentication in the project is being done through the `checkAuthentication function imported from the `../utils/checkAuthentication` module. This function takes in sever al parameters such as `hasOpted LogOut`, `getCookie`, `checkTokenValidity`, `setAccessTokenInChrome Storage`, `removeAuthTokenFromStorage', and 'logError`. These parameters are used to check if the user has opted out, get the authentication cookie, check the validity of the token, set the access token in Chrome storage, remove the auth token from storage, and log any errors that may occur. The checkAuthentication function is triggered by the chrome.cookies.onChanged event listener.'
}

Unveiling Challenges

As the innovative journey unfolded, so did the challenges. The first formidable hurdle emerged in the form of performance bottlenecks. Rigorous testing on the extensive insights.opensauced.pizza repository revealed issues. The process of generating embeddings, a critical component, was taking an agonizing amount of time that was far from ideal for any user seeking rapid answers. This was a breaking point; the need for a more performant solution was critical.

The Langchain Q&A retrieval system, although powerful, posed its own unique challenge – it operated in a one-shot manner, lacking the capability to receive feedback from answers and further explore the knowledge base. This limitation resulted in answers that fell short of comprehensive.

The poor performance stemmed from several factors:

Utilizing OpenAI embeddings for codebases proved to be both inefficient and impractical, necessitating the development of a local embedding solution. The embeddings for the insights.opensauced.pizza repository took upwards of 15 minutes to generate🥲.
At the time of the prototype's development, the Langchain GitHub loader sent one request per file to fetch the repository sequentially, leading to prolonged download times. In our case about 2 minutes for the insights.opensauced.pizza repository. This issue was later resolved in hwchase17/langchainjs#2224, enabling parallel requests for faster retrieval.
The process of chunking the codebase using Langchain's recursive splitting strategy required optimization.

No one would wait for such a long time to get an answer. I'd rather watch a YesTheory video and finally get off my desk.

Embracing Rust and the Art of Loops: A Transformative Shift

In the quest for more efficient solutions, the ONNX runtime emerged as a beacon of performance. The decision to transition from Typescript to Rust was an unconventional yet pivotal one. Driven by Rust's robust parallel processing capabilities using Rayon and seamless integration with ONNX through the ort crate, Repo-Query unlocked a realm of unparalleled efficiency. The result? A transformation from sluggish processing to, I have to say it, blazing-fast performance.

The Dual Acts:

Let's dissect Repo-Query's two key acts:

Act 1: /embed

The /embed endpoint is the engine that fuels the process of downloading and generating embeddings for GitHub repositories. A simpler and straightforward approach was embraced for fetching a repository; instead of pursuing individual requests, Repo-Query taps into the GitHub API's /archive service. This streamlined the process by condensing repository downloads into a single, efficient request per repository. Without the need for iterating through individual file retrieval requests, reminiscent of Langchain's GitHub document loader. The download time for the (https://github.com/open-sauced/app) repository was now down to 5 seconds(50 Mbps) for me.

src/github/mod.rs

pub async fn fetch_repo_files(repository: &Repository) -> Result<Vec<File>> {
    let Repository {
        owner,
        name,
        branch,
    } = repository;

    let url = format!("https://github.com/{owner}/{name}/archive/{branch}.zip");
    let response = reqwest::get(url).await?.bytes().await?;
    ...
}

The true essence of this enhancement lies in the synergy between the ONNX Runtime and the ort and Rayon crates. This combination tranformed the embedding process into a swift orchestration. The insights.opensauced.pizza repository could now be indexed in seconds, which was a huge uptick from previous benchmarks of 15+ minutes. The embeddings would now take about 30 seconds or so to generate. Like I said, blazing-fast embeddings.

src/github/mod.rs

pub async fn embed_repo<M: EmbeddingsModel + Send + Sync>(
    repository: &Repository,
    files: Vec<File>,
    model: &M,
) -> Result<RepositoryEmbeddings> {
    let file_embeddings: Vec<FileEmbeddings> = files
        .into_par_iter()
        .filter_map(|file| {
            let embed_content = file.to_string();
            let embeddings = model.embed(&embed_content).unwrap();
            Some(FileEmbeddings {
                path: file.path,
                embeddings,
            })
        })
        .collect();
}

Act 2: /query – An Unpredictable Symphony

At the epicenter of the /query endpoint exists a feedback loop. While conventional loops follow a predetermined path, this loop is a voyage into the unknown—a journey that responds, and evolves based on context and interactions as decided by GPT-3.5.

OpenAI's function calling entered the stage at an opportune moment. Within the heart of the loop, a number of semantic-search functions are exposed. GPT-3.5 leverages these functions to gather pertinent information from the codebase, aligning with the user's query and intent.

The magic happens when these functions interweave. GPT-3.5 dynamically chooses which function to invoke based on the evolving conversation. It's a narrative that unfolds based on the twists and turns of the user's questions. As GPT-3.5 traverses through the functions, the loop adapts, collecting more information, refining its understanding, and ultimately crafting responses that are not just relevant, but insightful.

src/conversation/mod.rs

pub async fn generate(&mut self) -> Result<()> {
    'conversation: loop {
        let request = generate_completion_request(self.messages.clone(), "auto");
        ...
        match parsed_function_call.name {
            Function::SearchCodebase=> ...
            Function::SearchFile => ...
            Function::SearchPath => ...
            Function::Done => ...
        }
    ...
}

The Epilogue: Rust and GPT-3.5 – An Unconventional Symbiosis

Transitioning to Rust for crafting an AI application may not align with convention, sure. But hey, who said innovation was all about following the recipe?

You can try out the service at https://opensauced.ai/.
For the source-code, visit Repo-Query's GitHub repository.

Stay Saucy🍕