DEV Community

Cover image for I Built a Free Open-Source Alternative to Sourcegraph — Here's Why"
Mukund Jha
Mukund Jha

Posted on

I Built a Free Open-Source Alternative to Sourcegraph — Here's Why"

It was 2 AM on a Tuesday, and I was staring at a function I had never seen before.

The codebase had 47,000 files across six microservices, written by people who had left the company years ago. I needed to understand how authentication flowed from the mobile app to the backend, through three different services, and back. I had grep open in one tab, the codebase in another, and ChatGPT in a third — copying and pasting files one at a time.

This isn't a workflow. It's a coping mechanism.

Every developer I know does some version of this. We read five files to understand one function. We trace imports manually. We build mental models that evaporate the moment Slack pings. We accept that understanding a new codebase takes weeks — sometimes months — as if that's just how software is made.

I don't think it has to be this way.


The Problem Nobody Solved

Sourcegraph exists. It's excellent at what it does — code search at enterprise scale. But it's built for organizations with dedicated infrastructure budgets. You need to self-host it, configure it, maintain it. For a solo developer, a small team, or an open-source project, the overhead doesn't make sense.

There are other tools too. But they all have the same gap: they search for text, not meaning.

When you type "where is auth handled," you don't want grep results. You want someone to tell you: "Auth starts in src/middleware.ts, the login form lives in src/auth/login.tsx, and the token exchange happens in src/api/auth/route.ts — here are the line numbers."

I wanted a search bar for codebases that understood how things connect. A tool that reads every file, builds a mental model of the dependency graph, and answers questions in plain English.

So I built one.


What Lexithm Does

Lexithm reads your entire repository — every file, every function, every import — and lets you ask questions about it in natural language. It extracts symbols, builds dependency graphs, detects API routes, and generates embeddings for semantic search.

Here's the flow:

  1. Sign in with GitHub — no CLI, no config, no setup.
  2. Pick a repository — public or private, works the same way.
  3. Wait for indexing — AST-level parsing for Python, JavaScript, TypeScript, Go, Java, and Rust. Most repos finish in 2-5 minutes.
  4. Ask questions — "what does this project do," "how does authentication work," "find the bug in the payment flow." Every answer cites the file paths and line numbers it references.

The key design decision was: no installation. Everything runs in the browser. You don't install a plugin, set up a server, or configure anything. It's a web app that connects to your GitHub and starts working.


The Architecture

The backend is Python with FastAPI. The indexing pipeline has 16 stages:

  1. File discovery — categorize every file by type
  2. AST parsing — build structured representations of each source file
  3. Graph construction — connect symbols into dependency and call graphs
  4. Embedding generation — convert every code chunk into vector embeddings
  5. Storage — persist the indexed data for instant retrieval

When you ask a question, the system classifies your intent, retrieves the most relevant code chunks from the index, and feeds them to an LLM with a tailored prompt. Only the relevant context is sent — never your entire codebase.

The frontend is Next.js with a terminal-inspired chat interface. It streams responses token by token so you see the answer build in real time.


Why It's Free

Lexithm runs on free tiers. NVIDIA NIM provides the LLM inference at no cost. OpenRouter serves as a fallback. The embedding model runs on NVIDIA's API. Supabase handles the database and authentication. Everything stays within the free usage limits.

I built this because I wanted a tool like this to exist. Not everything needs to be a startup. Not everything needs an enterprise license. Some things can just be useful.


FAQ That People Actually Ask

How is this different from Sourcegraph?
Sourcegraph is built for enterprise code search at scale — think regex across thousands of repos. Lexithm is built for conversational code understanding — ask questions in English, get answers with citations. It's also free, requires no infrastructure, and takes zero setup.

Is my code safe?
Your repository is indexed on the backend, not stored by any LLM provider. Only the relevant context is sent for each question. You can delete the index at any time.

How long does indexing take?
Most repositories finish in 2-5 minutes. Larger projects with over 100,000 files can take up to 20 minutes.

Do I need to install anything?
No. Everything runs through your browser.


Try It

The site is live at lexithm.vercel.app. Sign in with GitHub, pick a repository, and ask a question.

The code is on GitHub at — well, it will be as soon as I push it. (I wanted to write this post first.)

If you've ever spent hours tracing code in an unfamiliar repository, you know exactly why I built this. I hope it saves you some of those hours.


Built with Next.js, FastAPI, NVIDIA NIM, Supabase, and a lot of late nights.

Top comments (0)