DEV Community

Cover image for Building the Full AST to Embedding Pipeline for Secrin
Jenil Savani
Jenil Savani

Posted on

Building the Full AST to Embedding Pipeline for Secrin

Today was one of those days where everything finally starts connecting like LEGO blocks. I worked on the full pipeline that will let Secrin take any code repository, break it down into tiny meaningful pieces, and store them in a graph database so we can do some seriously smart search later.

Think of it like teaching Secrin to read code like a detective spotting characters, events, relationships, and breadcrumbs.

Here’s what I tackled today:

1. Cloning the Repository

This is the “bring the book home” step. Before Secrin can understand anything, it needs a local copy of the code. I built the logic that cleanly clones a Git repo and prepares it for scanning.

2. Parsing the AST (Abstract Syntax Tree)

Imagine turning a story into a family tree of words, sentences, and meaning.
That’s what AST parsing does for code. I wired up the system so Secrin can take a file and break it down into all the little nodes functions, classes, variables, conditions, loops, the whole cast of characters.

3. Inserting Nodes into the Graph DB

Once we have these AST nodes, we drop them into the graph database.
Graph is perfect for this because it lets us:

  • store relationships naturally
  • connect nodes like “function A calls function B”
  • track code structure like a real map instead of forcing everything into tables

Today I finished the core logic for inserting these nodes and connecting their relationships.

4. Storing Commits

Code changes over time.
Commits are like diary entries.
Now we store every commit as a node too, and link it to the files and AST structures it changed.
This makes the knowledge graph “time-aware,” which will help answer questions like:

  • “When did this function break?”
  • “Who last touched this part of the code?”

5. Adding Embeddings for Hybrid Search

This is where things get fun.
Embeddings give meaning to text and code so Secrin can understand similarity.
We combine:

  • graph search (precise)
  • embedding search (fuzzy but smart)

This hybrid approach means the user can ask anything from “where is this variable used?” to “why does this function exist?” and Secrin can dig deep into both structure and meaning.

Top comments (0)