<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ishaan Mavinkurve</title>
    <description>The latest articles on DEV Community by Ishaan Mavinkurve (@idiotcoffee).</description>
    <link>https://dev.to/idiotcoffee</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3935689%2F05dbf0c1-43f1-40d3-a146-030a0662e429.png</url>
      <title>DEV Community: Ishaan Mavinkurve</title>
      <link>https://dev.to/idiotcoffee</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/idiotcoffee"/>
    <language>en</language>
    <item>
      <title>Building Dhrishti Part 2: Go-Lang Quirks</title>
      <dc:creator>Ishaan Mavinkurve</dc:creator>
      <pubDate>Sun, 31 May 2026 03:59:14 +0000</pubDate>
      <link>https://dev.to/idiotcoffee/building-dhrishti-part-2-go-lang-quirks-1hm4</link>
      <guid>https://dev.to/idiotcoffee/building-dhrishti-part-2-go-lang-quirks-1hm4</guid>
      <description>&lt;p&gt;&lt;em&gt;— written by a human!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now, my thinking about Dhrishti had evolved - I wanted to decouple the different steps of actually receiving telemetry which were originally bunched together into one single &lt;code&gt;loader.go&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;I made the following architecture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;events.go&lt;/code&gt; - When my eBPF code ran, it would produce data in raw binary structs. Hence, my Go code, while going through the ring buffer, would get &lt;strong&gt;RAW BYTES&lt;/strong&gt;. In Go, I needed structs that would EXACTLY match the structs written in my &lt;code&gt;bpf.c&lt;/code&gt; code. This is what is called as &lt;code&gt;Application Binary Interface&lt;/code&gt; or ABI. This would allow my Go code to exactly decode the binary bytes and get the actual data in a readable format.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;receiver.go&lt;/code&gt; - This was the layer that would ingest my raw data by reading it continuously from the ring-buffer.  This had some beautiful event-driven architecture to be implemented, and this was actually the first time I had tried it out.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;normalize.go&lt;/code&gt; - Now, I had data in machine code… my timestamps were in nano seconds, my enums were numeric, my IP addresses were uint32 - this was useful to the machine, not so useful for me or other humans. I now needed to normalize the data and convert it to human readable code.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pipeline.go&lt;/code&gt;  - This was the orchestrator, where different go routines were running in parallel to receive the emitted data from my probes, and normalize and log them.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;attach.go&lt;/code&gt; - I needed this file to attach the probes to my receiver, and make a connection ****so I could start reading the events. It would load the object files, create the ring-buffer readables and attach the probes to the kernel programs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I thought this was clean enough architecture. Now, when I ran my basic server in docker, and ran the main.go program, I got:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3uqus6tijostuk5xpvf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3uqus6tijostuk5xpvf.png" alt="Connections with additional info" width="701" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Beautiful. This did not look like much, but I was actually processing quite a few events. Now, I had to resolve the names of the docker containers, so I knew the actual connections rather than the IPs. I already had the functions to do this, and I just had to add them into the updated flow to get:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8yzqh7htc5i5xl4wcn8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd8yzqh7htc5i5xl4wcn8.png" alt="Connections after docker name resolution" width="771" height="233"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, it was time to take a bigger step. Until now, I was using a simple client-server architecture. This was good. However, I now wanted a real challenge for my project. &lt;br&gt;
So I made the following architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjm8fztdb2iy5nr7xdhn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbjm8fztdb2iy5nr7xdhn.png" alt="Architecture of microservices" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I built a micro-services architecture that was using this design. This would be a more complex, more real world test for Dhrishti. I dockerized the services, ran the containers, started Dhrishti.&lt;/p&gt;

&lt;p&gt;And the result?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6moticjmj42y6rwmvcs6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6moticjmj42y6rwmvcs6.png" alt="Lots and Lots of connections like in a production environment" width="799" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Beautiful. All connections were seen correctly. &lt;/p&gt;

&lt;p&gt;Now, the next step was to actually make sense of all of these arrows. The raw telemetry I was getting was &lt;em&gt;stateless&lt;/em&gt;. That meant, it could only understand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;connect happened
close happened
accept happened
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But… who connected to whom? How long was the connection? How many connection attempts succeeded? &lt;/p&gt;

&lt;p&gt;To answer this, I decided to build a connection state. This would track a connection from open to close, and also track failed connections.&lt;/p&gt;

&lt;p&gt;I also had a seperate problem - sometimes, I saw&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dot"&gt;&lt;code&gt;&lt;span class="nv"&gt;gateway&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;auth&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;service&lt;/span&gt;
&lt;span class="nv"&gt;auth&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;service&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;gateway&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was essentially 1 request response cycle. I had to track it as such. So, I decided to construct a flow correlation engine. &lt;/p&gt;

&lt;p&gt;The next problem I had to tackle was - if I saw a &lt;code&gt;closed=True&lt;/code&gt; with an &lt;code&gt;accept=False&lt;/code&gt; - that meant I was looking at a failed connection - it was never accepted by the server. I had to track these as well. I also had a problem with &lt;code&gt;short-lived connections&lt;/code&gt; - connections that were made and closed so fast that either I missed the connection itself (which was okay, because I think &lt;strong&gt;telemetry services are lossy to some extent anyway&lt;/strong&gt;) or I could record the connection open, but not the connection close - which was a problem. Some graph edges remained open for ever, which was not right. &lt;/p&gt;

&lt;p&gt;Hence, I added a cleaner - it would track connections that were open for more than 30 seconds (later reduced), close them and clean up memory. &lt;/p&gt;

&lt;p&gt;I also needed something that looked like real time metrics. Currently, I was calculating Average latency between connections, for example. But when I observed my results, I saw that after a point, new connections did not change the average latency as much. I wanted to ensure that if something was failing, I knew it immediately - so I added calculation for&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- rolling window temporal calculations
- p95 latency (what is the latency expected 95% of the time)
- rolling averages (over a sliding window)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After adding these components, my metrics started to look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frkl6nhpi9v4bfky3ytyx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frkl6nhpi9v4bfky3ytyx.png" alt="Connections after adding a lot of information" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're thinking, "This is a LOT of information!" - yes, so was I. At this point, the client in my mock service was REALLY RAPIDLY sending requests to my API gateway, and it was becoming difficult to actually analyze my results.&lt;/p&gt;

&lt;p&gt;I even tried to add some time gaps between requests sent by the client in my mock service, and added a keep-alive time for my requests themselves… but the terminal logs were still going by very fast for me to understand anything. &lt;/p&gt;

&lt;p&gt;So, I decided to load up Cursor, and vibe-coded the entire front-end for my application. I just wanted a UI to view my metrics correctly. I was not concerned with UI polish for now. After a little bit of prompting, I decided to implement a cytoscape.js Graph (which would give me an interactive graph with a legend) to simulate the front-end using a web-socket from my Go backend.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fso4kmjyec2lebb0pn60b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fso4kmjyec2lebb0pn60b.png" alt="cytoscape.js front-end showing graph" width="800" height="684"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Okayy, this was looking pretty good! The connections that were active would be dotted lines, the colors in the connections represented the latencies and hovering on the connections even gave me all the exta information - like connection life, p99 and p95 latencies, etc.&lt;/p&gt;

&lt;p&gt;It also exposed some Go-Lang related issues. This was the part where it got interesting. I had never worked with Go so heavily until now. I knew the concepts I was using and the documentation was &lt;strong&gt;VERY&lt;/strong&gt; comprehensive, but I still made some very interesting mistakes:&lt;/p&gt;

&lt;p&gt;I was using Mutexes for a certain part of Dhrishti, basically, a Go listener would hold a thread until it heard a probe emit an event.&lt;br&gt;
This was directly messing with my server stats, because it caused deadlocks, with one go function waiting on the other to release, and the other one waiting on the first to release - so I had to do some refactoring to prevent it.&lt;br&gt;
The next, more subtle issue was with Go’s own &lt;em&gt;Garbage Collector&lt;/em&gt;. This is a program that runs periodically and checks whether there are any variables it can clean up to free up memory. This bug took me SO LONG to resolve, but when I finally had it, I was probably the happiest man alive for about 3 minutes. &lt;/p&gt;

&lt;p&gt;My app had &lt;strong&gt;4 “listeners”&lt;/strong&gt; plugged into Linux kernel events (like satellite dishes listening for TCP connect/accept/close activity from kernel space). Those listeners were created at startup and used to feed data into my Go pipelines. However, the GC used to only see that these listeners were created ONCE and then unused - so it decided to clean it up, breaking my graph after around 20 to 30 seconds. I had to force these listener objects to stay alive for the full life of the app by storing them in a forever-running go - routine context.&lt;/p&gt;

&lt;p&gt;In simple terms: I gave Go a permanent “don’t throw this away” reference. This was the first time I had run into problems with Go-Lang’s &lt;em&gt;quirks&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now, I had a working UI, a good amount of information from my Probes, some GREAT lessons by building the project in Go, and it was time to test out my project on something…. &lt;em&gt;bigger&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The next step was to setup and use a real, actual GitHub repo that replicated an application. I had options like Google Boutique, for example - which simulated a real E-commerce website with a lot of micro-services. I also wanted to experiment with tools like hey and k6 to simulate production behaviour. But I am still building this phase out, and I will document it as I move forward. Let me know if you have some tips for this phase, please!&lt;/p&gt;

&lt;p&gt;Check out Dhrishti here: &lt;a href="https://github.com/IdiotCoffee/dhrishti" rel="noopener noreferrer"&gt;https://github.com/IdiotCoffee/dhrishti&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>devops</category>
      <category>architecture</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Dhrishti Part 1 - Building Runtime Observability for Distributed Systems</title>
      <dc:creator>Ishaan Mavinkurve</dc:creator>
      <pubDate>Thu, 28 May 2026 10:06:59 +0000</pubDate>
      <link>https://dev.to/idiotcoffee/dhrishti-part-1-building-runtime-observability-for-distributed-systems-5f2e</link>
      <guid>https://dev.to/idiotcoffee/dhrishti-part-1-building-runtime-observability-for-distributed-systems-5f2e</guid>
      <description>&lt;p&gt;— &lt;em&gt;written by a human!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Recently at work, I worked on a major project - Multitenancy.&lt;/p&gt;

&lt;p&gt;Initially, we used to provide one virtual machine to every customer that we aquired. This meant a lot of manual configuration, multiple deployments for a small hot-fix, and more importantly, a lot of time spent in connecting to a remote SSH session and debugging network issues. Multitenancy would fix this by basically alloting all customers to a single machine. This didn’t sound bad, but now think about the legacy code - all the MongoDB connections, for example, or my .env files - everything was customized to an individual instance, and I had to make it so that the application for each customer worked within the scope of their own organization. In short, I did not want data from one organization to be visible in another.&lt;/p&gt;

&lt;p&gt;The code itself was difficult to conceptualize, but not impossible. What I felt was harder were the migrations themselves. My team and I spent countless hours pouring over connection errors, debugging Docker containerization issues, pointing our code to the correct env files - we almost gave up on this massive undertaking multiple times!&lt;/p&gt;

&lt;p&gt;Once we pulled through and this project was done, I began to wonder - what if there was some way to make this process easier?&lt;/p&gt;

&lt;p&gt;What if, through some coding magic, I could ACTUALLY make a graph to visualize all the network connections in an application? I could simply point my program to a docker container, and it would dive into the Kernel and reverse engineer its own architecture from system-calls to network events. &lt;/p&gt;

&lt;p&gt;I began doing some research, and I found the main character in this story - eBPF.&lt;/p&gt;

&lt;p&gt;What is eBPF?&lt;/p&gt;

&lt;p&gt;eBPF is a program that would allow me to run sandbox programs inside the Linux KERNEL. It would do so without modifying kernel sources or loading any kernel modules that were potentially unsafe.&lt;/p&gt;

&lt;p&gt;The Kernel in Linux handles all the cool stuff - TCP connections, when a process starts, how much memory is allocated, etc.&lt;/p&gt;

&lt;p&gt;eBPF would allow me to send a small “probe” into this Linux Kernel Space, and observe what happens around it. Then, any important or significant information would be emitted back to me.&lt;/p&gt;

&lt;p&gt;I like to think of it like &lt;em&gt;Voyager 1&lt;/em&gt; . (I love reading about space exploration!). This is a space probe that happens to be the FARTHEST human made object from us - and we can still communicate with it!&lt;/p&gt;

&lt;p&gt;So, all I had to do was create a probe, send it out on an adventure into Kernal space, and have it emit events back to me. Simple. How would I capture the events it sent? Well, Claude suggested using a receiver, which I would write in Go, to collect these events. &lt;/p&gt;

&lt;p&gt;So I started. I opened up Zed and made 2 files - a server.py, and then a client.py. The client would simply send a request to the server every 3 seconds, and the server would return a Hello, world! response. &lt;/p&gt;

&lt;p&gt;Next, I put both of them into their own docker containers, with the client being dependent on the server container.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frac29lsrttvpeewk5qxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frac29lsrttvpeewk5qxr.png" alt="Client-server file structure" width="229" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After that, I ran&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="nx"&gt;docker&lt;/span&gt; &lt;span class="nx"&gt;compose&lt;/span&gt; &lt;span class="nx"&gt;up&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And boom, I had just created a sandbox environment wherein TCP connections were being made, and a real application was running.&lt;/p&gt;

&lt;p&gt;Now, I had to build a probe to venture out into the vast expanse of (Linux Kernel) space and emit discoveries! For this, I used the help of ChatGPT. I asked it to make me a probe that would run and collect TCP events. It made a probe using C, and also said:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faddltoiaas0rssjzc70o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faddltoiaas0rssjzc70o.png" alt="Chatgpt-message" width="613" height="115"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I always knew that space exploration could be dangerous, and I would never understand everything fully. But, at a high level, the code did the following:&lt;/p&gt;

&lt;p&gt;my probe would hook onto the Kernel, look for &lt;code&gt;tcp_connect&lt;/code&gt; events, extract the meta-data and emit it out.&lt;/p&gt;

&lt;p&gt;Also, to make sure I followed CO-RE principles (Code Once, Run Everywhere), I had to make a vmlinux.h file with my &lt;strong&gt;kernel’s actual type definitions, extracted from BTF metadata, specifically for BPF programs.&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BPF - this is kernel runtime type metadata.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those like me who didn’t understand a word of the above, basically, I knew my probes would run on MY kernel space, but I could not guarantee that they would run on another type of Linux Kernel, or that they would not break if the libraries I was using god updated. So, I had to make a file to store all metadata about how to run my probes in every (known) situation.&lt;/p&gt;

&lt;p&gt;So I ran this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="nx"&gt;bpftool&lt;/span&gt; &lt;span class="nx"&gt;btf&lt;/span&gt; &lt;span class="nx"&gt;dump&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;sys&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;kernel&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;btf&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;vmlinux&lt;/span&gt; &lt;span class="nx"&gt;format&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;vmlinux&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I compiled the probe and ran the probe.o file. Now, I had a probe sent into the docker sandbox application, and it was already emitting events. I now needed to make a receiver that would receive these events.&lt;/p&gt;

&lt;p&gt;For this, I wanted to collect the telemetry in a language that was fast, efficient and easily compiled, so that my activity of listening to the probe did not slow down the application that I was supposed to observe. &lt;br&gt;
Hence, I selected Go. &lt;br&gt;
Go is truly a beautiful language, and I really wanted to use it in a project after having learnt it a little while ago. I also came across some really cool quirks of Go which I had to work around (stay tuned, this is for Part 2!)&lt;/p&gt;

&lt;p&gt;In Go, I built a struct to collect the events that were being sent by the probe:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Event&lt;/span&gt; &lt;span class="nx"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;Pid&lt;/span&gt;   &lt;span class="nx"&gt;uint32&lt;/span&gt;
    &lt;span class="nx"&gt;Comm&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="nx"&gt;byte&lt;/span&gt;
    &lt;span class="nx"&gt;Daddr&lt;/span&gt; &lt;span class="nx"&gt;uint32&lt;/span&gt;
    &lt;span class="nx"&gt;Dport&lt;/span&gt; &lt;span class="nx"&gt;uint16&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also built a resolver that would resolve a Docker Client and return the Client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="nx"&gt;func&lt;/span&gt; &lt;span class="nc"&gt;NewDockerResolver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;DockerResolver&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;cli&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NewClientWithOpts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FromEnv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;WithAPIVersionNegotiation&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nx"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;DockerResolver&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nl"&gt;cli&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cli&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, I was ready to connect to my probe and get some data! The concept was as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I would attach the satellite to the &lt;code&gt;tcp_connect&lt;/code&gt; probe, meaning, when the Linux Kernel made a new TCP connection, my code would run and gather telemetry&lt;/li&gt;
&lt;li&gt;Resolve the container name from the PID&lt;/li&gt;
&lt;li&gt;Based on the tcp_connect info, make a graph with 2 vertices to denote client and server (resolved from PID) and an edge denoting the dependency.&lt;/li&gt;
&lt;li&gt;Check every 5 seconds, collect telemetry, and also calculate number of requests that came in.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After a lot of experimentation, referring to docs and to ChatGPT, I managed to code out the steps exactly like this. My code was being orchestrated by a file called &lt;code&gt;loader.go&lt;/code&gt;  so, I turned it into an executable.&lt;/p&gt;

&lt;p&gt;Then, I ran my docker service and also my executable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldxzxuudu4m33lobbw0a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldxzxuudu4m33lobbw0a.png" alt="Results" width="708" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Oh. My. God. I could talk to my PROBE!! I sat there for a good 15 minutes just looking at my telemetry logs. This was beautiful.&lt;/p&gt;

&lt;p&gt;It was also insufficient. This did not tell me EVERYTHING I wanted to know about my containers. But now, the basic idea was built. All I had to do was send out multiple probes that specialized in multiple types of data gathering, and make sure I collected ALL of that data.&lt;/p&gt;

&lt;p&gt;When this was done, all I had to do was make a beautiful (AI Generated) front-end to show this graph by polling an API repeatedly.&lt;/p&gt;

&lt;p&gt;With the help of ChatGPT, I built the following probes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;tcp_connect&lt;/code&gt; - allows me to find out which processes initiated an outbound TCP Connection. This is the birth of a dependency graph. It was the core of my project&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tcp_close&lt;/code&gt; - tells me when a connection gets terminated. This would allow me to compute the lifetime of 1 connection.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tcp_accept&lt;/code&gt; - triggers when a server actually ACCEPTS a connection. This gives me server side visibility, whereas &lt;code&gt;tcp_connect&lt;/code&gt; gave me client side visibility. This would help decide failed connections, queue saturation, etc.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tcp_state&lt;/code&gt; - would tell me when the STATE of a connection changed. States like - established, fin_wait, time_wait, etc.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I wanted to start with this, but first, I needed to improve my coding architecture. I had a loader.go that was basically handling everything, and that would not be scalable as I added more probes. &lt;/p&gt;

&lt;p&gt;So I had to come up with a better architecture for my code, but the project wasn’t just an idea anymore!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Stay tuned for the second part, or feel free to check out my full project here!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/IdiotCoffee/dhrishti" rel="noopener noreferrer"&gt;https://github.com/IdiotCoffee/dhrishti&lt;/a&gt;&lt;/p&gt;

</description>
      <category>programming</category>
      <category>linux</category>
      <category>go</category>
      <category>distributedsystems</category>
    </item>
    <item>
      <title>Building KernelMind Part 3: Evaluation, Retrieval Ablations, RAGAS, and Turning The Project Into Something Measurable</title>
      <dc:creator>Ishaan Mavinkurve</dc:creator>
      <pubDate>Wed, 20 May 2026 01:29:24 +0000</pubDate>
      <link>https://dev.to/idiotcoffee/building-kernelmind-part-3-evaluation-retrieval-ablations-ragas-and-turning-the-project-into-334h</link>
      <guid>https://dev.to/idiotcoffee/building-kernelmind-part-3-evaluation-retrieval-ablations-ragas-and-turning-the-project-into-334h</guid>
      <description>&lt;p&gt;By this point, KernelMind had already evolved far beyond the original “embeddings over code” idea.&lt;/p&gt;

&lt;p&gt;The system now had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AST-aware chunking&lt;/li&gt;
&lt;li&gt;fully qualified symbol identities&lt;/li&gt;
&lt;li&gt;graph-aware retrieval&lt;/li&gt;
&lt;li&gt;hybrid BM25 + embedding search&lt;/li&gt;
&lt;li&gt;query-aware graph expansion&lt;/li&gt;
&lt;li&gt;cross-encoder reranking&lt;/li&gt;
&lt;li&gt;workflow reconstruction&lt;/li&gt;
&lt;li&gt;grounded answer synthesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And honestly, the demos looked pretty convincing, which was kinda scary... because I knew from experience that retrieval systems are &lt;em&gt;extremely&lt;/em&gt; easy to overestimate when you only test them manually.&lt;/p&gt;

&lt;p&gt;If I asked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How does login work?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The answer &lt;em&gt;sounded&lt;/em&gt; smart enough and my brain immediately started cooperating with the system.&lt;/p&gt;

&lt;p&gt;The issue was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“sounds correct” is not an evaluation strategy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At some point, I realized I had absolutely no reliable way to answer the question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Is KernelMind actually improving?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I needed the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;evaluation
↓
benchmarking
↓
retrieval ablations
↓
RAGAS scoring
↓
precision / recall analysis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Building A Retrieval Benchmark
&lt;/h2&gt;

&lt;p&gt;The first thing I needed was a benchmark suite grounded in the actual repository.&lt;/p&gt;

&lt;p&gt;Initially, I made the classic mistake:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"yeah I'll just manually write expected answers"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Terrible idea.&lt;/p&gt;

&lt;p&gt;Very quickly I realized that retrieval evaluation only works if the benchmark references:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;real indexed chunks&lt;/li&gt;
&lt;li&gt;real graph nodes&lt;/li&gt;
&lt;li&gt;real repository symbols&lt;/li&gt;
&lt;li&gt;real workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Otherwise you end up evaluating benchmark inaccuracies instead of retrieval quality. So I started inspecting the actual indexed graph and rebuilding benchmark questions around real repository functions.&lt;/p&gt;

&lt;p&gt;The benchmark suite eventually covered things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;authentication workflows&lt;/li&gt;
&lt;li&gt;password reset flows&lt;/li&gt;
&lt;li&gt;CRUD operations&lt;/li&gt;
&lt;li&gt;dependency injection&lt;/li&gt;
&lt;li&gt;database initialization&lt;/li&gt;
&lt;li&gt;middleware chains&lt;/li&gt;
&lt;li&gt;token generation&lt;/li&gt;
&lt;li&gt;API → CRUD traversal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, I finally had something measurable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Precision vs Recall
&lt;/h2&gt;

&lt;p&gt;Once the benchmark suite existed, the retrieval behavior became much clearer to reason about.&lt;/p&gt;

&lt;p&gt;And almost immediately, I noticed a pattern:&lt;/p&gt;

&lt;p&gt;KernelMind was actually very good at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;workflow reconstruction&lt;/li&gt;
&lt;li&gt;semantic neighborhoods&lt;/li&gt;
&lt;li&gt;execution flow retrieval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But precision was messy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recall - Is my retriever actually getting all the required chunks for this answer?
Precision - How many of the retrieved chunks are relevant, and which ones are noise?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query:
How are users updated?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;might retrieve:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;update_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;delete_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;read_users&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which sounds bad initially.&lt;/p&gt;

&lt;p&gt;But interestingly:&lt;br&gt;
the retriever clearly understood the &lt;em&gt;domain&lt;/em&gt; correctly.&lt;/p&gt;

&lt;p&gt;The remaining problem was: &lt;strong&gt;operation specificity&lt;/strong&gt;. That distinction became really important later.&lt;/p&gt;
&lt;h2&gt;
  
  
  The First Ablation Test
&lt;/h2&gt;

&lt;p&gt;This was where I started learning about ablation testing.&lt;/p&gt;

&lt;p&gt;An ablation test is basically &lt;em&gt;remove one system component and observe what changes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The goal is to isolate whether a specific architectural layer is actually contributing measurable value or just making the pipeline look more complicated.&lt;/p&gt;

&lt;p&gt;So I started removing pieces of KernelMind individually and rerunning the evaluation benchmarks.&lt;/p&gt;

&lt;p&gt;The first major test:&lt;/p&gt;
&lt;h2&gt;
  
  
  graph expansion.
&lt;/h2&gt;

&lt;p&gt;I disabled graph expansion entirely.&lt;/p&gt;
&lt;h2&gt;
  
  
  WITHOUT Graph Expansion
&lt;/h2&gt;

&lt;p&gt;KernelMind produced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Precision: 0.267
Recall:    0.722
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The retrieval became cleaner.&lt;br&gt;
Less noisy.&lt;br&gt;
More focused.&lt;/p&gt;

&lt;p&gt;But:&lt;br&gt;
important workflow nodes started disappearing.&lt;/p&gt;

&lt;p&gt;Authentication flows became incomplete.&lt;br&gt;
Password reset chains broke apart.&lt;br&gt;
Execution flow reconstruction weakened significantly.&lt;/p&gt;

&lt;p&gt;Then I re-enabled graph expansion.&lt;/p&gt;
&lt;h2&gt;
  
  
  WITH Graph Expansion
&lt;/h2&gt;

&lt;p&gt;KernelMind produced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Precision: 0.243
Recall:    1.000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That result gave me measurable evidence that &lt;strong&gt;graph traversal was actually improving workflow recovery.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The graph architecture was not decorative complexity anymore.&lt;/p&gt;

&lt;p&gt;It was contributing real retrieval value.&lt;/p&gt;

&lt;p&gt;And interestingly, the precision drop was relatively small compared to the recall improvement.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;th&gt;Recall&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No Graph Expansion&lt;/td&gt;
&lt;td&gt;0.267&lt;/td&gt;
&lt;td&gt;0.722&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Graph Expansion&lt;/td&gt;
&lt;td&gt;0.243&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That tradeoff actually makes sense for repository reasoning systems.&lt;/p&gt;

&lt;p&gt;Missing workflow-critical chunks is usually worse than retrieving a few extra neighboring functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross Encoder Reranking
&lt;/h2&gt;

&lt;p&gt;The next ablation targeted the reranker.&lt;/p&gt;

&lt;p&gt;At this point, graph expansion was improving recall significantly, but it also widened the semantic neighborhood too aggressively.&lt;/p&gt;

&lt;p&gt;Authentication questions started retrieving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;password reset helpers&lt;/li&gt;
&lt;li&gt;email token utilities&lt;/li&gt;
&lt;li&gt;related middleware&lt;/li&gt;
&lt;li&gt;adjacent auth flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I disabled the cross-encoder reranker to isolate its effect.&lt;/p&gt;

&lt;p&gt;Almost immediately:&lt;br&gt;
precision degraded further.&lt;/p&gt;

&lt;p&gt;The reranker turned out to be extremely good at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;suppressing graph noise&lt;/li&gt;
&lt;li&gt;cleaning semantic drift&lt;/li&gt;
&lt;li&gt;removing unrelated neighboring chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That clarified something important for me. Each retrieval stage now had a very distinct responsibility:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;BM25&lt;/td&gt;
&lt;td&gt;lexical precision&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;embeddings&lt;/td&gt;
&lt;td&gt;semantic discovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;graph expansion&lt;/td&gt;
&lt;td&gt;workflow recovery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;reranking&lt;/td&gt;
&lt;td&gt;precision cleanup&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That was the point where KernelMind stopped feeling like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"random retrieval layers stacked together"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and started feeling like an actual retrieval architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval Window Tuning
&lt;/h2&gt;

&lt;p&gt;Another interesting discovery appeared while evaluating precision - my retrieval window was too large. Initially, KernelMind retrieved around:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;8–10 chunks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;for many questions.&lt;/p&gt;

&lt;p&gt;That improved recall, but precision became diluted because the benchmarks usually expected only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1–4 relevant chunks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So I started experimenting with smaller retrieval windows.&lt;/p&gt;

&lt;h3&gt;
  
  
  K = 10
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Average Precision: ~0.175
Average Recall:    ~0.824
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  K = 5
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Average Precision: 0.276
Average Recall:    0.720
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  K = 4
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Average Precision: 0.339
Average Recall:    0.711
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This was one of the clearest retrieval tradeoffs in the entire project:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Retrieval Size&lt;/th&gt;
&lt;th&gt;Precision&lt;/th&gt;
&lt;th&gt;Recall&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;larger K&lt;/td&gt;
&lt;td&gt;lower precision&lt;/td&gt;
&lt;td&gt;higher recall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;smaller K&lt;/td&gt;
&lt;td&gt;higher precision&lt;/td&gt;
&lt;td&gt;lower recall&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And honestly, seeing these tradeoffs emerge experimentally was incredibly satisfying because now retrieval tuning stopped being &lt;em&gt;"vibes-based engineering"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;and became measurable system behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating RAGAS
&lt;/h2&gt;

&lt;p&gt;Once retrieval stabilized, I finally moved into answer evaluation using RAGAS.&lt;/p&gt;

&lt;p&gt;This was another huge shift in mindset.&lt;/p&gt;

&lt;p&gt;Because retrieval quality alone does not necessarily guarantee:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;grounded explanations&lt;/li&gt;
&lt;li&gt;coherent synthesis&lt;/li&gt;
&lt;li&gt;faithful generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I started evaluating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;faithfulness&lt;/li&gt;
&lt;li&gt;answer relevancy&lt;/li&gt;
&lt;li&gt;context precision&lt;/li&gt;
&lt;li&gt;context recall&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I made a RAGAS evaluator file, but now I had a dilema - RAGAS actually uses LLMs to evaluate other LLMs (crazy, I know!)&lt;br&gt;
So, I had to give it an API key - but which LLM should I evaluate with? I was on a budget here with my side project, so I couldn't move directly to gpt-5.5, although it is considered the most precise evaluator. &lt;/p&gt;

&lt;p&gt;I also could not use Sarvam AI - because that was the LLM generating my answers, and I didn't really want any bias here (I don't know for sure if that's how it works, but I didn't want to take my chances!). So I decided to add:&lt;br&gt;
an OpenAI judge with gpt-5-nano&lt;br&gt;
and an Ollama Local model - Qwen2.5: 7b&lt;/p&gt;

&lt;p&gt;When testing with Ollama, I got my best results, partially because the small 7b parameter model probably blew up while evaluating my large retrieval codes!&lt;/p&gt;

&lt;p&gt;Finally, KernelMind produced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;faithfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.6080&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;answer_relevancy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7697&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_context_precision_without_reference&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5962&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context_recall&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5357&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Honestly, I was pretty happy with these results considering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most things, except the Synthesis using Sarvam AI, ran locally&lt;/li&gt;
&lt;li&gt;the retrieval pipeline was graph-aware&lt;/li&gt;
&lt;li&gt;the system reconstructed workflows instead of isolated chunks&lt;/li&gt;
&lt;li&gt;the generation was grounded entirely in retrieved repository context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More importantly:&lt;br&gt;
The generated answers read like grounded, non-hallucinated, work-flow answers, rather than generic RAG quality.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The login flow begins in login_access_token().
The route authenticates the user through crud.authenticate(),
then generates a JWT token using create_access_token(),
which downstream authenticated routes depend on through
FastAPI dependency injection.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was the moment KernelMind genuinely started feeling like: &lt;strong&gt;a repository reasoning assistant&lt;/strong&gt; instead of &lt;em&gt;vector search over code.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The TUI Phase
&lt;/h2&gt;

&lt;p&gt;And finally:&lt;br&gt;
once the retrieval and generation pipeline stabilized, I wanted a proper interface for interacting with the system.&lt;/p&gt;

&lt;p&gt;Could I have built a web app?&lt;/p&gt;

&lt;p&gt;Probably.&lt;/p&gt;

&lt;p&gt;Did I instead build a terminal UI because I use Linux and enjoy turning every side project into a cyberpunk terminal application?&lt;/p&gt;

&lt;p&gt;Absolutely.&lt;/p&gt;

&lt;p&gt;KernelMind now runs through a TUI built using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;textual&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rich&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The interface supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;conversational repository querying&lt;/li&gt;
&lt;li&gt;retrieval visualization&lt;/li&gt;
&lt;li&gt;grounded answer display&lt;/li&gt;
&lt;li&gt;live workflow exploration&lt;/li&gt;
&lt;li&gt;repository loading&lt;/li&gt;
&lt;li&gt;indexing feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And honestly, interacting with the system through the terminal felt surprisingly natural for this kind of project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7f4xtllomfuxj1thm0j2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7f4xtllomfuxj1thm0j2.jpg" alt="A tui interface showing a question, and the answer, along with the trace and expanded graph" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is something extremely satisfying about asking &lt;em&gt;How does authentication work?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;and watching a graph-aware retrieval engine reconstruct repository workflows directly inside the terminal.&lt;/p&gt;
&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;KernelMind started as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Repository → Embeddings → Search
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It eventually evolved into:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query
↓
BM25 + Embedding Retrieval
↓
Hybrid Fusion
↓
Graph Expansion
↓
Graph-Aware Ranking
↓
Cross-Encoder Reranking
↓
Context Building
↓
Grounded Answer Generation
↓
Evaluation + RAGAS Benchmarking
↓
Conversational TUI Interface
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But honestly, I had never really planned any of these steps. Almost every architectural layer emerged because the previous one failed in some interesting way. And that was probably the most fun part of the project - exploring, engineering my way around problems and learning some new stuff along the way!&lt;/p&gt;

&lt;p&gt;GitHub Repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/IdiotCoffee/kernel-mind
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>rag</category>
      <category>showdev</category>
      <category>llm</category>
      <category>performance</category>
    </item>
    <item>
      <title>Building KernelMind Part 2: Hybrid Retrieval, Reranking, and Actually Retrieving Useful Code</title>
      <dc:creator>Ishaan Mavinkurve</dc:creator>
      <pubDate>Mon, 18 May 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/idiotcoffee/building-kernelmind-part-2-hybrid-retrieval-reranking-and-actually-retrieving-useful-code-46dl</link>
      <guid>https://dev.to/idiotcoffee/building-kernelmind-part-2-hybrid-retrieval-reranking-and-actually-retrieving-useful-code-46dl</guid>
      <description>&lt;p&gt;By the end of the first phase of KernelMind, the repository had stopped behaving like disconnected text. Functions now had identity, relationships attached to them. The graph architecture was finally stable enough to represent execution flow across the repository.&lt;/p&gt;

&lt;p&gt;The next challenge was obvious:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do I retrieve the &lt;em&gt;right&lt;/em&gt; parts of this graph efficiently?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That was where retrieval engineering began.&lt;/p&gt;

&lt;p&gt;Initially, I shifted the retrieval pipeline to operate directly on chunks retrieved from FAISS instead of querying raw documents from MongoDB. The idea was fairly simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use embeddings to retrieve likely entry points&lt;/li&gt;
&lt;li&gt;then use the graph to reconstruct surrounding execution context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination became the foundation of KernelMind’s retrieval pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Retrieval Pipeline
&lt;/h2&gt;

&lt;p&gt;The naive version of retrieval looked roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;all-MiniLM-L6-v2 + FAISS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I intentionally started lightweight because I wanted fast local experimentation while debugging retrieval behavior. At this stage, I was not trying to build the perfect retriever. I just wanted something fast enough to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;retrieve semantically relevant chunks&lt;/li&gt;
&lt;li&gt;test graph expansion&lt;/li&gt;
&lt;li&gt;debug execution flow reconstruction&lt;/li&gt;
&lt;li&gt;and iterate quickly without destroying my laptop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And honestly, embeddings worked reasonably well at first.&lt;/p&gt;

&lt;p&gt;Questions like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How does authentication work?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;usually surfaced relevant code. But implementation-heavy queries struggled badly.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query: cookies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;might retrieve semantically similar request-handling logic instead of the actual cookie implementation.&lt;/p&gt;

&lt;p&gt;That was the first moment I realized something important:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;semantic similarity alone is not enough for repositories.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because repositories rely heavily on exact operational language, like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* imports
* function names
* config values
* error strings
* middleware identifiers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Things embeddings sometimes blur together semantically.&lt;/p&gt;

&lt;h2&gt;
  
  
  BM25 vs Embeddings
&lt;/h2&gt;

&lt;p&gt;This was where BM25 entered the system. After reading more about BM25, my rough mental model became:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;embeddings understand meaning, BM25 understands exact language.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;BM25 is a lexical retrieval algorithm that ranks documents using exact token overlap, token rarity, and frequency instead of semantic similarity.&lt;/p&gt;

&lt;p&gt;That turned out to be extremely useful for repositories.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;update_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;delete_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;all belong to the same semantic neighborhood. But operationally, they are completely different. Embeddings handled such conceptual understanding well.&lt;/p&gt;

&lt;p&gt;BM25 handled lexical precision much better.&lt;/p&gt;

&lt;p&gt;Neither alone was enough, so KernelMind evolved into hybrid retrieval. Instead of replacing embeddings entirely, I started combining both retrieval signals together using Reciprocal Rank Fusion (a fancy term for simply combining two results together).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reciprocal Rank Fusion (RRF) helped combine both retrieval systems by
rewarding chunks that consistently appeared near the top across both FAISS
and BM25 results. 
That gave KernelMind a much more stable retrieval signal than relying on either retriever independently.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The retrieval pipeline slowly evolved into:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Embedding Retrieval + BM25 Retrieval + Reciprocal Rank Fusion
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This improved retrieval quality almost immediately. The embedding retriever surfaced semantically relevant chunks. BM25 reinforced exact implementation-level details.&lt;/p&gt;

&lt;p&gt;And the fusion layer combined both into a much stronger retrieval baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graph Expansion Over Retrieved Chunks
&lt;/h2&gt;

&lt;p&gt;Once hybrid retrieval stabilized, I started layering the graph architecture over the retrieved results themselves. This was one of the biggest shifts in the system.&lt;/p&gt;

&lt;p&gt;Initially, retrieval still operated mostly on isolated chunks returned from FAISS and BM25.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But repositories rarely store logic in one place.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Authentication systems, for example, are spread across routes, middleware, services, validators, token handlers, configuration, dependency layers&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Retrieving one isolated chunk was often not enough to reconstruct execution flow.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So instead of treating retrieval results as final answers, I started treating them as entry points into the graph.&lt;/p&gt;

&lt;p&gt;The pipeline became:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Retrieve relevant chunks
↓
Expand neighboring execution context
↓
Rank expanded graph nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This improved workflow reconstruction dramatically.&lt;/p&gt;

&lt;p&gt;Questions like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How does login create the access token?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;no longer returned disconnected helper functions. The graph expansion layer started surfacing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;* login routes
* auth middleware
* token creation
* validation flows
* session handling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;as connected execution context. This was the first time I started seeing actual repository aware chunks being exposed in the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating the Cross Encoder
&lt;/h2&gt;

&lt;p&gt;Even hybrid retrieval and my powerful graph architecture (from the first Blog) still produced noisy candidates. Sibling-operation pollution became a recurring issue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;update_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;delete_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;read_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;would cluster together semantically even when only one of these actually answered the question. That was where cross encoder reranking entered the system. I started using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cross-encoder/ms-marco-MiniLM-L-6-v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initially, I didn't really know how a cross-encoder worked or whether it would be useful. So, I researched it, and basically, BM25 would match the content retrieved from the chunk with the query itself for literal lexical overlap (great for exact matches), whereas my cross-encoder would add both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(query + chunk)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;together and directly predict relevance using &lt;em&gt;neural relevance evaluations&lt;/em&gt;. That distinction mattered a lot. The reranker became really good at cleaning up semantically adjacent but incorrect retrievals, especially after graph expansion widened the context.&lt;/p&gt;

&lt;p&gt;Questions like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;How does login create the access token?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;started consistently surfacing the right chunks instead of unrelated utility code nearby in semantic space.&lt;/p&gt;

&lt;p&gt;The reranker essentially became a way to restore &lt;em&gt;precision&lt;/em&gt; after graph expansion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing The Generation Model
&lt;/h2&gt;

&lt;p&gt;Once retrieval quality became stable enough, I finally started experimenting more seriously with answer generation. I ahd all these chunks, and all the metadata with it, but for a human to make sense of it, it had to be in a proper readable format. This is where LLMs came in.&lt;/p&gt;

&lt;p&gt;I tested several local and hosted models during development:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT-4o-mini&lt;/li&gt;
&lt;li&gt;GPT-5-nano&lt;/li&gt;
&lt;li&gt;Qwen 2.5 Code&lt;/li&gt;
&lt;li&gt;and Sarvam’s absurdly generous free 105B model, which occasionally spoke enough sweet architectural encouragement into my ears for me to add another retrieval layer at 2 AM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Eventually, Sarvam's 105b parameter model became the primary generation model because it gave me very good quality results FOR FREE and did not try to fry my GPU like the local models.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Architecture Changed
&lt;/h2&gt;

&lt;p&gt;Originally, KernelMind looked something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Embeddings → Retrieval → Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eventually, it evolved into:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query
↓
BM25 Retrieval + Embedding Retrieval
↓
Reciprocal Rank Fusion (RRF)
↓
Query-Aware Seed Reranking
↓
Graph Expansion + Graph-Aware Ranking
↓
Cross-Encoder Reranking
↓
Context Building
↓
Answer Generation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But - none of this architecture was pre-planned. Almost every layer was built because I observed some failures in the previous layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;embeddings missed identifiers&lt;/li&gt;
&lt;li&gt;retrieval lost workflow context&lt;/li&gt;
&lt;li&gt;graph expansion introduced noise&lt;/li&gt;
&lt;li&gt;re-ranking restored precision&lt;/li&gt;
&lt;li&gt;orchestration improved grounding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After a little bit of fine-tuning and prompt engineering, my final answer started coming up looking like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q. How is login handled in the fastapi library?
A. The login flow begins in `login_access_token()
` inside `backend/app/api/routes/login.py`.

When a POST request is sent to the login endpoint,
 FastAPI injects the submitted credentials through 
`OAuth2PasswordRequestForm`. The route then calls 
`crud.authenticate()` to validate the username and 
password against the database.


If authentication fails or the user is inactive, the
 API raises an HTTP 400 error. If authentication 
succeeds, the system generates a JWT access token 
using `security.create_access_token()`. The token 
includes the user ID and an expiration time 
configured through `ACCESS_TOKEN_EXPIRE_MINUTES`.


Finally, the endpoint returns a `Token` response
 containing the generated access token.

The retrieved workflow also shows that authenticated
 endpoints like `test_token()` depend on the 
validity of this token through FastAPI dependency 
injection, linking token generation directly to 
downstream protected routes.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My project evolved incrementally through debugging and experimentation rather than some giant architectural master plan. And once answer generation stabilized, a much harder question appeared:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;How do I actually KNOW whether the system is improving?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because retrieval systems are easy to overestimate when you only test them manually. That eventually led into the next phase of the project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;evaluation&lt;/li&gt;
&lt;li&gt;RAGAS benchmarking&lt;/li&gt;
&lt;li&gt;retrieval ablations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and figuring out whether the architecture changes were genuinely improving the system or just looking impressive during demos.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>showdev</category>
      <category>llm</category>
    </item>
    <item>
      <title>Building KernelMind, A Code-Aware Github Companion</title>
      <dc:creator>Ishaan Mavinkurve</dc:creator>
      <pubDate>Sun, 17 May 2026 14:00:00 +0000</pubDate>
      <link>https://dev.to/idiotcoffee/building-kernelmind-a-code-aware-github-companion-i3k</link>
      <guid>https://dev.to/idiotcoffee/building-kernelmind-a-code-aware-github-companion-i3k</guid>
      <description>&lt;p&gt;I have always wanted to contribute to Open Source Projects on Github. If you check out my Profile, you will see that I have even tried to get into it. But, once I went past the documentation changes and minor fixes, I realized that OSS Contributions were &lt;strong&gt;HARD&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So, I decided to code a RAG project that would help me out. Of course, I could just use the inbuilt coding agents in the IDE, but where's the fun in that? &lt;br&gt;
The original version of KernelMind was pretty basic.&lt;br&gt;
I just wanted a way to ask questions about large repositories without manually opening forty files and mentally reconstructing execution flow.&lt;/p&gt;

&lt;p&gt;At the time, the plan looked straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Repository -&amp;gt; AST Parsing -&amp;gt; Chunk Extraction -&amp;gt; Embeddings -&amp;gt; Vector Search -&amp;gt; Answer Generation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was it.No fancy business. Just embeddings over code. But it broke immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The First Hurdles&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The first step was parsing. I made a basic AST parser and ran it against a deliberately small repository, storing my chunks in MongoDB for now. I wanted something predictable so debugging would be easier. I decided to use &lt;code&gt;full-stack-fastapi-template&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The indexing pipeline finished and printed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inserted 1258 chunks.
Checked 57 files.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That made absolutely no sense. There was no way a small repository like that should explode into that many chunks. So I started tracing the parser output manually.&lt;/p&gt;

&lt;p&gt;The first issue was trivial. I was ingesting... everything. Tests, initializers, EVERYTHING. This was a small fix ... I added a simple IGNORE_LIST that would skip the garbage files and only download the relevant python files. &lt;/p&gt;

&lt;p&gt;The second issue was slightly more confusing: Turns out methods inside classes were being extracted twice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; once correctly as methods&lt;/li&gt;
&lt;li&gt;once incorrectly as standalone functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This meant that no chunk in my system had a concept of unique identity.&lt;/p&gt;

&lt;p&gt;Everything was just “chunks.” &lt;em&gt;And chunks had repetitive content...&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Another related problem:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Originally, the parser stored function names like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;__init__&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which is technically valid. It is also practically useless.&lt;/p&gt;

&lt;p&gt;There could be dozens of &lt;code&gt;__init__&lt;/code&gt; methods across the repository.&lt;/p&gt;

&lt;p&gt;So I introduced this (totally cool and non ChatGPT researched) concept - &lt;strong&gt;Fully Qualified Names.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;__init__&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the system generated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;matplotlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Figure&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__init__&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single architectural change completely shifted the project. FQNs were now the atomic elements in the data - an FQN would be completely unique across the entire repo. Now, while parsing, I had to only construct the FQN once - if I found out that another function had the same FQN, then - it was already parsed, so ignore it.&lt;/p&gt;

&lt;p&gt;Now that symbols had stable identities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;imports could resolve properly&lt;/li&gt;
&lt;li&gt;dependencies became traceable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository stopped behaving like disconnected text.&lt;/p&gt;

&lt;p&gt;It started behaving like a connected system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The “self” Problem
&lt;/h2&gt;

&lt;p&gt;One of the &lt;strong&gt;MOST CONFUSING&lt;/strong&gt; bugs came from method calls.&lt;/p&gt;

&lt;p&gt;Initially, method relationships looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;self.get_host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which looks reasonable at first glance ... except &lt;code&gt;self&lt;/code&gt; means nothing globally.&lt;/p&gt;

&lt;p&gt;A graph cannot reason over:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;self.get_host
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;because it has no stable reference. So I had to build resolution logic that converted local method calls into globally addressable symbols.&lt;/p&gt;

&lt;p&gt;Eventually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src.requests.cookies.MockRequest.get_host&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;started appearing in the graph output. That was a huge leap for me - my system was no longer parsing syntax alone. It was starting to reconstruct semantic relationships.&lt;/p&gt;

&lt;p&gt;Once FQNs entered the system, something clicked for me almost immediately.&lt;/p&gt;

&lt;p&gt;I realized I was no longer dealing with isolated chunks of text. Every function now had identity, relationships, callers, callees, imports, and dependencies. The repository was starting to look far less like a document collection and much more like a &lt;em&gt;graph data structure&lt;/em&gt; describing &lt;em&gt;execution flow.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Building The Graph&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;And once I saw the repository that way, a lot of the later architecture decisions suddenly started making sense.&lt;/p&gt;

&lt;p&gt;The next obvious question became:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If functions are connected, could I retrieve them together?&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That question basically led to the entire graph architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Constructing Relationships
&lt;/h2&gt;

&lt;p&gt;The first step was building explicit call relationships. Whenever the parser encountered a function call, I attempted to resolve it into an FQN and create a directed edge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;caller -&amp;gt; callee
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So if:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;login_user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;called:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;create_access_token&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the graph stored that relationship directly.&lt;/p&gt;

&lt;p&gt;Initially, the graph nodes were fairly simple. Each node stored:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- the FQN
- file path
- source code
- outgoing calls
- incoming calls
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Something roughly like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GraphNode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;called_by&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At first, this mainly helped with debugging. Then I realized the graph could fundamentally improve retrieval itself. Because codebases are not isolated files. They are execution systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Forward And Reverse Traversal&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Once the graph structure stabilized, I realized traversal needed to work in both directions. Forward traversal helped answer questions like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“What does this function eventually call?”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;which was useful for reconstructing execution flow and understanding downstream behavior. Reverse traversal was equally important because it answered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; “Who depends on this logic?”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That became extremely useful for tracing middleware usage, validation chains, service dependencies, and understanding how deeply certain functionality was integrated into the repository.&lt;/p&gt;

&lt;p&gt;I decided to implement naive BFS - semantic search (implemented later) would reveal the start node most similar to the query, and then BFS would reveal other function calls (and other "chunks") that were related to that node.&lt;/p&gt;

&lt;p&gt;Together, forward and reverse traversal made the graph feel much less like static metadata and much more like a navigable execution map of the repository.&lt;/p&gt;

&lt;p&gt;Once I switched traversal to BFS, retrieval immediately started feeling more coherent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Query-Aware Expansion
&lt;/h2&gt;

&lt;p&gt;The next problem was the &lt;em&gt;naive&lt;/em&gt; BFS implementation. Naive graph expansion retrieves way too much context. If you blindly expand neighbors inside a large repository, the graph explodes into noise very quickly. Especially around highly connected framework code.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So graph expansion had to become query-aware.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Instead of expanding everything equally, the system started looking at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- symbol overlap
- semantic similarity
- auth-related terminology
- file roles
- query keywords
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;before deciding what to expand.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query = authentication
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;should prioritize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;token middleware&lt;/li&gt;
&lt;li&gt;JWT validation&lt;/li&gt;
&lt;li&gt;auth decorators&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and not:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;generic request logging&lt;/li&gt;
&lt;li&gt;unrelated utilities&lt;/li&gt;
&lt;li&gt;serialization helpers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once I managed to code this in, the graph was no longer purely structural. It was becoming semantic.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Utility Node Problem
&lt;/h2&gt;

&lt;p&gt;Another issue appeared during expansion. Highly connected utility functions started dominating retrieval.&lt;/p&gt;

&lt;p&gt;Things like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;log_info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;handle_error&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;serialize_response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;showed up everywhere. The graph accidentally rewarded centrality. Which sounds mathematically elegant until your retrieval system starts implying logging is the answer to everything, simply because that function appeared 1000 times...&lt;/p&gt;

&lt;p&gt;So I introduced penalties for high-degree nodes. Highly connected utility-heavy functions received lower expansion priority. This was similar to how  TF-IDF matrix works, except over function calls.&lt;/p&gt;

&lt;p&gt;That cleanup improved retrieval quality far more than I expected ... because now the graph stopped constantly expanding into irrelevant framework plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Semantic Graph Expansion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This was where the architecture started becoming much more interesting. Originally, graph relationships were purely structural:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A calls B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eventually, I started combining: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;graph relationships with semantic similarity&lt;/li&gt;
&lt;li&gt;symbol relevance&lt;/li&gt;
&lt;li&gt;query intent so the traversal could prioritize execution paths actually related to the user’s question instead of blindly expanding every connected node.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This made a huge difference for repository reasoning&lt;br&gt;
Queries about authentication naturally began surfacing middleware chains, token validation logic, and request lifecycle flows instead of drifting into unrelated utility code and framework plumbing.&lt;/p&gt;

&lt;p&gt;The traversal pipeline slowly evolved into something closer to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initial_retrieval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;expanded&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bfs_expand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;query_aware&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;semantic_weighting&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, my retrieval architecture started feeling execution-aware.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Biggest Realization
&lt;/h2&gt;

&lt;p&gt;This entire phase fundamentally changed how I thought about retrieval systems. Originally, I assumed retrieval quality depended mostly on embeddings.&lt;/p&gt;

&lt;p&gt;Eventually I realized:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Retrieval quality depends heavily on structure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The graph was improving retrieval not because the model became smarter, but because the context became more coherent. The system stopped retrieving isolated functions. It started retrieving workflows.&lt;/p&gt;

&lt;p&gt;And finally, once the graph structure stabilized:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- symbol identity existed
- traversal worked
- execution flow became traceable
- relationships became meaningful
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All this time, I was working with MongoDB, and storing the "chunks" in a collection. This was excellent for debugging, but now that my repository structure had stabilized, and I was confident enough in my Graph architecture,  I was ready to move into embeddings and retrieval ranking properly.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 2 is coming up soon! Until then, you can check out my code &lt;a href="https://github.com/IdiotCoffee/kernel-mind" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rag</category>
      <category>programming</category>
      <category>learning</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
