DEV Community

Cover image for Git Cluster RAG: Semantic Routing for Git History (Copilot CLI Challenge)
Derin Akay
Derin Akay

Posted on

Git Cluster RAG: Semantic Routing for Git History (Copilot CLI Challenge)

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

Repository: https://github.com/dxa204/git-cluster-rag.git

What I Built

I built Git Cluster RAG, a command-line tool that uses K-Means Clustering to "route" questions about a repository's history to the correct context.

The Problem

Standard RAG (Retrieval-Augmented Generation) applications are "flat." If you ask a question like "Why did we remove the notes file?", a standard vector search might retrieve unrelated commits just because they share keywords. It struggles to distinguish between Code Refactoring, Documentation Updates, and One-off Cleanups.

The Solution

My tool uses the GitHub Copilot CLI to build a pipeline that:

  1. Ingests commit history (messages + file diffs).
  2. Embeds the changes using sentence-transformers.
  3. Clusters the commits using K-Means.
  4. Routes user queries to the specific semantic cluster (e.g., "Cluster 0: Maintenance") before retrieving answers. This "Cluster-Guided" approach ensures that when I ask about a deleted file, the system prioritizes "Cleanup" commits over "Feature" commits.

Demo: Cluster-Guided Routing in Action

In this video, you can see the tool ingesting the git history, identifying the clusters, and then correctly routing a specific query about a deleted file to the "Maintenance" cluster.

https://youtu.be/FY4GY0uqMxI

My Experience with GitHub Copilot CLI

Building this project entirely with the Copilot CLI changed my workflow from "Stack Overflow searcher" to "Command Line architect."

  1. Scaffolding with Context
    I used the @workspace /new command to generate the entire project structure (ingest.py, cluster.py, chat.py) in one go. Instead of writing boilerplate, I could focus on the logic of the K-Means algorithm.

  2. The "Agent" Workflow
    The standout feature for me was the /init command. By running this, I was able to generate a .github/copilot-instructions.md file that taught Copilot the specific constraints of my project (e.g., "Always use 3 clusters", "Truncate diffs to 500 chars"). This effectively turned Copilot into a specialized teammate that knew my architecture, not just a generic code generator.

  3. Frictionless Debugging
    When I hit syntax errors or needed to generate dummy git data for testing, I didn't leave the terminal. I used gh copilot suggest to generate complex shell commands that created dummy commits, enabling me to test the clustering algorithm in seconds.

Top comments (0)