This is a submission for the GitHub Copilot CLI Challenge
Repository: https://github.com/dxa204/git-cluster-rag.git
What I Built
I built Git Cluster RAG, a command-line tool that uses K-Means Clustering to "route" questions about a repository's history to the correct context.
The Problem
Standard RAG (Retrieval-Augmented Generation) applications are "flat." If you ask a question like "Why did we remove the notes file?", a standard vector search might retrieve unrelated commits just because they share keywords. It struggles to distinguish between Code Refactoring, Documentation Updates, and One-off Cleanups.
The Solution
My tool uses the GitHub Copilot CLI to build a pipeline that:
- Ingests commit history (messages + file diffs).
- Embeds the changes using sentence-transformers.
- Clusters the commits using K-Means.
- Routes user queries to the specific semantic cluster (e.g., "Cluster 0: Maintenance") before retrieving answers. This "Cluster-Guided" approach ensures that when I ask about a deleted file, the system prioritizes "Cleanup" commits over "Feature" commits.
Demo: Cluster-Guided Routing in Action
In this video, you can see the tool ingesting the git history, identifying the clusters, and then correctly routing a specific query about a deleted file to the "Maintenance" cluster.
My Experience with GitHub Copilot CLI
Building this project entirely with the Copilot CLI changed my workflow from "Stack Overflow searcher" to "Command Line architect."
Scaffolding with Context
I used the@workspace /newcommand to generate the entire project structure (ingest.py,cluster.py,chat.py) in one go. Instead of writing boilerplate, I could focus on the logic of the K-Means algorithm.The "Agent" Workflow
The standout feature for me was the/initcommand. By running this, I was able to generate a.github/copilot-instructions.mdfile that taught Copilot the specific constraints of my project (e.g., "Always use 3 clusters", "Truncate diffs to 500 chars"). This effectively turned Copilot into a specialized teammate that knew my architecture, not just a generic code generator.Frictionless Debugging
When I hit syntax errors or needed to generate dummy git data for testing, I didn't leave the terminal. I usedgh copilot suggestto generate complex shell commands that created dummy commits, enabling me to test the clustering algorithm in seconds.
Top comments (0)