How I Turned My Messy Project into a Real AI Application(FILEBOT)
WHAT I BUILT
FileBot RAG Chatbot
One-liner: An advanced Retrieval-Augmented Generation (RAG) chatbot built for the Devpost Community GitHub Throne Challenge to stream line how developers interact with repository data.
*DEMO *
** Source code ** :
https://github.com/sathvika1138/FILEBOT.git
SCREENSHOTS
*Homescreen : *
Session in progress :
*Video walkthrough : *
*Limitations : *
"This project uses Ollama for local LLM inference, which ensures privacy and avoids API costs. However, it requires local model installation, sufficient hardware resources, and cannot be directly deployed on Streamlit Community Cloud because Ollama must be running on the host machine."
💡 Inspiration
Navigating large GitHub repositories, reading through endless documentation, and understanding complex codebases takes too much time. We wanted to build an intelligent assistant that instantly answers specific questions about a repository using real-time codebase data, rather than relying on generic AI knowledge.
⚙️ What it does
~Our RAG chatbot allows users to input a GitHub repository link or query and get precise, context-aware answers.
~By fetching relevant code snippets, issues, or documentation, the bot ensures that its responses are accurate, factual, and directly tied to the specific codebase.
🛠️ How we built it
~Data Ingestion: We used GitHub APIs and web scrapers to extract repository data, markdown files, and code structure.
~Vector Embeddings: Code and text chunks were converted into vector embeddings using [Insert Embedding Model, e.g., OpenAI text-embedding-3 / Cohere].
~Vector Database: We stored and indexed these embeddings in [Insert Vector DB, e.g., Pinecone / Chroma / Milvus] for fast semantic search.
~LLM Integration: We used [Insert LLM, e.g., GPT-4o / Claude 3.5 Sonnet / Llama 3] to synthesize the retrieved context into a natural, helpful response.
~Frontend/UI: Built using [Insert UI Framework, e.g., Streamlit / Next.js / React].
🧠 Challenges we ran into
~Chunking Strategy: Code syntax is highly structured. Standard paragraph chunking broke the logic, so we had to implement syntax-aware code splitting.
~Context Window Limits: Keeping the retrieved code snippets relevant and concise enough to fit into the LLM context without losing crucial debugging data.
🏆 Accomplishments that we're proud of
~Successfully mapping complete repository structures into a vector space.
~Achieving low-latency retrieval speeds for a seamless chat experience.
~Building a clean, intuitive user interface within the hackathon timeline.
📘 What we learned
~Deepened our understanding of semantic search versus keyword search.
~Learned how to clean and optimize raw source code data for better embedding quality.
🔮 What's next for [FileBot RAG Chatbot]
~Adding support for private GitHub repositories using secure OAuth.
~Integrating multi-turn conversation memory to track debugging steps over long chats.
~Enabling the bot to auto-generate pull request summaries and code fixes.
Team Attribution
This project was completed and submitted by myself @sathvika_138 and my team member @sandhiyaxxx
No additional teammates contributed on this submission.


Top comments (0)