DEV Community

Abubakersiddique771
Abubakersiddique771

Posted on

I Built an AI That Learns Only From GitHub Bugs 🐛🤖

"Smart people learn from their mistakes. Geniuses learn from GitHub issues."

Imagine an AI that doesn’t just write perfect code — but actually studies thousands of real-world bugs, failures, and pull request debates to understand how things go wrong.

That’s what I tried to build.

This is the story of how I scraped, trained, and deployed a local LLM that doesn’t just generate code — it warns me about bugs real devs have already made. And it’s powered entirely by GitHub’s open issue tracker.


🔥 The Problem: Tutorials Don’t Teach You What Breaks

As developers, we spend hours on tutorials that show us:

  • “How to set up a REST API”
  • “How to train a basic LLM”
  • “How to deploy a project with Docker”

But these are happy path instructions. They show you the "golden path."

Meanwhile, on GitHub:

  • Issues filled with edge cases
  • Pull request discussions full of trade-offs
  • "Why was this line changed?" mysteries

That’s the real learning — the dark matter of dev education.

So I asked myself:

Can I make an AI that doesn’t learn from code examples, but from code mistakes?


🧠 The Idea: Train an LLM on GitHub Issues & Fixes

Instead of feeding an AI perfect Stack Overflow answers, I did this:

  1. Crawled open-source repos with high issue + PR activity
  2. Extracted:
  • Title + body of bug reports
  • Linked PRs that fixed them
  • Reviewer comments
    1. Chunked each (issue → fix) pair into embeddings
    2. Indexed them in a vector DB
    3. Created a CLI and VS Code extension where I could ask:

“Has anyone fixed a bug like this before?”

And shockingly... it worked.


🛠️ My Stack: Building the "Bug Sage"

Component Tool Used
Issue Scraping GitHub API + GraphQL
Embedding text-embedding-ada-002 via OpenAI OR Instructor-XL locally
Vector DB ChromaDB
Retrieval LangChain
UI CLI + VS Code Sidebar
Local LLM Phi-3 or Mistral via Ollama

🐛 How It Works (Real Example)

Say I’m debugging a KeyError in my FastAPI app when deploying to AWS Lambda.

Instead of googling aimlessly or hitting Stack Overflow, I type:

bugsage "KeyError during AWS Lambda cold start in FastAPI app"
Enter fullscreen mode Exit fullscreen mode

And it retrieves this issue from another repo:

📝 #328 - FastAPI app fails on cold start due to environment variables missing
🔧 Fixed by moving .env loading inside the handler in PR #329

Suddenly, I’m not just getting a fix.
I’m getting context, explanation, and real-world patterns.


🤖 The Coolest Part? The AI Learns With Every Crawl

Every weekend, a GitHub Action re-scrapes:

  • New issues from starred repos
  • PRs with fixes keywords
  • Tags like bug, regression, performance

It self-indexes all this into ChromaDB. Over time, the "Bug Sage" gets smarter — like a developer mentor who reads every project on GitHub for you.


  • Check this out, if u have some moment of time: (while reading it)


🧠 The Educational Value: Learning From Pain Points

This isn’t just a productivity tool.

It’s a learning engine.

You start to see:

  • How real teams debug
  • What kinds of mistakes repeat
  • Why certain decisions are controversial

You start coding defensively — with foresight.

It’s like pair programming with 1,000 senior devs whispering, “Hey… that didn’t work for us either.”


🤯 Unexpected Use Cases

  • Code review help: It suggests real PR debates for similar changes
  • 📉 Prevent regressions: Matches code diffs to past rollback issues
  • 🎓 Learning prompts: “Give me 3 bugs people faced with WebSockets + Django Channels”
  • 🕵️‍♂️ Open-source archaeology: “What were the most common bugs in X repo over 2 years?”

😂 Dev Humor (You Know It’s Coming)

  • 🧟 “I don’t make the same bug twice… I make it 10 times, slightly differently”
  • 🧙‍♂️ “Bug Sage, what’s the ancient wisdom on async DB calls?”
  • 👶 Me: "Why is my code crashing?" GitHub AI: "It has happened before… and it will happen again."

📚 How You Can Build Your Own “Bug Sage”

Want to try this at home?

Step 1: Crawl GitHub issues and PRs

from github import Github
g = Github("your_token")
repo = g.get_repo("tiangolo/fastapi")
issues = repo.get_issues(state="closed", labels=["bug"])
Enter fullscreen mode Exit fullscreen mode

Step 2: Pair issues to PRs

Look for text like "Fixes #123" in PR bodies.

Step 3: Embed text

Use:

from langchain.embeddings import OpenAIEmbeddings
Enter fullscreen mode Exit fullscreen mode

Or local models like Instructor-XL via HuggingFace.

Step 4: Store + Query with Chroma

from langchain.vectorstores import Chroma
Enter fullscreen mode Exit fullscreen mode

Step 5: Build CLI or integrate into VS Code!


🌍 Final Thought: Make GitHub Your Mistake Mentor

We often treat GitHub as a place to show perfect work.

But it’s really a museum of broken code — and if you mine it well, you’ll learn ten times faster than any tutorial.

Don’t just write code.
Study how it breaks — and let AI help you never repeat it.


💬 Tired of Building for Likes Instead of Income?

I was too. So I started creating simple digital tools and kits that actually make money — without needing a big audience, fancy code, or endless hustle.

🔓 Premium Bundles for Devs. Who Want to Break Free

These are shortcuts to doing your own thing and making it pay:

🔧 Quick Kits (Take 1 Product That Actually Works for You)

These are personal wins turned into plug-and-play kits — short instruction guides:

👉 Browse all tools and micro-business kits here
👉 Browse all blueprints here

Top comments (0)