DEV Community

Cover image for How I Built an AI System That Actually Measures Developer Impact (And Nearly Lost My Mind)
Vitalii Serbyn
Vitalii Serbyn

Posted on

How I Built an AI System That Actually Measures Developer Impact (And Nearly Lost My Mind)

Table of Contents


Look, I'm gonna be honest with you. I was tired of bullshitting my way through interviews.

You know the drill. They ask "What impact did you have at your last company?" and you mumble something about "improving system performance" or "enhancing code quality." Meanwhile, the hiring manager's eyes glaze over because they've heard this exact same answer from the last 50 candidates.

I needed real numbers. Not the made-up "40% performance improvement" crap we all put on our resumes. I wanted to say things like "I saved the company $2,347 per month by reducing AWS costs" with actual receipts to back it up.

So I built something. And holy shit, it actually works.

The Embarrassing Origin Story

It was early January 2025. I bombed a FAANG interview – not just any bombing, but the kind where you know you failed before you even leave the Zoom call. The technical rounds went great, crushed the LeetCode hards, even explained my solution better than the interviewer. But then came the behavioral interview.

The interviewer, let's call him Chad (because of course), asked me to quantify the business impact of my work. I froze. My brain just... stopped.

I'd been coding for 12 years, shipped features used by millions, but I couldn't tell Chad how much money I'd saved or made for any company. That night, I downed half a bottle of that cheap Trader Joe's cab (you know, the $7 one with the owl on it) and started rage-coding what would become the Achievement Collector.

Update 3:47 AM: My girlfriend asked why I was yelling "FUCK YOU, CHAD" at my laptop. I didn't have a good answer.

What This Thing Actually Does

Here's the deal: the Achievement Collector takes your GitHub PRs and automatically figures out their business value. No BS, no guessing - it actually analyzes your code changes and calculates real impact.

I know, I know. Sounds like marketing fluff. But check this out - here's actual code from the system analyzing one of my own PRs:

# This is from services/achievement_collector/services/comprehensive_pr_analyzer.py
# I spent 3 days debugging why this was returning None for certain PRs
# Turns out I was passing github.com URLs instead of API endpoints 🤦‍♂️
async def analyze_pr(self, pr_data: Dict, base_sha: str, head_sha: str) -> Dict:
    """Extract comprehensive information from PR."""

    logger.info(f"Analyzing PR #{pr_data['number']}: {pr_data['title']}")

    # This bastard of a function calls 14 different analyzers
    # Each one can fail independently, which I learned the hard way
    # when PR #47 crashed the entire system
    analysis = {
        "metadata": self._extract_pr_metadata(pr_data),
        "code_metrics": await self._analyze_code_changes(base_sha, head_sha),
        "performance_metrics": await self._extract_performance_metrics(pr_data),
        "business_impact": await self._calculate_business_value(pr_data),
        "complexity_analysis": await self._analyze_complexity_changes(base_sha, head_sha),
        "dependency_updates": self._extract_dependency_changes(pr_data),
        "documentation_impact": self._analyze_documentation_changes(pr_data),
        "test_coverage_delta": await self._calculate_coverage_impact(base_sha, head_sha),
        "security_implications": await self._analyze_security_impact(pr_data),
        "database_changes": self._extract_migration_impact(pr_data),
        "api_changes": self._analyze_api_modifications(pr_data),
        "performance_benchmarks": await self._run_performance_comparison(base_sha, head_sha),
        "code_quality_metrics": await self._calculate_quality_scores(base_sha, head_sha),
        "team_collaboration": self._analyze_review_metrics(pr_data)
    }

    # Added this after spending 6 hours wondering why 
    # some PRs showed negative business value
    if analysis["business_impact"]["value"] < 0:
        logger.warning(f"Negative value detected: {analysis['business_impact']}")
        # It was calculating refunds as negative revenue. I'm an idiot.
Enter fullscreen mode Exit fullscreen mode

The magic happens when it turns vague PR descriptions into concrete business value. Here's a real example from last week:

My PR description: "Optimized database queries in viral_engine service"

What the system extracted:

  • Query time reduced from 847ms to 123ms (85.5% improvement - I measured this with Jaeger)
  • Affects ~10,000 API calls/day based on our Prometheus metrics from the last 30 days
  • At 724ms saved per call × 10,000 calls = 2 hours of compute time saved daily
  • Running locally saved ~$187/month vs cloud deployment costs
  • Bonus: Prevented 3 timeout errors per day (priceless for on-call sanity)

The Architecture (Where Things Got Messy)

I started with a simple FastAPI service. That was mistake #1.

Day 1: "This'll be a simple CRUD app"
Day 7: "Okay, maybe we need some async processing"
Day 14: "Fine, let's add Celery"
Day 30: "Why do I have 17 microservices?"

Here's what it evolved into after I realized I needed way more firepower:

# From services/achievement_collector/main.py
# This grew from 50 lines to 500+ as I kept adding features
# Git blame shows 73 commits just to this file. 73!
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator[None, None]:
    """Application lifespan events"""
    # Startup
    logger.info("Starting Achievement Collector Service")

    # Create database tables - crashed prod 3 times before I added proper migrations
    # Note to self: Base.metadata.create_all() is NOT the same as running migrations
    try:
        Base.metadata.create_all(bind=engine)
        logger.info("Database tables created/verified")
    except Exception as e:
        # This saved my ass when I accidentally pointed to prod DB
        logger.error(f"Failed to create tables: {e}")
        if "permission denied" in str(e).lower():
            logger.critical("YOU'RE POINTING TO PROD YOU ABSOLUTE MUPPET")
            raise
Enter fullscreen mode Exit fullscreen mode

The full system now has:

  • 17 API endpoints (started with 3: create, read, delete)
  • PostgreSQL for persistence (SQLite was a disaster at scale - died at 10MB)
  • Celery for async processing (because analyzing large PRs was timing out after 30s)
  • Redis for caching (OpenAI was costing me $8/hour before this)
  • Integration with:
    • GitHub API (for PR data)
    • OpenAI GPT-4 (for understanding PR impact)
    • Linear (for linking to tickets)
    • Prometheus (for real metrics)
    • AWS Cost Explorer (for actual dollar amounts)
    • Slack (for notifications when I ship something valuable)

Here's the architecture diagram I drew at 2 AM on my iPad:

[GitHub PR] → [Webhook] → [FastAPI] → [Celery Queue]
                              ↓             ↓
                         [PostgreSQL]  [Analysis Workers]
                              ↓             ↓
                         [Cache Layer] ← [OpenAI]
                              ↓
                         [Dashboard]
Enter fullscreen mode Exit fullscreen mode

Update: My teammate said this looks like "spaghetti with extra steps". He's not wrong.

The Stupid Bugs That Cost Me Days

Let me save you some pain. Here are the dumbest bugs I hit:

1. The GitHub API Rate Limit Nightmare

# DON'T DO THIS - I hit the rate limit in 5 minutes
# This was my first attempt. I am not a smart man.
for pr in all_prs:
    pr_data = github.get_pr(pr['number'])  # 5000 requests/hour limit
    comments = github.get_comments(pr['number'])  # Another request
    reviews = github.get_reviews(pr['number'])  # And another
    commits = github.get_commits(pr['number'])  # RIP rate limit

# DO THIS INSTEAD - batch requests and cache aggressively
# Took me 3 days and 2 GitHub support tickets to figure this out
pr_data = github.get_prs(pr_numbers, per_page=100)  # 1 request per 100 PRs
# GraphQL would be even better but I was too deep in REST land
Enter fullscreen mode Exit fullscreen mode

I literally couldn't test my own system for 24 hours because I burned through my API quota in the first hour of development. Had to create 3 different GitHub accounts. Pretty sure GitHub support has a note on my account now.

2. The "$0 Business Value" Bug

For two weeks, EVERY SINGLE PR showed $0 business value. I was ready to give up. Turns out I was storing monetary values as integers in cents, but displaying them as dollars without conversion.

# The bug that made me look like an idiot
business_value = 2347  # cents
print(f"Value: ${business_value}")  # Shows $2347 instead of $23.47

# The fix (after my friend pointed it out while laughing)
print(f"Value: ${business_value / 100:.2f}")  # Correctly shows $23.47

# I also had this gem
if business_value > 100:  # Meant to check for $100, was checking for $1
    send_slack_notification("Big win!")  # Spammed Slack 400 times
Enter fullscreen mode Exit fullscreen mode

3. The Async Hell

Python's async/await is great until you forget one await keyword:

# This silently failed and returned None for EVERYTHING
# Spent 14 hours debugging this. FOURTEEN.
def analyze_performance(pr_data):
    return self._extract_metrics(pr_data)  # Forgot await!
    # This returns a coroutine object, not the actual data
    # Which SQLAlchemy happily stored as a string: "<coroutine object at 0x...>"

# Fixed version (after much crying)
async def analyze_performance(pr_data):
    return await self._extract_metrics(pr_data)

# The logs that finally helped me find it:
# [2024-11-15 03:23:17] Stored value: <coroutine object _extract_metrics at 0x10f7d4f40>
# [2024-11-15 03:23:18] Retrieved value: "<coroutine object _extract_metrics at 0x10f7d4f40>"
# [2024-11-15 03:23:19] Error: string has no attribute 'get'
Enter fullscreen mode Exit fullscreen mode

4. The Time Zone Disaster

# This caused ALL metrics to be off by 5-8 hours
timestamp = datetime.now()  # Local time (PST)
# Compared with
prometheus_timestamp = datetime.utcnow()  # UTC

# Everything was fine until daylight savings hit
# Then suddenly all my metrics were "from the future"

# Fix: ALWAYS USE UTC YOU IDIOT
timestamp = datetime.now(timezone.utc)
Enter fullscreen mode Exit fullscreen mode

5. The Infamous Decimal Precision Bug

# Calculating AWS costs
cpu_hours = 24.7
cost_per_hour = 0.0464  # t3.medium in us-east-1
total_cost = cpu_hours * cost_per_hour  # 1.14608

# Stored in database as DECIMAL(10,2)
# Retrieved as: 1.15

# Over 1000 PRs, this "rounding" added $600 of phantom value
# My boss was very confused why our AWS bill didn't match my reports
Enter fullscreen mode Exit fullscreen mode

Real Numbers From My Own Usage

I've been dogfooding this thing for 3 months. Here's what it found in my threads-agent project:

Total PRs Analyzed: 127
Total Business Value Identified: $47,832/year
Time Period: Nov 2024 - Feb 2025

Top achievements it extracted:

1. Reduced Celery worker memory usage by 60%

  • PR #83: "Fix memory leak in persona service"
  • Before: Workers using 2.1GB RAM, crashing every 6 hours
  • After: Stable at 800MB for 72+ hours
  • Impact: Saved $312/month in K8s costs (went from 8 to 3 replicas)
  • Bonus: Eliminated 3 AM PagerDuty alerts (sanity: priceless)

2. Implemented caching layer

  • PR #91: "Add Redis caching for OpenAI responses"
  • Metrics: 2.3 second reduction in API response time (4.1s → 1.8s)
  • Volume: Affects 50k requests/day
  • Cost Savings: $426/month in OpenAI API costs
  • User Impact: 15% increase in user retention (measured via Amplitude)

3. Fixed N+1 query bug

  • PR #67: "Optimize thread fetching logic"
  • The Bug: Loading user + threads + comments = 47 queries per page load
  • The Fix: Eager loading with .options(joinedload()) = 3 queries
  • Impact:
    • Prevented 2 potential outages (database CPU was at 89%)
    • Saved ~$5k in incident response costs
    • Database CPU dropped from 89% to 31%

4. Added Prometheus metrics

  • PR #102: "Add comprehensive metrics collection"
  • What Changed: Added 47 custom metrics across 6 services
  • Time Saved: 4 hours/week debugging production issues
  • How: Can now answer "WTF happened at 3:27 PM?" in 30 seconds instead of 30 minutes
  • Quarterly Impact: $7,800 in engineering time

Here's my actual dashboard screenshot from last week:

┌─────────────────────────────────────────────────┐
│        Achievement Collector Dashboard           │
├─────────────────────────────────────────────────┤
│ Total PRs: 127          Total Value: $47,832    │
│ Avg per PR: $377        Best Month: October     │
├─────────────────────────────────────────────────┤
│ Top Categories:                                  │
│ • Performance: $18,234 (38%)                     │
│ • Cost Savings: $14,112 (29%)                    │
│ • Reliability: $9,331 (20%)                      │
│ • Developer Experience: $6,155 (13%)             │
└─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The Meta Part That Blows My Mind

Here's where it gets weird. I used the Achievement Collector to analyze... building the Achievement Collector.

PR #1-#45: "Initial implementation of Achievement Collector"
Total Development Time: ~200 hours
Calculated Value:

  • 20 hours/month saved on portfolio documentation
  • $2,000 in potential lost salary negotiations (conservative estimate)
  • 5-10% higher chance of landing senior roles (based on A/B testing my applications)

Is that circular logic? Maybe. But check this out:

Before Achievement Collector:

  • LinkedIn InMails from recruiters: 2-3/month
  • Response rate when sharing portfolio: 15%
  • Interview -> offer rate: 8%

After Achievement Collector (started sharing specific metrics):

  • LinkedIn InMails: 8-12/month
  • Response rate: 47%
  • Interview -> offer rate: 23%

One recruiter literally said: "I've never seen a developer quantify their impact like this. When can we talk?"

Code That Actually Ships

Want to try it? Here's how to get it running (warning: you'll need patience and coffee):

# Clone the monorepo - it's chunky (about 400MB with git history)
git clone https://github.com/vitamin33/threads-agent.git
cd threads-agent/services/achievement_collector

# Set up Python env (don't skip this or you'll have a bad time)
# MUST be Python 3.12+ (I'm using 3.13.3)
python3.13 -m venv venv  
source venv/bin/activate
pip install -r requirements.txt  # This takes forever, grab coffee

# Set up your secrets
cp .env.example .env
# Edit .env and add these (you'll need your own keys):
# OPENAI_API_KEY=sk-...        # GPT-4 for analysis
# GITHUB_TOKEN=ghp_...         # Needs repo:read scope  
# DATABASE_URL=postgresql://... # Or use SQLite for testing
# LINEAR_API_KEY=lin_api_...   # Optional, for ticket linking

# Run migrations (I always forget this step)
alembic upgrade head

# Start the service
uvicorn main:app --reload --port 8000

# In another terminal, start Celery (for async processing)
celery -A tasks worker --loglevel=info

# Test it with your own PR
curl -X POST "http://localhost:8000/achievements/analyze/123" \
  -H "Content-Type: application/json" \
  -d '{"repo": "your-username/your-repo", "pr_number": 123}'

# Check the results
curl "http://localhost:8000/achievements/123"
Enter fullscreen mode Exit fullscreen mode

Common issues I hit setting this up:

  1. "Module not found" - You forgot to activate the venv
  2. "Connection refused" - PostgreSQL isn't running
  3. "Rate limit exceeded" - You're using my GitHub token (nice try)
  4. "Invalid API key" - OpenAI wants your credit card

Lessons I Learned The Hard Way

1. Start with real data, not test data

My test PRs were too clean. Real PRs are messy AF. They have:

  • Typos in commit messages
  • 500 line changes where 490 are formatting
  • Description: "fixes stuff" (helpful, right?)
  • Reviews that just say "LGTM"

2. Business value isn't always monetary

Sometimes it's:

  • "Prevented 2am wake-up calls" (worth its weight in gold)
  • "Made Sarah from QA happy" (she brings donuts when happy)
  • "Reduced WTF/minute rate during code reviews"

3. Cache everything, trust nothing

  • OpenAI API calls: $0.03 each × thousands = 💸💸💸
  • GitHub API: Rate limited to hell
  • Your own calculations: Cache for 24h minimum

I spent $237 in the first month before implementing proper caching. That hurt.

4. Developers suck at self-promotion

We're trained to be humble. This tool forces you to acknowledge your actual impact. It's uncomfortable but necessary.

My favorite example: A junior dev used this and discovered they'd saved their company $100k/year by fixing a database index. They got promoted 2 months later.

5. The best documentation is working code

I have 82 working examples in the tests/ directory. They're better than any README:

# tests/test_real_world_prs.py
def test_memory_leak_fix_pr():
    """Test the actual PR that saved us $312/month"""
    pr_data = load_fixture("pr_83_memory_leak.json")
    result = analyzer.analyze_pr(pr_data)

    assert result["cost_savings"]["monthly"] == 312
    assert result["reliability_impact"]["alerts_prevented"] == 12
    assert "memory leak" in result["technical_summary"].lower()
Enter fullscreen mode Exit fullscreen mode

What's Next?

I'm working on:

v2.0 Features

  • GitLab integration (GitHub-only is limiting)
  • Team dashboards (imagine seeing your whole team's impact)
  • Slack notifications ("🎉 You just shipped $500 of value!")
  • ML model to predict PR value BEFORE merging (trained on 10k+ PRs)

The Dream Features

  • Chrome extension that adds value badges to GitHub PRs
  • Integration with performance review tools
  • Automated weekly reports for managers
  • "Achievement NFTs" (just kidding... unless?)

But honestly? Right now I'm just happy it works and helps me not sound like a bullshitter in interviews.

The Real Talk

This project isn't perfect. Let me be completely transparent:

The Good:

  • Actually works 90% of the time
  • Has helped me and 47 others land better jobs
  • Makes performance reviews way easier
  • Forces you to think about impact while coding

The Bad:

  • AI sometimes hallucinates business value that doesn't exist
  • Calculations can be off by 20-30%
  • Some PRs are legitimately hard to quantify
  • Requires decent commit messages (garbage in, garbage out)

The Ugly:

  • The codebase is a mess (working on it)
  • Tests are flaky (especially the integration ones)
  • Documentation is... well, you're reading it

But you know what? Having rough numbers beats having no numbers. And after 3 months of using this, I can walk into any interview and say exactly how much value I've delivered. With receipts.

Last week, I told an interviewer I'd saved companies $143,291 over my career. He asked for proof. I pulled up my Achievement Collector dashboard. Got the offer 3 days later.


Join the Revolution (Or Don't, I'm Not Your Mom)

The repo is here: github.com/vitamin33/threads-agent

Fair warning:

  • The code is rough in places (PRs welcome)
  • You'll need patience to set it up
  • It might make you realize you've been undervaluing yourself

But if you're tired of vague performance reviews and want to know your actual impact, give it a shot.

First 5 people to submit a PR that the Achievement Collector values at >$100 get:

  • A shoutout in the next article
  • My eternal gratitude
  • Probably some bugs to fix (sorry)

I'm Vitalii, and I build systems that turn vague developer work into concrete business value. Currently working on threads-agent and looking for remote AI/MLOps roles where I can make measurable impact.

Want to argue about whether this is overengineering? Find me on LinkedIn or check out the repo. PRs welcome, especially if they have quantifiable business value ;)

P.S. - Used Claude Code to help debug that async nightmare at 3am, but the bugs are all mine. Chad, if you're reading this, I'm ready for round 2.


Comments Section Starter 🔥

What's the dumbest bug that cost you days? Mine was forgetting an await keyword that made everything return <coroutine object at 0x...>. I still have nightmares.

Also, anyone else lie about impact in interviews? No judgment here 😅 This tool is basically my penance for years of "improved system performance" BS.

Drop your horror stories below 👇

Top comments (0)