Computer science is a science. How do you do science without observation?
That's the question that started this.
I'd been using AI to build features for months. It worked — sometimes brilliantly, sometimes frustratingly. But I had no idea
what was actually happening. Was AI getting better at my project over time, or was I just getting better at prompting around
its blind spots? Were the rules I added actually helping, or did I just get lucky that session?
I couldn't answer any of these questions. Not because the answers didn't exist — but because I had no data.
The real problem with vibe coding
It's not that AI makes mistakes. It's that you can't see the pattern.
You fix something. Next session, same mistake. You fix it again. At some point you wonder: is this the third time I've
corrected this, or the seventh? Is this a one-off or a systemic gap in my project rules? Without data, you're just guessing.
I started logging this stuff manually in my existing project — not as a tool, just as structured notes. Task started,
deviation recorded, rule added. After a few weeks I had something interesting: a record of exactly where AI kept going wrong,
what I did about it, and whether it helped.
That's when I realized this should be automatic.
What I actually built
AIDA is an MCP server that silently collects structured data as your AI works. One line to set up:
{ "mcpServers": { "aida": { "command": "npx", "args": ["-y", "ai-dev-analytics", "mcp"] } } }
Every task, deviation, bug, self-review, and file change gets recorded to a local JSON file. No cloud, no telemetry, 100% on
your machine.
Then aida dashboard renders it:
- Where is AI deviating? Which categories — layout, components, API patterns?
- Why is it deviating — hallucination, missing rules, or context gap?
- After you add a rule, does that category of deviation actually go down?
- What's the bug rate this sprint vs last sprint?
- Which files keep getting touched? Where are the real pain points?
These aren't vanity metrics. They're the feedback loop that tells you whether what you're doing is working.
The part that matters: rules with evidence
Anyone using AI long enough has a collection of "rules" for their project — things you've told it, conventions you've
documented, patterns you've reinforced. But do they work? Which ones are actually changing AI behavior, and which ones are
just words in a file?
With observation data, you can answer that. Add a rule, watch the deviation rate in that category over the next few runs.
That's not "I think it's better" — that's a data-supported conclusion.
The rules system in AIDA reflects this. Rules are sedimented from observed deviations, stored in rules.json, and
auto-compiled into .md files AI reads every session. When a rule stops being relevant, you deprecate it. The data tells you
when.
aida rules build # compile rules.json → .md views AI reads
aida rules dedupe # find overlapping rules (>40% keyword similarity)
aida rules merge # resolve branch conflicts by fingerprint union
What I learned after running this on a real project
The deviation categories that kept showing up for me: component usage, layout conventions, API patterns. Not hallucination —
mostly rule-missing. AI wasn't confused about how to code; it just didn't know my project's specific conventions.
Once I had that data, I knew exactly where to focus. Rules sedimented from observed deviations. After that, those specific
categories dropped significantly. Not because I believed harder in the rules — because the next run's data showed fewer
deviations in those areas.
That's the loop: observe → identify → add rule → measure → repeat.
No talent required. Just iteration.
Try it
npx ai-dev-analytics dashboard
Opens a local dashboard with anonymized demo data so you can see what it looks like before connecting your own project.
GitHub: https://github.com/LWTlong/ai-dev-analytics
Curious whether others have been tracking this kind of thing manually — and what patterns you've found in where AI actually
fails on your projects.



Top comments (0)