From 11 Failing Tests to a 110-Test "Bulletproof" AI: A Debugging War Story

#machinelearning #programming #testing

I just finished building the "brain" for my VS Code extension, Break Bully. It's a complex AI stack with four ML-driven services:
One to generate personalized work-rest models.
One to analyze historical productivity.
One to monitor real-time fatigue from typing patterns.
One to adapt and learn from user feedback.
I'd written all the code, and I thought it worked. Then, I finally wrote the first massive test suite... and ran npm test.
11 FAILING TESTS.

My heart sank. But after digging in, I realized this was the best thing that could have happened. The tests had uncovered every single weakness in my design.
Here's the "behind-the-scenes" of the 3 types of bugs I found and fixed to get to 110 passing tests.

Bug Type 1: The "Obvious" Crash (TypeError)
The first set of bugs was the classic:
TypeError: Cannot read properties of undefined (reading 'length')
My test was correctly simulating a scenario where no activity data was available (null). My code, however, never expected this and tried to run .filter() on null.

The Fix: The simplest fix in the book. A guard clause at the top of every analytics function.

// Before
public static analyzeWorkPatterns(events: ActivityEvent[])...

// After
public static analyzeWorkPatterns(events: ActivityEvent[])... {
if (!events || events.length === 0) {
return { ...default_analysis ... };
}
// ... rest of the code
}

Lesson: Your tests are your best defense against null. Test what happens when your functions get null, undefined, or [].

Bug Type 2: The "Flaky" Bug (The Race Condition)
The next set of bugs was infuriating. Five tests would fail sometimes.
AssertionError: ... at Timeout._onTimeout
The Problem: My RealTimeSessionAnalyzer runs on a setInterval (a "real" clock). My test was using setTimeout to "wait" for the analysis to finish. They were racing, and the test was losing.
The Fix: Stop waiting for the clock; control it. I used sinon.useFakeTimers().
// In beforeEach:
clock = sinon.useFakeTimers();
// In the test:
it('should perform baseline session analysis', function() {
analyzer.startSessionAnalysis(); // Starts the internal setInterval
// Don't wait! Just fast-forward the clock:
clock.tick(5000); // 5 seconds pass instantly
const analysis = analyzer.getCurrentAnalysis();
assert.ok(analysis, 'Should have analysis results');
});

Lesson: If you're testing code that involves setInterval or setTimeout, don't guess. Use fake timers to make your tests instant, deterministic, and 100% reliable.
Bug Type 3: The "Aha!" Bug (The AI Was Smarter Than My Test)
This was my favorite bug. One test failed with:
AssertionError: expected 'nuclear' to equal 'moderate'
I was confused. The test was simple: check the default annoyanceLevel. Why did it expect moderate but get nuclear?

Then I remembered: I had fixed a different bug earlier that correctly set the default to nuclear! My test was outdated. The "failing" test was actually proof that my previous fix was working.
The Fix: Change the test to match the correct behavior.
// Before
assert.strictEqual(config.get('annoyanceLevel'), 'moderate');
// After
assert.strictEqual(config.get('annoyanceLevel'), 'nuclear');
Lesson: Tests aren't just for finding bugs in your code; they're for finding bugs in your assumptions.
The Final Run
After fixing the guard clauses, controlling the clock, and aligning my tests with the AI's actual logic, I ran the full suite one last time.

The result?
110 PASSING (14s)
Exit code: 0
This isn't just a vanity number. This 110-test suite is a safety net. It proves the AI brain is validated. It means I can now move on to building new features (like the CodeTune integration) with confidence, knowing that this "bulletproof" foundation won't break.
Thanks for reading my debugging story! What's the "flakiest" bug you've ever had to chase?

DEV Community

From 11 Failing Tests to a 110-Test "Bulletproof" AI: A Debugging War Story

Top comments (0)