The Feature Wasn’t the Point
Two AI systems successfully implemented the requested UI feature.
That part wasn’t interesting.
What mattered was what broke outside the requested scope.
The Regression
After the change:
Select text in the editor
Trigger the “Replace with” contextual action
Nothing happened.
No error.
No exception.
Just silent failure.
This is the most dangerous class of bug in production software.
Why This Happens with AI-Generated Code
From an engineering perspective, this failure is predictable:
The prompt emphasized new functionality
Existing behavior was implicit, not asserted
No test explicitly protected that interaction
AI systems optimize for local correctness,
not global behavioral invariants.
Where the Implementations Differed
A closer look at the code revealed meaningful trade-offs:
One approach prioritized UX richness and localization
The other emphasized modular helpers, safer selectors, and test coverage
A concrete example:
encodeURIComponent(...) vs CSS.escape(...)
Both work, but only one is designed for DOM and selector safety.
These choices matter months later, not minutes after generation.
Why Unit Tests Made the Difference
Only one implementation introduced unit tests covering:
State migration
Regex escaping
Replacement logic edge cases
Those tests didn’t just validate correctness —
they made the regression visible.
Without them, the bug would likely ship.
Takeaway
If you evaluate AI coding tools in real projects, ask:
What existing behavior is protected?
What assumptions remain implicit?
What breaks quietly?
Demos won’t answer those questions.
Production code will.
Final Thought
The most valuable result wasn’t the feature.
It was identifying where AI coding systems still fail like junior engineers —
and where they don’t.
That distinction is what matters in real-world software.

Top comments (0)