Last week my AI wrote code that read a JSON file from disk, parsed it, did one lookup, and repeated this 900 times inside a for loop. Each iteration: open file, decode JSON, look up a value, throw it all away. Start over.
It's a mistake I teach my students not to make within their first month of programming.
What happened (straight to the point)
I'm building Tokamak, a macOS menu bar app that monitors Claude Max quota. Part of the functionality scans ~900 JSONL files from Claude Code sessions. For each file, it needs to know the byte offset where it left off last time (incremental reading — only process what's new).
The offsets are stored in a JSON file:
{
"version": 1,
"offsets": {
"project-a/session-1.jsonl": 48231,
"project-b/session-2.jsonl": 12044
}
}
A Dictionary<String, UInt64>. 900 entries. ~55KB. Nothing fancy.
And here's the detail that makes it even more absurd: the app itself created this file. It's not JSON from an external API. It doesn't come from Claude Code. It's an internal state file that Tokamak writes and reads to track where it left off reading each session. The AI was reading from disk 900 times a file that it had generated itself.
"Why not use Core Data or SQLite, since you already have them in the app?" Good question. Because this file is a disposable progress cache. If it gets corrupted, you delete it and the next scan rebuilds all offsets by reading the entire files once. Zero data loss. Plus: I can cat session-offsets.json | jq . to debug (with Core Data I need sqlite3 and the sandbox path), it's Sendable without the background context dance, and if Core Data's SQLite gets corrupted it doesn't take down the offsets (or vice versa). For 55KB of a flat dictionary, the ceremony of an entity with schema migration isn't justified.
The format wasn't the problem. The access was.
The code the AI wrote for the scan loop:
for file in files { // 900 files
let storedOffset = offsetStore.offset(for: file.relativePath)
// ↑ THIS reads and parses the JSON from disk. Every. Time.
if file.fileSize == storedOffset { continue }
// ... read file, update offset ...
offsetStore.setOffset(newOffset, for: file.relativePath)
// ↑ And THIS reads it AGAIN, modifies, and saves it.
}
Two disk calls per iteration. 900 iterations. 1,800 I/O operations where there should have been exactly 2: one read at the start, one write at the end.
The numbers (xctrace doesn't lie)
I caught it with Instruments (Time Profiler). The data:
| Metric | Before | After |
|---|---|---|
| Total samples | 7,260 | 489 |
Samples in OffsetStore.load()
|
1,704 (88%) | 10 (2%) |
| Scan time | >20s | <0.5s |
| CPU | 81% | ~1.5% |
88% of scan time was reading and parsing a 900-line JSON. Over and over. Like Sisyphus pushing his boulder, but with JSONDecoder.
The fix (that should make you cringe)
// BEFORE: I/O on every iteration
for file in files {
let offset = offsetStore.offset(for: file.relativePath) // reads JSON
// ...
offsetStore.setOffset(newOffset, for: file.relativePath) // reads + writes JSON
}
// AFTER: load once, operate in memory, save once
var offsets = offsetStore.load() // ONCE
for file in files {
let offset = offsets.offsets[file.relativePath] ?? 0 // O(1) in memory
// ...
offsets.offsets[file.relativePath] = newOffset
}
offsetStore.save(offsets) // ONCE
The data structure didn't change. It was still a Dictionary<String, UInt64>. The hash table was already optimal. What was suboptimal was rebuilding it from disk on every iteration.
What doesn't work: adding "don't do this" to your CLAUDE.md
After the fix, I added this to the project's CLAUDE.md:
"NEVER do I/O (disk, network, decode JSON, Core Data fetch) inside a loop if it can be done before. Load data once before the loop, operate in memory, save once after."
And here's what I really want to tell you: it didn't help at all.
Weeks later, when adding a second service (Codex), the AI generated exactly the same pattern. With the instruction right there. It's like putting up a "keep off the grass" sign and expecting it to work.
Why? Because the LLM doesn't understand the rule. It has seen it. Statistically, most code it read during training does punctual I/O, not in 900-iteration loops. The load → use → save pattern in a function is most likely. That this function gets called inside a 900-iteration for loop is a contextual detail the model has no incentive to track.
What also doesn't work: linters
No linter can catch this. Not SwiftLint, not ESLint, not Ruff, not Clippy. Think about it: the code is syntactically correct and semantically valid. Each individual call to offsetStore.offset(for:) is perfectly reasonable. The problem isn't in any single line — it's in the composition.
Looking at the layers of code meaning (an idea I use in my adversarial development course):
| Layer | Question | Fails here? |
|---|---|---|
| 1. Signal | Is this code? | No |
| 2. Language | Is it valid Swift? | No |
| 3. Syntax | Does it compile? | No |
| 4. Local semantics | Does the function do what it promises? | No |
| 5. System semantics | Does it respect contracts and performance? | Yes |
| 6. Architecture | Does it scale without degrading? | Yes |
The failure is in layers 5-6. Exactly where LLMs fail today in 2026. The syntax and local logic are impeccable. The problem is emergent: it appears when a correct function gets used in a context that turns it into a bottleneck.
A linter operates in layers 2-4. It has no visibility into composition or performance. It's like asking Word's spell checker to detect a logical fallacy.
The only thing that works: performance tests after the fact
After the first fix, I wrote this test:
@Test("Scan performance does not degrade with file count")
func scanPerformanceDoesNotDegradeWithFileCount() async throws {
// Create 1000 JSONL files with minimal content
for i in 0..<1000 {
let content = "..." // one valid line
try content.write(to: dir.appendingPathComponent("session-\(i).jsonl"), ...)
}
// Pre-populate offset store (simulate re-scan)
var offsets = SessionOffsetStore.OffsetData()
for i in 0..<1000 {
offsets.offsets["session-\(i).jsonl"] = 100
}
offsetStore.save(offsets)
let start = ContinuousClock.now
await service.scan()
let elapsed = ContinuousClock.now - start
#expect(elapsed < .seconds(3)) // <3s for 1000 files
}
It's a brutally simple regression test. 1000 files, under 3 seconds, or the test fails. If anyone (human or AI) puts I/O back inside the loop, the test goes from taking 0.2 seconds to taking 30, and explodes.
And this is exactly what happened. When the AI generated the second service with the same bug, the first service's performance test kept passing (it was a different service). But when I wrote the equivalent test for the new service, it failed immediately. The test did its job: catch the regression that neither the CLAUDE.md nor any linter could see.
What this confirms
This bug is the perfect demonstration of the central thesis of what I call adversarial development: never trust, always verify.
You can't trust that AI won't make freshman-level mistakes. It will. Repeatedly. Even when you tell it not to.
You can't trust that linters will catch it. They can't. The error is above their abstraction level.
What you can do:
- Performance tests as an after-the-fact safety net
- Real profiling (xctrace, Instruments) to measure, not guess
- Defense in depth: multiple layers, because no single layer covers everything
The defense isn't a wall. It's an onion. Layers upon layers. And when one fails, the next one catches it.
For the skeptics
"But Fernando, wouldn't a human programmer make the same mistake?"
A junior, yes. A senior, probably not — because they have the pattern internalized. But even a senior would do code review and catch it. The problem with AI-generated code is volume: 50 files in 10 minutes. Nobody reviews 50 files line by line. Discriminator fatigue is real.
And that's why you need verification to be automatic, not human. The performance test doesn't get tired. It doesn't get distracted. It has no fatigue. It runs every time you do make test and tells you if something smells wrong.
It's the same principle I apply in the 5 defenses against hallucinations: the verification system must be external to the generator. If the AI writes the code, verification has to come from somewhere else. In this case, from a clock that measures how long it takes.
Top comments (0)