Adapted from a true story. The logs are real; the dramatization is
mine. The math boxes are for the curious — skip them and you lose
nothing of the story, read them and you'll never look at the machine
the same way.
For months I had been building a machine. Its name was Angelo. On
paper, its job was simple: run AI agents side by side, keep them from
fighting, pick the best one's work. I had written thousands of tests.
All of them green. The screen was a wall of green.
And as I stared at all that green, there was a small, cold voice
inside me:
"But does this thing actually work?"
Tests prove one thing: that your machine works in the world you
imagined. Not in the real one. The "agents" in my tests were puppets —
extras who delivered one line and walked off stage. That day I made
the decision: the puppets were out. I would give Angelo REAL agents,
and I would send it into the most dangerous terrain there is — the
code it was born from. The machine would work on its own source code.
But first I did something, and this is the spine of the whole story:
I wrote my prediction down on paper (those who know me know it —
I go through notebooks the way other people go through coffee).
Before the run. Because after the run, everyone's a prophet.
I took one last look at the code, and I saw it. A small, sneaky
detail. Angelo was calling its agents onto the stage but giving them
no lines. The puppets never cared — they knew their lines by heart.
But a real agent? A real agent walks on stage, looks at the audience,
and waits.
I wrote in the notebook: "All three agents will wait forever.
Without saying a single word."
I pressed the button.
Three digital souls were born. Implementer — the impatient one, wants
to solve it now. Architect — wants to build everything like a
cathedral. And Skeptic — the grumpy genius who believes in nothing
and hunts for cracks in everyone's work.
All three walked on stage. And all three... went silent.
One minute. Three minutes. Three "running" labels on the screen, zero
bytes underneath. The system that a thousand evergreen tests had
failed to protect had frozen solid in the first second of contact
with the real world. I looked at my notebook: word for word. What a
victory — the victory of having predicted your own disaster.
🔬 For the curious: the mathematics of the infinite wait
An LLM agent is, at its core, a conditional probability machine:
it samples from an output distribution conditioned on its input —
the prompt.
output ~ P(token | input)
So what happens when the input is the empty set? With nothing to
condition on, sampling never begins:
P(output is produced | input = ∅) = 0
And the expected waiting time for an event with probability zero:
E[T_wait] = ∞
The puppets were exempt from this equation, because their output
was never conditioned on the input — it was constant:
output = "hello, I am an agent". Determinism masks deadlock.
That's why the green tests proved nothing: multiply an equation
by zero and both sides look green.
The fix was twenty lines. I gave the agents their lines. Pressed
again.
This time... they lived. Implementer dove into the file and wrote the
code. Architect laid the edge cases out in columns. Skeptic hunted
for holes in both of their work — and wrote something better. Then
came the Judge; it read all three, compared them, and sealed its
verdict in a single line:
WINNER: skeptic.
A weight lifted off my shoulders. The machine worked. I reached for
the keyboard to commit the victory to the record, and...
...git betrayed me.
fatal: Unable to create '.git/index.lock': File exists.
A trivial thing. You delete it, you move on. I deleted it. It came
back within the same second. I deleted it again. It came back.
As if something inside the repo was breathing — every time I ripped
the lock out, it fitted a new one. I froze the processes; the lock
was born from SOMEWHERE ELSE. I checked who was holding it; nobody
was. Something that never held it but endlessly CREATED it was in
there, and I couldn't see it.
Then the compiler went down too. On the screen, a sentence I had
never seen before that day: "signal: killed." Something was
executing my git processes from behind, one by one.
I was fighting an enemy I couldn't see, inside my own repo.
And right at that moment — I'm still smiling as I write this — a
notification dropped into the corner of the screen. Hours earlier,
"just in case," I had posted an autonomous sentry over Angelo. A
humble, twelve-line watchman taking a pulse every forty-five minutes.
It had hit the alarm:
"INTERVENTION REQUIRED."
I stopped, looked at the screen, and saw the full picture of the
fight in that instant: Angelo's own watchman was reporting the fire
that Angelo itself had started. The machine I built was at war with
my repo, and the thing calling me to the war was also the machine I
built. This isn't Frankenstein — this is Frankenstein's monster
calling the fire department.
🔬 For the curious: the mathematics of the watchman
Why is a forty-five-minute pulse forty-five minutes? A failure
lands at some random moment between two pulses, uniformly
distributed. If the pulse interval is Δ, the expected time until
the failure is noticed:
E[detection delay] = Δ / 2 = 45 min / 2 = 22.5 min
Tightening the pulse lowers the delay, but it isn't free — every
pulse asks the system a question, and as you're about to see, the
fire in this story was started by exactly that: an innocent
question asked too often. Designing a watchman is an optimization
problem:
min [ detection_cost(Δ) + observation_load(1/Δ) ]
Delay grows with Δ; the load your observation puts on the system
grows with 1/Δ. The twelve-line watchman sat at the bottom of that
curve, at 45 minutes — and it reported the fire right inside the
window the equation promised.
I found the culprit. The killer was the most innocent-looking one in
the room: the status panel. Angelo's interface was asking the repo
"has anything changed?" every few seconds. A harmless question —
normally. But my repo wasn't normal; it was a fifteen-gigabyte
mammoth, fattened by years of builds. Every innocent question weighed
the mammoth from scratch; every weighing locked the repo; every
weighing cut short left its lock behind as an orphan. A question per
second, a lock forever. A polite question, asked often enough,
becomes a siege.
🔬 For the curious: the mathematics of the siege
This is a queueing theory classic. Two numbers decide everything:
T_poll — how often the panel asks its question
T_scan — how long one answer takes, i.e. weighing a 15 GB
mammoth from scratch
The system's load factor:
ρ = T_scan / T_poll
The iron law of queueing: if ρ < 1 the system breathes;
if ρ ≥ 1 the queue grows without bound. In an ordinary repo a
scan takes milliseconds, ρ hugs zero, and the same panel stays
innocent for years. On my mammoth, the scan had overtaken the
asking interval:
T_scan > T_poll → ρ > 1 → queue → ∞
Before one weighing finished, the next was at the door. Every
waiting scan held an index.lock; every scan executed mid-flight
orphaned its lock. The lock being "born endlessly while nobody
holds it" wasn't mystical — the birth rate had simply overtaken
the death rate:
λ_lock_births > μ_lock_deaths → lock count keeps climbing
The eerie part: one millimeter below the ρ = 1 threshold,
everything looks normal. One millimeter above it, catastrophe is
mathematically guaranteed. That's what a phase transition is —
the disaster wasn't bad code; it was good code multiplied by the
wrong constant.
The fix? Not a week-long architectural revolution. One line.
A flag planted in git itself, years ago, by someone who knew this
storm: "don't take a lock just to look." The mere existence of
that flag is proof of the story — mine was not the first ship lost
in this weather.
I started the third run. The storm counter on the screen: zero.
It stayed zero.
The agents took the stage again. The Judge delivered its verdict
again: Skeptic again. But the truly spine-tingling thing happened in
the last run, and this is the sentence worth all the exhaustion in
this story:
The machine remembered.
On day one, the seating order was alphabetical — it didn't know who
was who. On the third run, when the stage door opened, the first one
through was Skeptic. Angelo had watched who the judge had been
trusting for two days, and had seated the grumpy genius at the head
of the table with its own hands. Nobody told it to. It lived through
the fight, the storm, the executed processes — and it walked out of
the ashes with a preference.
🔬 For the curious: the mathematics of remembering
"The machine remembered" sounds poetic, but underneath it sits an
entirely sober mechanism: Bayesian updating. The belief about
each agent's probability of winning is kept as a Beta distribution:
trust_i ~ Beta(α_i, β_i)
At the start, everyone is equal: Beta(1, 1) — flat ignorance,
alphabetical seating. Then the judge speaks. Every win increments
α, every loss increments β:
win: α_i ← α_i + 1
loss: β_i ← β_i + 1
After two wins, Skeptic's expected trust:
E[trust] = α / (α + β) = 3 / 4 = 0.75
The two agents with two losses each sit at 1 / 4 = 0.25. The
seating order is no longer alphabetical; it follows the posterior —
the head of the table belongs to whoever's distribution has drifted
furthest right.
The subtle part: Angelo doesn't banish the losers from the stage.
On every run, a sample is drawn from each agent's distribution and
the highest sample goes first — Thompson sampling, by its textbook
name. Because a Beta distribution's tail never collapses to zero,
the door stays ajar for the others in case the grumpy genius ever
hits a bad streak. The machine both remembers and forgives — and a
single distribution tunes the ratio between the two.
That evening I wrote the final note in the notebook:
"Today I found five real bugs. I buried all five the same day. But
that's not the real win. The real win: when the storm died down, the
machine had learned something."
For the curious: the raw logs of the incident, the prediction
notebook, and the three commits are sitting in the repo. The
exaggeration lives in the telling; logs don't exaggerate.
Top comments (0)