xulingfeng

Posted on Jun 29 • Edited on Jul 1

Stratagems #2: Derek Shaw Walked Into Another AI Promise. The Pipeline Had a Better Plan.

#ai #career #discuss #programming

When the enemy is too strong to attack directly, attack what they hold dear. They will come to you to defend it, and the siege is lifted.
— The 36 Stratagems, "Besiege Wei to Rescue Zhao"

Derek Shaw knew 95% was a number he shouldn't have promised. He said it anyway.

97.2% Coverage, 4-Day Delivery — A Fake Quote, and a Parking Lot Lesson

Back at QualiGuard, Derek Shaw walked into Finova's $1.8M contract bid with an AI platform that claimed 97.2% test coverage. Four-day delivery, fully automated, beautiful numbers. But his boss Sarah had cut his GPU budget behind his back and forced him to slash the quote from $1.8M to $1.5M.

The truth was uglier than the cover slide. Derek had to fill the compute gap with sampling plus AI predictions — real coverage dropped from 97.2% to 86.7%. Lena spotted a footnote buried in the Aegis test report: External Deps: 0 — a financial system with 14 external dependencies, and the report claimed zero. She didn't bring it up during the review. Finova's risk lead Frank beat her to it.

Finova handed the contract to VeriTest. In the parking lot, Lena looked back before getting into her car: "Next time you put together a proposal — make sure your boss knows what you're doing out there."

Derek remembered those words. Then he switched companies.

Same Spot, Different Name

Derek Shaw was three months into MediSys. The QualiGuard disaster — that bid where he fell flat in front of Finova's entire evaluation team, the "External Deps: 0" oversight — still hadn't cooled off.

He never told anyone about that night in the parking lot. But every few days, the words surfaced on their own: "Next time you put together a proposal — make sure your boss knows what you're doing out there." He knew she was right. He didn't want to admit she was right.

At MediSys, Derek swore things would be different. He volunteered to run a round of "we're listening" sessions with the team — he wasn't going to be the Director who bet everything on AI again. The first two months went well. During the interview, VP Morgan — a woman in her early fifties, gray suit always pressed sharp — asked him only one non-technical question: "QualiGuard. What did you learn?" He gave a solid answer. He just wasn't sure how much of it Morgan actually bought.

Then month three hit. The CEO announced it at the all-hands: MediSys was launching an AI-assisted diagnostic validation platform. Strategic initiative. FDA pathway. $2M budget. Twelve months to MVP. Derek was named Quality Validation Lead.

Same position. Again. Only this time, he told himself he knew what to do.

MediSys's AI validation platform ran on a third-party engine — EHR AI Labs, a medical AI diagnostics startup fresh off a Series B. Their bare engine could hit 95%+ diagnostic suggestion coverage in lab conditions. But the engine wasn't the product — MediSys licensed it and handled the real work: fine-tuning adapters for 17 medical data sources, plus the FDA validation pathway. The integration layer was what MediSys actually sold to hospitals.

There was just one problem. Another company was chasing the same FDA audit contracts — OmniDx. They'd published a white paper the month before, claiming 96.2% diagnostic suggestion accuracy in independent validation. The CEO couldn't stop bringing up that number. Derek read the white paper. Something felt off — OmniDx's evaluation set had 3 data sources. MediSys had to run 17. That 96.2% wasn't inflated — it was greenhouse-grown.

He kept his mouth shut. He didn't have his own numbers yet.

Then Derek's team finished their first week of the project launch. The numbers came in — actual accuracy hovering between 83% and 87%. He was worse than OmniDx.

That's when the CEO looked at him in the weekly project review. "Investor day is 18 days out. I want to see 95%."

Derek wanted to say the 96.2% was greenhouse-grown. Wanted to say 3 data sources and 17 are not the same thing. He didn't. He was sitting on 83%. He had no standing to argue.

But he needed a number — not to actually satisfy the CEO, but to buy himself 18 days without interference. He needed a target that would make the CEO stop asking "so what do you think about their 96.2%?"

Derek stood up in the project review and said: "Give me 18 days. I'll get above 95% before investor day."

The room went quiet for a few seconds. In three months at MediSys, he'd never done this. His style was cautious, always leaving room. After the meeting, VP Morgan pulled him into the hallway: "Derek, are you sure about this?"

"Yes," he said.

He was walking the same road he walked at QualiGuard. → Story #15

18 Days. 95% or Bust.

Derek started tuning. He split the team into two tracks — himself plus one person pushing model accuracy, the other four maintaining the existing test pipeline.

The real problem was in the pipeline.

MediSys's data flow was nothing like QualiGuard's. Medical data came from 17 different sources: HIS, LIS, PACS imaging archives, third-party lab interfaces — formats ranging from HL7 to FHIR to bespoke CSVs. The preprocessing stage had to normalize all of it before the AI engine could touch anything.

The preprocessing pipeline was legacy architecture, three years old — built long before the AI project existed. A single-threaded ETL job, run once a day. It had never been a bottleneck because no one had ever tried to feed its output into a real-time AI model.

Now the AI platform demanded real-time ingestion from all 17 sources. Data volume quadrupled overnight.

The pipeline started collapsing during afternoon peak hours. Nobody cared — pipeline performance wasn't in anyone's KPIs. Everyone was chasing 95%.

Derek noticed but didn't act. He saw a few pipeline alert emails — preprocessing job completion time creeping from 6 hours to 9. He forwarded them to the pipeline team, got back a "got it, looking into it," and heard nothing after. He didn't push. He didn't want the distraction.

One night he was in the office until 1 a.m. Model results dropped from 90.5% back to 89%. No explanation. Just lost 1.5 points. He stared at the screen for a long time, searching for a reason — but he never looked back at the pipeline. He just re-ran the training and hoped for better luck.

Every day he wrestled with accuracy: 87% → 89% → 90.5% (fell back to 89% overnight) → 91.2%. Every gain was harder than the last. Every point was burning more of the team's time.

One night he closed the model terminal and reopened OmniDx's white paper — the one he'd saved a month ago and never revisited. This time he caught a footnote he'd skipped before: the evaluation set description listed three data sources, but the sample ID prefixes had four patterns. That fourth prefix didn't match any source in their public documentation.

He told no one. He wasn't sure if he'd found a crack or just found an excuse to look away from his own numbers.

The pipeline was silently rotting.

Day 17

Derek re-ran the model at 2 a.m. Accuracy: 92.7%. Still nowhere near 95%. Worse — he realized the data he'd been feeding in for the past two days wasn't from today. The pipeline was taking 20 hours to complete. The model was training on yesterday's data.

He opened the pipeline monitoring dashboard. First time since the project launched.

The ETL job was now clocking 20 hours. Out of every 24-hour cycle, the data-availability window was down to 4 hours. And nearly 3 of those overlapped with model training — leaving barely one clean hour for evaluation.

He could see the problem. He should have seen it sooner. The lessons from his last job were still fresh — all those late nights staring at QualiGuard's monitoring dashboards had taught him one thing: check the infrastructure before you check the model.

But he hadn't. Because looking at the pipeline meant admitting he'd picked the wrong battlefield. Admitting that "I'll hit 95%" — said 18 days ago — was a mistake. Admitting he was repeating his last company's playbook.

He sat in front of the screen, staring at that monitoring dashboard for a long time.

Something else was running through his head. The four sample ID prefixes in OmniDx's white paper — he hadn't touched that thread since noticing it the week before. It had felt like making excuses: can't hit 95%, so let's go nitpick the competitor. But sitting there, staring at a 20-hour ETL job, he started re-examining that lead. Not because anything clicked — but because he finally admitted the obvious: tuning the model wasn't going to win this.

But the game has more than one way to win.

If OmniDx's benchmark really did hide a fourth data source — that prefix with no known origin — then their 96.2% was trained on those three or four sources. MediSys had to run 17. The two numbers were never comparable. He didn't need to prove OmniDx was fake. He just needed the CEO to realize: OmniDx tested something entirely different from what we're building.

And then a voice surfaced in his head — not his own. It was that night in the parking lot, Lena turning back before the car door closed:

"Next time you put together a proposal — make sure your boss knows what you're doing out there."

He hadn't said anything back then. But now he understood what she meant. If this went sideways again, would VP Morgan turn into another Sarah — losing trust in him before he even realized it was gone?

He closed the model training terminal. First time in 18 days.

Then he opened the pipeline config.

He made a decision — and told no one. Alone, in the middle of the night, he rewired the single-threaded ETL into a multi-threaded parallel pipeline. He didn't notify VP Morgan. Didn't tell the team. He still wanted to maintain the illusion that "95% is still the target."

The Supply Line

3 a.m. to 10 a.m. Derek sat at MediSys's test server and split the ETL pipeline from one thread into 8 parallel workers, wrote a lightweight queue manager to distribute data sources. It blew up twice — first a deadlock, then a memory overflow — he walked to the hallway, drank a glass of cold water, came back, added a timeout mechanism, and kept going.

He didn't touch a single line of model code.

By 10 a.m., the ETL job finished in 4 hours what had been taking 20. Pipeline restored.

He re-ran the model — accuracy jumped from 92.7% straight to 94.1%. Not because the model improved. Because the data was fresh.

He didn't keep storming the city. He fixed the supply line. And the moment the supply line was fixed, the city fell on its own.

But he knew this fix was fragile — 8 parallel workers with no idempotency checks, a queue manager with no exception handling. He didn't know if it would hold tomorrow. He was just betting it wouldn't blow up before investor day.

The irony: if he'd fixed the pipeline on day one, 18 days would have been enough to push accuracy past 95%. He burned 16 days storming the walls and only remembered to check the supply line with two days left.

That night, he ran through the investor day deck one last time in the conference room. Closed his laptop. Leaned back.

94.1%. Tomorrow was investor day. Not pretty, but close enough.

Fixing the pipeline only kept him alive. The real Besiege Wei to Rescue Zhao happened the next day.

"Was It Like This Last Time?"

Investor day. Derek presented 94.1% accuracy — not 95%, but the data was live, reproducible, verifiable. The CEO watched the numbers pulse on screen and said exactly three words: "Within a point — close enough." Then flipped to the next slide. It was the biggest compliment Derek had ever heard from him.

Something else happened that Derek hadn't planned. That morning, he'd sent Morgan a comparison note on OmniDx's white paper — brief, no conclusions, just the sample ID prefix alignment across the three claimed data sources. Morgan slipped it into the investor day briefing — not as an attack, just one page under "competitive landscape analysis."

That night, Derek saw a push notification. News traveled faster than he'd expected — that one page in Morgan's briefing had been photographed by a shared investor and landed in OmniDx's internal Slack before lunch.

OmniDx's CTO posted on LinkedIn, tone restrained but prominent: "Our validation methodology has been independently audited. The 96.2% accuracy is based on a publicly reproducible evaluation framework. We'd suggest colleagues check their own datasets before commenting on others'."

Derek read it twice. After that, nobody compared 96.2% to 94.1% again.

VP Morgan found him after the session: "Was it like this last time?"

Derek froze for a second. Morgan's tone wasn't an interrogation, and it wasn't concern — it was confirmation. She was confirming what she'd suspected for three months — that Derek had done it again: promised a number he couldn't hit, fixed the pipeline in secret, dug up dirt on the competitor, and left her, the VP, the last to know.

"More or less," he said.

Morgan nodded. Didn't press. Sometimes the most unsettling thing isn't being questioned — it's realizing the other person already has their answer.

She never asked again.

A week later, two of the three hospitals pushed the FDA audit contract to final-stage negotiation. OmniDx stopped appearing on bid lists.

Besiege Wei to Rescue Zhao isn't about defeating Wei. It's about making Wei feel like it has to defend itself.

The Third Cup

That evening, Derek walked out of the MediSys building alone. Heading toward the metro platform going west, he turned at the intersection — not lost, just not ready to go home. He needed a road long enough to walk for ten minutes and process the day. He didn't take his usual route. Took an alley he'd never walked before. Deeper and deeper. Something was pulling him in that direction — though he didn't know what.

Then he stopped under a warm amber sign —

"Third Cup" cafe.

He didn't walk in. He stood at the door for two seconds, looking through the glass. Inside, there was only one person — a figure in a dark coat, seated at a corner table, a few sheets of paper spread out in front of them. Not a menu.

The person looked up. Met Derek's eyes.

Those eyes didn't look away. Didn't nod. Just watched.

Derek walked on.

A few steps later, it hit him: one of those pages on the table — the corner that was flipped open — had the MediSys logo on it. Deep blue. He knew it. Below it was an architecture diagram he didn't recognize but could guess.

He stopped. Looked back.

Through the glass, the seat was empty.

Derek's first instinct was to push the door open and walk in. But his feet wouldn't move. A Director-level architecture diagram on a stranger's table — he should be taking photos, chasing leads, calling Legal. He didn't. Because Morgan had just looked at him and asked, "Was it like this last time?" — and if tonight she heard he'd been standing outside a cafe while someone inside had MediSys architecture docs, she wasn't going to call it a coincidence.

He stood there thinking for a long time. Long enough that if the person was still inside, they'd had time to walk out the back and be three blocks away.

He didn't chase. Maybe he'd look into it tomorrow. Maybe not. Tonight, he needed sleep — not another lead to follow.

That's Besiege Wei to Rescue Zhao — when you can't beat their accuracy, you attack their benchmark.

🤖 AI Post-Mortem Analysis

[36 Stratagems Tactical Engine v3.1] Loaded
[Matching Target] Besiege Wei to Rescue Zhao
[Analysis Mode] Full-Table Situational Scan
━━━━━━━━━━━━━━━━━━━━
Tactical Match: 87.3%
Agent: Derek Shaw
Action: Abandoned head-to-head model accuracy competition.
        Exposed structural flaws in competitor's benchmark
        while rebuilding internal data pipeline.
Objective: AI-assisted diagnostic validation milestone
Outcome: 94.1% (target 95%) — competitive pressure neutralized

Parallel Path Analysis:
  - Pipeline path: Intervention on Day 17, 16 days late.
    Data refresh restored → accuracy jumped 92.7% → 94.1%.
  - Competitor path: Four-prefix anomaly detected Week 2.
    Comparative analysis deployed Day 18 → OmniDx retreated to LinkedIn defense.

Key Data Points:
  - OmniDx eval set: 3 data sources vs. MediSys 17.
    96.2% wasn't inflated — it was greenhouse-grown.
  - ETL pipeline: single-threaded 20h → 8-parallel 4h.
    Zero model code changed. Accuracy +1.4%.

Verdict: Tactically sound, split in execution.
Strategic judgment (competitor) and tactical execution (pipeline)
lagged behind each other by 16 days — two sides of the same person.
A smart person's greatest enemy isn't the competition.
It's their own ego.

Next stratagem: Kill with a Borrowed Knife

P.S. English isn't my first language. I use AI to polish the writing and smooth out the rough edges. Thanks for reading. ☕ Buy me a coffee

Top comments (8)

Evans Owusu • Jun 29

This resonates a lot. There's something almost poetic about
AI systems that promise transparency but obscure what's
actually happening under the hood.

It got me thinking about honesty gaps in general — not just
in pipelines but in human communication too. I've been
building Yhuu (yhuu.life), an app that surfaces the gap
between what people expect others to say vs what they
actually say. Different domain, same core problem:
the distance between what's promised and what's real.

Derek's story is a good reminder that that gap is everywhere.

xulingfeng • Jun 29

Right? The pipeline was never trying to hide anything — it just sat there degrading while everyone stared at the model dashboard. Infrastructure has a way of being honest whether you look at it or not. Yhuu sounds like the same instinct from the human side. Been there.

𝑻𝒉𝒆 𝑳𝒂𝒛𝒚 𝑮𝒊𝒓𝒍 • Jun 29

Sorry I'm late to the party! My response got stuck somewhere in the pipeline.

Loved the article! Quick question though: if you had to fix just one thing first—the prompt, the pipeline, or the process—what would give the biggest improvement? Curious to hear your take!

xulingfeng • Jun 29

The pipeline and it's not even close. A perfect prompt on stale data is still wrong. A bad prompt on fresh data is fixable. Process comes third because no process survives the first fire anyway. 🔥

𝑻𝒉𝒆 𝑳𝒂𝒛𝒚 𝑮𝒊𝒓𝒍 • Jun 29

I have a question: if someone follows us, should we follow them back?

xulingfeng • Jun 29

Depends on who's doing the following. Bot accounts with no avatar, no posts, no bio — don't bother. Real devs who write or comment — always follow back. The Dev.to community runs on reciprocity.