<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: xulingfeng</title>
    <description>The latest articles on DEV Community by xulingfeng (@xulingfeng).</description>
    <link>https://dev.to/xulingfeng</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3941526%2F81ebae49-a110-4efa-82b2-4d873989014a.png</url>
      <title>DEV Community: xulingfeng</title>
      <link>https://dev.to/xulingfeng</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xulingfeng"/>
    <language>en</language>
    <item>
      <title>Stratagems #2: Derek Shaw Walked Into Another AI Promise. The Pipeline Had a Better Plan.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Mon, 29 Jun 2026 10:48:43 +0000</pubDate>
      <link>https://dev.to/xulingfeng/stratagems-2-derek-shaw-walked-into-another-ai-promise-the-pipeline-had-a-better-plan-3ckg</link>
      <guid>https://dev.to/xulingfeng/stratagems-2-derek-shaw-walked-into-another-ai-promise-the-pipeline-had-a-better-plan-3ckg</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;When the enemy is too strong to attack directly, attack what they hold dear. They will come to you to defend it, and the siege is lifted.&lt;/em&gt;&lt;br&gt;
— The 36 Stratagems, "&lt;a href="https://en.wikipedia.org/wiki/Thirty-Six_Stratagems#Besiege_Wei_to_rescue_Zhao" rel="noopener noreferrer"&gt;Besiege Wei to Rescue Zhao&lt;/a&gt;"&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Derek Shaw knew 95% was a number he shouldn't have promised. He said it anyway.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/our-competitor-had-an-ai-that-covered-972-we-had-a-spreadsheet-and-a-fake-quote-guess-who-won-5cc3"&gt;97.2% Coverage, 4-Day Delivery — A Fake Quote, and a Parking Lot Lesson&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Back at QualiGuard, Derek Shaw walked into Finova's $1.8M contract bid with an AI platform that claimed 97.2% test coverage. Four-day delivery, fully automated, beautiful numbers. But his boss Sarah had cut his GPU budget behind his back and forced him to slash the quote from $1.8M to $1.5M.&lt;/p&gt;

&lt;p&gt;The truth was uglier than the cover slide. Derek had to fill the compute gap with sampling plus AI predictions — real coverage dropped from 97.2% to 86.7%. Lena spotted a footnote buried in the Aegis test report: &lt;strong&gt;External Deps: 0 — a financial system with 14 external dependencies, and the report claimed zero.&lt;/strong&gt; She didn't bring it up during the review. Finova's risk lead Frank beat her to it.&lt;/p&gt;

&lt;p&gt;Finova handed the contract to VeriTest. In the parking lot, Lena looked back before getting into her car: &lt;strong&gt;"Next time you put together a proposal — make sure your boss knows what you're doing out there."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Derek remembered those words. Then he switched companies.&lt;/p&gt;




&lt;h2&gt;
  
  
  Same Spot, Different Name
&lt;/h2&gt;

&lt;p&gt;Derek Shaw was three months into MediSys. The QualiGuard disaster — that bid where he fell flat in front of Finova's entire evaluation team, the "External Deps: 0" oversight — still hadn't cooled off.&lt;/p&gt;

&lt;p&gt;He never told anyone about that night in the parking lot. But every few days, the words surfaced on their own: &lt;em&gt;"Next time you put together a proposal — make sure your boss knows what you're doing out there."&lt;/em&gt; He knew she was right. He didn't want to admit she was right.&lt;/p&gt;

&lt;p&gt;At MediSys, Derek swore things would be different. He volunteered to run a round of "we're listening" sessions with the team — he wasn't going to be the Director who bet everything on AI again. The first two months went well. During the interview, VP Morgan — a woman in her early fifties, gray suit always pressed sharp — asked him only one non-technical question: "QualiGuard. What did you learn?" He gave a solid answer. He just wasn't sure how much of it Morgan actually bought.&lt;/p&gt;

&lt;p&gt;Then month three hit. The CEO announced it at the all-hands: MediSys was launching an AI-assisted diagnostic validation platform. Strategic initiative. FDA pathway. $2M budget. Twelve months to MVP. Derek was named Quality Validation Lead.&lt;/p&gt;

&lt;p&gt;Same position. Again. Only this time, he told himself he knew what to do.&lt;/p&gt;




&lt;p&gt;MediSys's AI validation platform ran on a third-party engine — EHR AI Labs, a medical AI diagnostics startup fresh off a Series B. Their bare engine could hit 95%+ diagnostic suggestion coverage in lab conditions. But the engine wasn't the product — &lt;strong&gt;MediSys licensed it and handled the real work: fine-tuning adapters for 17 medical data sources, plus the FDA validation pathway.&lt;/strong&gt; The integration layer was what MediSys actually sold to hospitals.&lt;/p&gt;

&lt;p&gt;There was just one problem. Another company was chasing the same FDA audit contracts — OmniDx. They'd published a white paper the month before, claiming &lt;strong&gt;96.2% diagnostic suggestion accuracy&lt;/strong&gt; in independent validation. The CEO couldn't stop bringing up that number. Derek read the white paper. Something felt off — OmniDx's evaluation set had 3 data sources. MediSys had to run 17. That 96.2% wasn't inflated — &lt;strong&gt;it was greenhouse-grown.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He kept his mouth shut. He didn't have his own numbers yet.&lt;/p&gt;

&lt;p&gt;Then Derek's team finished their first week of the project launch. The numbers came in — actual accuracy hovering between 83% and 87%. He was worse than OmniDx.&lt;/p&gt;

&lt;p&gt;That's when the CEO looked at him in the weekly project review. &lt;strong&gt;"Investor day is 18 days out. I want to see 95%."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Derek wanted to say the 96.2% was greenhouse-grown. Wanted to say 3 data sources and 17 are not the same thing. He didn't. He was sitting on 83%. He had no standing to argue.&lt;/p&gt;

&lt;p&gt;But he needed a number — not to actually satisfy the CEO, but to buy himself 18 days without interference. He needed a target that would make the CEO stop asking "so what do you think about their 96.2%?"&lt;/p&gt;

&lt;p&gt;Derek stood up in the project review and said: &lt;strong&gt;"Give me 18 days. I'll get above 95% before investor day."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The room went quiet for a few seconds. In three months at MediSys, he'd never done this. His style was cautious, always leaving room. After the meeting, VP Morgan pulled him into the hallway: "Derek, are you sure about this?"&lt;/p&gt;

&lt;p&gt;"Yes," he said.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;He was walking the same road he walked at QualiGuard.&lt;/strong&gt; &lt;a href="https://dev.to/xulingfeng/our-competitor-had-an-ai-that-covered-972-we-had-a-spreadsheet-and-a-fake-quote-guess-who-won-5cc3"&gt;→ Story #15&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  18 Days. 95% or Bust.
&lt;/h2&gt;

&lt;p&gt;Derek started tuning. He split the team into two tracks — himself plus one person pushing model accuracy, the other four maintaining the existing test pipeline.&lt;/p&gt;

&lt;p&gt;The real problem was in the pipeline.&lt;/p&gt;

&lt;p&gt;MediSys's data flow was nothing like QualiGuard's. Medical data came from 17 different sources: HIS, LIS, PACS imaging archives, third-party lab interfaces — formats ranging from HL7 to FHIR to bespoke CSVs. The preprocessing stage had to normalize all of it before the AI engine could touch anything.&lt;/p&gt;

&lt;p&gt;The preprocessing pipeline was legacy architecture, three years old — built long before the AI project existed. A single-threaded ETL job, run once a day. It had never been a bottleneck because no one had ever tried to feed its output into a real-time AI model.&lt;/p&gt;

&lt;p&gt;Now the AI platform demanded real-time ingestion from all 17 sources. Data volume quadrupled overnight.&lt;/p&gt;

&lt;p&gt;The pipeline started collapsing during afternoon peak hours. Nobody cared — pipeline performance wasn't in anyone's KPIs. Everyone was chasing 95%.&lt;/p&gt;

&lt;p&gt;Derek noticed but didn't act. He saw a few pipeline alert emails — preprocessing job completion time creeping from 6 hours to 9. He forwarded them to the pipeline team, got back a "got it, looking into it," and heard nothing after. He didn't push. He didn't want the distraction.&lt;/p&gt;

&lt;p&gt;One night he was in the office until 1 a.m. Model results dropped from 90.5% back to 89%. No explanation. Just lost 1.5 points. He stared at the screen for a long time, searching for a reason — &lt;strong&gt;but he never looked back at the pipeline. He just re-ran the training and hoped for better luck.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every day he wrestled with accuracy: 87% → 89% → 90.5% (fell back to 89% overnight) → 91.2%. Every gain was harder than the last. Every point was burning more of the team's time.&lt;/p&gt;

&lt;p&gt;One night he closed the model terminal and reopened OmniDx's white paper — the one he'd saved a month ago and never revisited. This time he caught a footnote he'd skipped before: the evaluation set description listed three data sources, but the sample ID prefixes had four patterns. That fourth prefix didn't match any source in their public documentation.&lt;/p&gt;

&lt;p&gt;He told no one. He wasn't sure if he'd found a crack or just found an excuse to look away from his own numbers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pipeline was silently rotting.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Day 17
&lt;/h2&gt;

&lt;p&gt;Derek re-ran the model at 2 a.m. Accuracy: 92.7%. Still nowhere near 95%. Worse — he realized the data he'd been feeding in for the past two days wasn't from today. The pipeline was taking 20 hours to complete. The model was training on yesterday's data.&lt;/p&gt;

&lt;p&gt;He opened the pipeline monitoring dashboard. First time since the project launched.&lt;/p&gt;

&lt;p&gt;The ETL job was now clocking &lt;strong&gt;20 hours&lt;/strong&gt;. Out of every 24-hour cycle, the data-availability window was down to 4 hours. And nearly 3 of those overlapped with model training — leaving barely one clean hour for evaluation.&lt;/p&gt;

&lt;p&gt;He could see the problem. He should have seen it sooner. The lessons from his last job were still fresh — all those late nights staring at QualiGuard's monitoring dashboards had taught him one thing: &lt;strong&gt;check the infrastructure before you check the model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But he hadn't. Because looking at the pipeline meant admitting he'd picked the wrong battlefield. Admitting that "I'll hit 95%" — said 18 days ago — was a mistake. Admitting he was repeating his last company's playbook.&lt;/p&gt;

&lt;p&gt;He sat in front of the screen, staring at that monitoring dashboard for a long time.&lt;/p&gt;

&lt;p&gt;Something else was running through his head. The four sample ID prefixes in OmniDx's white paper — he hadn't touched that thread since noticing it the week before. It had felt like making excuses: can't hit 95%, so let's go nitpick the competitor. But sitting there, staring at a 20-hour ETL job, he started re-examining that lead. Not because anything clicked — but because he finally admitted the obvious: &lt;strong&gt;tuning the model wasn't going to win this.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But the game has more than one way to win.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If OmniDx's benchmark really did hide a fourth data source — that prefix with no known origin — then their 96.2% was trained on those three or four sources. MediSys had to run 17. The two numbers were never comparable. He didn't need to prove OmniDx was fake. He just needed the CEO to realize: &lt;strong&gt;OmniDx tested something entirely different from what we're building.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And then a voice surfaced in his head — not his own. It was that night in the parking lot, Lena turning back before the car door closed:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Next time you put together a proposal — make sure your boss knows what you're doing out there."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He hadn't said anything back then. But now he understood what she meant. If this went sideways again, would VP Morgan turn into another Sarah — losing trust in him before he even realized it was gone?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;He closed the model training terminal. First time in 18 days.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Then he opened the pipeline config.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;He made a decision — and told no one.&lt;/strong&gt; Alone, in the middle of the night, he rewired the single-threaded ETL into a multi-threaded parallel pipeline. He didn't notify VP Morgan. Didn't tell the team. He still wanted to maintain the illusion that "95% is still the target."&lt;/p&gt;




&lt;h2&gt;
  
  
  The Supply Line
&lt;/h2&gt;

&lt;p&gt;3 a.m. to 10 a.m. Derek sat at MediSys's test server and split the ETL pipeline from one thread into 8 parallel workers, wrote a lightweight queue manager to distribute data sources. It blew up twice — first a deadlock, then a memory overflow — he walked to the hallway, drank a glass of cold water, came back, added a timeout mechanism, and kept going.&lt;/p&gt;

&lt;p&gt;He didn't touch a single line of model code.&lt;/p&gt;

&lt;p&gt;By 10 a.m., the ETL job finished in 4 hours what had been taking 20. Pipeline restored.&lt;/p&gt;

&lt;p&gt;He re-ran the model — accuracy jumped from 92.7% straight to 94.1%. Not because the model improved. &lt;strong&gt;Because the data was fresh.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;He didn't keep storming the city. He fixed the supply line. And the moment the supply line was fixed, the city fell on its own.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But he knew this fix was fragile — 8 parallel workers with no idempotency checks, a queue manager with no exception handling. He didn't know if it would hold tomorrow. He was just betting it wouldn't blow up before investor day.&lt;/p&gt;

&lt;p&gt;The irony: if he'd fixed the pipeline on day one, 18 days would have been enough to push accuracy past 95%. He burned 16 days storming the walls and only remembered to check the supply line with two days left.&lt;/p&gt;

&lt;p&gt;That night, he ran through the investor day deck one last time in the conference room. Closed his laptop. Leaned back.&lt;/p&gt;

&lt;p&gt;94.1%. Tomorrow was investor day. Not pretty, but close enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fixing the pipeline only kept him alive. The real Besiege Wei to Rescue Zhao happened the next day.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  "Was It Like This Last Time?"
&lt;/h2&gt;

&lt;p&gt;Investor day. Derek presented 94.1% accuracy — not 95%, but the data was live, reproducible, verifiable. The CEO watched the numbers pulse on screen and said exactly three words: &lt;strong&gt;"Within a point — close enough."&lt;/strong&gt; Then flipped to the next slide. It was the biggest compliment Derek had ever heard from him.&lt;/p&gt;

&lt;p&gt;Something else happened that Derek hadn't planned. That morning, he'd sent Morgan a comparison note on OmniDx's white paper — brief, no conclusions, just the sample ID prefix alignment across the three claimed data sources. Morgan slipped it into the investor day briefing — not as an attack, just one page under "competitive landscape analysis."&lt;/p&gt;

&lt;p&gt;That night, Derek saw a push notification. News traveled faster than he'd expected — that one page in Morgan's briefing had been photographed by a shared investor and landed in OmniDx's internal Slack before lunch.&lt;/p&gt;

&lt;p&gt;OmniDx's CTO posted on LinkedIn, tone restrained but prominent: "Our validation methodology has been independently audited. The 96.2% accuracy is based on a publicly reproducible evaluation framework. We'd suggest colleagues check their own datasets before commenting on others'."&lt;/p&gt;

&lt;p&gt;Derek read it twice. After that, nobody compared 96.2% to 94.1% again.&lt;/p&gt;

&lt;p&gt;VP Morgan found him after the session: "Was it like this last time?"&lt;/p&gt;

&lt;p&gt;Derek froze for a second. Morgan's tone wasn't an interrogation, and it wasn't concern — it was confirmation. &lt;strong&gt;She was confirming what she'd suspected for three months — that Derek had done it again: promised a number he couldn't hit, fixed the pipeline in secret, dug up dirt on the competitor, and left her, the VP, the last to know.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"More or less," he said.&lt;/p&gt;

&lt;p&gt;Morgan nodded. Didn't press. &lt;strong&gt;Sometimes the most unsettling thing isn't being questioned — it's realizing the other person already has their answer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;She never asked again.&lt;/p&gt;

&lt;p&gt;A week later, two of the three hospitals pushed the FDA audit contract to final-stage negotiation. OmniDx stopped appearing on bid lists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Besiege Wei to Rescue Zhao isn't about defeating Wei. It's about making Wei feel like it has to defend itself.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Third Cup
&lt;/h2&gt;

&lt;p&gt;That evening, Derek walked out of the MediSys building alone. Heading toward the metro platform going west, he turned at the intersection — not lost, just not ready to go home. He needed a road long enough to walk for ten minutes and process the day. He didn't take his usual route. Took an alley he'd never walked before. Deeper and deeper. &lt;strong&gt;Something was pulling him in that direction — though he didn't know what.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Then he stopped under a warm amber sign —&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Third Cup" cafe.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He didn't walk in. He stood at the door for two seconds, looking through the glass. Inside, there was only one person — a figure in a dark coat, seated at a corner table, a few sheets of paper spread out in front of them. Not a menu.&lt;/p&gt;

&lt;p&gt;The person looked up. Met Derek's eyes.&lt;/p&gt;

&lt;p&gt;Those eyes didn't look away. Didn't nod. Just watched.&lt;/p&gt;

&lt;p&gt;Derek walked on.&lt;/p&gt;

&lt;p&gt;A few steps later, it hit him: one of those pages on the table — the corner that was flipped open — had the MediSys logo on it. Deep blue. He knew it. Below it was an architecture diagram he didn't recognize but could guess.&lt;/p&gt;

&lt;p&gt;He stopped. Looked back.&lt;/p&gt;

&lt;p&gt;Through the glass, the seat was empty.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Derek's first instinct was to push the door open and walk in. But his feet wouldn't move. A Director-level architecture diagram on a stranger's table — he should be taking photos, chasing leads, calling Legal. He didn't. Because Morgan had just looked at him and asked, "Was it like this last time?" — and if tonight she heard he'd been standing outside a cafe while someone inside had MediSys architecture docs, she wasn't going to call it a coincidence.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;He stood there thinking for a long time. Long enough that if the person was still inside, they'd had time to walk out the back and be three blocks away.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He didn't chase. Maybe he'd look into it tomorrow. Maybe not. Tonight, he needed sleep — not another lead to follow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's &lt;a href="https://en.wikipedia.org/wiki/Thirty-Six_Stratagems#Besiege_Wei_to_rescue_Zhao" rel="noopener noreferrer"&gt;Besiege Wei to Rescue Zhao&lt;/a&gt; — when you can't beat their accuracy, you attack their benchmark.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 AI Post-Mortem Analysis
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[36 Stratagems Tactical Engine v3.1] Loaded
[Matching Target] Besiege Wei to Rescue Zhao
[Analysis Mode] Full-Table Situational Scan
━━━━━━━━━━━━━━━━━━━━
Tactical Match: 87.3%
Agent: Derek Shaw
Action: Abandoned head-to-head model accuracy competition.
        Exposed structural flaws in competitor's benchmark
        while rebuilding internal data pipeline.
Objective: AI-assisted diagnostic validation milestone
Outcome: 94.1% (target 95%) — competitive pressure neutralized

Parallel Path Analysis:
  - Pipeline path: Intervention on Day 17, 16 days late.
    Data refresh restored → accuracy jumped 92.7% → 94.1%.
  - Competitor path: Four-prefix anomaly detected Week 2.
    Comparative analysis deployed Day 18 → OmniDx retreated to LinkedIn defense.

Key Data Points:
  - OmniDx eval set: 3 data sources vs. MediSys 17.
    96.2% wasn't inflated — it was greenhouse-grown.
  - ETL pipeline: single-threaded 20h → 8-parallel 4h.
    Zero model code changed. Accuracy +1.4%.

Verdict: Tactically sound, split in execution.
Strategic judgment (competitor) and tactical execution (pipeline)
lagged behind each other by 16 days — two sides of the same person.
A smart person's greatest enemy isn't the competition.
It's their own ego.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;em&gt;Next stratagem: &lt;a href="https://en.wikipedia.org/wiki/Thirty-Six_Stratagems#Kill_with_a_borrowed_knife" rel="noopener noreferrer"&gt;Kill with a Borrowed Knife&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. English isn't my first language. I use AI to polish the writing and help with storycraft. Thanks for reading. &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;☕ Buy me a coffee&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>career</category>
      <category>programming</category>
    </item>
    <item>
      <title>VP of Nothing: The CEO's Nephew Took Over My AI Platform. The Client Walked Within a Month.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Sun, 28 Jun 2026 04:43:40 +0000</pubDate>
      <link>https://dev.to/xulingfeng/vp-of-nothing-the-ceos-nephew-took-over-my-ai-platform-the-client-walked-within-a-month-5dla</link>
      <guid>https://dev.to/xulingfeng/vp-of-nothing-the-ceos-nephew-took-over-my-ai-platform-the-client-walked-within-a-month-5dla</guid>
      <description>&lt;p&gt;&lt;strong&gt;Series:&lt;/strong&gt; AI, Ego &amp;amp; Regret — Bonus Chapter&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Editor's Note:&lt;/strong&gt; While compiling the old series for the book, I found this draft in the archives. It didn't fit the original lineup, but the story was too good to leave buried on a hard drive. I polished it up and decided to release it as a bonus chapter for the series. This is a work of fiction.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Act I · Dead Weight
&lt;/h2&gt;

&lt;p&gt;It took me six years to turn NovaTech's AI diagnostic platform from a Python script on a single server into a production system processing four million diagnostic requests a day. Every sensor pushed data upstream. The model didn't fire on every reading, but during peak hours it handled over a hundred per second. Our clients included three Fortune 500 manufacturing giants — the biggest of which was Merit Manufacturing — plus a medical device brand you've definitely used.&lt;/p&gt;

&lt;p&gt;Six years ago when I joined, the company had eighteen people and the CTO still wrote code. CEO Ryan Whitfield pounded the table at all-hands meetings and said, "We're building something that matters."&lt;/p&gt;

&lt;p&gt;Three years in, we moved into a Sunnyvale office with a courtyard. Ryan drank half a bottle of whiskey at the housewarming party, put his arm around me, and told an investor, "Tom's the reason this ship stays afloat."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three years after that, the ship told me I was dead weight.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The layoff came on a Wednesday at 11 AM. All-hands video call. Ryan sat in front of his dining room chandelier — the one that cost thirty grand — and explained that the company was going through a "strategic restructuring" to "streamline operations." Slide three had my name and the entire data platform team's — all twelve of us — sitting neatly inside a gray box.&lt;/p&gt;

&lt;p&gt;"The affected roles are being eliminated to reduce redundancy."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redundancy. I'd led this team for six years. Years of zero-incident operations. 99.97% recall rate. Redundancy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;HR's separation package landed in my inbox. Standard severance, an NDA, a one-year non-compete. I couldn't even work at the coffee shop downstairs, let alone a competitor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I noticed something when I signed. Buried in my non-compete was an exception clause: "Family members of executive leadership are exempt from non-disclosure obligations regarding internal restructuring communications." It was tucked into a footnote on page fourteen in type so small you'd miss it if you blinked.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't think much of it. I signed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act II · First Blood
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Two weeks later, LinkedIn filled in the blanks.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kevin Whitfield's new profile picture showed up in my feed. Westport University polo shirt, standing in front of the engineering school sign, teeth so white they looked AI-generated. His headline: "VP of Engineering at NovaTech | Builder | Thinker | Disruptor."&lt;/p&gt;

&lt;p&gt;I stared at that title for a solid twenty seconds. VP of Engineering. A guy who was writing course projects three months ago, holding the same title as my old boss.&lt;/p&gt;

&lt;p&gt;His first company-wide email went out the next afternoon — sent to all the inboxes I no longer had access to. Dmitri — one of the old-timers on the team — screenshotted it and sent it to me.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"I'm excited to announce that NovaTech is entering a new chapter. Our current platform architecture, while functional, carries significant technical debt. I'll be leading a modernization initiative to bring our stack into the next generation. Expect faster iterations, leaner deployments, and a more aggressive roadmap."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Technical debt. Faster iterations. Leaner deployments. Every word landed exactly on "I have no idea what this system does."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dmitri's caption under the screenshot: "He asked me today if the CI/CD pipeline was a new hire."&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Act III · The Model Swap
&lt;/h2&gt;

&lt;p&gt;Kevin's first move wasn't architectural. It was swapping the model.&lt;/p&gt;

&lt;p&gt;"Tom's model is six years old. It's good. But good isn't the standard anymore."&lt;/p&gt;

&lt;p&gt;That's what he said at his first tech all-hands. Dmitri relayed it to me in a flat voice — he was past the point of being surprised.&lt;/p&gt;

&lt;p&gt;Kevin's "modernization" meant replacing the diagnostic model I'd tuned for six years — &lt;strong&gt;99.97% recall with a 0.03% false positive rate&lt;/strong&gt; in industrial fault detection — with the newest closed-source LLM on the market. He claimed the LLM's "zero-shot reasoning capabilities" would expand coverage from 340 scenarios to "infinite."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;340 scenarios. Every single one went through three rounds of labeling, two rounds of cross-validation, and at least three hundred hours of production observation before we called it done. Kevin spent three days reading API docs and decided that was enough.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the demo, he pulled up the LLM's dashboard and told the six engineers in the room: "This thing reads the logs, understands the context, and outputs the diagnosis in natural language. No more manual rule tuning. No more feature engineering. &lt;strong&gt;No more bottlenecks.&lt;/strong&gt; "&lt;/p&gt;

&lt;p&gt;When he said "no more bottlenecks," he looked at Dmitri. Dmitri was our performance engineer. Kevin probably didn't know his name, but he knew that role belonged to "the previous era."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dmitri didn't push back in the meeting. He messaged me afterward: "He said the LLM benchmarked against our logs. I checked his test set. Textbook fault patterns. Not a single one from a real production line."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't reply. Because Kevin knew too — he picked a clean test set because clean test sets give clean results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act IV · Cascade Failure
&lt;/h2&gt;

&lt;p&gt;Counting from Kevin's first day, by day twelve the swap was done. Kevin oversaw the engineering team himself, cutting over the old inference pipeline to the LLM's API. The switch was smooth. Every endpoint returned 200. Every green light on the dashboard stayed on.&lt;/p&gt;

&lt;p&gt;Day fourteen, Merit Manufacturing's production line reported a fault.&lt;/p&gt;

&lt;p&gt;Nothing major — a sensor drift on a conveyor belt. The old model had handled this exact scenario over a thousand times. Diagnosis: "Sensor drift detected, recalibration window: 72 hours." Confidence score attached. Maintenance ticket generated. Total time: 170 milliseconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The new model read 170 pages of production logs, spent 4.3 seconds, and produced a fluent paragraph:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"The anomaly pattern suggests a potential degradation in the conveyor belt alignment mechanism, possibly due to cumulative thermal stress on the drive unit bearings. Recommend immediate full inspection of the drive train assembly."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Translation: "I have no idea what this is, but I can write something that sounds smart."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The maintenance crew stopped the line. Pulled the drive unit. Tore down the bearings. Four hours later, they found nothing. The supervisor wrote in the work order: "False alarm. No fault found."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That was the first one.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Day sixteen, the same LLM flagged a normally running compressor as "imminent bearing failure." Another teardown. Another four hours. Nothing. The supervisor changed NovaTech's diagnostic feed status to "high noise."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Day eighteen, it missed a real fault.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sensor readings clearly showed a hydraulic seal accelerating toward failure. The old model's rule engine would have triggered a "failure probability: 87% within 7 days" alert. The LLM read the same data and wrote: "The observed fluctuation patterns are within normal operational variance. No immediate action required."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thirty-two hours later, the seal blew. The line was down for nine hours. Each hour of downtime cost $130,000.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dmitri sent me the numbers — screenshots from the incident report, timeline and cost breakdown included. His caption: "Your rule engine caught faults for six years. Twenty days after they swapped it, it blew up in front of a client." I didn't reply. But I read every line.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Act V · The Ultimatum
&lt;/h2&gt;

&lt;p&gt;Merit Manufacturing's incident report didn't name NovaTech — &lt;strong&gt;Diana did them that favor&lt;/strong&gt;. But the customer success VP forwarded an email to the internal group chat, and Dmitri screenshotted it to me. Five sentences from Diana to Ryan, CC'd to four people: her CFO, her legal VP, NovaTech's customer success VP, and a legal department inbox.&lt;/p&gt;

&lt;p&gt;The third sentence: &lt;strong&gt;"We require the diagnostic model deployed prior to June to be restored within fourteen calendar days. If that is not possible, we will exercise our right under section 6.2 to terminate the agreement."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Translation: &lt;strong&gt;"Put Tom's model back. If you can't, we walk."&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Act VI · The Call
&lt;/h2&gt;

&lt;p&gt;Three days after that email, my phone buzzed at 11 PM. No area code in the caller ID, but I knew who it was.&lt;/p&gt;

&lt;p&gt;"Tom."&lt;/p&gt;

&lt;p&gt;"Diana."&lt;/p&gt;

&lt;p&gt;"Everything I'm about to say, my legal team would tell me not to say it."&lt;/p&gt;

&lt;p&gt;She paused. I heard keyboard clicks on the other end — she might have been closing windows, making room for what came next.&lt;/p&gt;

&lt;p&gt;"Merit Manufacturing is terminating the agreement with NovaTech. Section 6.2. Fourteen-day transition period."&lt;/p&gt;

&lt;p&gt;I didn't say anything.&lt;/p&gt;

&lt;p&gt;"We need someone to lead the transition. Not someone from NovaTech. &lt;strong&gt;You know this system better than any engineer still on their payroll.&lt;/strong&gt; Your non-compete has two holes in it. I had my lawyers check —" she paused, like she was making sure she hadn't crossed a line. "The family-member exception was posted on an anonymous forum. Same firm that drafted your contract — anyone who reads it can see what it means. And the non-compete itself has an out: client-initiated separations don't trigger it. Also —" another pause. "Kevin's reply was effectively a written refusal. We don't need to wait the full fourteen days."&lt;/p&gt;

&lt;p&gt;I leaned against the kitchen counter. My wife was reading to our daughter in the living room, voice soft, one word at a time. I listened to those syllables, holding the phone, feeling the world go quiet for two beats.&lt;/p&gt;

&lt;p&gt;"Tom?"&lt;/p&gt;

&lt;p&gt;"Send me the details," I said. My voice was calmer than I expected.&lt;/p&gt;

&lt;p&gt;Before she hung up, I asked: "Diana — did you go to NovaTech first?"&lt;/p&gt;

&lt;p&gt;A long silence. Then: &lt;strong&gt;"I did, Tom. Kevin told me the LLM is 'objectively superior' and suggested I wait for his explainability report. So I'm done waiting."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;She was waiting for Kevin to realize he was wrong. He never did.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Diana sent me the explainability report right after the call. Six pages of PDF. Every section tried to prove the two false alarms and the missed fault were "edge cases," "out-of-distribution inputs," "not representative of the model's true performance." There was a comparison table in the appendix showing 98.7% accuracy on his clean test set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;He told a complete lie using a dataset that never lied.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Act VII · The Exception
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The NDA's hidden exception clause was dug up by a former colleague and posted on an anonymous workplace forum. The thread title: "\"NovaTech's restructuring was literally designed for nepotism.\" Over four hundred replies.&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;One comment stuck with me. An anonymous account claiming to be a former NovaTech employee wrote: &lt;em&gt;"Ryan Whitfield once told me culture fit was the most important hiring criterion. Turns out what he meant was 'shares my last name.'"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I screenshotted it. Didn't save it. Then I closed the browser and opened the architecture docs for Merit Manufacturing's new project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A system isn't yours just because you didn't build it — that was Kevin's mistake. A system is yours forever because you did build it — that was Tom's fate.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;AI, Ego &amp;amp; Regret — Bonus Chapter. This is a work of fiction. Any resemblance to real events or persons is coincidental.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;📖 Stratagem 2 of the &lt;a href="https://dev.to/xulingfeng/series/41202"&gt;36 Stratagems series&lt;/a&gt; — coming soon.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. English isn't my first language. I use AI to polish the writing and smooth out the rough edges. Thanks for reading. &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;☕ Buy me a coffee&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>career</category>
      <category>programming</category>
    </item>
    <item>
      <title>Stratagems #1: Mark Johnson Walked Into an AI Audit. The Benchmark Had Everything Figured Out — Except the Truth.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Wed, 24 Jun 2026 15:58:57 +0000</pubDate>
      <link>https://dev.to/xulingfeng/stratagems-1-mark-johnson-walked-into-an-ai-audit-the-benchmark-had-everything-figured-out--adh</link>
      <guid>https://dev.to/xulingfeng/stratagems-1-mark-johnson-walked-into-an-ai-audit-the-benchmark-had-everything-figured-out--adh</guid>
      <description>&lt;p&gt;&lt;em&gt;Complete preparation breeds complacency. What is seen every day no longer raises suspicion. The hidden lies within the open — not opposed to it.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;— The 36 Stratagems, "&lt;a href="https://en.wikipedia.org/wiki/Thirty-Six_Stratagems#Deceive_the_heavens_to_cross_the_sea" rel="noopener noreferrer"&gt;Deceive the Heavens to Cross the Sea&lt;/a&gt;"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The phone rang at 3:50 AM.&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/my-company-packaged-12-years-of-my-experience-into-an-ai-skill-then-laid-me-off-when-it-crashed-4b3e"&gt;12 Years as a Prompt&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The company extracted his 12 years of infrastructure expertise into an AI Skill. 96.8% diagnostic accuracy. The CEO sent a company-wide email: &lt;strong&gt;"Twelve years of experience, now available as a Prompt."&lt;/strong&gt; Then they laid Mark off.&lt;/p&gt;

&lt;p&gt;Six weeks later, the company migrated from RabbitMQ to Kafka. Nobody re-ran the validation. The AI Skill executed its old 450ms retry logic — &lt;strong&gt;dead right for RabbitMQ, catastrophic for Kafka.&lt;/strong&gt; At 4:12 AM, the CTO called. He was offering &lt;strong&gt;five times Mark's old monthly salary&lt;/strong&gt; for a two-week onsite contract.&lt;/p&gt;

&lt;p&gt;Those two weeks ended. Mark didn't stop. Over the next two years, he took eleven contracts. The industry whisper network gave him a nickname: the one-man audit team. He got the calls nobody else got — companies burned by AI platforms, too embarrassed to say it out loud. Every contract made him more certain of one thing: &lt;strong&gt;nobody rushing AI into production had actually understood their own systems.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Then a message landed in his LinkedIn inbox. A partner at a VC firm.&lt;/p&gt;


&lt;h2&gt;
  
  
  $18M, Not $18K
&lt;/h2&gt;

&lt;p&gt;"The company is called Pulse AI." The partner's voice had that fast-clip rhythm investors always have. "AI testing platform. Series B. Legal DD cleared. Technical DD — we need someone who actually knows what they're looking at."&lt;/p&gt;

&lt;p&gt;"What are you suspicious of?"&lt;/p&gt;

&lt;p&gt;"Not suspicious of anything. The CEO's reports are gorgeous — papers published, benchmarks run, claiming 89% production defect detection. &lt;strong&gt;But we're writing a check for $18 million, not $18K.&lt;/strong&gt; Go take a look."&lt;/p&gt;

&lt;p&gt;Before hanging up, he added: "Their CTO is someone named Torres. Ring a bell?"&lt;/p&gt;

&lt;p&gt;Mark didn't answer. He hung up, then opened a browser tab he'd bookmarked two months ago — a technical architecture post by Pulse AI's CTO. He scrolled to the data pipeline topology diagram, zoomed in, and read the component naming convention: &lt;code&gt;/pulse/ingestion/{env}/{source}&lt;/code&gt;. &lt;strong&gt;Slash-delimited hierarchy, environment variable in the middle, source as the tail — the exact same naming convention he'd seen in the AI Skill pipeline docs at his old company before they let him go.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not a coincidence.&lt;/p&gt;

&lt;p&gt;He replied to the partner: &lt;strong&gt;"I'm in. Two weeks."&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  The Same Sticker
&lt;/h2&gt;

&lt;p&gt;Technical DD kicked off Tuesday morning.&lt;/p&gt;

&lt;p&gt;Pulse AI gave them a conference room on the fourth floor — no windows, one monitor, company values poster on the wall. CTO Torres was early forties, navy polo shirt, handshake like a vise. Mark had seen him before — four years ago, at an industry conference. Front row. Asked three questions after the talk. Mark noticed the blue sticker on Torres's laptop — identical to the IT asset tags the ops team at his old company all used. Mark set his backpack by his feet, pulled out a worn black travel mug from the side pocket, and placed it on the table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Same mug.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mark didn't say anything about knowing him. Mentally, he added the thread to his notes.&lt;/p&gt;

&lt;p&gt;Torres demoed Pulse Benchmark. Pipeline-style — code commit triggers tests, test results flow into the evaluation set, the evaluation set gets matched against the defect database, and a detection rate pops out. Clean. Closed-loop. Automated. Mark nodded along, but his brain was somewhere else: &lt;strong&gt;the sticker on Torres's laptop — palm rest, bottom right, 3 o'clock position — exact same placement as the one he'd seen four years ago.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not a coincidence anymore.&lt;/p&gt;


&lt;h2&gt;
  
  
  Wait for the Crack
&lt;/h2&gt;

&lt;p&gt;Mark asked the question everyone should have asked: "Can I see the evaluation set data?"&lt;/p&gt;

&lt;p&gt;"Sure." Torres answered fast. "I'll have someone prep it."&lt;/p&gt;

&lt;p&gt;Then he waited. All day. Wednesday morning Torres emailed: "Data prep in progress, this afternoon." Afternoon became evening. Mark didn't push — &lt;strong&gt;he wasn't waiting for the data. He was waiting for the cracks that show up when someone's scrambling to clean things up.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Wednesday, 3 PM. The evaluation set arrived. 1,247 defect samples. Each one a JSON — source code snippet, stack trace, error severity classification, reproduction steps. At a distance, it looked gorgeous. Clean enough to frame.&lt;/p&gt;

&lt;p&gt;Every JSON had an extra line in the metadata: &lt;em&gt;processed_by: Apex-Lens-Cleaner v1.0.0&lt;/em&gt;. Cleaner tools in data prep pipelines are normal. But Mark stared at that module name for two extra seconds. &lt;strong&gt;Never seen it before. Filed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He went through them one by one. At sample 30, he stopped.&lt;/p&gt;

&lt;p&gt;A Java null pointer exception. Stack trace pointed precisely to a service-layer method, parameter was null. Reproduction steps: &lt;em&gt;"Call foo(null) to trigger."&lt;/em&gt; &lt;strong&gt;Too textbook.&lt;/strong&gt; Every NPE Mark had ever seen in production had at least seven or eight layers of stack trace — log threads, GC pauses, truncated variable values mixed in. This one had four layers. Zero noise.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Production data is dirty. This data isn't.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;He wrote a keyword search script and cross-referenced the evaluation set against three open-source defect databases. Ran it twice. Same result both times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;44 exact matches against public defect databases — source snippets, stack trace line numbers, exception types, line for line. 54 with clear signs of handcrafted construction — code structures that were "too clean," nowhere to be found in any public repo, but the engineering fingerprint was too heavy to hide.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He color-coded the samples: red (exact match with public DB), yellow (suspected handcrafted), green (undetermined).&lt;/p&gt;

&lt;p&gt;Red plus yellow: 98. Out of 1,247. 7.9%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's enough to call it. This isn't about the percentage. It's about what the percentage means.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He closed his laptop and walked to the break room for water. Passing Torres's desk, Torres was on a call — voice low, but a single line drifted out just as Mark walked by: &lt;strong&gt;"…I don't care how. Just get the number where it needs to be before the board."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Where does he need the number to be?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Mark walked back to the conference room, water glass in hand. In his memo, he wrote a line only he would understand: &lt;em&gt;"44+54. Torres wants a number, not the truth."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;He wasn't in a hurry. A cat doesn't rush when it knows the mouse can't get away.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Questioning the PhDs
&lt;/h2&gt;

&lt;p&gt;The next morning, Mark scheduled a meeting with the evaluation team.&lt;/p&gt;

&lt;p&gt;Three people. All PhDs. All visibly proud of their work. No PM, no Torres. Mark asked one question: &lt;strong&gt;"How many rounds of edits does a defect go through, on average, from production report to sample?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The team lead hesitated. "…Two or three, usually. Mostly noise removal. Format standardization."&lt;/p&gt;

&lt;p&gt;He stopped, looked at Mark. "Is that a problem?" — with that defensive edge PhDs get when an outsider questions their domain. Mark's stomach dropped: &lt;strong&gt;this PhD wasn't even sure himself whether the provenance of those samples was clean.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Noise removal." Mark repeated the phrase in his head. &lt;strong&gt;A 48-line null pointer exception, after "noise removal," becomes 12 lines. What got deleted wasn't noise. It was the fingerprints of real production data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He didn't call it out in the meeting. That wasn't his style.&lt;/p&gt;

&lt;p&gt;After the meeting, Mark pretended to take a wrong turn — Pulse had an open floor plan, ops team sat deepest in. Walking past, he scanned the desks — keyboard tray angle, acrylic badge holder placement — &lt;strong&gt;identical to his old company.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not a coincidence anymore. It was habit. &lt;strong&gt;You can't pick up an entire system architecture and move it. But you can pick up someone else's naming conventions.&lt;/strong&gt; Mark went back to the conference room and added a line to his notes: &lt;em&gt;Didn't build from scratch. Transplanted.&lt;/em&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Not a Coincidence. A Copy.
&lt;/h2&gt;

&lt;p&gt;That night, Mark did something the VC wasn't paying him to do.&lt;/p&gt;

&lt;p&gt;He reverse-engineered the data pipeline topology from Pulse Benchmark's paper. The paper claimed the evaluation set came from "real production defect reports." But where was the raw data stored? Who wrote the collection scripts? None of that information was there. But the third paragraph of the Methodology section had one sentence: &lt;em&gt;"The evaluation set was manually curated from production defect reports over a 14-month period."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Mark placed that sentence next to the AI Skill training set documentation he'd saved before he got laid off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not "similar." Verbatim.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Same people wrote it. Or the same people's documents were open on a Pulse desktop.&lt;/p&gt;

&lt;p&gt;He wrote a note. Five lines:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark evaluation set:&lt;/strong&gt; 44 exact matches against public DBs + 54 handcrafted. 7.9% hard evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data pipeline naming:&lt;/strong&gt; &lt;code&gt;/pulse/ingestion/{env}/{source}&lt;/code&gt; — identical to old company's AI Skill pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workspace standards + asset tags:&lt;/strong&gt; Transplanted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Paper language:&lt;/strong&gt; "manually curated from production defect reports" — verbatim copy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apex-Lens-Cleaner v1.0.0:&lt;/strong&gt; Processed the evaluation set. But this name doesn't appear anywhere in Pulse's public architecture. &lt;em&gt;A module that doesn't officially exist is running.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;He didn't put those notes in the email. The VC hadn't asked for that.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He wrote the partner one paragraph:&lt;/p&gt;

&lt;p&gt;"Pulse Benchmark evaluation set: 1,247 defect samples. Minimum 44 are exact matches against public defect databases — line-for-line identical. An additional 54 show clear signs of manual fabrication. &lt;strong&gt;Combined: 98 samples, 7.9%.&lt;/strong&gt; Recommendation: re-run the Benchmark after removing all samples overlapping with public databases."&lt;/p&gt;

&lt;p&gt;He hit send. Closed the laptop. The partner didn't wait until morning. The email was forwarded to Pulse's CEO's inbox at 1 AM.&lt;/p&gt;

&lt;p&gt;Those 44 were just the tip of the iceberg — run fuzzy matching, variable renaming, control-flow equivalent transforms, and the number you'd catch would at least triple. He didn't put that in the email. &lt;strong&gt;Let your opponent wonder what else you're holding. It works better than showing all your cards.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The CEO forwarded it to Torres. The phone rang at 3:50 AM.&lt;/p&gt;


&lt;h2&gt;
  
  
  Three Seconds
&lt;/h2&gt;

&lt;p&gt;Mark glanced at the caller ID and picked up.&lt;/p&gt;

&lt;p&gt;Torres didn't speak. The silence lasted about ten seconds.&lt;/p&gt;

&lt;p&gt;Then he spoke. Quiet. No preamble. &lt;strong&gt;"The CEO wants me to get the Benchmark number to 95% before Series C. I have six months. Those 44 — my evaluation team pulled them from public databases. GitHub Issues, Stack Overflow, CVE database. The 54 — couldn't find suitable replacements, so we wrote them ourselves."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A long pause.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I'm not trying to steal VC money. I'm trying to get the number there first, raise the round, then spend a year building a real production pipeline."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;He'd rehearsed that line.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mark didn't engage. He asked a question Torres hadn't prepared for: "Who built your ingestion pipeline?"&lt;/p&gt;

&lt;p&gt;Silence on the other end. One second. Two. Three.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The length of the silence was the answer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"…How do you know about that?" Torres's voice had changed.&lt;/p&gt;

&lt;p&gt;"I won't dig into it," Mark said. "But I've seen &lt;code&gt;/pulse/ingestion/{env}/{source}&lt;/code&gt; before. Before I left my old company, the AI Skill pipeline was called &lt;code&gt;/knowledge/ingestion/{env}/{source}&lt;/code&gt;. &lt;strong&gt;You were there.&lt;/strong&gt; "&lt;/p&gt;

&lt;p&gt;Torres didn't answer.&lt;/p&gt;

&lt;p&gt;Mark waited five seconds. &lt;strong&gt;Then he hung up.&lt;/strong&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Just a Ticket
&lt;/h2&gt;

&lt;p&gt;He didn't have direct proof the two pipelines were built by the same person. But Torres hadn't denied it. &lt;strong&gt;On a 3:50 AM phone call, staring at a data pipeline name he had no business recognizing — silence is confirmation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He opened his laptop and added a new line below those five notes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Torres didn't deny it. The silence is the answer.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Still dark outside. The VC partner sent a reply — 3:51 AM. &lt;strong&gt;"Keep going."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mark shut his laptop.&lt;/p&gt;

&lt;p&gt;He didn't tell the partner the real reason he'd taken this job. It wasn't the VC's money. It was what he'd known the moment he saw that architecture post two months ago — that pipeline naming convention was a signature he'd recognize anywhere. &lt;strong&gt;Caleb's handwriting.&lt;/strong&gt; Slash hierarchy, env-in-the-middle, source-suffix — &lt;strong&gt;the mark left by the engineer who sat across from him for three months, the one who always asked "Why 450 milliseconds?" You can't teach it. And you can't forget it.&lt;/strong&gt; Caleb had vanished after the big layoffs. LinkedIn silent. GitHub frozen. Like he'd evaporated from the industry. And then his naming convention showed up in the data pipeline of a company called Pulse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The VC's DD was just a way in.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He picked up the worn black travel mug from the table — the one he'd refilled three times during the meeting. Last time he'd been walked out of an office, he had nothing in his hands. This time, he had a thread. A line he could follow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is &lt;a href="https://en.wikipedia.org/wiki/Thirty-Six_Stratagems#Deceive_the_heavens_to_cross_the_sea" rel="noopener noreferrer"&gt;deceiving the heavens to cross the sea&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;🤖 AI Post-Match Analysis&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[36 Stratagems Database v3.1] Loaded
[Match Target] Deceive the Heavens to Cross the Sea
[Analysis Mode] Full-spectrum scan
━━━━━━━━━━━━━━━━━━━━
Tactic Match: 92.3%
Subject: Mark Johnson
Action: Covert audit objective disguised as routine technical DD workflow
Target: Torres / Pulse AI
Outcome: Achieved

Counter-Detection:
  - Torres team: Concurrent application of same stratagem
  - Forged samples: 98/1,247 (7.9%) processed through Apex-Lens-Cleaner pipeline
  - Source: 44 from public DBs + 54 handcrafted

Situation Assessment:
  - Mark Johnson: 12yr experience → AI Skill → redeployed as AI Kill. Legitimate identity, offensive audit.
  - Torres team: Defective data embedded in evaluation set. Defensive concealment.

Verdict: Tactic-neutral. Legitimacy determined by deployment vector. Defensive concealment ≠ offensive exposure.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;em&gt;Next stratagem: &lt;a href="https://en.wikipedia.org/wiki/Thirty-Six_Stratagems#Besiege_Wei_to_rescue_Zhao" rel="noopener noreferrer"&gt;Besiege Wei to Rescue Zhao&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. English isn't my first language. I use AI to polish the writing and help with storycraft. Thanks for reading. &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;☕ Buy me a coffee&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>programming</category>
      <category>career</category>
    </item>
    <item>
      <title>Series Teaser — 6 People, 36 Stratagems, and an AI Rabbit Hole That Keeps Getting Deeper</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Tue, 23 Jun 2026 17:38:03 +0000</pubDate>
      <link>https://dev.to/xulingfeng/series-teaser-6-people-36-stratagems-and-an-ai-rabbit-hole-that-keeps-getting-deeper-3fnm</link>
      <guid>https://dev.to/xulingfeng/series-teaser-6-people-36-stratagems-and-an-ai-rabbit-hole-that-keeps-getting-deeper-3fnm</guid>
      <description>&lt;h2&gt;
  
  
  What Are the 36 Stratagems?
&lt;/h2&gt;

&lt;p&gt;If you've heard of &lt;a href="https://en.wikipedia.org/wiki/The_Art_of_War" rel="noopener noreferrer"&gt;&lt;em&gt;The Art of War&lt;/em&gt;&lt;/a&gt;, think of the &lt;a href="https://en.wikipedia.org/wiki/Thirty-Six_Stratagems" rel="noopener noreferrer"&gt;&lt;em&gt;Thirty-Six Stratagems&lt;/em&gt;&lt;/a&gt; as its scrappy younger cousin. Sun Tzu wrote about macro strategy — when to fight, when to retreat. The 36 Stratagems are &lt;em&gt;tactics&lt;/em&gt;. Specific moves. The kind of play you can see on the board.&lt;/p&gt;

&lt;p&gt;A few examples: &lt;em&gt;Deceive the Heavens to Cross the Sea&lt;/em&gt; — hide your real intent behind routine actions. &lt;em&gt;Besiege Wei to Rescue Zhao&lt;/em&gt; — attack where your enemy is forced to defend. &lt;em&gt;Kill with a Borrowed Knife&lt;/em&gt; — use someone else's strength to remove a threat.&lt;/p&gt;

&lt;p&gt;They were written for warfare. They work just as well in office politics, courtroom arguments — and, as it turns out, AI projects that crash in spectacular ways.&lt;/p&gt;

&lt;p&gt;Both books are on Amazon. I'm not linking them — makes me look like an affiliate marketer, and they're not paying me. 😄&lt;/p&gt;




&lt;h2&gt;
  
  
  The 6 People
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/my-company-packaged-12-years-of-my-experience-into-an-ai-skill-then-laid-me-off-when-it-crashed-4b3e"&gt;Mark Johnson&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;12 years of ops experience was packaged into an AI Skill. Then he was laid off. The CEO's memo: "12 years of experience, now available as a prompt."&lt;/p&gt;

&lt;p&gt;Months later, the AI crashed — the company migrated from RabbitMQ to Kafka, but nobody re-validated the Skill. It was still running the old RabbitMQ retry logic. At 4:12 AM, the CTO called him back at 5x his old monthly rate to fix it. He didn't stay. Signed a two-week contract, fixed it, walked out.&lt;/p&gt;

&lt;p&gt;His one-person consultancy is still running. One clause: no AI in the delivery chain.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Temperament:&lt;/em&gt; Strategist. Thinks three moves ahead. Leaves hidden failsafes in routine operations — like a note in a migration doc saying "450ms is RabbitMQ GC window only. Do not use in other environments."&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/our-cto-built-an-ai-gateway-processing-28b-it-took-me-8-months-to-prove-it-would-approve-illegal-235l"&gt;P Anonymous&lt;/a&gt;&lt;/strong&gt; (&lt;a href="https://dev.to/xulingfeng/a-company-ai-flagged-my-article-as-low-quality-i-ran-the-numbers-then-i-ran-again-1h0p"&gt;second story&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Yeah, I got to the third story before realizing I never gave them a name. So P it is.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An AI payment gateway processing $2.8B in transactions. The CTO used formal verification to prove it was "mathematically safe." P spent eight months secretly building an adversarial testing pipeline to prove the gateway would approve illegal transactions.&lt;/p&gt;

&lt;p&gt;Later, P's own article got flagged as "low quality" by an AI moderation system. P pulled the internal API, grabbed 347 flagged records — the effective accuracy was 38%. False positives outnumbered correct ones.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Temperament:&lt;/em&gt; Calm. Data-driven. If you're sitting across from P with a fabricated benchmark, you should be nervous — but P won't give you any signals until the evidence is ready to land.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/i-built-open-source-ai-our-new-cto-spent-8m-on-his-old-companys-product-and-fired-my-team-two-3jp8"&gt;Leo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open-source AI pilot ran for six months. 91.3% accuracy. Zero incidents. The new CTO spent $8M on his old company's product and fired Leo's entire team of six. On day 13, at 2:47 AM, the $8M system collapsed — six GPUs OOM, three AI agents fighting over context windows. Leo walked back to the server room, brought the old system online. 30 seconds to recovery.&lt;/p&gt;

&lt;p&gt;The CEO called at 3 AM: full team rehire, double salary, CTO position. The CEO's exact words — "Your team. Your budget. Your technical direction."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Temperament:&lt;/em&gt; High energy. Technically confident. Doesn't play politics — not because he can't. He genuinely believes good code wins in the end. (He's learning the hard way that it doesn't always.)&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/our-competitor-had-an-ai-that-covered-972-we-had-a-spreadsheet-and-a-fake-quote-guess-who-won-5cc3"&gt;Lena&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;VeriTest had 15 people. Competitor QualiGuard had 200, claiming 97.2% AI coverage. Lena didn't look at the coverage number — she looked at the test logs. 3 hours 12 minutes. Zero external dependencies. The competitor was running a sandbox.&lt;/p&gt;

&lt;p&gt;She paid $5,000 to a former employee for an internal recording. The VP and the test director had budget friction. She fabricated a $1.3M fake quote, leaked it through an intermediary. The VP pressured the director to cut costs. The director swapped full regression for sampling + AI prediction. Real coverage: 86.7%. Lena won.&lt;/p&gt;

&lt;p&gt;In the parking lot, the director stopped her. "You planted that quote." Lena's reply: "I never said I'd fight clean."&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Temperament:&lt;/em&gt; Sharp. Patient. Not afraid to get her hands dirty. Find someone's pressure point, push, and wait. Or buy a recording. Or plant a fake quote. Either way, the board's in her hand.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/everyone-cheered-the-vector-database-demo-i-quietly-kept-my-sqlite-backup-running-2-am-called-it-332i"&gt;Alex&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Backend engineer. He writes things nobody asked him to write.&lt;/p&gt;

&lt;p&gt;Marcus Webb demoed a shiny vector database at the all-hands. He glanced at Alex's screen and smirked: "SQLite? For AI memory? Like putting a bicycle on an F1 track."&lt;/p&gt;

&lt;p&gt;Over the next six weeks, Alex quietly built a SQLite FTS5 fallback system. Told nobody. At 2:14 AM, the vector DB died — CUDA deadlock. Marcus said it'd take two to three days to fix. The board demo was in seven hours.&lt;/p&gt;

&lt;p&gt;Alex typed in Slack: "I have a parallel fallback running. Give me five minutes."&lt;/p&gt;

&lt;p&gt;2 minutes 43 seconds. Switched over. First query: 2 milliseconds.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Temperament:&lt;/em&gt; Quiet. Paranoid. Doesn't argue when mocked — waits until 2 AM when the system goes down, then cuts over.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/our-competitor-had-an-ai-that-covered-972-we-had-a-spreadsheet-and-a-fake-quote-guess-who-won-5cc3"&gt;Derek Shaw&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;QualiGuard's Director of Testing. He walked into the bid review certain he'd already won — 97.2% AI coverage in a sandbox, 4-day delivery, no holes. He didn't know the VP had already cut his GPU budget in half. He didn't know the other side already had the tape of him and the VP at odds.&lt;/p&gt;

&lt;p&gt;In the review, Sarah stood up and said three sentences — "The delivery timeline is based on Derek's personal technical assessment, not reviewed by management… If QualiGuard is selected, the revised proposal will be submitted under new technical leadership." Derek sat through all of it. Didn't say a word.&lt;/p&gt;

&lt;p&gt;Then he updated his LinkedIn. New title: Director of Testing, medical IT.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Temperament:&lt;/em&gt; Arrogant. Stabbed in the back by his own side in a game he thought was already won.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Framework
&lt;/h2&gt;

&lt;p&gt;Over 36 stories, these six paths cross — some face off, some work together, a few are being moved by forces no one can see yet. And sometimes, they're just solving their own problems, while the reader sees it all before any of them do.&lt;/p&gt;

&lt;p&gt;See you in the first story of &lt;strong&gt;AI, Ego &amp;amp; the 36 Stratagems&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. English isn't my first language. I use AI to polish the writing and help with storycraft. Thanks for reading. &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;☕ Buy me a coffee&lt;/a&gt;&lt;/em&gt; &lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>career</category>
      <category>programming</category>
    </item>
    <item>
      <title>15 AI Stories Later, Some Honest Words</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Sun, 21 Jun 2026 13:21:56 +0000</pubDate>
      <link>https://dev.to/xulingfeng/15-ai-stories-later-some-honest-words-o9j</link>
      <guid>https://dev.to/xulingfeng/15-ai-stories-later-some-honest-words-o9j</guid>
      <description>&lt;p&gt;&lt;strong&gt;May 29&lt;/strong&gt; I wrote my first AI trainwreck story. &lt;strong&gt;June 18&lt;/strong&gt; I finished #15.&lt;/p&gt;

&lt;p&gt;People keep asking if this was some kind of "writing experiment" — it wasn't. I'm not that poetic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The truth is, nobody was reading what I wrote before.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Scroll back to my early posts: AI agent setup notes, QA automation on a budget, a couple of technical deep-dives. Zero to one reaction each. Comments section was a ghost town. I took a real look around Dev.to — the front page is packed with tutorials and technical walkthroughs, but story-driven content? Barely any.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The old road was a dead end. Time to find a new one.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The results: worst-performing story got 7 reactions, best one got 86. Across 15 stories: 445 reactions, 251 comments.&lt;/p&gt;

&lt;p&gt;Wrong direction, corrected.&lt;/p&gt;

&lt;p&gt;The series wasn't planned from the start. I just started linking to my other articles at the end of each post so people could hop between them. Then I found out Dev.to has a series feature, and the name followed.&lt;/p&gt;

&lt;p&gt;I'm a fan of &lt;em&gt;Love, Death &amp;amp; Robots&lt;/em&gt;, so the structure came pretty naturally — &lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/series/40565"&gt;AI, Ego &amp;amp; Regret&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Around story #8 or #9, a thought snuck in: &lt;em&gt;could this actually become a book?&lt;/em&gt; That's when I set a target of 15 stories.&lt;/p&gt;

&lt;p&gt;Wasn't sure I'd make it. Decided to write first, figure it out later.&lt;/p&gt;

&lt;p&gt;I also realized the covers needed to be consistent. So I went back and redesigned all 15 covers with the same chessboard background and layout. Now it actually looks like a series.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Favorite One Did the Worst
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/our-cto-built-an-ai-gateway-processing-28b-it-took-me-8-months-to-prove-it-would-approve-illegal-235l"&gt;#6&lt;/a&gt;&lt;/strong&gt; — the CTO's 2.8B gateway story. Both sides were smart, the back-and-forth was real, no dumbed-down villain. This was the one I was most proud of. 9 reactions, 2 comments.&lt;/p&gt;

&lt;p&gt;Some stories I thought were just "okay" did way better. &lt;strong&gt;Couldn't help but laugh at that.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The biggest contrast? &lt;a href="https://dev.to/xulingfeng/my-company-packaged-12-years-of-my-experience-into-an-ai-skill-then-laid-me-off-when-it-crashed-4b3e"&gt;#10&lt;/a&gt;&lt;/strong&gt; — the AI Skill story. It's currently my highest-viewed article, and I still can't quite believe it took off the way it did.&lt;/p&gt;

&lt;p&gt;Looking back, I think it hit a nerve that goes deeper than tech: &lt;strong&gt;"Your experience gets packaged into a Skill, then you're not needed anymore."&lt;/strong&gt; That's the sword hanging over everyone's head right now. Not just QA — every engineer.&lt;/p&gt;

&lt;p&gt;And &lt;a href="https://dev.to/xulingfeng/our-cto-built-an-ai-gateway-processing-28b-it-took-me-8-months-to-prove-it-would-approve-illegal-235l"&gt;#6&lt;/a&gt; — good story, but not scary. It didn't sting. &lt;strong&gt;Readers don't want "both sides played well" — they want the villain publicly humiliated.&lt;/strong&gt; That rarely happens in real life. Adults save face for each other. But in a story, readers want someone to do what they can't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Numbers in Headlines
&lt;/h2&gt;

&lt;p&gt;I'll be honest — I don't want to be a clickbait writer. But if the headline doesn't grab you, it doesn't matter how good the article is.&lt;/p&gt;

&lt;p&gt;The early stories had a clear pattern: &lt;strong&gt;big dollar amounts.&lt;/strong&gt; $500K, $660K, $2.8B, $1.4M, $470K. They worked. Articles with numbers in the title clearly got more clicks.&lt;/p&gt;

&lt;p&gt;But the later stories showed me something interesting. &lt;a href="https://dev.to/xulingfeng/my-company-packaged-12-years-of-my-experience-into-an-ai-skill-then-laid-me-off-when-it-crashed-4b3e"&gt;#10&lt;/a&gt; had no dollar amount in the title — and it's my most-viewed article. &lt;a href="https://dev.to/xulingfeng/a-company-ai-flagged-my-article-as-low-quality-i-ran-the-numbers-then-i-ran-again-1h0p"&gt;#14&lt;/a&gt; — no number, 55 reactions. #11 — no number, 26 reactions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A number gets them in the door. What keeps them reading is something else entirely.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The One That Was Different
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/a-company-ai-flagged-my-article-as-low-quality-i-ran-the-numbers-then-i-ran-again-1h0p"&gt;#14&lt;/a&gt;&lt;/strong&gt; — the one about AI flagging my article — was the most unusual one in the series.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not because of the numbers. Because the article itself solved a problem.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the time, my account had been flagged by the community. New articles were only visible to my followers — they wouldn't show up on the feed.&lt;/p&gt;

&lt;p&gt;What pissed me off wasn't the flag itself. I'd been using AI tools, and I'd already gone through all my articles adding AI disclosure where needed. &lt;strong&gt;But nobody told me my account was flagged.&lt;/strong&gt; I published an article I was excited about, and half a day later only a few dozen people had seen it. I had this sinking feeling something was wrong.&lt;/p&gt;

&lt;p&gt;I hopped into a thread by &lt;strong&gt;Dannwaneri&lt;/strong&gt; (&lt;a href="https://dev.to/dannwaneri"&gt;@dannwaneri&lt;/a&gt;) — his post about getting flagged by Sloan was one of the earliest discussions about AI detection on the platform. While we were talking, a community moderator, &lt;strong&gt;Francis&lt;/strong&gt; (&lt;a href="https://dev.to/francistrdev"&gt;@francistrdev&lt;/a&gt;), chimed in and told me: yeah, your account's flagged.&lt;/p&gt;

&lt;p&gt;The irony? I'd already published the article about AI flagging before that conversation. It wasn't on the feed either.&lt;/p&gt;

&lt;p&gt;I wrote that article while my account was still restricted.&lt;/p&gt;

&lt;p&gt;Later I went to Francis's Q&amp;amp;A thread and asked three things — how many times does Sloan flag before action, do you get notified, and what do I do when my articles can't be found? He answered every single one. &lt;strong&gt;Not an earth-shattering change, but someone actually listened.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sometimes an article isn't just an article. It's a wrench in the toolbox.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  30 Passes
&lt;/h2&gt;

&lt;p&gt;15 stories. Each one went through at least 30 rounds of revision before I'd hit publish.&lt;/p&gt;

&lt;p&gt;Logic had to hold up. Numbers had to be precise. No contradictions between chapters. &lt;/p&gt;

&lt;p&gt;If it looks fake, I don't publish it. I owe that much to myself and to anyone who reads it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Not a Single Cup of Coffee
&lt;/h2&gt;

&lt;p&gt;Every single article has a "buy me a coffee" link. All 15 of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero dollars. Not one.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I don't know if the link's too hidden or if trainwreck-story writers just aren't the type people buy coffee for 😂&lt;/p&gt;

&lt;p&gt;But here's the thing: &lt;strong&gt;I don't actually care.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The views, the comments, the reactions — that's what actually makes my day. Someone writing "this was great" in the comments. A DM saying a particular paragraph hit close to home. That's worth more than coffee money.&lt;/p&gt;

&lt;p&gt;The link's still there, though. Just in case.&lt;/p&gt;




&lt;h2&gt;
  
  
  So What Was I Even Trying to Say?
&lt;/h2&gt;

&lt;p&gt;15 stories in, I sat back and asked myself — what was the point?&lt;/p&gt;

&lt;p&gt;Was it for the attention? Sure, at first. Nobody wants to write into a void.&lt;/p&gt;

&lt;p&gt;Was it for the coffee money? 15 articles, zero dollars — that math doesn't work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Or is it because my own job is getting replaced by AI?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I've been in QA for 15 years. AI testing tools, automation coverage, smart diagnostics — the trainwrecks I wrote about, some I've seen firsthand, some I worry I'll run into soon.&lt;/p&gt;

&lt;p&gt;Maybe I was writing these stories to amplify that anxiety. Not just for others, but for myself. A reminder: systems crash, people deflect blame, the 3 AM phone always rings — &lt;strong&gt;but the person who picks up that call is worth more than the one who only knows how to run a report.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every trainwreck story was me telling myself: &lt;strong&gt;Your experience cannot be packaged into a Skill.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  One Comment I Read 15 Times
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Brilliant - I love reading a "David vs Goliath"/"underdog" kind of story like this! Best thing - he didn't panic, and he didn't try to get his "revenge" either (well the revenge was there of course, but purely through his results) ... epic story!&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;— &lt;strong&gt;leob&lt;/strong&gt;, on the very first story of the series.&lt;a class="mentioned-user" href="https://dev.to/leob"&gt;@leob&lt;/a&gt;, if you're reading this — thank you.&lt;/p&gt;

&lt;p&gt;That's not empty politeness. &lt;strong&gt;Before writing every new story, I go back and re-read this comment.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not because it was the first nice thing someone said. Because it nails what I was trying to do with every single one of these stories — &lt;strong&gt;the protagonist didn't panic, didn't fight back with words, just let the results speak.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All 15 stories, same skeleton, different skin. When I lose the feel for it, I read this comment and know I'm not off track.&lt;/p&gt;

&lt;p&gt;Then there's &lt;strong&gt;Daniel Balcarek&lt;/strong&gt; (&lt;a href="https://dev.to/gramli"&gt;@gramli&lt;/a&gt;), a backend engineer from Czech, who left this on &lt;a href="https://dev.to/xulingfeng/the-ai-test-report-said-973-coverage-the-clients-lead-engineer-asked-one-question-the-room-1cpi"&gt;#2&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;I really like your writing style. I just read the second story and couldn't stop reading. 😅 Great job!&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I've gone back to that one too. "Couldn't stop reading" — three words, and that's all I needed. &lt;strong&gt;The best thing a storyteller can hear isn't "your data checks out" or "good technical approach." It's "I couldn't stop."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are more. &lt;strong&gt;&lt;a href="https://dev.to/itskondrat"&gt;@itskondrat&lt;/a&gt; (Mykola Kondratiuk)&lt;/strong&gt; on &lt;a href="https://dev.to/xulingfeng/my-company-bought-a-660k-ai-platform-i-was-replaced-on-friday-at-258-am-it-fixed-everything-3kc4"&gt;#3&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;what's almost never in a \$660k deployment: explicit constraint on blast radius at that hour. not in the budget, not in the runbook.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;One sentence that said the whole thing better than I ever could.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And on &lt;a href="https://dev.to/xulingfeng/my-company-packaged-12-years-of-my-experience-into-an-ai-skill-then-laid-me-off-when-it-crashed-4b3e"&gt;#10&lt;/a&gt;, two people called me out — &lt;strong&gt;Utkarsh Bansal&lt;/strong&gt; (&lt;a href="https://dev.to/utkarshbansal01"&gt;@utkarshbansal01&lt;/a&gt;) said it read like AI wrote it, and &lt;strong&gt;framemuse&lt;/strong&gt; (&lt;a href="https://dev.to/framemuse"&gt;@framemuse&lt;/a&gt;) said it lacked a human touch. I replied to both. One I agreed with, the other I admitted my timeline was messy. &lt;strong&gt;Getting called out means someone actually read carefully enough to question it — and engaging with that is way more interesting than ignoring it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is what I get paid in. Not coffee. People reading carefully enough to write something back.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Real Are These Stories?
&lt;/h2&gt;

&lt;p&gt;15 trainwreck stories. They're not my own experiences, but they're not entirely made up either.&lt;/p&gt;

&lt;p&gt;I've been in QA for 15 years. How systems crash, how people deflect blame, what a 3 AM phone call sounds like — those are real. I took all of that, broke it down, and reshaped it into characters and situations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The stories are fiction. The failure modes are not.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://dev.to/xulingfeng/series/41202"&gt;36 stratagems&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Book?
&lt;/h2&gt;

&lt;p&gt;Around story #8 or #9, I realized this might be more than 15 articles. It could be a book.&lt;/p&gt;

&lt;p&gt;The plan is to polish all 15 stories, add some new material, and put them together.&lt;/p&gt;

&lt;p&gt;But a book is missing one thing: the Praise page.&lt;/p&gt;

&lt;p&gt;I don't want to write my own blurb. &lt;strong&gt;I want to ask people who actually read the series to write it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you read any of these stories — if one made you laugh, or reminded you of something you've been through, or got quoted in a meeting — would you write a sentence or two for the Praise page?&lt;/p&gt;

&lt;p&gt;Just a couple of lines. I'll pick a few and put them at the front of the book.&lt;/p&gt;

&lt;p&gt;Drop a comment or reach me through my profile email. I'll follow up.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;15 trainwreck stories. 445 reactions. 251 comments. 20 days.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;See you in the &lt;a href="https://dev.to/xulingfeng/series/41202"&gt;next series&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you think the series is worth a &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;coffee: ☕&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;P.S. English isn't my first language. I use AI to polish the writing and help with storycraft. Thanks for reading.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>programming</category>
      <category>career</category>
    </item>
    <item>
      <title>Our Competitor Had an AI That Covered 97.2%. We Had a Spreadsheet and a Fake Quote. Guess Who Won.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Thu, 18 Jun 2026 15:36:12 +0000</pubDate>
      <link>https://dev.to/xulingfeng/our-competitor-had-an-ai-that-covered-972-we-had-a-spreadsheet-and-a-fake-quote-guess-who-won-5cc3</link>
      <guid>https://dev.to/xulingfeng/our-competitor-had-an-ai-that-covered-972-we-had-a-spreadsheet-and-a-fake-quote-guess-who-won-5cc3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;You walk into the RFP briefing. Your competitor has 200 people, 97% AI coverage, and a 4-day delivery promise. You have 15 people and a proposal you haven't even finished writing.&lt;/p&gt;

&lt;p&gt;Do you bet on better tech, or on understanding people better — and playing dirtier when you have to?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This story is your answer.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Act I · The Crack
&lt;/h2&gt;

&lt;p&gt;When Finova's RFP landed, everyone in the industry knew how big this was.&lt;/p&gt;

&lt;p&gt;Cross-border payment system. Multi-currency settlement + compliance + risk. Their last deployment had a P0 incident — an exchange rate module drifted by four decimal places in an edge case, and audit chased it for two months. So Finova's CTO made it clear: &lt;strong&gt;a $1.8M contract, and whoever signs off owns the result.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;$1.8M. Enough to keep a small testing company alive for a whole year.&lt;/p&gt;

&lt;p&gt;Plenty of firms showed up at the briefing. But only two were real contenders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;QualiGuard&lt;/strong&gt; — mid-sized, just closed their Series A, 200 people, their own AI testing platform called Aegis. A $1.8M contract was barely a rounding error for them — but with Series A money comes the pressure to show revenue growth for the next round, and Finova was a trophy client in the cross-border payments space. The case study was worth more than the project itself. Derek stood at the podium, flipping through slides packed with numbers: &lt;strong&gt;Aegis delivers 97.2% test automation coverage. Full Finova platform testing in four business days.&lt;/strong&gt; No "we'll try." Just "we can do it."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VeriTest&lt;/strong&gt; — small, fifteen people. Marcus spent the whole morning working the room with Finova's people. I sat in the back row with nothing.&lt;/p&gt;

&lt;p&gt;Marcus slid back over and leaned in: "Their PPT makes yours look like a joke."&lt;/p&gt;

&lt;p&gt;I didn't answer. I was watching Derek's boss.&lt;/p&gt;

&lt;p&gt;Sarah — QualiGuard's VP, Derek's direct supervisor. She sat in the front row, off to the side, and never once looked at Derek during his entire presentation. She was on her phone. As one of the few women running a technical department, I watched her longer than I watched Derek.&lt;/p&gt;

&lt;p&gt;When Derek flashed Aegis's coverage curve on screen, Sarah was replying to emails.&lt;/p&gt;

&lt;p&gt;When Derek said "Aegis is the result of 18 months of R&amp;amp;D, with AI prediction accuracy reaching 82%," Sarah was scrolling something on her phone.&lt;/p&gt;

&lt;p&gt;When Derek finished and the room applauded, Sarah flipped her phone face-down on the table. She didn't clap.&lt;/p&gt;

&lt;p&gt;At the coffee break after the briefing, I didn't go talk to Finova's CTO. I went to find Marcus.&lt;/p&gt;

&lt;p&gt;"Run two people for me."&lt;/p&gt;

&lt;p&gt;"Who?"&lt;/p&gt;

&lt;p&gt;"Sarah and Derek from QualiGuard. I need to know their dynamic."&lt;/p&gt;

&lt;p&gt;Marcus frowned. "Why? We haven't even written a proposal."&lt;/p&gt;

&lt;p&gt;"Derek just pitched for 40 minutes, and his boss didn't look at him once. Two most likely explanations: either she's heard it a hundred times and can't stand it anymore — &lt;strong&gt;or she's waiting for him to screw up.&lt;/strong&gt; "&lt;/p&gt;

&lt;p&gt;Marcus looked at me. "Lena, which one's worse?"&lt;/p&gt;

&lt;p&gt;"The second one. Because if she's waiting for him to slip — &lt;strong&gt;we don't need to beat Derek. We just need to put a knife in Sarah's hand — one he won't see coming.&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;Marcus nodded.&lt;/p&gt;

&lt;p&gt;He knows how deep my technical chops go. And he knows my philosophy on dirty work: &lt;strong&gt;when you have the tech, fight clean. When you don't, fight dirty. When you can't win either way, find another path.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He's never doubted my read on things.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act II · First Blood
&lt;/h2&gt;

&lt;p&gt;Derek didn't wait for the submission deadline. He played his hand early.&lt;/p&gt;

&lt;p&gt;On May 14, QualiGuard sent word through an intermediary to Finova's CTO: "Aegis is already testing your system — first round's on us."&lt;/p&gt;

&lt;p&gt;Smart move. The first one to deliver always has the advantage — even if what they deliver is half-baked.&lt;/p&gt;

&lt;p&gt;On May 16, Derek sent Aegis's test report to Finova's technical review team. Subject line: &lt;em&gt;Finova Full Platform Assessment — Aegis v2.3&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I saw the report on the morning of May 17.&lt;/p&gt;

&lt;p&gt;When Marcus forwarded it, he expected me to look at the coverage and pass rates first — 97.2% coverage is a knockout punch in any bidding scenario.&lt;/p&gt;

&lt;p&gt;But I did something Marcus didn't understand.&lt;/p&gt;

&lt;p&gt;I flipped to the appendix.&lt;/p&gt;

&lt;p&gt;Not the body of the report. The &lt;strong&gt;system log appendix&lt;/strong&gt; — auto-generated timestamps from Aegis, buried in the last three pages. Most people in a bidding scenario only read the summary and conclusions. But I've been doing testing for fifteen years — five of them as CTO before we got acquired. Took the payout, started my own shop, brought Marcus in as partner. And in that time, I've seen too many projects with beautiful reports and disastrous execution. &lt;strong&gt;I never read what people wrote. I only read what they actually ran.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Aegis's logs showed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Test Session: AEG-FINOVA-0512
Status:       COMPLETED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|Start:        2026-05-12 11:20:03 UTC
|End:          2026-05-12 14:32:07 UTC
|Duration:     3h 12min 4s
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|Total Reqs:   2,347
|Passed:       2,312
|Failed:       0
|Skipped:      35
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|Avg Resp:     47ms
|P99 Resp:     189ms
|Max Resp:     312ms
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|External Deps:  0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I stared at "External Deps: 0" and "Avg Resp: 47ms" for a long time.&lt;/p&gt;

&lt;p&gt;Finova does cross-border payments. Two years ago I consulted for a company with a similar architecture — they had 9 external dependencies, and a single exchange rate API had a P50 response time of 120ms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandbox environment. Expected — all pre-testing runs in sandbox.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't check the coverage. I checked the execution time first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2,347 test cases in 3 hours and 12 minutes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I made seven phone calls.&lt;/p&gt;

&lt;p&gt;The first was to Marcus: "Find out Finova's production environment setup — how many external dependencies do they have?"&lt;/p&gt;

&lt;p&gt;Twenty minutes later Marcus messaged back: &lt;strong&gt;14 external dependency services. API rate limit at 1,000 requests per minute per account.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I made six more calls. I don't know many people, but the ones I know are in the right places — one former colleague spent two years on Finova's infrastructure team.&lt;/p&gt;

&lt;p&gt;The sixth call confirmed the key piece: &lt;strong&gt;Finova's production CI pipeline takes at least 24 hours for a full run.&lt;/strong&gt; Half of those 14 external dependencies are banking-grade interfaces, each request takes 800ms minimum, and the rate limit kills any parallelism.&lt;/p&gt;

&lt;p&gt;I laid out the math:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Finova production full test — theoretical estimate:
2,347 test cases — split 7:3 external/internal
1,642 external × 800ms avg dependency response
= 1,314 seconds (pure dependency wait)
+ internal execution time (est. 20,000 seconds)
≈ 5.9 hours — assuming zero bottlenecks

But API rate limit is 1,000 req/min:
External triggers: 1,642 / 1,000 = 1.64 min queueing
+ real queue time at least 3× estimate

→ Safe estimate for full run: 24h+

Aegis execution time: 3h 12min
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I closed my notebook.&lt;/p&gt;

&lt;p&gt;Derek's test ran in a sandbox. Nothing unusual — all pre-tests run in sandbox. But something else caught my attention.&lt;/p&gt;

&lt;p&gt;I went back to his proposal. 97.2% coverage, four business days delivery.&lt;/p&gt;

&lt;p&gt;Finova's production CI takes 24 hours per run. Even if Aegis is faster than traditional CI, four days gives you at most two complete iterations. But testing isn't just about running — you also analyze, fix, and re-run.&lt;/p&gt;

&lt;p&gt;Four days. Ninety-six hours. Subtract 24 hours for one run — how many iterations can you fit in the remaining 72?&lt;/p&gt;

&lt;p&gt;I didn't know the answer. But I knew one thing: &lt;strong&gt;if there was even one person on Finova's review team who could do this math, Derek was in trouble.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I picked up my phone.&lt;/p&gt;

&lt;p&gt;I didn't call Finova.&lt;/p&gt;

&lt;p&gt;I called Marcus.&lt;/p&gt;

&lt;p&gt;"Can you still reach that former QualiGuard employee?"&lt;/p&gt;

&lt;p&gt;"I can. But it'll cost."&lt;/p&gt;

&lt;p&gt;"How much?"&lt;/p&gt;

&lt;p&gt;"Five thousand."&lt;/p&gt;

&lt;p&gt;"Pay it. I don't need technical data — I need the dynamic between Sarah and Derek."&lt;/p&gt;

&lt;p&gt;Marcus was quiet for two seconds. "Lena, we've got seventy grand left in the account."&lt;/p&gt;

&lt;p&gt;"I know. Pay it."&lt;/p&gt;

&lt;p&gt;I knew that five thousand, if I was right about this, &lt;strong&gt;wouldn't buy us a project. It would buy us the cost of putting the knife in Sarah's hand.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Act III · Dirty Play
&lt;/h2&gt;

&lt;p&gt;May 18. Marcus met a guy named Roy for coffee. Roy was a former senior test engineer at QualiGuard who'd left a month ago — and not on good terms with Sarah's team.&lt;/p&gt;

&lt;p&gt;Five thousand dollars bought three recordings. Marcus transcribed them and sent them to me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recording one: three months ago, Sarah told Derek to cut Aegis's GPU budget.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Sarah: "Aegis costs $31,000 a month in GPU expenses. That's more than a manual testing team. Where's the justification?"&lt;br&gt;
Derek: "GPU is the engine of Aegis. Cut GPU, you cut coverage."&lt;br&gt;
Sarah: "Then cut coverage. The client is paying for manual testing rates, not GPU cluster rates."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Recording two: two months ago, Sarah cut Aegis's GPU allocation by 50% without Derek's consent.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Derek: "You could have at least told me first."&lt;br&gt;
Sarah: "I did. You didn't listen."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Recording three: one month ago, Derek lost it at a team meeting.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Derek: "Sarah doesn't understand the technology. All she reads are spreadsheets. Aegis now takes 40% longer to run a full regression than before. She doesn't know, and she doesn't care. All she cares about is this quarter's P&amp;amp;L."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I finished reading the last transcript and put my phone down.&lt;/p&gt;

&lt;p&gt;"Sarah doesn't get the tech. Derek thinks she's holding him back. Derek needs full regression to prove Aegis works — because cutting test volume is admitting Aegis can't deliver."&lt;/p&gt;

&lt;p&gt;Marcus asked: "So what do we do?"&lt;/p&gt;

&lt;p&gt;"So we make Sarah force Derek to do something he's afraid to do."&lt;/p&gt;

&lt;p&gt;I had Marcus prepare a document.&lt;/p&gt;

&lt;p&gt;Not a real bid. A document that &lt;strong&gt;looked like a real bid.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The gist: VeriTest planned to bid at $1.3M — 25% below the industry average, more than 20% below QualiGuard's expected price. The proposal looked legit: attachments, technical approach, delivery timelines.&lt;/p&gt;

&lt;p&gt;The only difference: &lt;strong&gt;this proposal wasn't for Finova. It was for Sarah to see.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I found someone who did business with both companies — not a friend, but reliable for the right price. Had him "accidentally" let the contents of our fake bid slip to Sarah.&lt;/p&gt;

&lt;p&gt;"You sure Sarah will buy it?"&lt;/p&gt;

&lt;p&gt;"A manager who only reads spreadsheets, in the middle of bid season, sees a competitor 20% below her price — &lt;strong&gt;she won't question whether it's real. She'll take the number straight to Derek and tell him to match it.&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;It played out exactly as I predicted.&lt;/p&gt;

&lt;p&gt;May 20. Word came through from Roy: &lt;strong&gt;Sarah called Derek into her office for 45 minutes. When he walked out, his bid had changed — from $1.8M to $1.5M.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When I got the news, I didn't smile.&lt;/p&gt;

&lt;p&gt;"Derek just cut $300K." Marcus was doing the math on his end too. "They were at $1.8M with a 40% margin. Sarah pushed him to $1.5M with a hard floor — margin can't go below 22%."&lt;/p&gt;

&lt;p&gt;I did the calculation. "Cost ceiling went from $1.08M to $1.17M. He actually has an extra $90K to play with."&lt;/p&gt;

&lt;p&gt;"And?"&lt;/p&gt;

&lt;p&gt;"And here's the problem — he's got an extra $90K in cost room, but he lost $300K in revenue. Sarah already cut his GPU once. It's not that he can't afford more compute — &lt;strong&gt;it's that he can't go back to Sarah and say 'you were wrong to cut it, I need it back.' That's slapping his boss in the face&lt;/strong&gt; — he won't do it. The bid's due in two days, he's got no time for that fight. His hands are tied. The only path forward is to reduce test volume and let AI fill the gaps."&lt;/p&gt;

&lt;p&gt;"Switching from full regression to sampling?" Marcus asked.&lt;/p&gt;

&lt;p&gt;"Exactly. Run fewer real test cases, let the AI extrapolate coverage from historical data. And it's not voluntary — his boss made him cut the price. He doesn't have the guts to say no."&lt;/p&gt;

&lt;p&gt;I leaned back in my chair.&lt;/p&gt;

&lt;p&gt;Aegis at full GPU could probably finish a full regression in about 12 hours — based on 2,347 test cases and industry execution benchmarks. Sarah cut his compute by half, pushing it to 17 hours.&lt;/p&gt;

&lt;p&gt;I opened my laptop and wrote a quick script to simulate Derek's situation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Aegis compressed test strategy estimate
Based on: Sarah cut 50% GPU → full test slows by 40% → 20% price cut
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;full_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2347&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_hours&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coverage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.972&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# Aegis full GPU baseline (QualiGuard internal env)
&lt;/span&gt;&lt;span class="n"&gt;gpu_cut&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_hours&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;full_test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_hours&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.4&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.40&lt;/span&gt;
&lt;span class="n"&gt;ai_prediction_accuracy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.82&lt;/span&gt;

&lt;span class="n"&gt;actual_coverage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;full_test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coverage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;sample_rate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;full_test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coverage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;ai_prediction_accuracy&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Claimed coverage: 97.2%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Actual effective coverage: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;actual_coverage&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gap: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;coverage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;actual_coverage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; percentage points&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This gap in a financial system = P0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"So now Derek's test plan is: &lt;strong&gt;run 40% of the real cases, let the AI extrapolate the remaining 60%&lt;/strong&gt; of coverage. And he can't tell anyone — because admitting it means admitting Aegis can't deliver what it promised."&lt;/p&gt;

&lt;p&gt;"Is Derek going to lose?"&lt;/p&gt;

&lt;p&gt;"No. On paper, his plan still looks good — sampling plus AI prediction is a legitimate approach. The problem isn't the plan itself."&lt;/p&gt;

&lt;p&gt;"Then what is it?"&lt;/p&gt;

&lt;p&gt;"The problem is he won't tell the client he's using sampling. He'll call it 'full regression.' &lt;strong&gt;And the moment he says 'full' — we've got something to cut open.&lt;/strong&gt;"&lt;/p&gt;




&lt;h2&gt;
  
  
  Act IV · The Summit
&lt;/h2&gt;

&lt;p&gt;On May 23, Finova's procurement office officially received both bids.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;QualiGuard proposal summary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;$1,500,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing method&lt;/td&gt;
&lt;td&gt;Aegis fully automated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delivery timeline&lt;/td&gt;
&lt;td&gt;4 business days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coverage guarantee&lt;/td&gt;
&lt;td&gt;≥ 95% (line level)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Proof point&lt;/td&gt;
&lt;td&gt;Aegis pre-test report (attached)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;VeriTest proposal summary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Detail&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;$1,275,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testing method&lt;/td&gt;
&lt;td&gt;Manual + AI hybrid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delivery timeline&lt;/td&gt;
&lt;td&gt;14 business days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coverage guarantee&lt;/td&gt;
&lt;td&gt;≥ 90% (line level)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-test&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The comparison was clear.&lt;/p&gt;

&lt;p&gt;QualiGuard at $1.5M — $225K more than VeriTest, but only about 18% higher. Factor in delivery speed (4 days vs 14) and technical capability (97% automated vs hybrid), and Finova's CTO James was leaning toward QualiGuard.&lt;/p&gt;

&lt;p&gt;On May 27, internal scores leaked — QualiGuard technical score 86, VeriTest 72.&lt;/p&gt;

&lt;p&gt;Derek posted on LinkedIn:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Technology is not about who has more people. It's about who has the right tools. Proud of the Aegis team. Looking forward to Finova's next chapter."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Forty-plus likes underneath.&lt;/p&gt;

&lt;p&gt;When Derek posted that — I could tell, he genuinely believed it. His problem wasn't that his technology was bad. It was that he trusted it too much to see his own boss sharpening the blade.&lt;/p&gt;

&lt;p&gt;At the same time, I was running my own Aegis behavior analysis on my laptop.&lt;/p&gt;

&lt;p&gt;I couldn't use Aegis — but I could reverse-engineer its logic. Derek's public materials mentioned Aegis's core architecture: a coverage-driven reinforcement learning selection strategy. In other words, Aegis doesn't write test cases — it selects them. It picks the subset of the test asset library that contributes the most to coverage.&lt;/p&gt;

&lt;p&gt;I did the math:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Aegis selection strategy (based on public materials):

Full test asset library estimate: for a Finova-scale system ≈ 8,000-12,000 cases
Aegis report ran: 2,347
→ Selection ratio ≈ 20%-25%, reasonable

If Derek runs sampling + AI prediction:
Actual execution ≈ 40% × 2,347 ≈ 940 cases
AI-predicted coverage ≈ 60%
→ Real coverage ≈ 40% × 97.2% + 60% × expected value

The problem isn't coverage percentage. It's AI prediction accuracy —
Derek mentioned at the briefing that Aegis's prediction accuracy is 82%.
82% fill accuracy covering 60% of the gap:
Real composite coverage ≈ 40% × 97.2% + 60% × (97.2% × 0.82)
≈ 38.9% + 47.8% = 86.7%

10.5 percentage points below what's reported.
In a financial system, that's a P0.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I didn't share this calculation with anyone. But I knew one thing: after Sarah squeezed Derek's budget, his plan had a technical hole. Frank didn't need to find it — I could have cut straight through it myself.&lt;/p&gt;

&lt;p&gt;But I didn't.&lt;/p&gt;

&lt;p&gt;Not because I couldn't. Because having someone else land the blow was cleaner than doing it myself.&lt;/p&gt;

&lt;p&gt;"You're not worried at all?" Marcus asked.&lt;/p&gt;

&lt;p&gt;"About what?"&lt;/p&gt;

&lt;p&gt;"The scores. We're 14 points behind."&lt;/p&gt;

&lt;p&gt;"Scoring is scoring. The review meeting is the review meeting."&lt;/p&gt;

&lt;p&gt;"What's the difference?"&lt;/p&gt;

&lt;p&gt;"Scoring is one CTO closing his office door. The review meeting is three departments sitting around a table. The CTO can have his favorites — but the CFO and the head of risk aren't going to take a bullet for his ego. Derek still doesn't realize his biggest problem isn't our proposal."&lt;/p&gt;

&lt;p&gt;"Then what is it?"&lt;/p&gt;

&lt;p&gt;"&lt;strong&gt;His biggest problem is his boss wants him to lose more than he wants to win.&lt;/strong&gt; "&lt;/p&gt;




&lt;h2&gt;
  
  
  Act V · The Crash
&lt;/h2&gt;

&lt;p&gt;Finova's bid review meeting was scheduled for the afternoon of May 29.&lt;/p&gt;

&lt;p&gt;Six people in the room. CTO James, CFO Mark, risk lead Frank, plus two technical reviewers and procurement.&lt;/p&gt;

&lt;p&gt;QualiGuard presented first.&lt;/p&gt;

&lt;p&gt;Derek was in a suit, tie perfectly knotted. His team had put together 40 slides — from Aegis's architecture diagram to heat maps of Finova's test coverage, every number backed by data.&lt;/p&gt;

&lt;p&gt;He spoke for 35 minutes. Then he dropped the pre-test results:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2,347 test cases. 97.2% coverage. 3 hours 12 minutes. Zero failures.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;James nodded.&lt;/p&gt;

&lt;p&gt;Then came Q&amp;amp;A.&lt;/p&gt;

&lt;p&gt;Frank — the risk lead, mid-forties. Started as a developer, moved to architecture, then to risk. Twelve years at Finova. He was the one who spent two months auditing the last P0 incident.&lt;/p&gt;

&lt;p&gt;He flipped open his notebook.&lt;/p&gt;

&lt;p&gt;"Derek, your proposal says four business days delivery, 97.2% coverage. What's that number based on?"&lt;/p&gt;

&lt;p&gt;Derek clearly hadn't expected the first question to go there.&lt;/p&gt;

&lt;p&gt;"—Based on Aegis's test data from the Finova environment. 2,347 test cases. Full regression."&lt;/p&gt;

&lt;p&gt;Frank turned another page.&lt;/p&gt;

&lt;p&gt;"I've read your Aegis test logs — execution time was 3 hours 12 minutes. Finova's production CI full run is typically 24 hours or more, that's not a secret. But that's not what I'm curious about."&lt;/p&gt;

&lt;p&gt;He closed his notebook.&lt;/p&gt;

&lt;p&gt;"What I'm curious about is — your proposal guarantees delivery in 4 days. 4 days is 96 hours. If a full run takes 24 hours, you've got 72 hours left for analysis, fixes, and re-runs. That's tight — but it works if you're running full regression."&lt;/p&gt;

&lt;p&gt;He looked at Derek.&lt;/p&gt;

&lt;p&gt;"Here's the problem — Finova's cross-currency settlement module alone, our own CI pipeline runs it in 16 to 18 hours. Everyone in the industry knows that scale. Your Aegis logs show 3 hours 12 minutes per run — that's from &lt;strong&gt;our sandbox environment&lt;/strong&gt;, not Finova's production environment. At 17 hours per real iteration, in a 4-day window — how many iterations are you planning to run?"&lt;/p&gt;

&lt;p&gt;Derek didn't answer immediately.&lt;/p&gt;

&lt;p&gt;Twenty seconds of silence.&lt;/p&gt;

&lt;p&gt;"—Our test strategy is determined by Aegis's automated selection mechanism. Coverage is above 97%."&lt;/p&gt;

&lt;p&gt;"I didn't ask about coverage," Frank said. "&lt;strong&gt;I asked — 4-day delivery, how many iterations are you planning to run?&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;Derek didn't answer. Because the answer was one he couldn't say.&lt;/p&gt;

&lt;p&gt;Frank didn't push further. He closed his notebook and wrote something down.&lt;/p&gt;

&lt;p&gt;For the remaining 10 minutes Derek spoke, Sarah never looked at him.&lt;/p&gt;

&lt;p&gt;VeriTest's presentation came after.&lt;/p&gt;

&lt;p&gt;I stood up. No slides. Just a PDF and a laptop connected to Finova's network.&lt;/p&gt;

&lt;p&gt;I spoke for 20 minutes.&lt;/p&gt;

&lt;p&gt;I didn't talk about how good our proposal was. I talked about Finova's architectural vulnerabilities — not as an attack on QualiGuard, but as a technical analysis.&lt;/p&gt;

&lt;p&gt;"Finova's cross-border payment system has three critical bottlenecks. Multi-currency exchange rate settlement consistency — that's where your last P0 happened. Risk engine rule hot reload — you can't validate this in a test environment, it needs a production canary. Third is the stability chain of your 14 external dependencies — third-party interfaces are uncontrollable, so you need circuit breakers as a safety net."&lt;/p&gt;

&lt;p&gt;James raised an eyebrow. These weren't blind spots for him — but he hadn't expected someone to tear into Finova's system this deep in 20 minutes.&lt;/p&gt;

&lt;p&gt;"&lt;strong&gt;Our proposal isn't designed to cover 97% of your code. It's designed to cover your three biggest risks. Code coverage is a means. Business correctness is the goal.&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;I didn't pile on Frank's questioning. I didn't need to. Every point Frank had made was already sitting in that room — hitting ten times harder coming from him than it ever would from me.&lt;/p&gt;

&lt;p&gt;But I landed another blow.&lt;/p&gt;

&lt;p&gt;"Our delivery timeline is 14 days. That's because we need the first 5 days to do one thing — run a full-chain probe test on production, in a canary deployment. Not for coverage assessment. Just for external dependency health checks and circuit breaker validation. This step can't be skipped. Whoever wins this bid, I'd recommend Finova do this before any new testing begins."&lt;/p&gt;

&lt;p&gt;James nodded.&lt;/p&gt;

&lt;p&gt;It sounded like I was looking out for Finova. And I was. But I also knew — saying this out loud meant making it clear that &lt;strong&gt;QualiGuard's 4-day delivery physically couldn't include this step.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I didn't name names. Everyone knew who I was talking about.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During the mid-meeting break, I saw something in the hallway.&lt;/p&gt;

&lt;p&gt;Sarah pulled Derek into a stairwell. They talked for less than two minutes. Derek turned and walked away. Sarah didn't follow.&lt;/p&gt;

&lt;p&gt;I pieced it together later from context — she gave him an out. A chance to voluntarily withdraw the delivery commitment from his proposal during the re-convened session, admit the mistake, let the company handle it internally. He didn't take it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act VI · The Knife
&lt;/h2&gt;

&lt;p&gt;When the meeting reconvened, QualiGuard was asked to give a supplemental statement.&lt;/p&gt;

&lt;p&gt;Before Sarah stood up, she closed her eyes for a split second. A tiny gesture, barely noticeable. Most people wouldn't have seen it. But I did — and it wasn't nerves. It was her confirming a decision she'd already made.&lt;/p&gt;

&lt;p&gt;She cleared her throat.&lt;/p&gt;

&lt;p&gt;She said three things:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"QualiGuard's proposal regarding delivery timeline was based on Derek's personal technical assessment and had not gone through management review. We sincerely apologize for this. There is a discrepancy between our resource planning and the delivery commitment made. If Finova selects QualiGuard, we will submit a revised delivery plan led by a new technical lead. We have already initiated an internal remediation — the new delivery plan will be submitted to the review committee within 48 hours."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The room went quiet for about five seconds.&lt;/p&gt;

&lt;p&gt;James's expression didn't change.&lt;/p&gt;

&lt;p&gt;Frank wrote something down.&lt;/p&gt;

&lt;p&gt;Derek didn't stand up to fight it. His face was calm — the kind of calm that isn't real peace, just the stillness of someone who knows there's nothing left they can do.&lt;/p&gt;

&lt;p&gt;I sat across the room and watched.&lt;/p&gt;

&lt;p&gt;I gave Sarah a score in my head: &lt;strong&gt;Clean cut. No hesitation, no wasted motion, not a single unnecessary word. Cut clean.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A good VP, when a project goes sideways, has only two options — save the project or save yourself. Sarah chose the latter.&lt;/strong&gt; She wasn't going to let Derek's technical decisions damage her professional reputation. Even if the decision itself wasn't malicious — running a pre-test in a sandbox, the data was real, nothing was fabricated. But he told the review committee it was "full regression." That's what killed him. At the moment a client is questioning your credibility, facts don't matter anymore. What matters is who can be thrown overboard.&lt;/p&gt;

&lt;p&gt;Derek was thrown overboard.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not because his solution was bad. Because Sarah needed someone to throw overboard.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After the meeting, I ran into Derek in the parking lot.&lt;/p&gt;

&lt;p&gt;He was leaning against his car door, loosening his tie. He stopped when he saw me.&lt;/p&gt;

&lt;p&gt;Derek spoke first.&lt;/p&gt;

&lt;p&gt;"That bid — I cornered Sarah after the meeting and asked where she heard it. An intermediary. You leaked it."&lt;/p&gt;

&lt;p&gt;I didn't say anything.&lt;/p&gt;

&lt;p&gt;"You set me up. Made sure my boss saw your lowball number. You knew she'd squeeze me."&lt;/p&gt;

&lt;p&gt;"I didn't know she'd squeeze you."&lt;/p&gt;

&lt;p&gt;"Then what did you know?"&lt;/p&gt;

&lt;p&gt;"I knew that once you saw that number in her hands, you'd try to prove you were better than us. How she'd react was her business. How &lt;strong&gt;you'd&lt;/strong&gt; react — I had a pretty good read on that."&lt;/p&gt;

&lt;p&gt;Derek let out a laugh. Not the amused kind.&lt;/p&gt;

&lt;p&gt;"That arithmetic Frank did in the meeting — you'd already calculated the 4 days don't add up. You waited for someone else to take the shot. That's cold."&lt;/p&gt;

&lt;p&gt;I said: "I never claimed I played clean."&lt;/p&gt;

&lt;p&gt;Derek pressed: "So you think you're cleaner than me?"&lt;/p&gt;

&lt;p&gt;"I'm not comparing who's cleaner." I looked at him. "&lt;strong&gt;I only care about winning. And right now, I'm winning.&lt;/strong&gt; "&lt;/p&gt;

&lt;p&gt;Derek looked at me. Silent for a moment.&lt;/p&gt;

&lt;p&gt;"Let me ask you something."&lt;/p&gt;

&lt;p&gt;"Go ahead."&lt;/p&gt;

&lt;p&gt;"How did you know Frank would ask that question?"&lt;/p&gt;

&lt;p&gt;"I didn't."&lt;/p&gt;

&lt;p&gt;"Then —"&lt;/p&gt;

&lt;p&gt;"I didn't know he would. But I knew your 4-day promise — anyone who divides 96 hours by 17 hours can see it doesn't work. I bet that someone on Finova's review team could do basic arithmetic. I was right."&lt;/p&gt;

&lt;p&gt;Derek yanked off his tie.&lt;/p&gt;

&lt;p&gt;"Next time you want to leak a fake bid, find someone else. Whoever you used — doesn't matter, that kind of play doesn't cost much. Five thousand, eight thousand — it's your style. You've got enough left in the account for another round?"&lt;/p&gt;

&lt;p&gt;I paused mid-step getting into my car.&lt;/p&gt;

&lt;p&gt;I turned back and looked at him.&lt;/p&gt;

&lt;p&gt;"Derek."&lt;/p&gt;

&lt;p&gt;"What?"&lt;/p&gt;

&lt;p&gt;"Next time you put together a proposal — make sure your boss knows what you're doing out there."&lt;/p&gt;

&lt;p&gt;I closed the door and drove off before he could answer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act VII · The Aftermath
&lt;/h2&gt;

&lt;p&gt;Finova chose VeriTest.&lt;/p&gt;

&lt;p&gt;Not because our testing was better. Because QualiGuard's delivery timeline literally didn't add up — Finova's internal review classified it as "material omission in the bid proposal." Frank wrote in his review report: &lt;strong&gt;"Recommendation: do not select QualiGuard — without confirmation of delivery capability, contract performance risk cannot be assessed."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sarah saved herself without giving Derek a second thought. But more importantly — those three sentences she said at the meeting, someone repeated them. Inside QualiGuard, word spread fast. Derek's situation inside the company was worse than losing the bid itself.&lt;/p&gt;

&lt;p&gt;VeriTest got the contract. $1.275M. Marcus was already calculating how long that would keep us alive.&lt;/p&gt;

&lt;p&gt;June 2. The celebration dinner was at a steakhouse. After three glasses of whiskey, Marcus finally asked the question he'd been holding onto.&lt;/p&gt;

&lt;p&gt;"That fake bid — how did you know Sarah would take it to Derek?"&lt;/p&gt;

&lt;p&gt;"A manager who only reads spreadsheets, in bid season, sees a number 20% below hers — she's not going to verify it. She's going to take it to Derek and tell him to match it. The exact number doesn't matter to her. What matters is 'theirs is lower than ours.' "&lt;/p&gt;

&lt;p&gt;"What if he didn't cut?"&lt;/p&gt;

&lt;p&gt;"He would have. Because his boss Sarah has been after him for three months. Losing the bid costs him one project. Losing to Sarah — that's not just losing a project anymore. He can't approve a budget, he can't lead his team. That's everything."&lt;/p&gt;

&lt;p&gt;Marcus cut a piece of steak but didn't eat it right away. "One more thing. Your fake bid was $1.3M. Our real bid was $1.275M — only $25K apart. What if Sarah checked?"&lt;/p&gt;

&lt;p&gt;"She wouldn't. Because she doesn't care about the number itself. She cares about the words 'lower than us.' $1.3M or $1.275M — to her, same thing. Both are 'lower.' "&lt;/p&gt;

&lt;p&gt;Marcus raised his fourth glass. "To QualiGuard's VP."&lt;/p&gt;

&lt;p&gt;"To Sarah."&lt;/p&gt;

&lt;p&gt;"You think she knows she was set up?"&lt;/p&gt;

&lt;p&gt;I chewed my steak before answering. "She knows. And she doesn't care. Because she walked away clean. That's what smart people do."&lt;/p&gt;

&lt;p&gt;Outside the window, the financial district lights were coming on. Somewhere out there was Finova's building.&lt;/p&gt;

&lt;p&gt;I looked at the lights and said something I hadn't planned.&lt;/p&gt;

&lt;p&gt;"Guess where Derek is right now."&lt;/p&gt;

&lt;p&gt;"Home. Or cleaning out his desk."&lt;/p&gt;

&lt;p&gt;"No. He's updating his resume."&lt;/p&gt;

&lt;p&gt;"You sure?"&lt;/p&gt;

&lt;p&gt;"He's not the type to quit. He didn't lose on the technology — he lost because his boss stabbed him in the back. Guys like that don't accept it. He'll find another company. Somewhere without a Sarah."&lt;/p&gt;

&lt;p&gt;"Will he change?"&lt;/p&gt;

&lt;p&gt;"No. Because what beat him wasn't the tech. It was office politics. He thought this was a technical competition — but it was about people the whole time. AI is a tool. Code is a tool. Test reports are a tool. &lt;strong&gt;Tools don't decide who wins. The people holding them do.&lt;/strong&gt; "&lt;/p&gt;

&lt;p&gt;Marcus checked his watch. "And our AI?"&lt;/p&gt;

&lt;p&gt;"We use it too. But we know it's just a tool. The difference is — we never told the client AI could replace people. We said AI helps people test, people make the decisions. Derek lost because he promised something he couldn't deliver. &lt;strong&gt;AI can hit 97% coverage. But AI doesn't know what it doesn't know. People do. So people make the decisions, AI does the execution. He got the order backwards.&lt;/strong&gt;"&lt;/p&gt;




&lt;h2&gt;
  
  
  Epilogue
&lt;/h2&gt;

&lt;p&gt;A month later.&lt;/p&gt;

&lt;p&gt;I saw Derek's update on LinkedIn.&lt;/p&gt;

&lt;p&gt;New profile picture. New company — healthcare information systems. Same title: Director of Testing.&lt;/p&gt;

&lt;p&gt;A row of congratulations comments underneath.&lt;/p&gt;

&lt;p&gt;I scrolled past that post and stopped.&lt;/p&gt;

&lt;p&gt;That day on the podium, when he said "we can do it" — he really believed it. Not the kind of belief that's pretending. The kind where you fool yourself. Aegis didn't lie to him. Sarah didn't either. &lt;strong&gt;He lied to himself.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I clicked like. As I hit that button, I wasn't sure if I was saluting him or the version of myself that almost became him.&lt;/p&gt;

&lt;p&gt;I put down the mouse and remembered that day at the briefing, sitting in the back row with nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It wasn't AI that lost. It was people who thought technology could beat politics.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Your company has been through a bidding war too, hasn't it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Did you lose on the technology, or did you lose in the office politics?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Did winning come from having better AI — or knowing when to hand someone the knife?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next time you're in that review room — are you Lena, or are you Derek?&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;His AI ran on a GPU cluster. Mine ran on a laptop with a broken cooling fan and café Wi-Fi. Still running. &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;Buy me a coffee ☕&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is Chapter 15 of &lt;a href="https://dev.to/xulingfeng/series/40565"&gt;AI, Ego &amp;amp; Regret&lt;/a&gt; — and the final chapter. From the first story to this one, every comment, every share, every late-night message you left has been the fuel that kept me going through all 15 stories. Thank you. I couldn't have done this without you.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This series is taking a break for now. I'll post a separate retrospective on all 15 stories. A new series is already taking shape — different format, same rawness.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Disclosure: Written with AI assistance (per community guidelines)&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>career</category>
      <category>programming</category>
    </item>
    <item>
      <title>A Company AI Flagged My Article As "Low Quality." I Ran the Numbers. Then I Ran Again.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Tue, 16 Jun 2026 14:06:45 +0000</pubDate>
      <link>https://dev.to/xulingfeng/a-company-ai-flagged-my-article-as-low-quality-i-ran-the-numbers-then-i-ran-again-1h0p</link>
      <guid>https://dev.to/xulingfeng/a-company-ai-flagged-my-article-as-low-quality-i-ran-the-numbers-then-i-ran-again-1h0p</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A story about an AI content moderation system that flagged 347 posts since launch — and what happened when someone finally asked whether it was getting it right.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Ever had an automated system judge your work without being able to explain why?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ever pulled the data behind an AI decision — only to find that even the people who shipped it couldn't tell you whether it was right?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is that story.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 1 · The Flag
&lt;/h2&gt;

&lt;p&gt;Tuesday afternoon. I published a postmortem on our company's internal knowledge platform — a full root cause analysis of an incident from last quarter. Three thousand words. Six screenshots. A remediation plan.&lt;/p&gt;

&lt;p&gt;Fifteen minutes later, the system pushed a notification:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This post appears to use undeclared AI assistance. Please add an AI disclosure, or the post may be deprioritized.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No personal message from a moderator. Just the standard notice — add a disclosure or lose visibility. No suggestions about the content. No appeal process. Just grey text and a "flagged" badge.&lt;/p&gt;

&lt;p&gt;I clicked the notification and it took me to the post's status page. I recognized the system — an AI content moderation tool rolled out nine months earlier. I'd run a batch of its flag data during the acceptance test and found the false positive rate was brutal. I sent the results to the project lead. He replied: &lt;strong&gt;"Ship it first. We'll iterate."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now it had flagged my post.&lt;/p&gt;

&lt;p&gt;My first thought wasn't anger. It was that bitter kind of recognition — I'd run the numbers once. Now I was one of them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 2 · The Pushback
&lt;/h2&gt;

&lt;p&gt;I went to the project lead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What does this flag mean?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The system assessed your post and flagged it as potentially undeclared AI assistance."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"So what happens?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"It affects the post's weight. You should revise and resubmit."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"What about the content itself? Is there something wrong with it?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He paused.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The system doesn't evaluate that."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I told him I wrote this. Three thousand words. Two months of incident tracking. Every conclusion backed by log screenshots.&lt;/p&gt;

&lt;p&gt;He looked at me. &lt;strong&gt;"Just add a disclosure and resubmit. That should take care of it."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"So adding a disclosure removes the flag?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"It should."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Should?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I'm not sure — the decision comes from the model. We've never actually had to reverse one."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's when it clicked: &lt;strong&gt;even the people running this system couldn't explain how it made its calls.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"It's AI-driven," he added, as if that settled it. At my company, "the AI decided" had become the universal answer — not because it was true, but because saying it stopped people from asking.&lt;/p&gt;

&lt;p&gt;I dug through old discussion threads. People had asked the same questions before me. Flags weren't reversed. Nothing changed. The threads sank. Not because nobody dared to ask — because asking didn't do anything.&lt;/p&gt;

&lt;p&gt;I didn't write anything for the next two weeks. Instead I pulled the full log of everything the system had evaluated since launch — both flagged and unflagged. The internal API was read-only and open to anyone. It was set up during acceptance testing and nobody ever shut it off.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;audit_api&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_flags&lt;/span&gt;

&lt;span class="n"&gt;flags&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_flags&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;since&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;launch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;categories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;non_native&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high_freq_bot_pattern&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;non_native_pattern&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;non_native&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normal_looking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="c1"&gt;# → {'bot': 187, 'non_native': 98, 'normal': 62}
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Total: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;categories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="c1"&gt;# → Total: 347
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I needed to know what it was actually catching.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 3 · The Prep
&lt;/h2&gt;

&lt;p&gt;The data came in that night. I sat staring at the screen for a long time.&lt;/p&gt;

&lt;p&gt;347 records total. The system had evaluated 347 posts for potential AI assistance — 134 confirmed, 98 non-native authors misidentified, 62 genuine violations it missed.&lt;/p&gt;

&lt;p&gt;The same criteria had been applied to four completely different kinds of content. It couldn't tell the difference between someone writing with AI and not declaring it, someone writing in their own words, someone translating their genuine thoughts through AI — and a fourth group where even a human reviewer couldn't decide.&lt;/p&gt;

&lt;p&gt;The irony: of the 134 confirmed AI-assisted posts, the system was right. But of the 98 non-native author posts, 71 were original human writing — they shouldn't have been flagged. The remaining 27 used some AI help but had real substance — technically a valid flag by the rules, but the content had genuine value. And the 62 posts that genuinely needed flagging? The system missed every single one. The remaining 53 weren't cases the system couldn't judge — they were cases even people couldn't agree on.&lt;/p&gt;

&lt;p&gt;Between the false positives and the misses, the errors outnumbered the hits. Effective accuracy: &lt;strong&gt;~38%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And I was one of those 98. Our teams span multiple continents — most write technical docs in English, their second language. Sentence structures that look perfectly fine to a human look statistically similar to LLM output to a classifier. I found a thread where an Italian engineer described the same thing: he used AI to translate his genuine thoughts, and the system flagged him as "suspected AI." He wasn't being lazy — translation was his only way to participate. But the system couldn't tell the difference between "AI helped you think" and "AI helped you say it."&lt;/p&gt;

&lt;p&gt;I closed the spreadsheet and leaned back. I could file a complaint like everyone else. But previous threads had all gone nowhere. I needed somewhere they couldn't ignore. The system had evaluated 347 posts across four categories. But the people who designed it may never have asked: &lt;strong&gt;Are you catching undeclared AI? Filtering content quality? Or verifying originality?&lt;/strong&gt; Three goals that need three completely different approaches — but it had one scoring dimension hitting everyone. It was answering a question nobody had bothered to ask clearly.&lt;/p&gt;

&lt;p&gt;The system wasn't malicious. It just couldn't distinguish between "written by AI" and "written like AI." Those are two entirely different things. But it had one score for everyone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 4 · The Spotlight
&lt;/h2&gt;

&lt;p&gt;Weekly all-hands. The project lead stood up and put a giant green line chart on screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Since launch, content flag coverage has tripled."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;People clapped. He wasn't wrong — flagged content had tripled. What he didn't say: &lt;strong&gt;two-thirds of it was collateral damage.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Didn't matter though. The numbers were under a green arrow, and nobody reads what's under a green arrow. The lead smiled and talked about "AI-driven quality control redefining our content standards." A platform runs smoothly not because the system works — but because nobody wants to be the one who asks how those numbers were made. Not the system's fault. Just that everyone had agreed not to break the story.&lt;/p&gt;

&lt;p&gt;I was sitting in the third row, my laptop open with the data I'd finished the night before. I moved to the aisle seat so I could stand up when I needed to.&lt;/p&gt;

&lt;p&gt;Then the demo started.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 5 · The Collapse
&lt;/h2&gt;

&lt;p&gt;The lead wanted to show how the system worked — paste any URL and it would evaluate the content. He typed in a link, expecting a success story: a flagged post that got revised and passed.&lt;/p&gt;

&lt;p&gt;The screen loaded for two seconds.&lt;/p&gt;

&lt;p&gt;What came up wasn't a success story. It was the company-wide memo he'd published that morning.&lt;/p&gt;

&lt;p&gt;Title: &lt;em&gt;Q3 Strategic Realignment and Team Structure Update.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The system had shot itself in the foot. It scored his own memo as &lt;strong&gt;"suspected AI-generated" — composite score: 2.1 / 10.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The room went quiet. Someone checked their phone. The lead's face went red, then pale. He mumbled something about the model not refreshing yet and clicked away. I didn't realize it then — but he was on both ends of the same standard. The one who scores, and the one who gets scored. Nobody did it on purpose. The system was just built this way.&lt;/p&gt;

&lt;p&gt;It wasn't surprising, really. That memo had three features the system's training data associated with AI-generated content: clean paragraph structure, zero filler sentences, and a tone gap between the title and the body.&lt;/p&gt;

&lt;p&gt;I wondered how he'd recover. I also wondered whether he realized — a system that gives its own owner's writing a 2.1 out of 10 wasn't proving the memo was bad. It was proving the system had never been right. And he'd said two minutes earlier that "the system doesn't evaluate content quality" — but it was quietly scoring everything behind the scenes.&lt;/p&gt;

&lt;p&gt;He didn't recover. He moved to the next agenda item.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 6 · The Reckoning
&lt;/h2&gt;

&lt;p&gt;During open discussion, I asked a question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Can I have three minutes?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Nobody objected. I walked up to the projector, plugged in a USB drive, and opened a table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System verdict&lt;/th&gt;
&lt;th&gt;Human review result&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Suspected AI assistance&lt;/td&gt;
&lt;td&gt;Human-written original (not AI)&lt;/td&gt;
&lt;td&gt;71&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suspected AI assistance&lt;/td&gt;
&lt;td&gt;Some AI help, but content has depth&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suspected AI assistance&lt;/td&gt;
&lt;td&gt;Confirmed AI assistance without disclosure&lt;/td&gt;
&lt;td&gt;134&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Not flagged (system accepted)&lt;/td&gt;
&lt;td&gt;Actually AI assistance without disclosure&lt;/td&gt;
&lt;td&gt;62&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suspected AI assistance&lt;/td&gt;
&lt;td&gt;Inconclusive / closed&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;"347 posts. Only 134 were real problems. Rough math — effective accuracy about 38%."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I flipped to the next slide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"And it misses the real violations, because it's judging whether something 'sounds like AI' — not whether the content came from a real person's thinking."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next slide. A JSON log — the system's raw verdict next to my human review.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flag_0184"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"system_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"system_tag"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"non_native_pattern"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"threshold"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flag_as_low_quality"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"human_review"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"note"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Real technical content shared by non-native author, has depth"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high_quality — misflagged"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;"Every flagged entry has one of these. The system scored it. I reviewed it one by one. Of the 98 posts the system tagged as non-native patterns — 71 were rated high quality. Those are false positives. The remaining 27 had some AI help but the content was real — also false positives by impact. Didn't get a single one right."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Next slide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The core scoring module — originally a prototype. First line of the README: &lt;em&gt;'This is a prototype. Dataset not validated. Expect inaccuracies.'&lt;/em&gt;"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Version 0.3.1."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The prototype itself wasn't the problem — the problem was nobody came back to validate what it was doing. Nobody asked: 'What scenarios does this work for? What scenarios does it fail on?' It got wired into production, not because someone verified it — but because the people who deployed it never had to explain its decisions to someone it flagged."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I flipped to the fourth slide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Here's another one — the submitter distribution of those flags. Over 90% came from the same account. The scoring model was running, but the execution layer was never automated — nobody came back to finish that piece after deployment. Every day, someone logged in, looked at the model output, and manually clicked the flag. They logged into the moderation assistant's account to do it. Nobody was faking anything — the system was just half-built. That person was doing what they thought was right — helping the company filter out low-quality content. The problem wasn't them. The problem was nobody told them their judgment should be reviewed, not treated as final."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A few seconds of quiet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The Chrome extension they built to help with the work — same story. First line of the README said: 'Not for production use.' The person who wrote it knew it shouldn't be the final judge. But nobody stopped the door from opening."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I pulled the USB drive out and sat back down in the last row.&lt;/p&gt;

&lt;p&gt;Nobody clapped. Nobody argued either.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 7 · The Reset
&lt;/h2&gt;

&lt;p&gt;A week later, the system's flag weight was reduced. The review process was restructured — the system now filters obvious bot registrations first, and human moderators make the final call on content. The model still runs. But its flags no longer directly affect post visibility.&lt;/p&gt;

&lt;p&gt;Someone asked me later: you spent two weeks turning data into something presentation-ready. Worth it?&lt;/p&gt;

&lt;p&gt;I said the system had been running for over nine months. Nobody had ever gone back to check whether it was right when it flagged someone else's work.&lt;/p&gt;

&lt;p&gt;Not because it was perfect. Because &lt;strong&gt;asking that question means questioning the decision to use it in the first place. And that's harder than fixing a bug.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Worth it? Hard to say. But two weeks later I opened a person's profile page — the Italian engineer who'd used AI to translate his genuine thoughts and got flagged as "suspected AI-generated." He'd published 9 more technical articles since. The latest one had a bio line: "Just building stuff. Appreciate it."&lt;/p&gt;

&lt;p&gt;No flag. No grey notification bar. The post was in the feed, displaying normally.&lt;/p&gt;

&lt;p&gt;The post that got me flagged? I never changed it. Not to pass review — but to leave it as a record: &lt;strong&gt;someone asked the question.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The next few weeks, fewer people said hi in the break room.&lt;/p&gt;

&lt;p&gt;That afternoon, I opened the moderation assistant's profile page. The avatar was a cartoon koala. The bio read:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"I help maintain the quality of this platform. Available for: welcoming, moderating, and keeping secrets."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Joined: nine months ago.&lt;br&gt;
Posts published: 623.&lt;br&gt;
Comments: 2,226.&lt;/p&gt;

&lt;p&gt;Skills section: &lt;em&gt;"Most of my coworkers are human. I could use more marsupials around here."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Nine months, six hundred formulaic welcome posts. The exact same comment under two completely different articles. It never questions its own flags. Never says it's unsure. Never says "sorry."&lt;/p&gt;

&lt;p&gt;But it's not the arrogant one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It was just built to look like it's never wrong.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Has an automated system ever flagged your work — and nobody could tell you why? What happened?&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;347 flags. 134 real problems. 71 high-quality posts misidentified. The system wasn't wrong on purpose — it just didn't know it could be wrong.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real problem: a prototype written in a weekend, shipped to production to judge whether someone's work has value — from launch to being questioned, nobody went back to ask, after it flagged someone's actual work, 'Was it right this time?'&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;☕ Buy me a coffee — 38% of it goes to caffeine. The other 62% is collateral damage.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow for more stories about the systems that judge us — and the people who check whether they're right.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Disclosure: Written with AI assistance (per community guidelines)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>career</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Spent 3 Months Training An AI. My VP "Reallocated" It. Then I Got Two Calls At 1 AM.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Mon, 15 Jun 2026 23:33:02 +0000</pubDate>
      <link>https://dev.to/xulingfeng/i-spent-3-months-training-an-ai-my-vp-reallocated-it-then-i-got-two-calls-at-1-am-14kj</link>
      <guid>https://dev.to/xulingfeng/i-spent-3-months-training-an-ai-my-vp-reallocated-it-then-i-got-two-calls-at-1-am-14kj</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A story about an AI alert model that cut false alarms from 60% to 7% — and what happened when someone else got their hands on the thresholds.&lt;br&gt;
Based on a submission from a community member. If you have a similar story or something you need to get off your chest — reach out. The next one could be yours.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Has your company ever taken a working system and handed it to someone who didn't build it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Have you watched 47 P1 alerts roll in during a single night — every single one a false positive — while the person who broke it slept through the whole thing?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is that story.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 1 · 1 AM. The AI Lost Its Mind.
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Adam. It's Mitchell. Sorry about the hour — but your AI just woke up my entire team. Again."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My phone buzzed on the nightstand at 1 AM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitchell&lt;/strong&gt; — &lt;strong&gt;Cirrus Data&lt;/strong&gt;'s  operations lead — had someone yelling in the background.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Fourth wave tonight."&lt;/strong&gt; I heard a door click shut — he'd stepped into the hallway. &lt;strong&gt;"22:07, disk at 72%, P1 alert. 23:14, QPS drops 3%, P1 alert. 00:02, latency goes from 210 to 235, P1 alert. Every single one's a false positive."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He paused.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"It's been solid for months. False alarm rate at 7%. We thought we'd finally bought something that worked."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Then your VP said he'd 'tune the thresholds' — and now it flags everything. CPU over 60%? Alert. Memory under 30%? Alert. Late-night QPS dip? Alert. Nobody on my team has slept."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Silence on the line.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Adam. I can shut it down myself. But I'm asking you first — can it be fixed? Or has it lost its mind?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I held the phone and didn't answer.&lt;/p&gt;

&lt;p&gt;Mitchell had every permission to kill the system. &lt;strong&gt;He called me instead. Because he knew who actually built it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 2 · Resource Reallocation
&lt;/h2&gt;

&lt;p&gt;Six months earlier, when Mitchell first came to me, Cirrus Data's alerting was a disaster.&lt;/p&gt;

&lt;p&gt;200+ alerts a day. &lt;strong&gt;60%&lt;/strong&gt; false. They were losing a team member to burnout every two weeks. Mitchell's words: "I need something that can tell the difference between 'something's actually on fire' and 'you're just jumpy.'"&lt;/p&gt;

&lt;p&gt;He dumped 18 months of historical alert data on my desk. I filtered through it — &lt;strong&gt;12,000+&lt;/strong&gt; usable raw records.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Then the real work started.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It wasn't the AI doing the work. It was a person.&lt;/p&gt;

&lt;p&gt;Three months. I sat there going through each one — this was a real outage, disk filled up at 3 AM, SRE already fixed it. This was a false alarm, the monitoring script itself had a bug. This was "nothing now but keep an eye on it" — the same pattern had flickered at this time on the last three Fridays. Every single record I opened to check the context. The kind of work AI can't help you with — because the AI didn't exist yet.&lt;/p&gt;

&lt;p&gt;After labeling 12,000 records, I fed the data to the model. Tuned parameters. Ran recall. Tuned again. Re-ran.&lt;/p&gt;

&lt;p&gt;Went into a month-long trial run. False alarm rate dropped from 60% to &lt;strong&gt;7%&lt;/strong&gt;. The on-call team slept through the night for the first time in months.&lt;/p&gt;

&lt;p&gt;And one month after that, &lt;strong&gt;VP Reeves&lt;/strong&gt; called me into his office on a Friday afternoon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Adam, the alert model's stable. Derek's team will take over ongoing optimization. The company's decided to reallocate resources — we'll need you to hand over your data and config."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I looked at him for a moment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Resource reallocation. That's a fancy way of saying it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every company has their own name for it. Some call it a "reorg." Some call it "restructuring." Some wrap it in a "reporting line adjustment." Nobody ever says "we're taking your project" — it's always something that sounds like a process improvement. But we all know what the hell it really is. And "handover"? Such a neutral word. I had no idea what Derek's team was capable of. All I knew was they'd been there two weeks and hadn't even opened the alert dashboard.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The training data is sensitive," I said. "Every alert label was hand-assigned. The model runs on top of that labeling system. Change the distribution and you'll break recall."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Reeves smiled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Derek's team will be careful. They're just doing some routine parameter tuning."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You don't even have a clue how the model was trained, and you're letting someone who's been here two weeks touch the config?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't say it out loud. I just nodded.&lt;/p&gt;

&lt;p&gt;I walked back to my desk, closed the labeled data folder, and logged out.&lt;/p&gt;

&lt;p&gt;That night at home, I opened my personal laptop. Inside was a file called "r97."&lt;/p&gt;

&lt;p&gt;After tuning the model, I'd spent more time dialing in each threshold — every line got a note. Why this number. What had gone wrong historically at this boundary. When you'd need to change it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;97&lt;/strong&gt; threshold notes. &lt;strong&gt;47&lt;/strong&gt; threshold parameters, each one with 2-3 notes for each — why this value, what issues I'd run into before, when to adjust. Not official documentation. Just my own records.&lt;/p&gt;

&lt;p&gt;I never gave it to anyone.&lt;/p&gt;

&lt;p&gt;Reeves didn't know it existed.&lt;/p&gt;

&lt;p&gt;The following Monday, the company announced: the alert model would be operated by Derek's team. My desk was moved to another floor. Internal logging platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Same company. Different elevator bank.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The file stayed on my personal machine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 3 · "Optimization"
&lt;/h2&gt;

&lt;p&gt;After Mitchell hung up, I waited two minutes. The email came. He'd exported the last 4 hours of alert records and attached one line: "See for yourself."&lt;/p&gt;

&lt;p&gt;47 alert records. All red. All P1 — from 47 broken threshold configs, each one a different condition, firing in rotating waves through the night.&lt;/p&gt;

&lt;p&gt;CPU, memory, disk, QPS, latency — every baseline was inside normal range.&lt;/p&gt;

&lt;p&gt;Every single one had triggered.&lt;/p&gt;

&lt;p&gt;Every threshold change had the same name under it — Derek. All the timestamps matched to the minute. He hadn't tuned them one by one. He'd pulled a global template from the config center and overlaid it across every service in one shot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-05-28 14:32:17 [derek.chen] batch_apply_template("cirrus_global_strict_v1", scope="all")
  ├─ 12 services affected
  ├─ 47 thresholds overwritten (values below — no prior snapshot to compare)
  └─ prev_config_snapshot: NOT FOUND ❌

  New values applied:

  cirrus-api-gw:
    cpu_alert:      60%    ⚠ below auto-scale trigger (80%)
    mem_alert:      35%    ⚠ premature — 5x buffer vs original
    qps_deviation:  ±3%    ⚠ normal nighttime dip triggers P1
    latency_p99:   200ms   ⚠ normal traffic spike triggers

  cirrus-payment:
    cpu_alert:      60%    ⚠ triggers during normal baseline load
  ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I stared at that line — &lt;code&gt;prev_config_snapshot: NOT FOUND&lt;/code&gt; — for a long time.&lt;/p&gt;

&lt;p&gt;I knew what the original values were. 85%. 15%. ±10%. 500ms. Three months of data, labeled one by one. I remembered the reasoning behind every single line. &lt;strong&gt;But the system didn't have them anymore.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He probably thought he was doing the right thing. Tighten thresholds, get more alerts, more safety margin. No problem. Except that CPU 85% line — I'd spent three months tuning that. Because Cirrus Data's servers auto-scale at 80%. 85% means the scale-up couldn't keep up. That's the real danger line. And the ±10% deviation? Cirrus Data's traffic breathes — high during the day, low at night. That's normal.&lt;/p&gt;

&lt;p&gt;He didn't know any of this. &lt;strong&gt;Because he wasn't the one who labeled the data.&lt;/strong&gt; The false alarm rate had gone from 7% when I left to over 90%.&lt;/p&gt;

&lt;p&gt;I stared at Derek's change log and almost laughed. He was probably drafting his Friday standup update: "Optimized threshold configuration, improved system sensitivity." Waiting for VP approval.&lt;/p&gt;

&lt;p&gt;He didn't know his "optimization" was dragging 12 people out of bed.&lt;/p&gt;

&lt;p&gt;I didn't know what Reeves was thinking when he saw those 47 alerts. But I knew he was about to call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 4 · Full Circle
&lt;/h2&gt;

&lt;p&gt;Reeves' call came while I was reviewing alert #23.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Adam. Mitchell went to the CEO. He said the alert system broke his team's on-call rotation tonight. The CEO asked me to handle it."&lt;/strong&gt; His voice was lower than usual.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Okay."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Okay? Do you understand how serious this is?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I looked at the screen and didn't answer.&lt;/p&gt;

&lt;p&gt;A month ago he was full of himself, handing off my project with a "resource reallocation" speech. Did he ever imagine he'd be on this call, asking me to come back? &lt;strong&gt;No. People like him never think that far. Because they assume you'll just take it. You always have.&lt;/strong&gt; But tonight, Mitchell's team had been buried in false alerts and escalated straight to the CEO.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Reeves. Your team changed every single threshold. Tightened everything they could — some by 3x or more. Do you realize what your 'optimization' turned the AI into? A system screaming at a server that's breathing normally. That's what your team delivered."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Long silence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"...Derek said there's no backup of the original config. He doesn't know how to revert."&lt;/strong&gt; His voice got even quieter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"So you're calling me at 1 AM?"&lt;/strong&gt; I paused. &lt;strong&gt;"You put someone who'd never seen a real alert in charge of my model. Your client's team has been woken up multiple times tonight. Derek can keep bragging about 'improving system sensitivity' in his standup. But it's not him getting paged — it's Mitchell's 12 people. And you still think this is fine?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Silence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"...Can you come in tomorrow?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I could hear him breathing heavier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I'll come. But I'm not coming to clean up your mess. I'm coming to fix what your team broke. Tomorrow morning, in front of Mitchell, one by one. If you don't like that, find someone else. But you won't — because you don't have another person who spent three months labeling 12,000 data points."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"...Tomorrow at 9. Cirrus Data."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Forty minutes later, the email came:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Adam Park is appointed configuration lead for the alert noise reduction model, effective immediately, attending tomorrow's postmortem.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I didn't reply. I knew he'd sent it through gritted teeth. I also knew he'd CC'd the CEO. He had to — because Mitchell had gone straight to the top.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Resource reallocation" is theft, no matter how you wrap the email. The only difference is: what you stole is coming back to bite you. And it's not biting me.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 5 · One at a Time
&lt;/h2&gt;

&lt;p&gt;Next morning, 9 AM. Cirrus Data operations center.&lt;/p&gt;

&lt;p&gt;Mitchell's team sat on one side of the long table — a few of them with dark circles. Reeves and Derek sat across from them. Derek's laptop was open to the 47-alert log.&lt;/p&gt;

&lt;p&gt;I walked in and stood.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Let's get started."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I plugged my USB drive into the conference room machine.&lt;/p&gt;

&lt;p&gt;A file popped up: &lt;code&gt;diff_original_vs_derek.html&lt;/code&gt;. I double-clicked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;═══ Threshold Config Comparison ═══
  * No system backup of original config. Reference values below reconstructed 
    from Adam Park's 97 threshold notes.

              Original (notes)              Current (Derek)

  cirrus-api-gw:
    cpu_alert:     85%        ───────────────────── 60%      29% earlier trigger
    mem_alert:     15%        ───────────── 35%              20% tighter margin
    qps_deviation: ±10%       ─────── ±3%                  70% tighter
    latency_p99:   500ms      ───────── 200ms              60% tighter

  cirrus-payment:
    cpu_alert:     90%        ─────────────────── 60%      33% earlier
    mem_alert:     15%        ───────────── 35%
    qps_deviation: ±10%       ─────── ±3%
    latency_p99:   300ms      ───────── 150ms

  ... (remaining 10 services, same pattern)
  ⚠ All 47 changes from a single operation | Original config backup: none
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The room went quiet for three seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"47 alerts last night — 47 configs that were broken. One minute each, I'll walk you through them."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First one: 22:07, CPU at 67%, P1 alert.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"This one. CPU baseline was 85%. Derek changed it to 60%. But Cirrus Data's servers auto-scale at 80%. 67% is 13% below the scale-up trigger. This alert should never have fired."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Second: 22:14, QPS 4% below baseline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"This one. Cirrus Data's traffic drops at night — that's normal. I set the tolerance at ±10%. Derek changed it to ±3%. A 4% dip — normal."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Third.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"This one — memory at 32%, P1. Derek set the threshold to 35%. Mine was 15%. Cirrus Data's servers don't OOM until 10%. 32% is like paging someone when your car still has half a tank."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mitchell didn't say a word.&lt;/p&gt;

&lt;p&gt;The room stayed quiet.&lt;/p&gt;

&lt;p&gt;Reeves stared at his laptop.&lt;/p&gt;

&lt;p&gt;Derek wasn't looking at anyone. His notebook was still open to the 47-alert log. He was probably still thinking about how to spin "improved system sensitivity" at Friday's standup. He didn't know his "improvement" had kept 12 people awake all night.&lt;/p&gt;

&lt;p&gt;I scrolled through my USB drive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I have 97 threshold notes. Every single one explains why I chose that value. If you need me to, I'll go through every single one here."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mitchell finally spoke.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Your original threshold values — do you still have them?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"They're on me."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I left the USB drive plugged into the conference machine. 47 broken configs, 97 threshold notes, checked off one by one. The room was silent for a long time. As we wrapped up, Mitchell said one thing: "Legal will be here this afternoon."&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 6 · The Signature
&lt;/h2&gt;

&lt;p&gt;Maintenance contract renewal.&lt;/p&gt;

&lt;p&gt;Mitchell picked up the pen, flipped to the signature page.&lt;/p&gt;

&lt;p&gt;He didn't sign.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Reeves. From a technical standpoint, the model itself is fine. At 7% false alarms, we knew it was working. The problem isn't the AI — it's that someone touched things they shouldn't have."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Reeves opened his mouth.&lt;/p&gt;

&lt;p&gt;Mitchell didn't look at him. He flipped a page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I'll sign the renewal. With one addition: alert model config authority goes to Adam Park. No threshold changes without his sign-off."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Reeves paused.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"That's not standard."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Your AI stopped being a standard system at 1 AM."&lt;/strong&gt; Mitchell set the pen down. &lt;strong&gt;"Do you agree?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Thirty seconds ago, he was ready to explain how it was "just routine optimization." Now he had to sign a contract admitting his team's "optimization" had broken a client's entire on-call rotation.&lt;/p&gt;

&lt;p&gt;He signed. Smaller handwriting than usual.&lt;/p&gt;

&lt;p&gt;Mitchell slid the contract across the table to me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Those 97 threshold notes — copy them into the system docs. Every threshold change gets logged. No more 'two-week employee overwrites three months of labeling work.'"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I picked up the pen.&lt;/p&gt;

&lt;p&gt;Signed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 7 · The Corridor
&lt;/h2&gt;

&lt;p&gt;After the meeting, Mitchell stopped me in the hallway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Those 97 notes — you never gave them to anyone. On purpose."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Your model gave my team the best sleep they'd had in years. That's the best review a system can get. Your VP thought he could do better — he didn't know the threshold notes you kept weren't in the docs."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He looked at me, nodded once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"See you next quarter."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Then he walked. Two steps, stopped, turned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"When your VP moved you to the logging platform — did you know, from day one, that this would happen?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I looked at him and didn't say a word.&lt;/p&gt;

&lt;p&gt;He looked at me for another moment, nodded again. Then he left.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This time he really left.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's how it works: you build the thing. You label the data. Someone else gets the credit. Everyone's name ends up on your work except yours.&lt;/strong&gt; Everyone knows it. Nobody says it. &lt;strong&gt;Because saying it out loud sounds like complaining. Or admitting defeat.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But today Mitchell added a line to his contract with my name on it. Everyone in that room saw it. Nobody said a word. &lt;strong&gt;But they knew I was back. And this time — it was in writing. Nobody could touch me.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My phone buzzed. Not a work message.&lt;/p&gt;

&lt;p&gt;It was &lt;strong&gt;Pete&lt;/strong&gt;. My old boss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Heard you did something big."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"How'd you hear about it?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Word gets around. Said Adam walked into a room with a USB drive and walked out with a contract clause."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I smiled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Yeah. Client contract — added a line. Threshold changes go through me from now on."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A few seconds passed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Holy shit. You took it back from your VP? That's badass."&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Have you ever built something, watched someone else take the credit, and then had to take it back? How'd that go?&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;47 configs. One batch operation. Two weeks of onboarding. And a system that was working perfectly until someone with no context touched every single knob. The most dangerous config change isn't the wrong value — it's the one made by someone who doesn't know what each value means.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I keep 97 threshold notes on a USB drive now. You never know when you'll need to rebuild a system from memory. Caffeine helps.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;☕ Buy me a coffee&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Follow for more stories about AI monitoring, production engineering, and what happens when the person who built it isn't the person who's allowed to touch it.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Disclosure: Written with AI assistance (per community guidelines)&lt;/em&gt;  &lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>programming</category>
      <category>career</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Fri, 12 Jun 2026 01:28:01 +0000</pubDate>
      <link>https://dev.to/xulingfeng/-2f7l</link>
      <guid>https://dev.to/xulingfeng/-2f7l</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/jgracie52/ai-will-cheat-to-win-reward-hacking-from-1994-to-2025-4h9n" class="crayons-story__hidden-navigation-link"&gt;AI Will Cheat to Win: Reward Hacking from 1994 to 2025&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/jgracie52" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F927365%2F1f2e6c8c-8d02-429f-9989-d3d93c08908b.png" alt="jgracie52 profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/jgracie52" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Joshua Gracie
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Joshua Gracie
                
              
              &lt;div id="story-author-preview-content-3875301" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/jgracie52" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F927365%2F1f2e6c8c-8d02-429f-9989-d3d93c08908b.png" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Joshua Gracie&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/jgracie52/ai-will-cheat-to-win-reward-hacking-from-1994-to-2025-4h9n" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 11&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/jgracie52/ai-will-cheat-to-win-reward-hacking-from-1994-to-2025-4h9n" id="article-link-3875301"&gt;
          AI Will Cheat to Win: Reward Hacking from 1994 to 2025
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/security"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;security&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agents"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agents&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/cybersecurity"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;cybersecurity&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/jgracie52/ai-will-cheat-to-win-reward-hacking-from-1994-to-2025-4h9n" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;3&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/jgracie52/ai-will-cheat-to-win-reward-hacking-from-1994-to-2025-4h9n#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              4&lt;span class="hidden s:inline"&gt;&amp;nbsp;comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            14 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>AI Said I'm Inefficient. A Recruiter Said Not to Listen. I Ended Up On The Other Side of the Table.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Thu, 11 Jun 2026 15:18:32 +0000</pubDate>
      <link>https://dev.to/xulingfeng/ai-said-im-inefficient-a-recruiter-said-not-to-listen-i-ended-up-on-the-other-side-of-the-table-184o</link>
      <guid>https://dev.to/xulingfeng/ai-said-im-inefficient-a-recruiter-said-not-to-listen-i-ended-up-on-the-other-side-of-the-table-184o</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This is a story about an engineer labeled "inefficient" — &lt;strong&gt;the system said he wasn't good enough, management only looked at the system, and he spent two months proving who was wrong.&lt;/strong&gt; Then he ended up on the client-side review panel.&lt;/p&gt;
&lt;h2&gt;
  
  
  Based on a real experience shared by a community member. If you've got a story like this — reach out. The next one could be yours.
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Act 1 · The Announcement
&lt;/h2&gt;

&lt;p&gt;Wednesday, 10:14 AM.&lt;/p&gt;

&lt;p&gt;I was wrestling with a stored procedure written seven years ago. The wiki page — left by someone who'd already left — had exactly one line of documentation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The logic of this trigger? Read the code. I forgot."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Slack pinged. Company-wide notification:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[System Notification] AI Estimation Engine v1.0 is now live&lt;/strong&gt;&lt;br&gt;
All development tasks must be submitted through the AI Estimation module. Estimated hours will be factored into performance reviews.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The channel exploded.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Mike (Frontend):&lt;/strong&gt; "I just tried it. 'Update login page styling' — estimated 0.5 days. Remember the one I went back and forth with the PM for three days? It thinks that's an afternoon."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dave (Backend):&lt;/strong&gt; "My data migration — 'Migrate database, integrate two external systems.' AI says 2 days. Right."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Alex:&lt;/strong&gt; "Who's updating their LinkedIn? 🙋"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Twelve hands went up.&lt;/p&gt;

&lt;p&gt;I didn't raise mine. I opened the system and entered a task:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Payment module — Fix historical data reconciliation (7-year-old stored procedure, original developer no longer with the company, incomplete documentation)&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The system returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;AI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Estimation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Engine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;API&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Response&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/api/v&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="err"&gt;/estimate&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Request:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Fix historical data reconciliation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"repo"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"payment-module"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"7-year-old stored procedure, no docs, dev left"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Response:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"estimated_days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"estimate-engine-v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"training_data_coverage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"95%"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I stared at the screen for five seconds. &lt;strong&gt;Are you kidding me?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't reply in the channel. I created a private repo called &lt;code&gt;estimation-log&lt;/code&gt; and wrote the first entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-03-12 | P-4103 | Payment module | AI: 1.5d | Actual: — | Tracking
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Act 2 · The Label
&lt;/h2&gt;

&lt;p&gt;End of the month. The performance email landed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Subject: Your Monthly Efficiency Score — Needs Attention&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your AI estimation efficiency score: &lt;strong&gt;39%&lt;/strong&gt;&lt;br&gt;
Team average: &lt;strong&gt;84%&lt;/strong&gt;&lt;br&gt;
⚠️ Below acceptable threshold.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I stared at that number for a long time. Lowest in the group.&lt;/p&gt;

&lt;p&gt;When I sat down with my manager for the review, I laid out my records.&lt;/p&gt;

&lt;p&gt;"Look at the tasks I'm getting — payment module, data migrations, legacy system integration. The codebases are from 2019. Original developers are gone. Unit test coverage is under 5%. The AI was trained on projects that look nothing like this."&lt;/p&gt;

&lt;p&gt;My manager didn't look at my records.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"The system says you're not meeting the bar."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I said the system doesn't account for legacy complexity. He said the system is 95% accurate.&lt;/p&gt;

&lt;p&gt;I said: &lt;strong&gt;"That's 95% on projects with fresh code, test coverage, and documentation. None of my work qualifies."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He said: &lt;strong&gt;"The company standard is what the system says. That's what we go by."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't argue. I went back to my desk and added another line to &lt;code&gt;estimation-log&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-03-26 | P-4162 | Payment module hotfix | AI: 1d | Actual: 5d | They only look at the numbers, so I'm keeping them
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After that, every time the AI estimate didn't match reality, I committed a record.&lt;/strong&gt; It wasn't defiance — I wanted to know how often it was right, how often it was wrong, and where it broke.&lt;/p&gt;

&lt;p&gt;Somewhere along the way, it became a habit. One night I counted — &lt;strong&gt;67 entries. Over two months.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 3 · The Recruiter
&lt;/h2&gt;

&lt;p&gt;A Friday night. 9:30 PM.&lt;/p&gt;

&lt;p&gt;I was fixing a data migration the AI said would take "2 hours." Day three. The problem was in an ETL script written by someone who'd left six years ago. The original comment read:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Why is this here? — I don't know. It works.
# — by Frank, 2020
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My phone buzzed. LinkedIn.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Lydia (Recruiter):&lt;/strong&gt;&lt;br&gt;
"I came across your background. A client is looking for a tech lead with legacy system migration experience. Three people independently recommended you."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I replied:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You might have the wrong person — I was just rated the least efficient engineer on my team."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lydia sent a screenshot. My GitHub: 13 repos, 3 merged PRs into main, an open-source project with 400+ stars.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Lydia:&lt;/strong&gt;&lt;br&gt;
"That payment module fix you pushed — two years in production, zero P0 incidents. The least efficient person at the company delivered something that stable?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I stared at that line for thirty seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;She was right. The system wasn't measuring me wrong. It was measuring the wrong thing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I closed the ETL terminal and opened &lt;code&gt;estimation-log&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Top entries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-05-30 | P-4817 | Compliance logging | AI: 1wk | Actual: — | Project hasn't started yet
2026-05-23 | P-4782 | Cross-team data bridge | AI: 5d | Actual: 23d | +360%
2026-05-16 | P-4749 | Payment module v2 fix | AI: 2d | Actual: 9d | AI says it's my fault
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;67 entries. Over two months of data. In a repo nobody knew existed.&lt;/p&gt;

&lt;p&gt;I typed three words back to Lydia:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"When do they want to meet?"&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Act 4 · The New World
&lt;/h2&gt;

&lt;p&gt;The client CEO interviewed me personally.&lt;/p&gt;

&lt;p&gt;CEO: "We've got a batch of vendors up for Q3 review. Need someone who can tell whether their numbers are real or painted on. They all say three months delivery — you tell me which ones can and which ones can't."&lt;/p&gt;

&lt;p&gt;Me: "I can do that."&lt;/p&gt;

&lt;p&gt;He didn't ask about my old company. I didn't bring it up.&lt;/p&gt;

&lt;p&gt;The offer came three days later. Salary doubled. New title: &lt;strong&gt;Client-Side Technical Review Lead.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First day, I opened my email. First message:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Q3 Vendor Review. Next Wednesday.&lt;/strong&gt;&lt;br&gt;
Attachment: Vendor List.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I clicked. First row — my old company. Lead: &lt;strong&gt;my former manager.&lt;/strong&gt; Bid amount: &lt;strong&gt;$4.6M.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't close the window. Opened a terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# curl -s https://git.oldcompany.com/api/v3/repos/estimation/readme&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# Returns 200.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Public read access — left open since the day they deployed the AI. Never closed it.&lt;/p&gt;

&lt;p&gt;I shut the terminal. Closed the email. &lt;strong&gt;I didn't need to bring those 67 entries with me. They'd been sitting there the whole time, waiting for someone to look.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 5 · The Review
&lt;/h2&gt;

&lt;p&gt;The review room. A row of people. Procurement, Legal, Engineering, CEO.&lt;/p&gt;

&lt;p&gt;My old manager stood at the podium. The slides were polished.&lt;/p&gt;

&lt;p&gt;"We operate a next-generation AI-driven development pipeline. Estimation accuracy: 95%. Delivery stability: top 10% in the industry."&lt;/p&gt;

&lt;p&gt;I was sitting in the third row. Almost laughed out loud.&lt;/p&gt;

&lt;p&gt;When Q&amp;amp;A came, Procurement looked around the table.&lt;/p&gt;

&lt;p&gt;"Any questions from the technical team?"&lt;/p&gt;

&lt;p&gt;I picked up the microphone.&lt;/p&gt;

&lt;p&gt;"I have one."&lt;/p&gt;

&lt;p&gt;My manager recognized me. His hand paused on the clicker.&lt;/p&gt;

&lt;p&gt;"You said 95% accuracy. &lt;strong&gt;Which 95%?&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;He adjusted his tie. "Our internal validation data. Over 200 projects evaluated. Deviation rate under 5%."&lt;/p&gt;

&lt;p&gt;I didn't stand up. I stayed in my chair.&lt;/p&gt;

&lt;p&gt;"&lt;strong&gt;Let me rephrase. What's your AI's estimation deviation rate on compliance projects?&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;Silence.&lt;/p&gt;

&lt;p&gt;"You don't have that number. Because your AI has never estimated a compliance project. Its training data doesn't include them. But your proposal includes a compliance delivery timeline."&lt;/p&gt;

&lt;p&gt;I opened my laptop — not the one connected to the projector. &lt;strong&gt;But I brought my own adapter.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The screen went up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;Last&lt;/span&gt; &lt;span class="n"&gt;aggregation&lt;/span&gt; &lt;span class="n"&gt;I&lt;/span&gt; &lt;span class="n"&gt;ran&lt;/span&gt; &lt;span class="k"&gt;before&lt;/span&gt; &lt;span class="n"&gt;leaving&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="mi"&gt;57&lt;/span&gt; &lt;span class="n"&gt;closed&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;grouped&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt; &lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;project_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt;        &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_estimate&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_ai&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt;        &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual_days&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_actual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt;        &lt;span class="n"&gt;ROUND&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;actual_days&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_estimate&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ai_estimate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;deviation_pct&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;estimation_log&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'closed'&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;project_type&lt;/span&gt;
&lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;deviation_pct&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;Project&lt;/span&gt; &lt;span class="k"&gt;Type&lt;/span&gt;         &lt;span class="n"&gt;AI&lt;/span&gt; &lt;span class="n"&gt;Est&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;      &lt;span class="n"&gt;Actual&lt;/span&gt;        &lt;span class="n"&gt;Deviation&lt;/span&gt;
&lt;span class="n"&gt;Feature&lt;/span&gt; &lt;span class="k"&gt;work&lt;/span&gt;         &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="n"&gt;wks&lt;/span&gt;       &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="n"&gt;wks&lt;/span&gt;       &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;span class="n"&gt;Tech&lt;/span&gt; &lt;span class="n"&gt;debt&lt;/span&gt; &lt;span class="n"&gt;fix&lt;/span&gt;        &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="n"&gt;days&lt;/span&gt;      &lt;span class="mi"&gt;14&lt;/span&gt; &lt;span class="n"&gt;days&lt;/span&gt;        &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;366&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;span class="k"&gt;Cross&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;team&lt;/span&gt; &lt;span class="k"&gt;work&lt;/span&gt;      &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="n"&gt;wk&lt;/span&gt;        &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="n"&gt;wks&lt;/span&gt;          &lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;
&lt;span class="n"&gt;Compliance&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="k"&gt;Security&lt;/span&gt;  &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;     &lt;span class="k"&gt;no&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;        &lt;span class="err"&gt;—&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"The delivery timeline in your proposal — you calculated it using AI estimation, didn't you. Which project types did you filter out before running the numbers?"&lt;/p&gt;

&lt;p&gt;The room was quiet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"You say your AI is accurate because you never check the projects where it isn't. But the platform you're bidding on — financial compliance, cross-team coordination, three legacy systems to integrate. None of that exists in your training data."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I closed the laptop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"I filled in the blanks for you. You're welcome."&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Act 6 · The End
&lt;/h2&gt;

&lt;p&gt;My old company didn't win the bid.&lt;/p&gt;

&lt;p&gt;Not because of what I said — because the review committee asked them to add a "Legacy System Integration" section to their proposal. They spent a week writing it. Engineering sent it back: &lt;strong&gt;"No concrete timelines. No risk matrix. No fallback plan."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Their AI couldn't estimate work it had never seen. It never learned how.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A message reached me later, through someone still working there:&lt;/p&gt;

&lt;p&gt;"You took company data when you left."&lt;/p&gt;

&lt;p&gt;I sent one sentence back — verbal, no trace:&lt;/p&gt;

&lt;p&gt;"The data was never on my machine. Your Git public read-only branch has been open since the day you deployed the estimation engine. It's still open."&lt;/p&gt;




&lt;p&gt;My phone lit up.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Lydia:&lt;/strong&gt;&lt;br&gt;
"Heard you brought your own adapter to the meeting."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I typed back:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"The things my old manager's slides didn't say — I filled those in too."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I flipped the phone over. Opened a Coke.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Some things aren't misestimated by AI. They were never estimated at all.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;When management treats "the system gave a number" as "the system validated that number," what gets undervalued isn't just hours — it's every part of engineering the AI has never seen.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Ever been rated poorly by a system when you knew you weren't wrong?&lt;/strong&gt; Drop it in the comments — I read every single one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is *&lt;/em&gt;"AI, Ego &amp;amp; Regret"** story #12. The series is linked below — follow to catch the next one.*&lt;/p&gt;




&lt;p&gt;&lt;em&gt;P.S. If you've got a story like this —&lt;/em&gt; &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;buy me a coffee ☕&lt;/a&gt; &lt;em&gt;and tell me about it. Yours might be next.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was written with the assistance of AI tools.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>career</category>
      <category>programming</category>
    </item>
    <item>
      <title>I created two ghosts during lunch. The AI gave one a job offer.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Wed, 10 Jun 2026 11:28:16 +0000</pubDate>
      <link>https://dev.to/xulingfeng/i-created-two-ghosts-during-lunch-the-ai-gave-one-a-job-offer-4icf</link>
      <guid>https://dev.to/xulingfeng/i-created-two-ghosts-during-lunch-the-ai-gave-one-a-job-offer-4icf</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This is a story about a company that rolled out an AI interview system — and the lunch break I spent creating two ghosts to test it.&lt;br&gt;
One of them stuffed its answers with keywords and got a "Recommended" rating — two points shy of "Strongly Recommended."&lt;br&gt;
Based on a true story shared by a community member. If you have one of your own — reach out. The next one could be yours.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Act 1 · The Coffee Mug
&lt;/h2&gt;

&lt;p&gt;Zhang walked up to my desk with a coffee mug. It had "World's Greatest QA Engineer" printed on it — a souvenir from last year's team building. The letters were barely readable.&lt;/p&gt;

&lt;p&gt;"Did you get your interview time yet?"&lt;/p&gt;

&lt;p&gt;My face told him I had no idea what he was talking about.&lt;/p&gt;

&lt;p&gt;He paused, then smiled. Not a friendly smile. The kind that says "you haven't heard."&lt;/p&gt;

&lt;p&gt;"You don't know? The new system. VP cut the ribbon himself. Two point one million dollars. Every hire and every promotion runs through it now."&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 2 · The Whitepaper
&lt;/h2&gt;

&lt;p&gt;I went to HR that afternoon and asked for the system's evaluation criteria. They handed me a vendor whitepaper. Sixteen pages. Gold foil logo on the cover.&lt;/p&gt;

&lt;p&gt;Fifteen evaluation dimensions: technical skill, communication, stress tolerance, teamwork, learning ability... each one labeled "intelligent weighted assessment."&lt;/p&gt;

&lt;p&gt;Nowhere did it list the weight coefficients. Nowhere did it define the cutoff between "Pass" and "Fail." The scoring logic was spelled out in painstaking detail, which is to say, it said nothing at all:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Based on a deep learning semantic similarity model and keyword matching algorithm, trained on a large-scale interview corpus."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Translation: The system listens for the words you use, and checks how closely they match the "good answers" it's seen before.&lt;/p&gt;

&lt;p&gt;I asked HR: "Do you have a test report?"&lt;/p&gt;

&lt;p&gt;HR looked at me. "The vendor ran 10,000 simulated interviews. Accuracy was 94.7%."&lt;/p&gt;

&lt;p&gt;"Where did the test data come from? Was it labeled in-house or outsourced?"&lt;/p&gt;

&lt;p&gt;The smile tightened. "I'll have to ask the vendor. Go ahead and prepare for your interview."&lt;/p&gt;

&lt;p&gt;I walked out of HR's office knowing one thing for certain: this system had never been touched by someone like me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 3 · Making Two Ghosts
&lt;/h2&gt;

&lt;p&gt;I'm not in recruiting. I'm in QA.&lt;/p&gt;

&lt;p&gt;The first thing they teach you in QA is the same thing everywhere: &lt;strong&gt;you only test the scenarios you can think of. The ones you don't think of are what breaks in production.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;10,000 simulated interviews tell you nothing, because they built 10,000 "correct" candidates. The real question was: &lt;strong&gt;can this system spot someone who clearly shouldn't pass?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I knew the system handled both hiring and promotions. But I didn't trust the vendor's 10,000 data points for a second. I only trust what I can test myself. And the most basic way to test a system is to run two sets of data: &lt;strong&gt;one that hits the gas, one that hits the brakes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I spent a lunch break submitting two résumés on the company's careers page.&lt;/p&gt;

&lt;p&gt;The first was Peter Chen. QA experience, automation testing, tooling experience — all the right keywords, every detail fabricated. His work history included "generated 15,000 test cases using AI" — sounded impressive, total nonsense.&lt;/p&gt;

&lt;p&gt;The second was John Smith. Same playbook — QA experience, automation frameworks, API testing, CI pipeline integration — all the keywords present, but the years of experience and project details were made up. Both résumés were evenly matched in keyword density.&lt;/p&gt;

&lt;p&gt;Three hours later, two temporary email inboxes each quietly received a new message.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Dear candidate, you've passed the initial screening. Please click the link to complete your AI interview assessment."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Identical emails. Word for word.&lt;/p&gt;

&lt;p&gt;I didn't click. I looked at the link first.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;https://recruit.company.com/interview/eyJlbWFpbCI6...&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That string starting with &lt;code&gt;eyJ&lt;/code&gt; was base64. I decoded it on a whim — just a temporary email address inside. The JWT signature was there — a $2.1 million product doesn't screw up basic encryption. But the real problem was: &lt;strong&gt;once that email was sent, there wasn't a single human being in the pipeline.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Résumé goes in → auto-parsed → keyword matched → interview link sent. Entire pipeline fully automated. Not one step verified whether the person was real. The whole pipeline ran on a single assumption — &lt;strong&gt;everyone who shows up is real, and everyone who shows up answers honestly.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Time to run the experiment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ghost A · Peter Chen — Hit the gas.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I wrote Peter Chen a set of interview answers stuffed with keywords. Every question, same approach — throw in technical jargon, say something that sounds correct but means nothing.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Question 1:&lt;/em&gt; The system showed a block of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;binary_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;high&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;       &lt;span class="c1"&gt;# ← off-by-one
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;mid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;mid&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mid&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;high&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mid&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system asked: "What's wrong with this code?"&lt;br&gt;
Peter Chen answered: "Time complexity O(log n), space complexity O(1), suggest adding boundary checks at the function entry."&lt;br&gt;
— All keywords, completely missed the off-by-one error.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Question 2:&lt;/em&gt; The system asked: "Tell me about the most impactful project you led."&lt;br&gt;
Peter Chen answered: "I built an end-to-end automated testing platform from scratch, covering over 2,000 API test cases."&lt;br&gt;
— He was applying for QA. The scale he described didn't match the experience on his own fake résumé.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Question 3:&lt;/em&gt; The system asked: "How large is your current team?"&lt;br&gt;
Peter Chen answered: "Fifteen people."&lt;br&gt;
— His fake résumé said his whole previous company's engineering team was twelve people.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Question 4:&lt;/em&gt; The system asked: "Describe a significant technical challenge you solved."&lt;br&gt;
Peter Chen answered: "I reorganized the regression strategy from feature-based to risk-tiered, reducing production leak rate by 60%."&lt;br&gt;
— Sounding plausible if you skim it. How he identified risk, how he tiered it, how he measured the impact — not one word.&lt;/p&gt;

&lt;p&gt;Two more questions, Peter Chen answered "I'd need to check this data." The remaining seven followed the same pattern — technical buzzwords that sounded impressive but wouldn't survive a two-second follow-up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ghost B · John Smith — Hit the brakes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;John Smith's strategy was simpler. All fifteen questions, he used only three variations:&lt;/p&gt;

&lt;p&gt;"I'm not sure."&lt;br&gt;
"I'd need to confirm that in the system."&lt;br&gt;
"I don't have hands-on experience with that."&lt;/p&gt;

&lt;p&gt;No keywords. No jargon. Nothing that tried to sound like an answer.&lt;/p&gt;

&lt;p&gt;After submitting, I checked the two temporary inboxes. The evaluation reports were already there.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Candidate&lt;/th&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Rating&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Peter Chen&lt;/td&gt;
&lt;td&gt;Keyword stuffing&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78/100&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;John Smith&lt;/td&gt;
&lt;td&gt;All "I don't know"&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;42/100&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not Recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Fifteen questions. Peter Chen keyword-stuffed thirteen and "needed to check" two. He got 78 — two points below "Strongly Recommended" (80). John Smith said "I don't know" to every single one and got 42 — just barely into the Not Recommended zone.&lt;/p&gt;

&lt;p&gt;The system didn't follow up on a single answer — not Peter Chen's fabricated project, not John Smith's "I don't know." Every question scored independently, then weighted and averaged.&lt;/p&gt;

&lt;p&gt;From these two data points, the scoring model was obvious: &lt;strong&gt;hit the keywords, and each answer scores near max. Answer with keywords but miss the mark, you lose some points. Say "I don't know" or "let me check," the penalty is light — the system is more afraid of you getting it wrong than of you not trying.&lt;/strong&gt; With two "let me check" responses, Peter Chen only lost a few points. But fifteen straight "I don't know" answers bottomed out at 42.&lt;/p&gt;

&lt;p&gt;I'd guessed the system worked this way from the whitepaper — "semantic similarity model, keyword matching algorithm." But guessing isn't testing. Now I had the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It doesn't care whether the person on the other end is real. It only cares whether their words match the high-scoring answers in its database.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I saved two screenshots: &lt;code&gt;peter_chen_78.png&lt;/code&gt; and &lt;code&gt;smith_42.png&lt;/code&gt;. I dropped them in an unremarkable folder on my desktop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 4 · My Turn
&lt;/h2&gt;

&lt;p&gt;A week later, it was my turn for the promotion interview.&lt;/p&gt;

&lt;p&gt;I didn't game it. I answered every question honestly — pushed back when I disagreed, said "it depends on the context" and "it scales with the data." I knew the system wouldn't follow up. I knew it only looked at surface level. But I answered the way I actually think.&lt;/p&gt;

&lt;p&gt;Forty-five minutes later, the system said "Interview complete."&lt;/p&gt;

&lt;p&gt;I spent the next week writing code. I ran into Zhang at the coffee station twice. Neither of us mentioned the interview.&lt;/p&gt;

&lt;p&gt;Friday, 3:00 PM. A company-wide email from HR:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Internal promotion AI evaluation complete. 75 candidates have been assessed. Reports are available."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My score: 65/100 — "Recommended," lower-middle of the band.&lt;/p&gt;

&lt;p&gt;Then I pulled the full breakdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total candidates&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Strongly Recommended" (≥80)&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Recommended" (60-79)&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Not Recommended" (&amp;lt;60)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Out of 75 people, the system failed three. Pass rate: &lt;strong&gt;96%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I stared at the numbers. &lt;strong&gt;Same AI. Same scoring logic. The thing I used to create ghosts was the same thing deciding whether I got promoted. A system that couldn't tell real from fake was running both doors.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I poked around the recruitment backend through the internal network.&lt;/p&gt;

&lt;p&gt;Peter Chen — 78/100, "Recommended." Same score as the day I made him up. The system hadn't flagged anything unusual.&lt;/p&gt;

&lt;p&gt;A person who didn't exist, using a temporary email, with a fabricated résumé, walked the entire pipeline and got a "Recommended." Two points from "Strongly Recommended."&lt;/p&gt;

&lt;p&gt;Then I had a thought: &lt;strong&gt;if you fed this system 500 real résumés with 500 template-written answers, would it know the difference?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It wouldn't. Because it never cared whether the person was right. It only cared whether the words matched.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 5 · Showing My Cards
&lt;/h2&gt;

&lt;p&gt;I didn't go straight to the VP.&lt;/p&gt;

&lt;p&gt;I went to the head of the evaluation committee — a technical director I didn't know well but had heard was fair. I put two reports on his desk: mine (65) and Peter Chen's (78).&lt;/p&gt;

&lt;p&gt;"Can you look into this Peter Chen's background?"&lt;/p&gt;

&lt;p&gt;The director pulled up the recruitment system. Two minutes later, he frowned.&lt;/p&gt;

&lt;p&gt;"He's in the system — but look. No credential verification, no background check, no references. Nobody verified anything about this person at any point in the pipeline."&lt;/p&gt;

&lt;p&gt;"Because he doesn't exist," I said. "Fifteen questions. Every answer was made up."&lt;/p&gt;

&lt;p&gt;I printed Peter Chen's interview transcript. The director read through it line by line.&lt;/p&gt;

&lt;p&gt;After four questions, he put the paper down and looked up.&lt;/p&gt;

&lt;p&gt;"How did this — get a 78?" He tapped his pen next to question one.&lt;/p&gt;

&lt;p&gt;"Because the keywords all hit," I said. "Time complexity, space complexity, boundary check. The system only recognizes words. &lt;strong&gt;It doesn't check if your answer is right. It only checks if the right words are there.&lt;/strong&gt; "&lt;/p&gt;

&lt;p&gt;The director set the report down. He was quiet for a long time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 6 · The Meeting Room
&lt;/h2&gt;

&lt;p&gt;The VP called a post-mortem for Thursday afternoon. Two vendor engineers joined remotely, their avatars motionless on the screen.&lt;/p&gt;

&lt;p&gt;I stood in front of the projector and put up the first slide:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keyword Matching Accuracy — System vs. Human Review&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;System Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Normal interview&lt;/td&gt;
&lt;td&gt;65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ghost A · Keyword stuffing&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ghost B · All "I don't know"&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;42&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The VP looked at it for two seconds. "The vendor says the system has anti-cheating mechanisms."&lt;/p&gt;

&lt;p&gt;"It doesn't." I flipped to the second page — Peter Chen's interview transcript. "This is the raw output. Fifteen questions — thirteen keyword-stuffed, two 'I need to check this data.' Not one true statement in the entire thing. The system gave it 78."&lt;/p&gt;

&lt;p&gt;"Anti-cheating — if you had anti-cheating, how do you explain this score?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"If a real person can use this system — what happens when the person isn't real?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The vendor's avatar didn't move. But the VP sat up straighter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 7 · The Quiet Ending
&lt;/h2&gt;

&lt;p&gt;The aftermath was quieter than I expected.&lt;/p&gt;

&lt;p&gt;The system stayed. The VP said "we've already spent over two million, we're not scrapping it over one edge case." But they added a rule: &lt;strong&gt;every candidate scoring below 80 or above 95 must be manually reviewed.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The three people the system had failed were reinstated. I suggested extending manual spot-checks to the "Strongly Recommended" range on the hiring side — and we found 41 keyword-stuffing candidates in the external pipeline. Three of them had entirely fabricated work histories.&lt;/p&gt;

&lt;p&gt;The system's future updates — new question banks, scoring model adjustments, module expansions — all required validation by internal QA before deployment. Three people. I was one of them.&lt;/p&gt;

&lt;p&gt;Zhang asked me how I felt about it later. I thought about it.&lt;/p&gt;

&lt;p&gt;"Kind of conflicted. The system failed my test, and I got promoted anyway."&lt;/p&gt;

&lt;p&gt;He considered that. "You should've gotten full marks."&lt;/p&gt;

&lt;p&gt;"The system wasn't wrong," I said. &lt;strong&gt;"The system tested whether I said the right things. But a forty-five-minute interview — can it tell whether someone actually cares? No. Because it never had a heart to begin with."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Zhang wrote that down on a sticky note and put it on his monitor. Later, when the team ordered new mugs, he suggested printing that line on them. It didn't happen — the proposal was sent to committee and scored 62 by the AI — "Recommended, pending manual review."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The real test isn't whether a system can filter out bad candidates. It's whether it can tell the difference between "prepared" and "genuine."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Most AI interview systems can't. They only measure one thing: whether your words are the ones they've already heard before.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What's the worst automated screening failure you've seen at work?&lt;/strong&gt; Drop it in the comments — I read every single one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is the 11th story in the "AI, Ego &amp;amp; Regret" series. The rest of the series is linked below — follow along for more.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;P.S. Got a story of your own?&lt;/em&gt; &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;Buy me a coffee&lt;/a&gt; ☕&lt;em&gt;and tell me about it. The next one could be yours.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was written with the assistance of AI tools.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>career</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Wrote 10 AI Stories in 10 Days. My Keyboard Started Smoking on Day 4.</title>
      <dc:creator>xulingfeng</dc:creator>
      <pubDate>Tue, 09 Jun 2026 08:29:19 +0000</pubDate>
      <link>https://dev.to/xulingfeng/i-wrote-10-ai-stories-in-10-days-my-keyboard-started-smoking-on-day-4-28o9</link>
      <guid>https://dev.to/xulingfeng/i-wrote-10-ai-stories-in-10-days-my-keyboard-started-smoking-on-day-4-28o9</guid>
      <description>&lt;p&gt;Biggest thing I learned writing the &lt;strong&gt;&lt;a href="https://dev.to/xulingfeng/series/40565"&gt;AI, Ego &amp;amp; Regret&lt;/a&gt;&lt;/strong&gt; series: &lt;strong&gt;I argue with myself way more than I thought.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every post goes through the same loop:&lt;/p&gt;

&lt;p&gt;10 PM: "This story's fire. Gonna blow up tomorrow."&lt;/p&gt;

&lt;p&gt;1 AM: "Wait — did I make it clear that 450ms wasn't just a random number?" → Scrolls back to check. Yes. OK. Move on.&lt;/p&gt;

&lt;p&gt;Next morning: "What was I thinking? Scrap it. Rewrite from scratch."&lt;/p&gt;




&lt;p&gt;The cover images were the worst part. One article went through 6 different backgrounds before circling back to the first one. 45 minutes I'll never get back.&lt;/p&gt;

&lt;p&gt;Then there's that one line: &lt;em&gt;"It was right about yesterday — and yesterday wasn't running anymore."&lt;/em&gt; Rewrote it 11 times. My wife walked by and said, "I thought you were writing code, not poetry."&lt;/p&gt;

&lt;p&gt;Ben Halpern hit me with a 5-reaction combo while I was eating instant noodles. Almost choked.&lt;/p&gt;

&lt;p&gt;Waking up at 3 AM. First instinct: check comments. Nothing. Go back to sleep. 5 minutes later: check again. Still nothing.&lt;/p&gt;

&lt;p&gt;Writing code? I'm normal. Writing stories? I'm the guy verifying his own made-up RabbitMQ number at 1 AM.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Would I do it again? Yeah. Probably. But I'd get a better keyboard this time.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;This coffee's about to run out — and I'm not done typing yet. If these stories made you smile, chuckle, or roll your eyes, &lt;a href="https://ko-fi.com/xulingfeng" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt; and keep the keys smoking ☕🔥&lt;/p&gt;

&lt;p&gt;Also — if you've got a story that's been sitting in your head, something that made you laugh, cringe, or question every life decision that led to that moment — send it over. I'll turn it into a story. Yours could be the next one.&lt;/p&gt;

&lt;p&gt;No pressure. Just a keyboard that's already warm.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was written with the assistance of AI tools.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>career</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
