How We Sourced a 12-Part Tech Investigation So Every Claim Survives a Hostile Fact-Check

#journalism #documentation #cloud #writing

I just shipped a 12-part video series on cloud economics - depreciation schedules, lock-in, licensing disputes, bundling cases. The kind of material where one sloppy sentence gets you a lawyer's letter, and one unsourced number gets you dismissed as a crank.

So before writing a single script, we built a sourcing pipeline. This post is about that pipeline - treating an investigation like an engineering project, with schemas, gates, and review passes - because I think the discipline transfers to anyone writing technical content that makes claims about real companies.

The core artifact: claims.json

Every episode started not as a script but as a structured claims file. Each claim is a record with a few mandatory fields:

{
  "id": "EP01-C03",
  "text": "Alphabet's 2023 change to server useful life (4 to 6 years) added roughly $3.9B to reported income.",
  "type": "fact",
  "sources": [
    { "kind": "SEC 10-K", "ref": "Alphabet 2023 10-K, accounting estimates note" }
  ],
  "status": "verified"
}

The type field is the heart of it. Three values, strictly enforced:

fact - documented in primary public record: SEC filings, court dockets, regulator publications (EU Commission, UK CMA, FTC). A fact claim with no primary source fails the build. Press coverage alone doesn't qualify; an article about a 10-K is a pointer, not a source.
allegation - something a party claims in a live dispute. The CISPE complaint about Microsoft licensing is an allegation by CISPE; we say so, every time, with attribution in the sentence itself, not in a footnote.
opinion - our interpretation. "Extending server useful life is a lever that flatters margins" is opinion built on facts. It gets labeled as analysis in the script, out loud.

A script line that doesn't trace back to a claim ID doesn't ship. That sounds bureaucratic until you're on episode 9 and you can no longer remember whether you read that Amazon shortened some server lifespans in 2025 citing AI, or whether you inferred it. (We read it. It's filed. That's the point.)

The refutation pass

Verification is the easy half. The pass that actually saved us is the refutation pass: for every fact claim, someone takes the adversarial seat and tries to kill it.

Concretely, that means asking:

Is this the most recent number? (Azure didn't disclose a revenue dollar figure until July 2025 - $75B/yr. Anything written before that citing "Azure revenue" was citing analyst estimates, and we had to label those as estimates or cut them.)
Is the claim narrower than the sentence implies? All four hyperscalers - Microsoft, Amazon, Google, Meta - extended server life estimates within a few years of each other. That's documented. "They coordinated" is not documented, and the refutation pass deletes any sentence that smuggles it in.
Would the subject's lawyer agree this is an accurate description of the record? Not agree with our framing - agree it describes what the document says.

About a fifth of our draft claims didn't survive this pass. Most weren't wrong; they were stronger than the source. That's the failure mode the pass exists to catch: drift between what the record supports and what the sentence sounds like.

The legal and ethics gate

Separate from sourcing, every script passed a gate with two hard rules.

Rule 1: never accuse anyone of a crime. Not hedged, not implied, not "raises questions about whether laws were broken." If a regulator made a finding, we report the finding in the regulator's terms. The EU closed its Teams bundling investigation with commitments from Microsoft and no fine - so that's what we say. Not "got away with it." The commitments and the closure are the story; the editorializing is a liability and it's lazier journalism.

Rule 2: link the right of reply. Where a company has publicly responded - Broadcom on VMware's move to subscription bundles, Automattic in the WP Engine litigation (the N.D. Cal. dockets are public) - we link the response. If the audience only hears the complaint side, we've built a prosecution, not an investigation.

This gate had veto power over the refutation pass, the script, everything. A claim could be perfectly sourced and still fail here on framing.

Why "bad systems, not bad people" is the operative frame

This isn't a tone choice. It's what the evidence actually supports.

A CFO who extends server depreciation when the hardware genuinely lasts longer is doing their job. A vendor that prices egress high is responding to incentives every competitor shares - which is precisely why the UK CMA ran a full market investigation citing egress fees and licensing rather than prosecuting individuals. The documented pattern is structural: incentives, accounting levers, contract terms that compound into lock-in.

The systems frame also keeps you honest as a writer. The moment you cast a villain, you start selecting evidence to fit the character. When the subject is a system, contrary evidence is just more data about the system - you can include it without weakening your story, because the story is the mechanism, not the morality play.

And practically: it's the frame that survives a hostile fact-check, because it never asserts intent you can't document.

Check our work - please

All of this would be theater if readers had to take our word for it. So the claims database is public and browsable: the Evidence Explorer lets you pick any claim from any episode and see its label, its sources, and the underlying documents. If you find a claim that doesn't hold up, that's a bug report, and we treat it like one.

The series itself is on YouTube - the English playlist runs from depreciation mechanics through to a practical field guide for buyers. I also published a companion piece today on the lock-in economics specifically: The Lock-In Economy: How Cloud Pricing Quietly Traps Customers. And the full series announcement with all 12 parts is here on dev.to.

The takeaway for technical writers

If you write about real companies - postmortems, vendor comparisons, cost analyses - steal the cheap parts of this:

Separate your claims from your prose. A list of claims with sources is greppable; a 2,000-word draft is not.
Label fact vs. allegation vs. opinion before you write, not after.
Run one adversarial pass where your only job is to kill your own claims.
Never assert intent you can't document. Describe the mechanism instead.

I spent 35 years in IT operations - Y2K at Allstate, eight years inside Microsoft's cloud delivery org, co-authoring MOF 4.0 - and the habit that transferred best to publishing is the same one that works in ops: assume your output will be audited by someone who wants it to fail, and build so it doesn't.