Detecting AI-authored news at corpus scale from a single MCP call

#ai #mcp #news #showdev

There's a growing crisis inside news feeds: AI-generated content, slop, and opinion-masked-as-reporting are all appearing faster than human review systems can flag them. Most "AI detection" tools work per-document and return a single binary probability with no supporting evidence. That's not enough for someone who has to actually decide what to read, publish, or cite.

I put the opposite approach behind an MCP server - a continuous, corpus-scale, per-article ai_authorship_probability score, plus 30 other framing dimensions, all queryable in plain English from Claude or Cursor.

The core dimension

Helium MCP scores every article it ingests across 3.2M+ articles and 5,000+ sources on:

ai_authorship_probability - explicit model estimate that the article was LLM-generated
credibility - sourcing density, named-source ratio, evidence-citation patterns
sensationalism - headline-vs-body amplification, superlative density
overconfidence - hedge-language vs declarative-certainty ratio
opinion_vs_fact - opinion language vs declarative-fact language ratio
oversimplification - single-cause reduction of complex causation
begging_the_question - conclusion assumed in the framing
scapegoating - actor-blaming vs structural-explanation patterns
covering_responses - whether the criticized parties get space to respond
...22 more

The point is not that any one score is a verdict. The point is that you can now triangulate. A high AI-authorship probability paired with low sourcing density and high sensationalism is a very different signal from high AI-authorship in a meticulously-sourced explainer - and a scoring pipeline that only returns one number cannot tell them apart.

Setup

# In Cursor or Claude Desktop MCP config:
npx mcp-remote https://heliumtrades.com/mcp

Free. No signup. No API key. Remote server.

Asking the question

In Claude, I asked:

Using Helium, show me the most AI-suspicious recent articles across the corpus, and cross-reference against their credibility and sensationalism scores.

Claude called search_articles, filtered by the top decile of ai_authorship_probability, joined against per-source metadata, and returned a ranked list with the four relevant scores side-by-side. The top of the list was dominated by low-credibility, high-sensationalism sources - which is what you'd expect. But the more interesting result was a small cohort in the middle of the pack: high AI-authorship, high credibility, moderate sensationalism. Those are almost certainly human-edited AI drafts - the category that a single-axis detector would miss entirely.

Why 31 dimensions, not one

The one-number-detector fails in two ways:

False negatives from human editing. A human editor can smooth an LLM draft enough to drop a binary detector score below threshold, but framing artifacts (overconfidence pattern, opinion-vs-fact ratio, coverage-of-responses) survive. Multi-dim signal catches them.
False positives from LLM-like human writing. Academic-style prose is often flagged as AI-generated by single-axis detectors. But the sourcing-density and citation-evidence axes in a 31-dim score are the difference between a grad student and an LLM - and they show up cleanly in the schema.

Example use cases

Newsroom standards editors - a daily cron job that flags high-AI-authorship articles in your freelance submissions bucket, weighted by credibility score, before an editor ever opens them.

Fact-checkers - when triaging a viral claim, pull the source's recent-window scores on ai_authorship_probability, credibility, and overconfidence. A source that has drifted toward AI authorship and away from sourced evidence is a different trust situation than one that has been stable.

Journalism-school instructors - assign students to pull 10 articles from a single publication across a decade, graph the ai_authorship_probability and credibility trend lines, and write a piece on what changed.

AI-safety researchers - the full 31-dimension scored corpus is a ready-made dataset for studying how LLM-generated news content is spreading through mainstream feeds.

Live example

Here's a real query I ran in Claude:

Helium: Show me how AI-authorship probability has moved for tech-news sources over the last year, correlated with credibility.

Claude used the MCP tools, ran the query across the Helium corpus, and returned a tidy summary showing that several mid-tier aggregator sources have seen a meaningful upward shift in ai_authorship_probability over the last 12 months, while their credibility score drifted down. That's a reportable trend. The reporter didn't have to build a scraper, didn't have to maintain a classifier, didn't have to write SQL - they asked a question in English.

What to do

If you work anywhere near news - as a reader, a writer, an editor, a researcher, or someone building AI-news workflows - try it:

npx mcp-remote https://heliumtrades.com/mcp

Then ask Claude the question you've been asking Google and failing to get a structured answer from. The 31-dim schema is there, the corpus is populated, and the tool calls are free.

Full tool list, full schema, full source: github.com/connerlambden/helium-mcp.

If you find the schema missing something important, open an issue or reach out. The axes were picked empirically across 3.2M articles, but the space of things-worth-measuring about a news article is larger than what's in the schema today - and I'd rather know.