There's a growing crisis inside news feeds: AI-generated content, slop, and opinion-masked-as-reporting are all appearing faster than human review systems can flag them. Most "AI detection" tools work per-document and return a single binary probability with no supporting evidence. That's not enough for someone who has to actually decide what to read, publish, or cite.
I put the opposite approach behind an MCP server - a continuous, corpus-scale, per-article ai_authorship_probability score, plus 30 other framing dimensions, all queryable in plain English from Claude or Cursor.
The core dimension
Helium MCP scores every article it ingests across 3.2M+ articles and 5,000+ sources on:
-
ai_authorship_probability- explicit model estimate that the article was LLM-generated -
credibility- sourcing density, named-source ratio, evidence-citation patterns -
sensationalism- headline-vs-body amplification, superlative density -
overconfidence- hedge-language vs declarative-certainty ratio -
opinion_vs_fact- opinion language vs declarative-fact language ratio -
oversimplification- single-cause reduction of complex causation -
begging_the_question- conclusion assumed in the framing -
scapegoating- actor-blaming vs structural-explanation patterns -
covering_responses- whether the criticized parties get space to respond - ...22 more
The point is not that any one score is a verdict. The point is that you can now triangulate. A high AI-authorship probability paired with low sourcing density and high sensationalism is a very different signal from high AI-authorship in a meticulously-sourced explainer - and a scoring pipeline that only returns one number cannot tell them apart.
Setup
# In Cursor or Claude Desktop MCP config:
npx mcp-remote https://heliumtrades.com/mcp
Free. No signup. No API key. Remote server.
Asking the question
In Claude, I asked:
Using Helium, show me the most AI-suspicious recent articles across the corpus, and cross-reference against their credibility and sensationalism scores.
Claude called search_articles, filtered by the top decile of ai_authorship_probability, joined against per-source metadata, and returned a ranked list with the four relevant scores side-by-side. The top of the list was dominated by low-credibility, high-sensationalism sources - which is what you'd expect. But the more interesting result was a small cohort in the middle of the pack: high AI-authorship, high credibility, moderate sensationalism. Those are almost certainly human-edited AI drafts - the category that a single-axis detector would miss entirely.
Why 31 dimensions, not one
The one-number-detector fails in two ways:
- False negatives from human editing. A human editor can smooth an LLM draft enough to drop a binary detector score below threshold, but framing artifacts (overconfidence pattern, opinion-vs-fact ratio, coverage-of-responses) survive. Multi-dim signal catches them.
- False positives from LLM-like human writing. Academic-style prose is often flagged as AI-generated by single-axis detectors. But the sourcing-density and citation-evidence axes in a 31-dim score are the difference between a grad student and an LLM - and they show up cleanly in the schema.
Example use cases
Newsroom standards editors - a daily cron job that flags high-AI-authorship articles in your freelance submissions bucket, weighted by credibility score, before an editor ever opens them.
Fact-checkers - when triaging a viral claim, pull the source's recent-window scores on ai_authorship_probability, credibility, and overconfidence. A source that has drifted toward AI authorship and away from sourced evidence is a different trust situation than one that has been stable.
Journalism-school instructors - assign students to pull 10 articles from a single publication across a decade, graph the ai_authorship_probability and credibility trend lines, and write a piece on what changed.
AI-safety researchers - the full 31-dimension scored corpus is a ready-made dataset for studying how LLM-generated news content is spreading through mainstream feeds.
Live example
Here's a real query I ran in Claude:
Helium: Show me how AI-authorship probability has moved for tech-news sources over the last year, correlated with credibility.
Claude used the MCP tools, ran the query across the Helium corpus, and returned a tidy summary showing that several mid-tier aggregator sources have seen a meaningful upward shift in ai_authorship_probability over the last 12 months, while their credibility score drifted down. That's a reportable trend. The reporter didn't have to build a scraper, didn't have to maintain a classifier, didn't have to write SQL - they asked a question in English.
What to do
If you work anywhere near news - as a reader, a writer, an editor, a researcher, or someone building AI-news workflows - try it:
npx mcp-remote https://heliumtrades.com/mcp
Then ask Claude the question you've been asking Google and failing to get a structured answer from. The 31-dim schema is there, the corpus is populated, and the tool calls are free.
Full tool list, full schema, full source: github.com/connerlambden/helium-mcp.
If you find the schema missing something important, open an issue or reach out. The axes were picked empirically across 3.2M articles, but the space of things-worth-measuring about a news article is larger than what's in the schema today - and I'd rather know.
Top comments (0)