JudgeGPT & RogueGPT: Building Open-Source Platforms for AI Misinformation Research

#ai #machinelearning #opensource

Can people tell AI-written news from human-written journalism? As large language models grow more capable, the answer is becoming increasingly uncomfortable. This is the question at the heart of two open-source research platforms: JudgeGPT and RogueGPT.

Both are licensed under GPLv3 and have companion papers accepted at The Web Conference 2026 (WWW '26).

The Problem: Industrialized Deception

Generative AI has created an asymmetric arms race. Producing convincing synthetic news now costs almost nothing. Detecting it reliably does not. Two papers at WWW '26 address this:

"Industrialized Deception: The Collateral Effects of LLM-Generated Misinformation on Digital Ecosystems" (arXiv:2601.21963) -- systemic effects of LLM-generated misinformation on trust networks.
"Eroding the Truth-Default: A Causal Analysis of Human Susceptibility to Foundation Model Hallucinations and Disinformation in the Wild" (arXiv:2601.22871) -- key finding: the human truth-default is being measurably eroded by LLM-generated content.

RogueGPT: Controlled Stimulus Generation

RogueGPT is a Python framework for generating controlled news stimuli. The current corpus contains 2,663 multilingual news fragments: 37 model configurations across 10 providers (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Microsoft, Zhipu, Moonshot, Qwen, MiniMax), 4 languages, 3 formats, 5 journalistic styles per language, and 222 human-sourced fragments as experimental anchors.

Three interfaces over a shared data layer: Streamlit app, CLI, and an MCP server exposing tools for AI agent integration.

git clone https://github.com/aloth/RogueGPT
pip install -r requirements.txt
python cli.py ingest --text "..." --model "gpt-4o" --language en --style nyt --format article
python cli.py retrieve --model "gpt-4o" --language en --limit 10

JudgeGPT: Human Evaluation at Scale

JudgeGPT is a live Streamlit platform collecting human judgments on news authenticity. Participants evaluate fragments on three 7-point scales: source attribution (human vs. machine), veracity (legitimate vs. fake), and topic familiarity.

After each submission, participants see the ground truth and the specific model that generated the content. A shareable score card is generated every 5 responses.

Live survey: judgegpt.streamlit.app

Why It Matters for Developers

Every fragment has full provenance: model, parameters, seed. This enables questions beyond can humans detect AI -- which models are hardest to detect? In which languages? By which demographic groups?

Corpus on Zenodo (academic access): DOI: 10.5281/zenodo.18703138

Both repos are GPLv3. Contributions welcome.