AI Text Detector – CPU-first approach to AI-generated text detection

徐振鹏 — Sat, 09 May 2026 09:24:15 +0000

I built an open-source AI text detector that ranks #14 on the RAID benchmark and #1 among CPU-only solutions. No GPU required.

https://github.com/xuzhenpeng263/ai-text-detector

Why this matters

Big Tech is approaching AI detection all wrong. They're throwing massive compute at the problem – training billion-parameter models that require expensive GPUs just to detect whether a text was written by AI.

This is fundamentally misguided. Using brute-force compute to fight brute-force compute is a losing game. It's an arms race that only the largest companies can afford, and it's not scalable.

The CPU-first philosophy

I believe the solution isn't MORE compute – it's SMARTER compute. This detector uses only 60 handcrafted statistical features and a lightweight XGBoost model. It runs on any CPU, instantly, with no specialized hardware required.

The features capture the subtle fingerprints of AI-generated text:

Compression ratio patterns
Entropy distributions
Burstiness metrics
Lexical diversity signals
And 56 other statistical markers

These signals don't require deep learning to detect. They just require knowing what to look for.

Performance

On the RAID benchmark (the standard academic evaluation for AI text detection):

Overall ranking: #14
Among CPU-only methods: #1

This proves you don't need a GPU to compete with the best models out there. You just need the right features.

Usage

from ai_detector import AITextDetector

detector = AITextDetector()
result = detector.detect("Your text here...")

print(f"AI Probability: {result['ai_probability']:.2%}")
print(f"Label: {result['label']}")

Why this approach wins

Accessible: Anyone can run it, no GPU needed
Fast: Inference takes milliseconds on any modern CPU
Transparent: Features are interpretable, not a black box
Sustainable: No massive energy consumption
Privacy-first: Everything runs locally

The bigger point

The AI industry is obsessed with scale. Bigger models, more data, more compute. But sometimes the best solution is the elegant one.

AI text detection doesn't require a foundation model. It requires understanding the statistical properties that distinguish AI writing from human writing. That's something a lightweight, well-designed system can do perfectly well.

Open source means everyone can benefit from this approach – not just companies with massive GPU clusters.

Would love to hear your thoughts on the compute vs. algorithm design tradeoff in AI detection.

GitHub: https://github.com/xuzhenpeng263/ai-text-detector

Tech stack: Python, XGBoost, numpy, scipy
Language: English text detection
License: MIT