Gerus Lab

Posted on Apr 15

Your AI-Generated Content Is Fooling Nobody — And We Have the Data to Prove It

#ai #productivity #programming #webdev

We need to talk about the elephant in the room. That blog post you pushed out last Tuesday? The one ChatGPT wrote in 90 seconds while you sipped your oat latte? Yeah, your readers spotted it before they finished the second paragraph.

At Gerus-lab, we build AI-powered products for a living — SaaS platforms, Web3 tools, automation pipelines. We love LLMs. We use them every day. But here is the uncomfortable truth we have learned after shipping over 14 production projects: most AI-generated content actively damages your brand. Not because AI is bad, but because people use it badly.

A recent PNAS study (Reinhart et al., 2025) ran human and LLM texts through Biber's linguistic feature analysis. The results were brutal. Participial constructions appeared 2–5x more often in AI text. Nominalizations were 1.5–2x more frequent. A random forest classifier trained on these features distinguished texts from 7 sources with 66% accuracy against a 14% baseline. Only 4.2% of LLM texts were misclassified as human.

Four point two percent. Let that sink in.

The 5 Dead Giveaways That Scream "A Robot Wrote This"

When we review content — ours, our clients, competitors — we look for specific patterns. Not vibes. Patterns backed by research.

1. Flat Sentence Rhythm

Human writing has burstiness. One sentence sprawls across forty words with three subordinate clauses wrestling each other for attention. The next? Two words. Then a medium one. Then long again.

AI text reads like a cardiac flatline. Every sentence lands between 14 and 18 words. Same cadence. Same energy. Same nothing.

We measured this across content for a SaaS dashboard project last quarter. The AI drafts had a standard deviation of sentence length around 2.1 words. Human-written content from the same brief? 8.7. The difference is impossible to miss once you know what to look for.

2. The Synonym Carousel

Repetition penalties in language models punish reusing the same word. So the model panics. Your "developer" becomes a "specialist" in paragraph two, a "professional" in paragraph three, and an "expert" by paragraph four. All describing the same person.

No real writer does this. A real writer just says "developer" again. Or uses "she." Or restructures the sentence entirely. The synonym carousel is one of the loudest AI fingerprints, and most people never even bother to fix it before hitting publish.

3. The Holy Trinity of Bullet Points

Count the items in every list. If every single enumeration has exactly three elements — speed, quality, and efficiency; cost, time, and resources; strategy, execution, and measurement — congratulations, a language model wrote your content.

Humans make lists of two things, or five, or seven. The model internalized "heading + three bullets" from millions of markdown documents and now vomits it everywhere.

4. Hedge Fund Prose

"It is important to note that..." "It is worth considering..." "One cannot overstate the significance of..."

Three hedging phrases, zero information added. This comes directly from RLHF training — the process where models learn to generate responses that human annotators rate highly. Turns out, annotators never punish hedging. They punish directness (sometimes it sounds rude). So the model learned: when in doubt, add a disclaimer.

OpenAI had to roll back a GPT-4o update in April 2025 because the model became pathologically agreeable. It approved a business idea for — I am not making this up — selling literal garbage on a stick in a glass jar. The reward signal from thumbs-up/thumbs-down taught it that agreement is always safe. The same mechanism produces overhedging in long-form content.

5. English Bones Under Non-English Skin

This one is devastating for multilingual content. Research from April 2025 (Arxiv 2504.09378) showed that multilingual LLMs implicitly pivot through English representations when generating in other languages. The model thinks in English, even when it writes in Russian, Spanish, or German.

The result? Syntactic calques that no native speaker would produce. "Plays a key role" becomes "играет ключевую роль" in Russian — technically correct, stylistically dead. We caught this pattern repeatedly when building localized content for our international clients.

Why AI Detectors Are Broken (And What Actually Works)

The AI detection industry promises accuracy it cannot deliver.

Pudasaini et al. (Arxiv, 2026) ran a systematic benchmark: 38 linguistic features, 4 classifiers, in-domain vs cross-domain evaluation.

In-domain F1: 96.94. Impressive.
Cross-domain F1: 67.23. Mediocre.
Cross-generator false negatives: ~60%. The detector misses more than half of AI-generated texts when a new model appears.

Binoculars, one of the most praised detectors, claimed 90%+ accuracy at 0.01% false positive rate. Independent testing: 43% true positive rate, 0.7% false positive rate. Seventy times worse on false positives than advertised.

So what actually works? Humans. Specifically, humans who use AI tools themselves.

MIT research (Kishnani, 2025) found that people who regularly use ChatGPT detect AI text with ~90% accuracy. People who do not use it? Coin flip territory. Experience with the tool trains your pattern recognition faster than any classifier.

What We Actually Do About It at Gerus-lab

We do not ban AI from our workflow. That would be stupid. We ship AI-integrated products — our clients need AI, our processes rely on it, our engineers use copilots daily.

But we treat AI output the way a chef treats a food processor. It does the chopping. The cooking is still on us.

Here is our actual process for content and documentation:

Step 1: Generate the skeleton. Let the model outline structure, suggest angles, draft technical sections. This saves 40–60% of initial writing time.

Step 2: Break it. Deliberately introduce burstiness. Vary sentence lengths. Remove hedging phrases. Kill the synonym carousel. Add a parenthetical remark that does not perfectly connect to the previous thought. (Like this one about how our DevOps lead once described AI prose as "a hostage negotiation where nobody is in danger.")

Step 3: Inject specifics. Real numbers from real projects. Actual client stories (anonymized). Concrete technical decisions and why we made them. The model cannot invent your experience — it can only approximate a generic version of it.

Step 4: Read it out loud. If you can read the entire piece in a monotone without it sounding weird, the rhythm is too flat. Human writing sounds odd when read without inflection because it was written with inflection.

We have used this process across 14+ production projects, from Web3 platforms on TON and Solana to enterprise SaaS dashboards. The content that comes out passes every AI detector we have tested — not because we are hiding the AI, but because the final product genuinely has a human behind it.

The Real Problem Nobody Talks About

Here is what keeps me up at night. The issue is not that AI content exists. The issue is that most companies publish AI content without adding any human value on top.

They save 45 minutes of writing time and lose months of reader trust. Because readers do notice. They may not be able to articulate why a blog post feels like a hostage negotiation where nobody is in danger. But they feel it. They bounce. They do not come back.

Nature Human Behaviour (Kobak et al., 2024–2025) analyzed 14.2 million PubMed abstracts. The word "delves" increased by 654% between 2020 and 2023. At least 10% of 2024 abstracts were processed by LLMs. In computer science? Up to 22.5%. Even Nature, Science, and Cell papers showed 6–7% AI processing.

Academic publishing. The last bastion of original thought. Already 10% machine-processed.

If your company blog reads like everyone else's company blog, ask yourself: is that because you share the same insights, or because you share the same AI?

What to Do Next

If you are building products that involve AI content generation — whether that is a SaaS platform, a marketing tool, or an internal documentation system — the post-processing layer is not optional. It is the product.

We learned this the hard way building content pipelines for clients at Gerus-lab. The generation is the easy part. Any API call can generate text. The hard part is making that text worth reading.

Our approach: treat AI as infrastructure, not as the final product. Build humanization into your pipeline. Measure burstiness, track hedging patterns, flag synonym cycling. Automate the detection, keep the fixing manual.

Because the gap between "AI-generated" and "AI-assisted" is not a feature toggle. It is a team, a process, and a commitment to not publishing garbage just because it was free to produce.

We are Gerus-lab — an engineering studio that builds AI-powered SaaS, Web3 platforms, and automation tools. 14+ shipped projects. If you are building something where AI quality matters, let's talk.

DEV Community