DEV Community

Proof of Human: I Built a Reverse Turing Test After Getting Flagged as AI

Daniel Nwaneri on June 18, 2026

This is a submission for the June Solstice Game Jam I got flagged by Sloan. If you've been on DEV long enough, you know Sloan. I thought Sloan...

Read full post

Daniel Nwaneri • Jun 18

Hey @francistrdev . you're the origin story for this one. Built it for the June Solstice Game Jam after the flagging incident. Five questions, Claude scores how human you sound.
Curious what you'd score. → proof-of-human-3ts.pages.dev/

FrancisTRᴅᴇᴠ (っ◔◡◔)っ • Jun 18

Hey Daniel!

Pretty much all are one sentence answers lol

Daniel Nwaneri • Jun 18

72 . you passed. The 82 on Q5 makes sense, that's the question where one-sentence answers still carry weight because there's no safe answer. The 62s on Q1 and Q3 are the specificity gap . one sentence doesn't leave room for the detail that gives you away as human. Play again and go longer on those two.

Sylwia Laskowska • Jun 19

Hahaha it's perfect! Unfortunately, I'm not a human 😅

The fun part is that I didn't even use the tools to polish the grammar. These are my answers flagged as AI generated 🤣

As you see, if you answer shortly and gramatically correct it's easy to be marked as AI 😅

Daniel Nwaneri • Jun 19

57 is the honest score for "I genuinely love AI" . it's the answer that sounds most human but reads as the safest possible thing to say. Q5 specifically punishes the reflex to be positive about AI. The question tells you not to say what you're supposed to say and that's exactly what got flagged. Play again and say something you'd be slightly embarrassed to admit. That's the answer that passes. 😁😁

Sylwia Laskowska • Jun 19

But here I'm still losing 🤣

Daniel Nwaneri • Jun 19

Genuine fear doesn't use six exclamation marks 😅 It uses one sentence and stops. The performative outrage is the tell ."I hate AI" as a declaration reads like someone performing the opinion rather than holding it. The model caught the theatre, not the feeling.

Sylwia Laskowska • Jun 19

You've definitely never seen the Facebook messages from polish people 🤣

Sylwia Laskowska • Jun 19

Ok, sometimes it's really funny

xulingfeng • Jun 19

🤣🤣🤣
No way — I actually wrote Q5 from the heart and it flagged me. That's hilarious.🤣🤣🤣

Daniel Nwaneri • Jun 19

45 on Q5 is the most common result 😅. The question specifically asks you not to say what you're supposed to say but the moment you write your real opinion clearly and directly, it reads like a prepared statement. The only answers that pass Q5 are the ones with friction in them. Contradiction, uncertainty, something you haven't fully worked out yet 🤔 "I wrote this from the heart" is exactly what the model can't detect because the heart, written cleanly, looks like a press release 😂

Marco Sbragi • Jun 19

Funny...
I joke with Gemini and said "we need to pass a test, i ask you some questions. Answer like a real person will do". And voilà... Try it yourself.

Daniel Nwaneri • Jun 19

That's the whole thesis in one experiment, Marco 😅 Gemini coached to "answer like a real person" passes. A real person writing sincerely gets flagged. The detector can't see the difference between performed humanity and actual humanity and now neither can the game. That's not a bug. That's where we are in 2026. The test Turing designed to catch machines is now something machines pass more reliably than people. What score did Gemini get??

Fayaz • Jun 22

15 more minutes with the Gem + asked one question twice after reviewing. No Edit.

Meaning: Fully reviewed AI generated text is indistinguishable from human written text.

Daniel Nwaneri • Jun 23

5 of 5 with 15 minutes of iteration is the data point that ends the debate 🎯. The detection question has an expiration date and you just showed how short it is. The game can't tell and neither can GPTZero, neither can Sloan, neither can any classifier that's coming. What you've done in this thread is run a clean experiment that arrives at "detect quality not AI" through evidence rather than argument. Going to cite this thread the next time someone asks why I built a detector that can be fooled because that was the point. The detector isn't supposed to win. It's supposed to make the limitation visible.

Fayaz • Jun 21 • Edited

I got 74 😁

However, since I'm one of those who don't like to use AI to judge human written content, I played tricks with it.

I didn't answer those questions myself.
I created a Gemini Gem (with Gemini Flash 3.5) to answer the questions.

So AI passed AI as human 🤣

It took only 5 minutes to create the gem.
I can do a lot better if I spend more time with it. I'm sure of it.

This proves my point: if you use AI cleverly, your AI judge will fail to recognize it 🥰
So I'll stand by my initial opinion: when evaluating content, judge quality, not whether or not it's done using AI.

😇

Daniel Nwaneri • Jun 22

74 from a Gemini Gem in 5 minutes is the result that matters most in this whole thread 😅 You proved what the post is arguing: detection is the wrong frame.

The arms race ends the moment someone spends 5 minutes calibrating an LLM to sound human and we're already past that moment. "Detect quality not AI" was your line from the Sloan thread and it's the only frame that survives this experiment. The game caught Sylwia writing sincerely and missed you writing through a Gem. That's not the game failing. That's the test telling us what test we should actually be running.

Daniel Balcarek • Jun 19

Yessssss, I knew I was human! 🧠
Mostly. 🤣🤣

Daniel Nwaneri • Jun 19

62 counts as mostly human 🧠 Q2 at 82 and Q3 at 78 means you were specific enough where it mattered. Q4 and Q5 both at 45 is the pattern . Those are the questions where "something you think about more than expected" and "what you actually think about AI" require you to say something that costs you something. Safe answers on those two always land in the 40s. Go again and say the uncomfortable thing on Q4 and Q5 . you'll clear 75 overall. 😄