A day ago, I came across a LinkedIn post from Tyler Richards showcasing an experimental CAPTCHA called StoryCaptcha.
The concept was simple but unusual.
Instead of asking users to identify traffic lights or solve image puzzles, StoryCaptcha asks users to write a short story based on a random prompt and then evaluates the interaction using behavioral signals.
The goal wasn't to build a production-ready CAPTCHA.
It was an experiment exploring behavioral biometrics and user interaction patterns.
As someone working in web scraping and anti-bot research, I immediately became curious.
What happens when an AI agent attempts the challenge?
More importantly:
Can an AI-controlled browser generate interaction patterns that a behavioral CAPTCHA considers human?
I spent the next 12 hours trying to answer that question.
The Setup
For this experiment I used:
- Playwright MCP
- VS Code
- GitHub Copilot
- Chromium
The objective wasn't to bypass the CAPTCHA.
The objective was to understand how a behavioral scoring system evaluates AI-driven interactions.
First Attempt: 56/100
My first run scored 56/100 and failed.
The reason quickly became obvious.
The AI agent was behaving exactly how an automation system would behave:
- Copying and pasting content
- Completing actions immediately
- Following deterministic patterns
- Showing almost no hesitation
Efficient.
But not very human.
The Interesting Part
Unlike many behavioral systems, StoryCaptcha actually exposes a large portion of the signals it evaluates.
The dashboard displayed metrics such as:
Typing Signals
- Typed vs Pasted
- Keystrokes per character
- Key-hold (dwell) profile
- Key-overlap (rollover)
- Rhythm variability
- Non-repeating intervals
Behavioral Signals
- Cognitive pauses
- Inter-interaction timing
- Correction behavior
- Backspace usage
Mouse Signals
- Mouse path curvature
- Straightness
- Teleport detection
Content Signals
- Reads like language
- On-topic for prompt
This transformed the experiment from simple testing into a feedback-driven behavioral analysis exercise.
Instead of guessing blindly, I could observe which signals were being evaluated and adjust the agent's behavior accordingly.
Observation #1: Copy-Paste Was a Dead Giveaway
Initially the AI agent preferred copying and pasting the story.
StoryCaptcha immediately detected this.
The first optimization was simple:
Instead of pasting content, I instructed the agent to type the response character by character.
The score improved.
Observation #2: Human Typing Isn't Uniform
The next issue was typing cadence.
Humans don't type with perfectly consistent timing.
Sometimes we pause.
Sometimes we think.
Sometimes we speed up.
I instructed the agent to:
- Use random keystroke delays
- Avoid identical intervals
- Pause naturally between thoughts
The score improved again.
One metric I paid particular attention to was:
Non-Repeating Intervals
StoryCaptcha was actively measuring how repetitive the timing patterns were.
Observation #3: Humans Make Mistakes
Humans aren't perfect typists.
We:
- Misspell words
- Hit incorrect keys
- Use backspace
- Correct ourselves
Automation rarely does.
So I instructed the agent to:
- Occasionally introduce spelling mistakes
- Use backspace corrections
- Continue naturally after correction
The dashboard reflected these behaviors through correction metrics and the overall score improved.
Observation #4: Humans Don't Instantly Click Everything
The agent was still too efficient.
Humans typically:
- Read content
- Hover over elements
- Pause before actions
- Explore pages
I encouraged more natural cursor movement and hovering behavior.
StoryCaptcha evaluates:
- Mouse path curvature
- Teleport detection
- Interaction timing
So this adjustment had a measurable impact.
Observation #5: One Signal Refused To Cooperate
The most fascinating metric was:
Key Overlap (Rollover)
StoryCaptcha reported:
Human ≈ 25%–50% overlap
My agent consistently scored:
0%
Even after improving almost every other metric.
This was particularly interesting because it exposed a difference between simulated typing and real human keyboard behavior.
Humans frequently begin pressing the next key before fully releasing the previous key.
Many automation frameworks generate perfectly sequential key events.
The CAPTCHA was successfully identifying that distinction.
Despite scoring well overall, this remained one of the strongest indicators that the interaction was not genuinely human.
Final Result
After roughly 10 experimental runs:
| Attempt | Score |
|---|---|
| Initial | 56 |
| Intermediate | 60–70 |
| Optimized | 76–77 |
The challenge eventually passed consistently.
However, the score wasn't the most valuable outcome.
The real value was understanding how behavioral features influenced the evaluation.
What I Learned
Behavioral Biometrics Are More Than Mouse Movement
Before this experiment, most discussions I encountered focused on:
- Browser fingerprints
- TLS fingerprints
- Device identification
- Network reputation
This experiment reminded me that behavior itself can become a powerful signal.
Not just what actions occur.
But how they occur.
AI Agents Create New Challenges
Traditional automation focuses on:
- Speed
- Efficiency
- Determinism
AI agents introduce:
- Exploration
- Context awareness
- Adaptive behavior
As AI agents become more common, behavioral detection systems will likely become increasingly important.
Reverse Engineering Doesn't Always Require Source Code
I never saw StoryCaptcha's implementation.
I never saw its scoring algorithm.
But by observing outputs, forming hypotheses, and iteratively adjusting behavior, I was still able to learn a surprising amount about what the system valued.
That's one of the things I enjoy most about reverse engineering:
Observe.
Hypothesize.
Test.
Repeat.
Final Thoughts
I started this experiment asking:
Can an AI agent behave like a human?
Twelve hours later, I think the more interesting question is:
Which parts of human behavior are hardest for machines to reproduce?
The answer, at least from this experiment, appears to be much more nuanced than simply moving a mouse or typing text.
And that's exactly what made the exercise worth exploring.
Top comments (0)