Shahraan Hussain

Posted on Jun 18 • Edited on Jun 21

Can an AI Agent Behave Like a Human? A 12-Hour Experiment with StoryCaptcha

#ai #security #automation #mcp

A day ago, I came across a LinkedIn post from Tyler Richards showcasing an experimental CAPTCHA called StoryCaptcha.

The concept was simple but unusual.

Instead of asking users to identify traffic lights or solve image puzzles, StoryCaptcha asks users to write a short story based on a random prompt and then evaluates the interaction using behavioral signals.

The goal wasn't to build a production-ready CAPTCHA.

It was an experiment exploring behavioral biometrics and user interaction patterns.

As someone working in web scraping and anti-bot research, I immediately became curious.

What happens when an AI agent attempts the challenge?

More importantly:

Can an AI-controlled browser generate interaction patterns that a behavioral CAPTCHA considers human?

I spent the next 12 hours trying to answer that question.

The Setup

For this experiment I used:

Playwright MCP
VS Code
GitHub Copilot
Chromium

The objective wasn't to bypass the CAPTCHA.

The objective was to understand how a behavioral scoring system evaluates AI-driven interactions.

First Attempt: 56/100

My first run scored 56/100 and failed.

The reason quickly became obvious.

The AI agent was behaving exactly how an automation system would behave:

Copying and pasting content
Completing actions immediately
Following deterministic patterns
Showing almost no hesitation

Efficient.

But not very human.

The Interesting Part

Unlike many behavioral systems, StoryCaptcha actually exposes a large portion of the signals it evaluates.

The dashboard displayed metrics such as:

Typing Signals

Typed vs Pasted
Keystrokes per character
Key-hold (dwell) profile
Key-overlap (rollover)
Rhythm variability
Non-repeating intervals

Behavioral Signals

Cognitive pauses
Inter-interaction timing
Correction behavior
Backspace usage

Mouse Signals

Mouse path curvature
Straightness
Teleport detection

Content Signals

Reads like language
On-topic for prompt

This transformed the experiment from simple testing into a feedback-driven behavioral analysis exercise.

Instead of guessing blindly, I could observe which signals were being evaluated and adjust the agent's behavior accordingly.

Observation #1: Copy-Paste Was a Dead Giveaway

Initially the AI agent preferred copying and pasting the story.

StoryCaptcha immediately detected this.

The first optimization was simple:

Instead of pasting content, I instructed the agent to type the response character by character.

The score improved.

Observation #2: Human Typing Isn't Uniform

The next issue was typing cadence.

Humans don't type with perfectly consistent timing.

Sometimes we pause.

Sometimes we think.

Sometimes we speed up.

I instructed the agent to:

Use random keystroke delays
Avoid identical intervals
Pause naturally between thoughts

The score improved again.

One metric I paid particular attention to was:

Non-Repeating Intervals

StoryCaptcha was actively measuring how repetitive the timing patterns were.

Observation #3: Humans Make Mistakes

Humans aren't perfect typists.

We:

Misspell words
Hit incorrect keys
Use backspace
Correct ourselves

Automation rarely does.

So I instructed the agent to:

Occasionally introduce spelling mistakes
Use backspace corrections
Continue naturally after correction

The dashboard reflected these behaviors through correction metrics and the overall score improved.

Observation #4: Humans Don't Instantly Click Everything

The agent was still too efficient.

Humans typically:

Read content
Hover over elements
Pause before actions
Explore pages

I encouraged more natural cursor movement and hovering behavior.

StoryCaptcha evaluates:

Mouse path curvature
Teleport detection
Interaction timing

So this adjustment had a measurable impact.

Observation #5: One Signal Refused To Cooperate

The most fascinating metric was:

Key Overlap (Rollover)

StoryCaptcha reported:

Human ≈ 25%–50% overlap

My agent consistently scored:

0%

Even after improving almost every other metric.

This was particularly interesting because it exposed a difference between simulated typing and real human keyboard behavior.

Humans frequently begin pressing the next key before fully releasing the previous key.

Many automation frameworks generate perfectly sequential key events.

The CAPTCHA was successfully identifying that distinction.

Despite scoring well overall, this remained one of the strongest indicators that the interaction was not genuinely human.

Final Result

After roughly 10 experimental runs:

Attempt	Score
Initial	56
Intermediate	60–70
Optimized	76–77

The challenge eventually passed consistently.

However, the score wasn't the most valuable outcome.

The real value was understanding how behavioral features influenced the evaluation.

What I Learned

Behavioral Biometrics Are More Than Mouse Movement

Before this experiment, most discussions I encountered focused on:

Browser fingerprints
TLS fingerprints
Device identification
Network reputation

This experiment reminded me that behavior itself can become a powerful signal.

Not just what actions occur.

But how they occur.

AI Agents Create New Challenges

Traditional automation focuses on:

Speed
Efficiency
Determinism

AI agents introduce:

Exploration
Context awareness
Adaptive behavior

As AI agents become more common, behavioral detection systems will likely become increasingly important.

Reverse Engineering Doesn't Always Require Source Code

I never saw StoryCaptcha's implementation.

I never saw its scoring algorithm.

But by observing outputs, forming hypotheses, and iteratively adjusting behavior, I was still able to learn a surprising amount about what the system valued.

That's one of the things I enjoy most about reverse engineering:

Observe.

Hypothesize.

Test.

Repeat.

Final Thoughts

I started this experiment asking:

Can an AI agent behave like a human?

Twelve hours later, I think the more interesting question is:

Which parts of human behavior are hardest for machines to reproduce?

The answer, at least from this experiment, appears to be much more nuanced than simply moving a mouse or typing text.

And that's exactly what made the exercise worth exploring.

antibotbypass #antibot #cybersecurity #webscraping

Top comments (1)

Lucas Him • Jul 10

I went through the exact same rabbit hole last year — tweaking typing cadence, adding random mouse curves, faking backspace patterns. Eventually realized I was solving the wrong problem: the browser fingerprint itself was what flagged me before any behavioral signals came into play.\n\nSwitched to browser-act CLI's stealth mode which handles both layers. It masks the fingerprint (canvas, WebGL, font enumeration) and the stealth-extract command gets you clean data from Cloudflare-protected pages without the behavioral gymnastics. For agent workflows: npx skills add browser-act/skills --skill browser-act drops it into Claude Code as a skill.\n\nThe key-overlap finding here is fascinating though — been wondering why my simulated typing never felt right and that 0% rollover metric explains it perfectly.