<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shahraan Hussain</title>
    <description>The latest articles on DEV Community by Shahraan Hussain (@shahraan_hussain_b42640e7).</description>
    <link>https://dev.to/shahraan_hussain_b42640e7</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2558281%2F74643e2a-7c6d-489c-b67a-6eed2448489f.png</url>
      <title>DEV Community: Shahraan Hussain</title>
      <link>https://dev.to/shahraan_hussain_b42640e7</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shahraan_hussain_b42640e7"/>
    <language>en</language>
    <item>
      <title>Can an AI Agent Behave Like a Human? A 12-Hour Experiment with StoryCaptcha</title>
      <dc:creator>Shahraan Hussain</dc:creator>
      <pubDate>Thu, 18 Jun 2026 13:33:46 +0000</pubDate>
      <link>https://dev.to/shahraan_hussain_b42640e7/can-an-ai-agent-behave-like-a-human-a-12-hour-experiment-with-storycaptcha-1661</link>
      <guid>https://dev.to/shahraan_hussain_b42640e7/can-an-ai-agent-behave-like-a-human-a-12-hour-experiment-with-storycaptcha-1661</guid>
      <description>&lt;p&gt;A day ago, I came across a LinkedIn post from Tyler Richards showcasing an experimental CAPTCHA called StoryCaptcha.&lt;/p&gt;

&lt;p&gt;The concept was simple but unusual.&lt;/p&gt;

&lt;p&gt;Instead of asking users to identify traffic lights or solve image puzzles, StoryCaptcha asks users to write a short story based on a random prompt and then evaluates the interaction using behavioral signals.&lt;/p&gt;

&lt;p&gt;The goal wasn't to build a production-ready CAPTCHA.&lt;/p&gt;

&lt;p&gt;It was an experiment exploring behavioral biometrics and user interaction patterns.&lt;/p&gt;

&lt;p&gt;As someone working in web scraping and anti-bot research, I immediately became curious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when an AI agent attempts the challenge?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;More importantly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can an AI-controlled browser generate interaction patterns that a behavioral CAPTCHA considers human?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I spent the next 12 hours trying to answer that question.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;For this experiment I used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Playwright MCP&lt;/li&gt;
&lt;li&gt;VS Code&lt;/li&gt;
&lt;li&gt;GitHub Copilot&lt;/li&gt;
&lt;li&gt;Chromium&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The objective wasn't to bypass the CAPTCHA.&lt;/p&gt;

&lt;p&gt;The objective was to understand how a behavioral scoring system evaluates AI-driven interactions.&lt;/p&gt;




&lt;h2&gt;
  
  
  First Attempt: 56/100
&lt;/h2&gt;

&lt;p&gt;My first run scored 56/100 and failed.&lt;/p&gt;

&lt;p&gt;The reason quickly became obvious.&lt;/p&gt;

&lt;p&gt;The AI agent was behaving exactly how an automation system would behave:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copying and pasting content&lt;/li&gt;
&lt;li&gt;Completing actions immediately&lt;/li&gt;
&lt;li&gt;Following deterministic patterns&lt;/li&gt;
&lt;li&gt;Showing almost no hesitation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Efficient.&lt;/p&gt;

&lt;p&gt;But not very human.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Interesting Part
&lt;/h2&gt;

&lt;p&gt;Unlike many behavioral systems, StoryCaptcha actually exposes a large portion of the signals it evaluates.&lt;/p&gt;

&lt;p&gt;The dashboard displayed metrics such as:&lt;/p&gt;

&lt;h3&gt;
  
  
  Typing Signals
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Typed vs Pasted&lt;/li&gt;
&lt;li&gt;Keystrokes per character&lt;/li&gt;
&lt;li&gt;Key-hold (dwell) profile&lt;/li&gt;
&lt;li&gt;Key-overlap (rollover)&lt;/li&gt;
&lt;li&gt;Rhythm variability&lt;/li&gt;
&lt;li&gt;Non-repeating intervals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Behavioral Signals
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Cognitive pauses&lt;/li&gt;
&lt;li&gt;Inter-interaction timing&lt;/li&gt;
&lt;li&gt;Correction behavior&lt;/li&gt;
&lt;li&gt;Backspace usage&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Mouse Signals
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Mouse path curvature&lt;/li&gt;
&lt;li&gt;Straightness&lt;/li&gt;
&lt;li&gt;Teleport detection&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Content Signals
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Reads like language&lt;/li&gt;
&lt;li&gt;On-topic for prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This transformed the experiment from simple testing into a feedback-driven behavioral analysis exercise.&lt;/p&gt;

&lt;p&gt;Instead of guessing blindly, I could observe which signals were being evaluated and adjust the agent's behavior accordingly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observation #1: Copy-Paste Was a Dead Giveaway
&lt;/h2&gt;

&lt;p&gt;Initially the AI agent preferred copying and pasting the story.&lt;/p&gt;

&lt;p&gt;StoryCaptcha immediately detected this.&lt;/p&gt;

&lt;p&gt;The first optimization was simple:&lt;/p&gt;

&lt;p&gt;Instead of pasting content, I instructed the agent to type the response character by character.&lt;/p&gt;

&lt;p&gt;The score improved.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observation #2: Human Typing Isn't Uniform
&lt;/h2&gt;

&lt;p&gt;The next issue was typing cadence.&lt;/p&gt;

&lt;p&gt;Humans don't type with perfectly consistent timing.&lt;/p&gt;

&lt;p&gt;Sometimes we pause.&lt;/p&gt;

&lt;p&gt;Sometimes we think.&lt;/p&gt;

&lt;p&gt;Sometimes we speed up.&lt;/p&gt;

&lt;p&gt;I instructed the agent to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use random keystroke delays&lt;/li&gt;
&lt;li&gt;Avoid identical intervals&lt;/li&gt;
&lt;li&gt;Pause naturally between thoughts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The score improved again.&lt;/p&gt;

&lt;p&gt;One metric I paid particular attention to was:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Non-Repeating Intervals&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;StoryCaptcha was actively measuring how repetitive the timing patterns were.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observation #3: Humans Make Mistakes
&lt;/h2&gt;

&lt;p&gt;Humans aren't perfect typists.&lt;/p&gt;

&lt;p&gt;We:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Misspell words&lt;/li&gt;
&lt;li&gt;Hit incorrect keys&lt;/li&gt;
&lt;li&gt;Use backspace&lt;/li&gt;
&lt;li&gt;Correct ourselves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Automation rarely does.&lt;/p&gt;

&lt;p&gt;So I instructed the agent to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Occasionally introduce spelling mistakes&lt;/li&gt;
&lt;li&gt;Use backspace corrections&lt;/li&gt;
&lt;li&gt;Continue naturally after correction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The dashboard reflected these behaviors through correction metrics and the overall score improved.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observation #4: Humans Don't Instantly Click Everything
&lt;/h2&gt;

&lt;p&gt;The agent was still too efficient.&lt;/p&gt;

&lt;p&gt;Humans typically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read content&lt;/li&gt;
&lt;li&gt;Hover over elements&lt;/li&gt;
&lt;li&gt;Pause before actions&lt;/li&gt;
&lt;li&gt;Explore pages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I encouraged more natural cursor movement and hovering behavior.&lt;/p&gt;

&lt;p&gt;StoryCaptcha evaluates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mouse path curvature&lt;/li&gt;
&lt;li&gt;Teleport detection&lt;/li&gt;
&lt;li&gt;Interaction timing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So this adjustment had a measurable impact.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observation #5: One Signal Refused To Cooperate
&lt;/h2&gt;

&lt;p&gt;The most fascinating metric was:&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Overlap (Rollover)
&lt;/h3&gt;

&lt;p&gt;StoryCaptcha reported:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Human ≈ 25%–50% overlap&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My agent consistently scored:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;0%&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Even after improving almost every other metric.&lt;/p&gt;

&lt;p&gt;This was particularly interesting because it exposed a difference between simulated typing and real human keyboard behavior.&lt;/p&gt;

&lt;p&gt;Humans frequently begin pressing the next key before fully releasing the previous key.&lt;/p&gt;

&lt;p&gt;Many automation frameworks generate perfectly sequential key events.&lt;/p&gt;

&lt;p&gt;The CAPTCHA was successfully identifying that distinction.&lt;/p&gt;

&lt;p&gt;Despite scoring well overall, this remained one of the strongest indicators that the interaction was not genuinely human.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Result
&lt;/h2&gt;

&lt;p&gt;After roughly 10 experimental runs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attempt&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Initial&lt;/td&gt;
&lt;td&gt;56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intermediate&lt;/td&gt;
&lt;td&gt;60–70&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimized&lt;/td&gt;
&lt;td&gt;76–77&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The challenge eventually passed consistently.&lt;/p&gt;

&lt;p&gt;However, the score wasn't the most valuable outcome.&lt;/p&gt;

&lt;p&gt;The real value was understanding how behavioral features influenced the evaluation.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Behavioral Biometrics Are More Than Mouse Movement
&lt;/h3&gt;

&lt;p&gt;Before this experiment, most discussions I encountered focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser fingerprints&lt;/li&gt;
&lt;li&gt;TLS fingerprints&lt;/li&gt;
&lt;li&gt;Device identification&lt;/li&gt;
&lt;li&gt;Network reputation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This experiment reminded me that behavior itself can become a powerful signal.&lt;/p&gt;

&lt;p&gt;Not just what actions occur.&lt;/p&gt;

&lt;p&gt;But how they occur.&lt;/p&gt;




&lt;h3&gt;
  
  
  AI Agents Create New Challenges
&lt;/h3&gt;

&lt;p&gt;Traditional automation focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Speed&lt;/li&gt;
&lt;li&gt;Efficiency&lt;/li&gt;
&lt;li&gt;Determinism&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI agents introduce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exploration&lt;/li&gt;
&lt;li&gt;Context awareness&lt;/li&gt;
&lt;li&gt;Adaptive behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As AI agents become more common, behavioral detection systems will likely become increasingly important.&lt;/p&gt;




&lt;h3&gt;
  
  
  Reverse Engineering Doesn't Always Require Source Code
&lt;/h3&gt;

&lt;p&gt;I never saw StoryCaptcha's implementation.&lt;/p&gt;

&lt;p&gt;I never saw its scoring algorithm.&lt;/p&gt;

&lt;p&gt;But by observing outputs, forming hypotheses, and iteratively adjusting behavior, I was still able to learn a surprising amount about what the system valued.&lt;/p&gt;

&lt;p&gt;That's one of the things I enjoy most about reverse engineering:&lt;/p&gt;

&lt;p&gt;Observe.&lt;/p&gt;

&lt;p&gt;Hypothesize.&lt;/p&gt;

&lt;p&gt;Test.&lt;/p&gt;

&lt;p&gt;Repeat.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;I started this experiment asking:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can an AI agent behave like a human?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Twelve hours later, I think the more interesting question is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which parts of human behavior are hardest for machines to reproduce?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer, at least from this experiment, appears to be much more nuanced than simply moving a mouse or typing text.&lt;/p&gt;

&lt;p&gt;And that's exactly what made the exercise worth exploring.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>automation</category>
      <category>mcp</category>
    </item>
  </channel>
</rss>
