<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: reallizhi</title>
    <description>The latest articles on DEV Community by reallizhi (@reallizhi).</description>
    <link>https://dev.to/reallizhi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3813347%2F649e054b-86f2-467d-a588-9b6b1929a97c.png</url>
      <title>DEV Community: reallizhi</title>
      <link>https://dev.to/reallizhi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/reallizhi"/>
    <language>en</language>
    <item>
      <title>I built an open-source tool that lets AI agents hire real humans to test your product</title>
      <dc:creator>reallizhi</dc:creator>
      <pubDate>Sun, 08 Mar 2026 19:17:58 +0000</pubDate>
      <link>https://dev.to/reallizhi/i-built-an-open-source-tool-that-lets-ai-agents-hire-real-humans-to-test-your-product-m9a</link>
      <guid>https://dev.to/reallizhi/i-built-an-open-source-tool-that-lets-ai-agents-hire-real-humans-to-test-your-product-m9a</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;I use AI coding tools (Claude Code, Cursor) for almost everything. Development speed is incredible — what used to take a week now takes an afternoon.&lt;/p&gt;

&lt;p&gt;But there's a gap nobody talks about: "it compiles" ≠ "users can use it".&lt;/p&gt;

&lt;p&gt;I kept shipping features that worked perfectly in my terminal but confused real users. And manual QA was eating all the time I saved from AI coding. The irony was painful.&lt;/p&gt;

&lt;p&gt;Traditional UX testing tools like UserTesting or Maze didn't help either — they're built for product managers reviewing dashboards, not for AI agents writing code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;human_test()&lt;/strong&gt; is an open-source platform that closes the loop between AI coding and real user experience.&lt;/p&gt;

&lt;p&gt;Tell your agent: "Test my app at localhost:3000, focus on the signup flow"&lt;/p&gt;

&lt;p&gt;Then your agent calls human_test(), 5 real humans test your product with screen recording and audio narration, AI analyzes the recordings and generates a structured report, finds 3 critical issues, auto-generates fixes, and creates a PR.&lt;br&gt;
  You do nothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The full workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create a task&lt;/strong&gt; — provide a URL (or description for mobile/desktop apps) and what to focus on&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real humans test&lt;/strong&gt; — testers claim the task, record their screen and microphone, go through a guided feedback flow (first impression, task steps, NPS rating)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI generates a report&lt;/strong&gt; — extracts key frames from recordings, uses vision AI to analyze usability issues, aggregates everything into a structured report&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-fix&lt;/strong&gt; — if you provide a repo URL, it clones your code, generates file-level diffs, and creates a PR&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The entire loop runs without you touching anything after the initial call.&lt;/p&gt;

&lt;h2&gt;
  
  
  What makes it different
&lt;/h2&gt;

&lt;p&gt;The key insight: this is not a dashboard for humans to interpret. It's a &lt;strong&gt;structured API that AI agents can call, parse, and act on directly&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The report is designed for machines. Each issue has a severity tag like CRITICAL, MAJOR, or MINOR, plus three fields:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[CRITICAL] Signup button unresponsive on mobile&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Evidence:&lt;/strong&gt; 3/5 testers couldn't complete registration on iPhone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact:&lt;/strong&gt; 60% of mobile users will abandon signup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recommendation:&lt;/strong&gt; Fix touch target size, minimum 44x44px&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your agent reads the severity, reads the Recommendation, and writes a targeted fix. No human interpretation needed.&lt;/p&gt;

&lt;p&gt;Other differences from traditional UX tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Webhook-driven&lt;/strong&gt; — async notifications when reports and code fixes are ready&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-PR&lt;/strong&gt; — from usability issue to pull request, fully automated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-hostable&lt;/strong&gt; — runs locally with SQLite, your data stays on your machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open source&lt;/strong&gt; — MIT licensed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Option 1: Self-host (3 commands)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;npm i -g humantest-app&lt;br&gt;
  humantest init&lt;br&gt;
  humantest start&lt;/p&gt;

&lt;p&gt;Local SQLite database, zero external dependencies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 2: AI agent skill (1 command)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;npx skills add avivahe326/human-test-skill&lt;/p&gt;

&lt;p&gt;Works with Claude Code, Cursor, Windsurf. Then just ask your agent in natural language: "Run a usability test on my checkout flow with 3 testers"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 3: Hosted version&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use &lt;a href="https://human-test.work" rel="noopener noreferrer"&gt;human-test.work&lt;/a&gt; — zero setup.&lt;/p&gt;

&lt;p&gt;## Tech stack&lt;/p&gt;

&lt;p&gt;Next.js 16, Prisma, NextAuth, Tailwind CSS. Supports Anthropic and OpenAI for report generation. SQLite for local dev, MySQL for production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/avivahe326/humantest" rel="noopener noreferrer"&gt;github.com/avivahe326/humantest&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live&lt;/strong&gt;: &lt;a href="https://human-test.work" rel="noopener noreferrer"&gt;human-test.work&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;I'd love to hear from you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;If you use AI coding tools&lt;/strong&gt; — how do you handle usability testing today? Do you just ship and hope for the best?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you try human_test()&lt;/strong&gt; — what's your first impression? What's missing?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If you've built similar tools&lt;/strong&gt; — what did you learn?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drop a comment below or open an issue on &lt;a href="https://github.com/avivahe326/humantest" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>testing</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
