DEV Community

Cover image for I built an open-source tool that lets AI agents hire real humans to test your product
reallizhi
reallizhi

Posted on

I built an open-source tool that lets AI agents hire real humans to test your product

The problem

I use AI coding tools (Claude Code, Cursor) for almost everything. Development speed is incredible — what used to take a week now takes an afternoon.

But there's a gap nobody talks about: "it compiles" ≠ "users can use it".

I kept shipping features that worked perfectly in my terminal but confused real users. And manual QA was eating all the time I saved from AI coding. The irony was painful.

Traditional UX testing tools like UserTesting or Maze didn't help either — they're built for product managers reviewing dashboards, not for AI agents writing code.

What I built

human_test() is an open-source platform that closes the loop between AI coding and real user experience.

Tell your agent: "Test my app at localhost:3000, focus on the signup flow"

Then your agent calls human_test(), 5 real humans test your product with screen recording and audio narration, AI analyzes the recordings and generates a structured report, finds 3 critical issues, auto-generates fixes, and creates a PR.
You do nothing.

The full workflow

  1. Create a task — provide a URL (or description for mobile/desktop apps) and what to focus on
  2. Real humans test — testers claim the task, record their screen and microphone, go through a guided feedback flow (first impression, task steps, NPS rating)
  3. AI generates a report — extracts key frames from recordings, uses vision AI to analyze usability issues, aggregates everything into a structured report
  4. Auto-fix — if you provide a repo URL, it clones your code, generates file-level diffs, and creates a PR

The entire loop runs without you touching anything after the initial call.

What makes it different

The key insight: this is not a dashboard for humans to interpret. It's a structured API that AI agents can call, parse, and act on directly.

The report is designed for machines. Each issue has a severity tag like CRITICAL, MAJOR, or MINOR, plus three fields:

[CRITICAL] Signup button unresponsive on mobile

  • Evidence: 3/5 testers couldn't complete registration on iPhone
  • Impact: 60% of mobile users will abandon signup
  • Recommendation: Fix touch target size, minimum 44x44px

Your agent reads the severity, reads the Recommendation, and writes a targeted fix. No human interpretation needed.

Other differences from traditional UX tools:

  • Webhook-driven — async notifications when reports and code fixes are ready
  • Auto-PR — from usability issue to pull request, fully automated
  • Self-hostable — runs locally with SQLite, your data stays on your machine
  • Open source — MIT licensed

Try it

Option 1: Self-host (3 commands)

npm i -g humantest-app
humantest init
humantest start

Local SQLite database, zero external dependencies.

Option 2: AI agent skill (1 command)

npx skills add avivahe326/human-test-skill

Works with Claude Code, Cursor, Windsurf. Then just ask your agent in natural language: "Run a usability test on my checkout flow with 3 testers"

Option 3: Hosted version

Use human-test.work — zero setup.

## Tech stack

Next.js 16, Prisma, NextAuth, Tailwind CSS. Supports Anthropic and OpenAI for report generation. SQLite for local dev, MySQL for production.

Links


I'd love to hear from you:

  • If you use AI coding tools — how do you handle usability testing today? Do you just ship and hope for the best?
  • If you try human_test() — what's your first impression? What's missing?
  • If you've built similar tools — what did you learn?

Drop a comment below or open an issue on GitHub.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.