I built an open-source tool that lets AI agents hire real humans to test your product

#testing #opensource #ai #webdev

The problem

I use AI coding tools (Claude Code, Cursor) for almost everything. Development speed is incredible — what used to take a week now takes an afternoon.

But there's a gap nobody talks about: "it compiles" ≠ "users can use it".

I kept shipping features that worked perfectly in my terminal but confused real users. And manual QA was eating all the time I saved from AI coding. The irony was painful.

Traditional UX testing tools like UserTesting or Maze didn't help either — they're built for product managers reviewing dashboards, not for AI agents writing code.

What I built

human_test() is an open-source platform that closes the loop between AI coding and real user experience.

Tell your agent: "Test my app at localhost:3000, focus on the signup flow"

Then your agent calls human_test(), 5 real humans test your product with screen recording and audio narration, AI analyzes the recordings and generates a structured report, finds 3 critical issues, auto-generates fixes, and creates a PR.
You do nothing.

The full workflow

Create a task — provide a URL (or description for mobile/desktop apps) and what to focus on
Real humans test — testers claim the task, record their screen and microphone, go through a guided feedback flow (first impression, task steps, NPS rating)
AI generates a report — extracts key frames from recordings, uses vision AI to analyze usability issues, aggregates everything into a structured report
Auto-fix — if you provide a repo URL, it clones your code, generates file-level diffs, and creates a PR

The entire loop runs without you touching anything after the initial call.

What makes it different

The key insight: this is not a dashboard for humans to interpret. It's a structured API that AI agents can call, parse, and act on directly.

The report is designed for machines. Each issue has a severity tag like CRITICAL, MAJOR, or MINOR, plus three fields:

[CRITICAL] Signup button unresponsive on mobile

Evidence: 3/5 testers couldn't complete registration on iPhone

Impact: 60% of mobile users will abandon signup

Recommendation: Fix touch target size, minimum 44x44px

Your agent reads the severity, reads the Recommendation, and writes a targeted fix. No human interpretation needed.

Other differences from traditional UX tools: