The problem
I use AI coding tools (Claude Code, Cursor) for almost everything. Development speed is incredible — what used to take a week now takes an afternoon.
But there's a gap nobody talks about: "it compiles" ≠ "users can use it".
I kept shipping features that worked perfectly in my terminal but confused real users. And manual QA was eating all the time I saved from AI coding. The irony was painful.
Traditional UX testing tools like UserTesting or Maze didn't help either — they're built for product managers reviewing dashboards, not for AI agents writing code.
What I built
human_test() is an open-source platform that closes the loop between AI coding and real user experience.
Tell your agent: "Test my app at localhost:3000, focus on the signup flow"
Then your agent calls human_test(), 5 real humans test your product with screen recording and audio narration, AI analyzes the recordings and generates a structured report, finds 3 critical issues, auto-generates fixes, and creates a PR.
You do nothing.
The full workflow
- Create a task — provide a URL (or description for mobile/desktop apps) and what to focus on
- Real humans test — testers claim the task, record their screen and microphone, go through a guided feedback flow (first impression, task steps, NPS rating)
- AI generates a report — extracts key frames from recordings, uses vision AI to analyze usability issues, aggregates everything into a structured report
- Auto-fix — if you provide a repo URL, it clones your code, generates file-level diffs, and creates a PR
The entire loop runs without you touching anything after the initial call.
What makes it different
The key insight: this is not a dashboard for humans to interpret. It's a structured API that AI agents can call, parse, and act on directly.
The report is designed for machines. Each issue has a severity tag like CRITICAL, MAJOR, or MINOR, plus three fields:
[CRITICAL] Signup button unresponsive on mobile
- Evidence: 3/5 testers couldn't complete registration on iPhone
- Impact: 60% of mobile users will abandon signup
- Recommendation: Fix touch target size, minimum 44x44px
Your agent reads the severity, reads the Recommendation, and writes a targeted fix. No human interpretation needed.
Other differences from traditional UX tools:
- Webhook-driven — async notifications when reports and code fixes are ready
- Auto-PR — from usability issue to pull request, fully automated
- Self-hostable — runs locally with SQLite, your data stays on your machine
- Open source — MIT licensed
Try it
Option 1: Self-host (3 commands)
npm i -g humantest-app
humantest init
humantest start
Local SQLite database, zero external dependencies.
Option 2: AI agent skill (1 command)
npx skills add avivahe326/human-test-skill
Works with Claude Code, Cursor, Windsurf. Then just ask your agent in natural language: "Run a usability test on my checkout flow with 3 testers"
Option 3: Hosted version
Use human-test.work — zero setup.
## Tech stack
Next.js 16, Prisma, NextAuth, Tailwind CSS. Supports Anthropic and OpenAI for report generation. SQLite for local dev, MySQL for production.
Links
- GitHub: github.com/avivahe326/humantest
- Live: human-test.work
I'd love to hear from you:
- If you use AI coding tools — how do you handle usability testing today? Do you just ship and hope for the best?
- If you try human_test() — what's your first impression? What's missing?
- If you've built similar tools — what did you learn?
Drop a comment below or open an issue on GitHub.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.