TL;DR — mk-qa-master is an open-source MCP server that lets Claude / Cursor / Codex / Gemini drive your real test suite — pytest, Jest, Cypress, Go test, and Maestro for mobile. 16 tools, 5 categories, a three-layer QA knowledge architecture.
uvx-installable. MIT.
The moment I stopped blaming the model
The 5th time Claude wrote # TODO: add real selector here in a generated test, I tried a smarter prompt. The 20th time, I switched models. The 100th time, I stopped blaming the LLM.
I'm a QA engineer. I've watched LLMs write beautiful-looking test scaffolds for two years now, and every one of them collapses at the same place:
The model can read your code. It cannot see your live DOM, your mobile view hierarchy, your last 10 test runs, or that
checkout-flow.spec.tshas been red 7 times in 14 days.
So it guesses. Guesses are how you get # TODO.
The fix isn't a smarter prompt. It's giving the LLM access to the things it's currently guessing about.
That's what the Model Context Protocol (MCP) is for. And that's why I built mk-qa-master.
What "AI for QA" usually means
Most AI-for-testing products today fall into one of three buckets:
- IDE plugins that emit test files — Copilot Tests, Cursor's test generator. Great in a screenshot. They write the file, you fix the selectors.
- "Just prompt ChatGPT" tutorials — works for one test, falls apart at ten. No persistence, no awareness of what's actually flaky, no runtime feedback.
- End-to-end AI testing SaaS — record-and-playback wrappers. They own your test infrastructure, charge per seat, and you're locked in.
What's missing from all three: the AI never touches the runner. It writes code; you run; you debug; you tell the AI what broke. It's a chatbot pretending to be an engineer.
The reframe: stop asking AI to write tests. Make it drive your test runner.
What MCP changes
MCP (introduced by Anthropic in late 2024, now adopted by Cursor, Codex CLI, Gemini CLI, Zed, Cline and others) lets an AI client call tools — not just see text, but trigger actions, read structured responses, chain them.
An MCP server is just a process that exposes tools. Drop it into your client config:
{
"mcpServers": {
"mk-qa-master": {
"command": "uvx",
"args": ["mk-qa-master"],
"env": {
"QA_RUNNER": "pytest",
"QA_PROJECT_ROOT": "/path/to/your/project"
}
}
}
}
…and now Claude has 16 new things it can do in your project: probe the DOM of a live URL, list your existing tests, generate new ones with real selectors, run them, read JUnit XML, write an optimization plan based on the last N runs.
Your runner just became part of the AI's tool surface.
mk-qa-master in 60 seconds
16 tools across 5 categories. You don't need to memorize names; the README has a cookbook of natural-language prompts that map to each chain.
| Category | Tools | What it does |
|---|---|---|
| Discover |
get_runner_info · list_tests · analyze_url · analyze_screen
|
Which framework is active. What tests exist. Probe a URL or a live mobile screen for form / nav / CTA modules with real selectors. |
| Generate |
generate_test · auto_generate_tests · codegen · init_qa_knowledge · get_qa_context
|
Emit runnable pytest .py or Maestro .yaml. Not # TODO placeholders. |
| Run |
run_tests · run_failed
|
Drive pytest / Jest / Cypress / Go test / Maestro. Auto-retry, JUnit XML, screenshots, Playwright trace.zip, Maestro recordings. |
| Report |
get_test_report · get_failure_details · generate_html_report · get_test_history
|
Outcome history, error signatures, per-test flake scores. |
| Advise | get_optimization_plan |
Three lenses: suite quality (flaky vs broken vs slow), MCP usability, AI effectiveness. Output is a ranked action list — what to fix next, with evidence. |
Switch frameworks with a single env var: QA_RUNNER=pytest | jest | cypress | go | maestro. Web and mobile share the same MCP surface — analyze_screen works on iOS Simulator, Android Emulator, real devices, and (yes) BlueStacks via adb connect.
The part nobody else builds: a three-layer QA knowledge architecture
This is what makes mk-qa-master not monkey-testing.
A DOM-only analyzer produces "empty field should error" for every form on the internet. That's not testing, it's noise. To produce a test that means anything, the generator needs domain context. So I layered three:
Layer 1 — Built-in
ISTQB's seven principles, equivalence partitioning, decision tables, state transitions, the test pyramid, shift-left, mobile testing checklists, QA metrics — baked into the server. The AI gets methodology by default, not by accident.
Layer 2 — Your project's qa-knowledge.md
Drop a file at your project root with your business rules, historical bugs, standard assertion copy, user-journey snippets, technical constraints. init_qa_knowledge scaffolds one. The MCP loads it on every relevant tool call. This is where the "AI doesn't know my business" problem actually gets solved.
Layer 3 — Per-test inline
Pass a business_context slice into generate_test. It gets printed as a # Business context: block inside the generated test, so the next reviewer sees why this test exists without leaving the file.
Three layers of context. One MCP. Pile them up and the AI stops producing "click the button, see something happen" garbage.
A real session
Here's what a Monday morning with this looks like:
you ▸ Test https://your-site/login — one runnable case per module
→ analyze_url ✓ 4 modules · 12 endpoints · 18 candidate cases
→ generate_test ✓ tests/test_login.py (4 cases)
→ run_tests ⚠ 3 passed, 1 failed
→ get_optimization_plan ✓ next priorities:
🔴 broken · checkout-coupon-rule (same signature × 3 runs = real bug)
🟡 flaky · login-with-2fa (PFPFP outcome string, 60% flake score)
🟢 stable · all 12 nav-menu cases
you ▸ Fix the broken one first. Show me the failure.
→ get_failure_details ✓ checkout-coupon-rule:
Expected: "Discount applied: $5.00"
Got: "Discount applied: NaN"
First failed: 3 runs ago, on PR #142
Notice what's happening here:
- The AI doesn't ask which test is flaky — it pulls flake history from
tests-history/. - The AI doesn't guess selectors —
analyze_urlgave it real selectors from the live page. - The AI doesn't just run tests — it returns a ranked action list. "This is broken, this is flaky, this is stable." Evidence, not gut feel.
This isn't AI writing tests. This is AI doing QA.
What this deliberately is not
| Not | Use this instead |
|---|---|
| A test framework | You bring pytest / Jest / Cypress / Go test / Maestro — mk-qa-master drives them |
| An LLM | Your AI client (Claude / Cursor / Codex / Gemini) does the reasoning |
| A CI runner | Runs locally, produces JUnit XML; pipe to GitHub Actions / Jenkins as usual |
| A source-code analyzer | Looks at live DOM and view hierarchy, not your repo's source |
| A SaaS dashboard | MCP-native, lives in your AI client. HTML reports are self-contained .html files |
Knowing what a tool isn't is half of trust.
Quick start
uvx mk-qa-master
# or: pip install mk-qa-master
Claude Desktop config lives at:
-
macOS:
~/Library/Application Support/Claude/claude_desktop_config.json -
Windows:
%APPDATA%\Claude\claude_desktop_config.json -
Linux:
~/.config/Claude/claude_desktop_config.json
{
"mcpServers": {
"mk-qa-master": {
"command": "uvx",
"args": ["mk-qa-master"],
"env": {
"QA_RUNNER": "pytest",
"QA_PROJECT_ROOT": "/path/to/your/project"
}
}
}
}
Restart your client. Then in any AI session, say:
"Test
https://your-site/login— one runnable case per module, then tell me which existing test is most likely flaky."
That's the whole UX. No menus. No buttons. The AI chains the tools.
This is one of three
mk-qa-master is the execution end of a family I'm building solo:
- mk-plan-master — turns a pile of 30–200 raw ideas into RICE-scored, spec-draft-ready initiatives. Hands off to ↓
- mk-spec-master — parses specs into scenarios, keeps a live spec ↔ test coverage matrix, grades the specs themselves. Hands off to ↓
- mk-qa-master — drives the runner, generates tests, advises on what's broken vs flaky vs slow.
Together they form an end-to-end AI dev pipeline:
Idea → Plan → Spec → Code (your IDE) → Test → Coverage → Coach
mk-plan mk-spec your IDE mk-qa mk-spec both
The family wraps the rails; code-writing stays in your IDE (Claude Code / Cursor / Copilot). I deliberately don't try to rebuild what your IDE already does well.
The other two MCPs get their own posts. Follow if that pipeline sounds useful.
Links
- GitHub: https://github.com/kao273183/mk-qa-master
- PyPI: https://pypi.org/project/mk-qa-master/
- Family site: https://mcp.chenjundigital.com
- License: MIT
-
Family:
mk-qa-master·mk-spec-master·mk-plan-master
If your team is QA-heavy and you've been frustrated by AI tools that write # TODO instead of real tests — give it a try. If you've found a better way to do this, I'd genuinely love to hear about it in the comments. This is an opinionated tool and I'm still iterating.
A star helps the algorithm find people like you. Feedback helps more.
— Jack Kao, building solo.
Top comments (0)