TL;DR — Today I shipped
mk-qa-masterv0.6.0 (Schemathesis) in the morning and v0.6.1 (Newman / Postman) in the afternoon. Same MCP tool surface (still 16 tools), samereport.json/ history / flake / coach pipeline, two new ways to drive API tests from Claude / Cursor / Codex. Total code: ~300 lines across two runners. Total elapsed: about 6 hours. This post is the architecture story.
The setup
I'm a QA engineer building mk-*, an open-source family of MCP servers for the AI dev pipeline. Last week I shipped v0.5.1 of mk-qa-master with five runners — pytest / Jest / Cypress / Go test / Maestro for mobile.
Two days ago, while updating the family-site copy, I added a line that said "mk-qa-master tests web + mobile + API". The first two were honest. The third was a stretch — yes, your existing pytest-with-httpx tests would run, but there was no dedicated API runner. A QA reader could install it expecting OpenAPI ingestion or Postman support, and find neither.
I had two options:
- Walk the marketing copy back to "we drive web + mobile, your existing API tests ride along"
- Make the copy true
I picked option 2. Two API runners, same day.
This post is how that played out.
Why MCP makes "ship two runners in one day" plausible
The mk-qa-master architecture has a runner abstraction that already shipped with five frameworks. Each runner is a Python class implementing the same interface:
class TestRunner:
name: str
def list_tests(self) -> str: ...
def run_tests(self, filter: str | None = None, **kwargs) -> dict: ...
def run_failed(self) -> dict: ...
def get_report_summary(self) -> dict: ...
Whatever framework the runner wraps, the MCP tool surface is the same 16 tools. The AI client (Claude, Cursor, Codex, Gemini) calls run_tests / get_optimization_plan / get_failure_details the same way regardless of whether you're testing a React app, a Go service, an iOS Simulator, or an API. The runner translates.
Adding a new runner = ~150 lines of Python + register in REGISTRY + write a sample + bump version. That's it.
This is the MCP-level value claim: the AI doesn't relearn your stack. You add a runner; the AI's tool surface inherits the new capability automatically.
So shipping API testing was less "design a new product" and more "fill in the runner slot the abstraction was waiting for."
v0.6.0 — Schemathesis (OpenAPI / Swagger)
Schemathesis reads an OpenAPI 3.x or Swagger 2.0 schema and fuzzes every operation with property-based tests — response schema conformance, status-code conformance, server-error detection. Hand it a URL or file path, it spits out coverage in 30–60 seconds.
The runner wraps the schemathesis run CLI. User-facing config:
{
"mcpServers": {
"mk-qa-master": {
"command": "uvx",
"args": ["mk-qa-master[api]"],
"env": {
"QA_RUNNER": "schemathesis",
"QA_OPENAPI_URL": "https://api.example.com/openapi.json"
}
}
}
}
That's it. Restart your client. Then in any session:
"Test the API at https://api.example.com/openapi.json — find anything broken, then give me a prioritized action plan."
Real-feeling session transcript:
you ▸ Test https://api.example.com/openapi.json — find anything broken
and give me a prioritized action plan.
→ get_runner_info ✓ schemathesis · OpenAPI 3.0.3 detected
→ list_tests ✓ 24 endpoints × 5 checks = 120 cases
→ run_tests ⚠ 112 passed, 6 failed, 2 errored (47s)
→ get_optimization_plan ✓ next priorities:
🔴 broken · POST /users :: response_schema_conformance
Same Schemathesis signature × 3 → "status 500, expected 201|400"
Action: response schema doesn't allow 500; either fix the
validation bug or add 500 to the schema's responses block
🔴 broken · GET /search :: not_a_server_error
Crashes under `?q=null` and `?limit=-1`
Action: missing input validation on the search handler
🟡 warn · DELETE /users/{id} returned 204 when schema says 200
Likely safe to update the schema; verify with PM
🟢 stable · 18 endpoints, no findings
The advisor's classification is the same logic the suite uses for UI tests — 3 consecutive failures with the same error signature = broken. A test that's red-green-red across runs = flaky. mk-qa-master doesn't differentiate "the API is broken" from "the UI is broken" — same flake-score, same broken classification, same advisor.
That's the abstraction paying off.
The one CLI-flag mistake that cost me 20 minutes
Here's the part that was not smooth.
The PRD I wrote in the morning said the runner would invoke schemathesis like this:
schemathesis run \
--checks all \
--report-json /tmp/report.json \ # ⚠ this flag does not exist
--junit-xml /tmp/junit.xml \
--hypothesis-database=none \
$URL
The subagent implementing the runner followed the spec faithfully. CI choked instantly:
Error: No such option '--report-json'. Did you mean '--report'?
Schemathesis 3.x has no JSON-report flag. The PRD assumed one based on... I'm not sure what. Maybe an older version, maybe wishful thinking, maybe just a hallucination in my own design doc.
Fix: rewrite _normalize_report to parse --junit-xml output instead — JUnit XML is stdlib-parseable (xml.etree.ElementTree) and standard across every test runner I've ever touched. Took 20 minutes.
Lesson: when writing a PRD that hardcodes CLI flags, run <tool> --help on the actual installed version before committing. The spec is only worth what the underlying tool actually supports.
I'll be repeating this to myself for v0.7.
v0.6.1 — Newman (Postman collections)
After lunch I shipped the second runner.
Newman is the official CLI for running Postman collections. Postman has ~30M users; a huge chunk of them have collections in version control already. Newman + that collection JSON = headless replay of every request and pm.test(...) assertion.
Runner shape, same as Schemathesis but for Postman:
{
"mcpServers": {
"mk-qa-master": {
"command": "uvx",
"args": ["mk-qa-master"],
"env": {
"QA_RUNNER": "newman",
"QA_POSTMAN_COLLECTION": "/path/to/your-api.postman_collection.json",
"QA_POSTMAN_ENVIRONMENT": "/path/to/staging.postman_environment.json"
}
}
}
}
Newman is npm-side, not pip-side, so it's a system prerequisite rather than a Python optional dep:
npm install -g newman
This was a small choice that took 2 minutes to settle: do you bundle Newman into the Python optional dep group somehow? You can't — pyproject.toml only knows about Python. So Newman gets the npm install -g treatment, the runner does shutil.which("newman"), and if it's missing the user sees a clear ImportError pointing at the install command.
The runner translates Newman's JSON report (run.executions[] + run.failures[]) into mk-qa-master's report.json shape. One nodeid per assertion:
GET {{baseUrl}}/books :: Books :: List books
POST {{baseUrl}}/books :: Books :: Create book
GET {{baseUrl}}/books/{{bookId}} :: Books :: Get book by id
Same history / flake / coach pipeline as before.
No CLI-flag mistake this time — I ran newman run --help first, sketched the flag list, then started implementation. Lesson learned from the morning.
Schemathesis vs Newman — when to use which
I get asked this every time I show the two runners. Here's the call I make:
| You have… | Use… |
|---|---|
| An OpenAPI 3.x / Swagger 2.0 schema and you want generated tests across the whole surface | Schemathesis — fuzz-driven, finds bugs you didn't think to write tests for |
| A Postman collection your team already curates by hand | Newman — re-uses your existing investment, runs the assertions you already wrote |
| Both (a schema for breadth + a collection for happy paths) | Run both in the same session — Schemathesis catches schema drift, Newman catches business-logic regressions |
| Neither, but you have pytest tests hitting your API | Stay on QA_RUNNER=pytest, no migration needed — your existing tests already ride the same pipeline |
The point of having both isn't to replace either ecosystem. It's that the AI doesn't need to know which one is active. From Claude's perspective, run_tests returns the same shape. The runner does the translation.
What I'd do differently
Things I'd change on a redo:
-
Run
--helpfirst on every CLI before writing the PRD. (See above.) - Single PRD covering Phase 1 + Phase 2 instead of writing Phase 2 ratification as an appendix. Mid-sized features deserve a single design doc, not a doc + amendment.
-
Bundle the sample Postman collection with a Prism mock script so users can
prism mock openapi.yaml &and immediately have something live to point Newman at. Right now the sample is correct but a bit lonely until the user provides a target.
Things I'd keep:
- Optional deps for Python-side, system prereq for npm-side. Forcing schemathesis onto every install would bloat. Forcing newman as a pip dep doesn't even work.
-
--junit-xmlas the normalization source for Schemathesis. Standard format, stdlib parseable, future-proof. - Per-assertion nodeids for Newman, per-check nodeids for Schemathesis. Finer granularity than "this endpoint passed" — the flake-score logic needs to know which assertion within an endpoint is unstable.
Quick start
If you want to try it right now:
# Schemathesis path (OpenAPI / Swagger)
pip install 'mk-qa-master[api]'
# Newman path (Postman)
npm install -g newman
pip install mk-qa-master
Then drop the snippet from above into your Claude Desktop / Claude Code / Cursor / Codex config. Restart your client. Ask Claude to test your API. That's the whole UX.
The bundled sample at examples/sample_api_project/ has both an openapi.yaml and a postman-collection.json for the same fictional Library API — same 3 endpoints, two different runner paths, identical AI-side workflow. Drop a mock server (Prism, Mockoon, whatever) in front and you can dogfood the whole loop in ~5 minutes.
What's next
v0.7.0 adds Pact provider verification + an analyze_api tool (OpenAPI introspection → candidate test scenarios). Whether it ships depends on whether v0.6.0 / 0.6.1 produce real adoption signal. If 6 weeks from now nobody's filed an issue about Pact, I'll skip it and focus on something the community is actually asking for.
This is the discipline I'm trying to learn — ship two runners on the same day the architecture allows it; don't speculate a third just because the abstraction would still hold.
Family
mk-qa-master is one of three open-source MCP servers I'm building:
- mk-plan-master — idea triage + RICE scoring + spec-draft bridge
- mk-spec-master — specs → scenarios + coverage matrix
- mk-qa-master (this) — drives the test runner across web / mobile / API
Together they form: Idea → Plan → Spec → Code (your IDE) → Test → Coverage → Coach.
Family site: mcp.chenjundigital.com
If your team is QA-heavy and you've been frustrated by AI tools that either write # TODO for API tests or charge $50k/year to run them — give the v0.6 line a try. If you find anything weird, the issue tracker is the right place.
A star helps the algorithm find people like you. Feedback helps more.
— Jack Kao, building solo.
Top comments (0)