MiniKao

Posted on May 17

Claude can drive Schemathesis + Postman through one MCP — I shipped both runners in one day

#api #testing #mcp #opensource

TL;DR — Today I shipped mk-qa-master v0.6.0 (Schemathesis) in the morning and v0.6.1 (Newman / Postman) in the afternoon. Same MCP tool surface (still 16 tools), same report.json / history / flake / coach pipeline, two new ways to drive API tests from Claude / Cursor / Codex. Total code: ~300 lines across two runners. Total elapsed: about 6 hours. This post is the architecture story.

The setup

I'm a QA engineer building mk-*, an open-source family of MCP servers for the AI dev pipeline. Last week I shipped v0.5.1 of mk-qa-master with five runners — pytest / Jest / Cypress / Go test / Maestro for mobile.

Two days ago, while updating the family-site copy, I added a line that said "mk-qa-master tests web + mobile + API". The first two were honest. The third was a stretch — yes, your existing pytest-with-httpx tests would run, but there was no dedicated API runner. A QA reader could install it expecting OpenAPI ingestion or Postman support, and find neither.

I had two options:

Walk the marketing copy back to "we drive web + mobile, your existing API tests ride along"
Make the copy true

I picked option 2. Two API runners, same day.

This post is how that played out.

Why MCP makes "ship two runners in one day" plausible

The mk-qa-master architecture has a runner abstraction that already shipped with five frameworks. Each runner is a Python class implementing the same interface:

class TestRunner:
    name: str

    def list_tests(self) -> str: ...
    def run_tests(self, filter: str | None = None, **kwargs) -> dict: ...
    def run_failed(self) -> dict: ...
    def get_report_summary(self) -> dict: ...

Whatever framework the runner wraps, the MCP tool surface is the same 16 tools. The AI client (Claude, Cursor, Codex, Gemini) calls run_tests / get_optimization_plan / get_failure_details the same way regardless of whether you're testing a React app, a Go service, an iOS Simulator, or an API. The runner translates.

Adding a new runner = ~150 lines of Python + register in REGISTRY + write a sample + bump version. That's it.

This is the MCP-level value claim: the AI doesn't relearn your stack. You add a runner; the AI's tool surface inherits the new capability automatically.

So shipping API testing was less "design a new product" and more "fill in the runner slot the abstraction was waiting for."

v0.6.0 — Schemathesis (OpenAPI / Swagger)

Schemathesis reads an OpenAPI 3.x or Swagger 2.0 schema and fuzzes every operation with property-based tests — response schema conformance, status-code conformance, server-error detection. Hand it a URL or file path, it spits out coverage in 30–60 seconds.

The runner wraps the schemathesis run CLI. User-facing config:

{
  "mcpServers": {
    "mk-qa-master": {
      "command": "uvx",
      "args": ["mk-qa-master[api]"],
      "env": {
        "QA_RUNNER": "schemathesis",
        "QA_OPENAPI_URL": "https://api.example.com/openapi.json"
      }
    }
  }
}

That's it. Restart your client. Then in any session:

"Test the API at https://api.example.com/openapi.json — find anything broken, then give me a prioritized action plan."

Real-feeling session transcript:

you ▸ Test https://api.example.com/openapi.json — find anything broken
       and give me a prioritized action plan.

  → get_runner_info ✓ schemathesis · OpenAPI 3.0.3 detected
  → list_tests ✓ 24 endpoints × 5 checks = 120 cases
  → run_tests ⚠ 112 passed, 6 failed, 2 errored (47s)
  → get_optimization_plan ✓ next priorities:

      🔴 broken  · POST /users :: response_schema_conformance
        Same Schemathesis signature × 3 → "status 500, expected 201|400"
        Action: response schema doesn't allow 500; either fix the
        validation bug or add 500 to the schema's responses block

      🔴 broken  · GET /search :: not_a_server_error
        Crashes under `?q=null` and `?limit=-1`
        Action: missing input validation on the search handler

      🟡 warn    · DELETE /users/{id} returned 204 when schema says 200
        Likely safe to update the schema; verify with PM

      🟢 stable  · 18 endpoints, no findings

The advisor's classification is the same logic the suite uses for UI tests — 3 consecutive failures with the same error signature = broken. A test that's red-green-red across runs = flaky. mk-qa-master doesn't differentiate "the API is broken" from "the UI is broken" — same flake-score, same broken classification, same advisor.

That's the abstraction paying off.

The one CLI-flag mistake that cost me 20 minutes

Here's the part that was not smooth.

The PRD I wrote in the morning said the runner would invoke schemathesis like this:

schemathesis run \
  --checks all \
  --report-json /tmp/report.json \   # ⚠ this flag does not exist
  --junit-xml /tmp/junit.xml \
  --hypothesis-database=none \
  $URL

The subagent implementing the runner followed the spec faithfully. CI choked instantly:

Error: No such option '--report-json'. Did you mean '--report'?

Schemathesis 3.x has no JSON-report flag. The PRD assumed one based on... I'm not sure what. Maybe an older version, maybe wishful thinking, maybe just a hallucination in my own design doc.

Fix: rewrite _normalize_report to parse --junit-xml output instead — JUnit XML is stdlib-parseable (xml.etree.ElementTree) and standard across every test runner I've ever touched. Took 20 minutes.

Lesson: when writing a PRD that hardcodes CLI flags, run <tool> --help on the actual installed version before committing. The spec is only worth what the underlying tool actually supports.

I'll be repeating this to myself for v0.7.

v0.6.1 — Newman (Postman collections)

After lunch I shipped the second runner.

Newman is the official CLI for running Postman collections. Postman has ~30M users; a huge chunk of them have collections in version control already. Newman + that collection JSON = headless replay of every request and pm.test(...) assertion.

Runner shape, same as Schemathesis but for Postman:

{
  "mcpServers": {
    "mk-qa-master": {
      "command": "uvx",
      "args": ["mk-qa-master"],
      "env": {
        "QA_RUNNER": "newman",
        "QA_POSTMAN_COLLECTION": "/path/to/your-api.postman_collection.json",
        "QA_POSTMAN_ENVIRONMENT": "/path/to/staging.postman_environment.json"
      }
    }
  }
}

Newman is npm-side, not pip-side, so it's a system prerequisite rather than a Python optional dep:

npm install -g newman

This was a small choice that took 2 minutes to settle: do you bundle Newman into the Python optional dep group somehow? You can't — pyproject.toml only knows about Python. So Newman gets the npm install -g treatment, the runner does shutil.which("newman"), and if it's missing the user sees a clear ImportError pointing at the install command.

The runner translates Newman's JSON report (run.executions[] + run.failures[]) into mk-qa-master's report.json shape. One nodeid per assertion:

GET {{baseUrl}}/books :: Books :: List books
POST {{baseUrl}}/books :: Books :: Create book
GET {{baseUrl}}/books/{{bookId}} :: Books :: Get book by id

Same history / flake / coach pipeline as before.

No CLI-flag mistake this time — I ran newman run --help first, sketched the flag list, then started implementation. Lesson learned from the morning.

Schemathesis vs Newman — when to use which

I get asked this every time I show the two runners. Here's the call I make:

You have…	Use…
An OpenAPI 3.x / Swagger 2.0 schema and you want generated tests across the whole surface	Schemathesis — fuzz-driven, finds bugs you didn't think to write tests for
A Postman collection your team already curates by hand	Newman — re-uses your existing investment, runs the assertions you already wrote
Both (a schema for breadth + a collection for happy paths)	Run both in the same session — Schemathesis catches schema drift, Newman catches business-logic regressions
Neither, but you have pytest tests hitting your API	Stay on `QA_RUNNER=pytest`, no migration needed — your existing tests already ride the same pipeline

The point of having both isn't to replace either ecosystem. It's that the AI doesn't need to know which one is active. From Claude's perspective, run_tests returns the same shape. The runner does the translation.

What I'd do differently

Things I'd change on a redo:

Run --help first on every CLI before writing the PRD. (See above.)
Single PRD covering Phase 1 + Phase 2 instead of writing Phase 2 ratification as an appendix. Mid-sized features deserve a single design doc, not a doc + amendment.
Bundle the sample Postman collection with a Prism mock script so users can prism mock openapi.yaml & and immediately have something live to point Newman at. Right now the sample is correct but a bit lonely until the user provides a target.

Things I'd keep:

Optional deps for Python-side, system prereq for npm-side. Forcing schemathesis onto every install would bloat. Forcing newman as a pip dep doesn't even work.
--junit-xml as the normalization source for Schemathesis. Standard format, stdlib parseable, future-proof.
Per-assertion nodeids for Newman, per-check nodeids for Schemathesis. Finer granularity than "this endpoint passed" — the flake-score logic needs to know which assertion within an endpoint is unstable.

Quick start

If you want to try it right now:

# Schemathesis path (OpenAPI / Swagger)
pip install 'mk-qa-master[api]'

# Newman path (Postman)
npm install -g newman
pip install mk-qa-master

Then drop the snippet from above into your Claude Desktop / Claude Code / Cursor / Codex config. Restart your client. Ask Claude to test your API. That's the whole UX.

The bundled sample at examples/sample_api_project/ has both an openapi.yaml and a postman-collection.json for the same fictional Library API — same 3 endpoints, two different runner paths, identical AI-side workflow. Drop a mock server (Prism, Mockoon, whatever) in front and you can dogfood the whole loop in ~5 minutes.

What's next

v0.7.0 adds Pact provider verification + an analyze_api tool (OpenAPI introspection → candidate test scenarios). Whether it ships depends on whether v0.6.0 / 0.6.1 produce real adoption signal. If 6 weeks from now nobody's filed an issue about Pact, I'll skip it and focus on something the community is actually asking for.

This is the discipline I'm trying to learn — ship two runners on the same day the architecture allows it; don't speculate a third just because the abstraction would still hold.

Family

mk-qa-master is one of three open-source MCP servers I'm building:

mk-plan-master — idea triage + RICE scoring + spec-draft bridge
mk-spec-master — specs → scenarios + coverage matrix
mk-qa-master (this) — drives the test runner across web / mobile / API

Together they form: Idea → Plan → Spec → Code (your IDE) → Test → Coverage → Coach.

Family site: mcp.chenjundigital.com

If your team is QA-heavy and you've been frustrated by AI tools that either write # TODO for API tests or charge $50k/year to run them — give the v0.6 line a try. If you find anything weird, the issue tracker is the right place.

A star helps the algorithm find people like you. Feedback helps more.

— Jack Kao, building solo.

DEV Community