Manual API tests check what you expect to happen. The bugs that take down production are the ones nobody thought to write a test for: an empty string where the code assumed a value, a negative number in a field that only ever saw positives, an emoji in a date.
You can sit and brainstorm broken inputs by hand. You will write ten of them and get bored. The three-hundredth one, the one that actually crashes the server, you will never reach.
Schemathesis writes those for you. You give it your OpenAPI schema, it generates thousands of requests on its own, and it hammers every endpoint looking for the ones that break. Here is how I run it, what it actually finds, and how I deal with the flood of results it produces.
What Schemathesis actually is
Schemathesis is a Python tool built on top of Hypothesis, the property-based testing library. Property-based means you do not write example inputs. You describe the shape of valid data, and the engine generates hundreds of cases that fit (and deliberately break) that shape.
Your OpenAPI schema already describes that shape: every endpoint, every parameter, every type. Schemathesis reads it and turns it into a test generator. No assertions to maintain by hand. When the schema changes, the tests change with it.
Install it:
pip install schemathesis
The first run
One command, pointed at your schema:
st run http://localhost:8000/openapi.json
That is it. It reads the schema, then fires thousands of requests at every endpoint: valid ones, and ones deliberately malformed. By default it runs in all mode, mixing valid and invalid data. You can push it harder with negative-only generation:
st run --mode=negative http://localhost:8000/openapi.json
What it catches
Out of the box, Schemathesis runs a set of checks against every response. Three of them find the bugs that matter most.
The silent 2xx. You send a malformed payload expecting a 400 Bad Request. Instead the server replies 200 OK and quietly accepts the garbage. The bad data either got saved to your database or vanished without a trace. Both are bugs, and neither shows up in a happy-path test.
The unexpected 5xx. Fuzzing string boundaries (huge Unicode strings, null bytes, integers at the edge of their range) drags raw stack traces and unhandled exceptions out of code that unit tests walk right past.
The contract violation. The server returns a status code, or a response body, that its own OpenAPI schema never documented. The status_code_conformance and response_schema_conformance checks catch the gap between what the docs promise and what the API does.
You can run a focused subset:
st run openapi.yaml --checks not_a_server_error,response_schema_conformance
Reading a failure
This is the part that makes Schemathesis worth it. Every failure comes with the exact curl command to reproduce it. No guessing which of the thousand requests broke things.
A real example from a booking API:
- Server error
- Undocumented HTTP status code
Received: 500
Documented: 200, 422
[500] Internal Server Error
Reproduce with:
curl -X POST -H 'Content-Type: application/json' \
-d '{"guest_name": "00", "nights": 1, "room_type": ""}' \
http://127.0.0.1:8080/bookings
A guest name of "00" and an empty room_type crash POST /bookings with a 500. Copy the command, paste it in a terminal, and the bug reproduces every time. Hand that line straight to the developer who owns the endpoint.
The noise problem
Run Schemathesis for the first time and it will surface a hundred findings. Do not panic, and do not trust all of them.
The negative_data_rejection check is a known source of false positives. Schemathesis sends a malformed request (say, an empty query parameter) and expects a 4xx. But many frameworks, FastAPI among them, simply ignore the empty parameter and return 200 OK. Schemathesis flags that as a failure, even though nothing is actually broken.
So the first pass is mostly triage: separate the real crashes from the protocol nitpicks. Doing that by hand across a hundred findings is an afternoon gone.
Letting Claude triage the flood
This is where I plug in an AI layer. I take the full Schemathesis report and hand it to Claude with a simple instruction: group the findings, drop the false positives like the empty-parameter ones, and rank what is left by severity.
It collapses the duplicates (one root cause often shows up in twenty different request shapes), strips the framework-level noise, and hands back a short list: these three are real server crashes, fix them first; the rest is the negative_data_rejection pattern, safe to ignore.
What was an afternoon of manual sorting becomes a couple of minutes. The machine generates the attacks, the AI clears the noise, and I am left fixing actual bugs instead of reading stack traces.
Wiring it into CI
A fuzzer in CI is dangerous if you let it stay random. A pipeline that passes today and fails tomorrow, for no reason other than the generator stumbling onto a new case, trains everyone to ignore it.
Fix the seed so runs are reproducible:
uvx schemathesis run --generation-deterministic http://localhost:8000/openapi.json
uvx runs it in an isolated, throwaway environment with no dependency setup, which is exactly what you want in CI. The deterministic flag means the same code gives the same result every time. Save the wild, fully-random run for a nightly cron job, where a new failure is a signal to investigate rather than a broken build.
Schemathesis also emits JUnit XML, so the results drop straight into the dashboards your team already reads.
Where it fits
If you have used Dredd, Schemathesis is the step past it. Dredd validates that your API matches the examples in your docs. It does not generate negative tests or fuzz. Schemathesis actively hunts for inputs that break things.
Compared to writing assertions by hand in pytest or Postman, the trade is clear: those tools make you write and maintain every check. Schemathesis treats the OpenAPI spec as the single source of truth. Update the spec, and the test surface updates with it.
The short version
Manual tests cover the inputs you imagined. Schemathesis covers the ones you did not, generates a reproducible curl for every bug it finds, and an AI pass turns the noisy output into a clean fix list. One command to find the holes, a deterministic seed to keep CI honest, and a nightly chaos run to keep looking.
If you test APIs, point it at your schema once and see what falls out. The first run is usually uncomfortable, in a useful way.
Drop your stack in the comments and I will tell you where to start.
Top comments (0)