Green unit tests are a comfort blanket

#testing #pytes #hypothesis #python

My converter has a test suite. It was green. Every example I could think of went in, the right pytest file came out, and I shipped it to PyPI as 1.0. Weeks later it sits there marked Production/Stable.

Then I pointed a fuzzer at it, and it stopped being so quiet.

The blind spot in example tests

Unit tests check the inputs you thought of. That is their whole nature: you sit down, you imagine how the code gets used, you write an example for each case. The cases you imagine tend to be the cases the code already handles, because you wrote the code and the test in the same afternoon, with the same blind spots.

A converter makes that worse. postman2pytest takes a Postman collection, a JSON file, and turns it into a runnable pytest suite. The input is not something I control. It is a file a stranger exports from a tool, edits by hand, truncates over a flaky download, or generates from a script that had a bug. The real test suite for a parser is the set of files you did not write.

So I stopped writing examples and let a machine write them for me.

Letting Hypothesis do the imagining

Hypothesis is property-based testing for Python. Instead of one example, you describe the shape of the input and assert a property that must hold for all of them. It then generates hundreds of cases, including the mean ones you would never type out.

For a parser the property is simple to state. Feed it any well-formed JSON, whatever its shape. It should do exactly one of two things: return a list of parsed tests, or raise a clear error that tells the user what is wrong. It must never fall over with a raw stack trace.

import json
from hypothesis import given, strategies as st

json_values = st.recursive(
    st.none() | st.booleans() | st.integers() | st.text(),
    lambda children: st.lists(children) | st.dictionaries(st.text(), children),
)

@given(value=json_values)
def test_parser_is_graceful_on_any_json(value, tmp_path):
    path = tmp_path / "in.json"
    path.write_text(json.dumps(value))
    try:
        result = parse_collection(path)
    except ValueError:
        return  # the one allowed failure: a clear, typed error
    assert isinstance(result, list)

Then I ran it.

What it found

The parser handled every real Postman collection the same as before. The fuzzer never touched valid input. It broke my handling of garbage, in five distinct shapes.

A collection whose top level was a list instead of an object. A collection whose info block was a string. An item that was a bare number where the code expected a dict. A folder nested deep enough that the recursion bottomed out. And a payload so deeply nested that json.loads itself blew the stack before my code ran at all.

In each case the user would have seen an AttributeError, a TypeError, or a RecursionError, with an internal stack trace and no idea what to fix. None of those is a useful message. The tool knew the input was wrong. It just had no manners about saying so.

The fix is a contract, not a patch

Plugging the five holes was the boring part. The contract underneath them was the real work: valid collection in, pytest suite out; anything else, one clear ValueError that names the problem. Degrade, do not crash.

That turned into a few isinstance checks at the boundaries, a depth limit on folder recursion, and wrapping the top-level json.loads so a stack-busting document becomes a readable "this file is nested too deeply to parse" instead of a traceback. The same review on my other parser, secure-log2test, which reads Kibana log exports, turned up the same family of gaps in the same afternoon. Same shape of bug, same shape of fix.

What I actually learned

"Production/Stable" is a statement about the API, not about the input. It never promised to survive a corrupt file, and I had let people assume it did.

And finding bugs with a fuzzer is the fuzzer doing its job. Point Hypothesis at a parser and it finds nothing? You probably did not let it generate anything nasty enough. The bugs are the evidence the method works.

If your tool reads a file someone else made, your unit tests are a comfort blanket. Spend twenty lines on a property test before you trust them. The input you did not write is the one that finds you in production.

postman2pytest and secure-log2test are MIT-licensed on PyPI. The fuzz tests are in each repo if you want a template to copy.