Your Kibana logs are full of test cases. Here is a CLI that extracts them, with auth scrubbed by default.

#qa #testing #python #opensource

Every sprint we export a JSON dump from Kibana, scroll through hundreds of log entries, and tell ourselves we will turn them into test cases later.

Later never comes.

The logs contain real API calls. Real endpoints, real payloads, real status codes from production. It is the closest thing to a specification of how the system actually behaves. And almost none of it ever becomes an automated test, because converting it manually takes longer than the sprint.

I got tired of later. I wrote secure-log2test, a CLI that reads a Kibana JSON export and generates a ready-to-run pytest file. One command. Working tests.

There is one constraint that shaped the whole design: no data leaves your machine. No LLM API calls. No cloud. Everything runs locally.

Why the privacy constraint matters

The obvious alternative to building a tool is asking an LLM to write the tests from your logs. It would probably work. Right up until someone on the security team noticed that production logs full of PII and internal API structures were being sent to an external service.

In enterprise environments that conversation ends badly. So I made it impossible: the core has no network calls at all.

But the bigger privacy story is what happens to secrets inside the log entries themselves. Production logs leak Authorization: Bearer ... headers all the time. They leak Cookie values, X-API-Key values, and increasingly they leak request bodies that contain password, refresh_token, client_secret, or whatever the team called their auth field this week. If the generated tests carry those values, the regression suite becomes a credential dump on disk, ready to be accidentally committed.

secure-log2test scrubs three layers before the test file is written:

A static list of well-known auth headers: authorization, proxy-authorization, proxy-authenticate, cookie, set-cookie, x-api-key, x-auth-token, x-csrf-token, x-access-token, refresh-token, id-token, x-amz-security-token, authentication.
A regex pattern (auth|token|secret|key|session|cookie|credential|bearer|password|passwd) that catches custom header names project teams invent.
The same logic walks JSON request bodies recursively. So {"password": "..."}, {"client_secret": "..."}, OAuth {"refresh_token": "..."} all get replaced with ***REDACTED*** at parse time.

The marker is a placeholder. You inject real credentials at run time through environment variables or fixtures.

How it works

pip install secure-log2test

secure-log2test data/sample_kibana_export.json --output tests_generated.py
pytest tests_generated.py -v

A sample export ships with the repo (data/sample_kibana_export.json), so you can see real output without setting up a Kibana instance first.

Input is the Kibana JSON hits.hits[*]._source format you get from any saved search:

{
  "hits": {
    "hits": [
      {
        "_source": {
          "url": "/api/v1/orders",
          "method": "POST",
          "status": 201,
          "headers": {"Authorization": "Bearer abc.xyz", "Content-Type": "application/json"},
          "body": {"item_id": 42, "password": "hunter2"}
        }
      }
    ]
  }
}

Output is a pytest file with one test function per log entry:

def test_post_api_v1_orders_1():
    response = requests.request(
        method="POST",
        url=BASE_URL + "/api/v1/orders",
        headers={"Authorization": "***REDACTED***", "Content-Type": "application/json"},
        json={"item_id": 42, "password": "***REDACTED***"},
        timeout=10,
    )
    assert response.status_code == 201

Both the Authorization header value and the password field inside the body are redacted. The original log dict is not mutated.

The architecture: two stages

Parse (secure_log2test/core/parser.py). Pydantic v2 validates and normalises each entry. Records with missing fields fall back to safe defaults rather than crashing. Malformed entries are dropped with a warning, not silently swallowed. Redaction runs as a Pydantic field_validator so it cannot be skipped accidentally.

Generate (secure_log2test/core/generator.py). The validated list goes through a Jinja2 template (templates/test_module.py.j2) and lands as a .py file. The template is the only place that knows what pytest looks like. Want a different output (httpx instead of requests, unittest instead of pytest, k6 scenarios)? You replace the template. The parser stays untouched.

This split is the part I am most happy with. The redaction logic is unit-testable in isolation. The output format is a config file. Anyone can fork the template and emit something else from the same parsed entries.

The user-feedback loop that improved v1.0.1

The first PyPI release shipped at v1.0.0 on May 11th. Within hours an external user fed it a Grafana Loki export with Cyrillic content from a Russian backend. The parser opened the file without an explicit encoding argument. On Linux this works (utf-8 by default). On Windows the same call raises UnicodeDecodeError because Windows defaults to cp1252.

Bug filed, reproduced, fixed within a day. v1.0.1 went out on the 13th with explicit encoding="utf-8-sig" on the file open. I added a regression test that simulates the cp1252 environment so the same bug cannot come back.

What I learned: every framework that handles user-provided input needs an adversarial encoding test. The happy path is easy. The bug lived in the gap between "what my dev machine does by default" and "what a Windows shell does by default."

What the tests cover

59 tests as of v1.1.0, across parser and generator:

Valid input, malformed records, missing fields, empty exports.
Header redaction with the static list. Header redaction by regex pattern. Custom team-specific headers like X-Custom-Token caught by pattern.
Body redaction walker: password fields, OAuth refresh tokens, nested dicts, lists of dicts, non-dict pass-through.
Float duration coercion (Kibana sometimes outputs 134.0 instead of 134).
Template rendering, payload serialisation, test naming.
CLI argument handling, file output path creation.
A CI smoke test that runs the CLI end-to-end on the sample export and parses the generated Python with ast.parse.

CI runs on Python 3.10, 3.11, 3.12, and 3.13 via GitHub Actions.

Honest limitations

Kibana / Elasticsearch JSON export shape only. Grafana Loki Explore exports are tracked in issue #4.
Single-file input. Multi-file batch mode is on the roadmap.
Output format: pytest only. JSON / CSV for downstream pipelines are tracked in issue #5.
Loads the full file into memory. Not designed for multi-GB exports.
Does not infer sequences or dependencies between requests.
Does not replace manual test design. It accelerates the first pass.

The generated tests are a starting point. You review them, set the base URL via environment variable, add setup or teardown where needed. But you start from working, runnable code rather than a blank file and a pile of log entries.

Where it goes next

v1.1 will add response body assertions and optional schema match (issue #1). v1.2 will allow custom redaction rules via config file (issue #2). Two good first issue slots are open right now if you want to grab one.