DEV Community: angufibo lincoln

Introducing FlameIQ — Deterministic Performance Regression Detection for Python

angufibo lincoln — Sat, 07 Mar 2026 22:06:07 +0000

The Problem

Performance regressions are invisible in code review.

A careless refactor that recompiles a regex on every function call. A new dependency that adds 40ms to your p95 latency. A database query that wasn't indexed. None of these show up in a diff. They accumulate silently across hundreds of commits — a 3ms latency increase here, a 2% throughput drop there — until they become expensive production incidents.

Type checkers enforce correctness automatically. Linters enforce style automatically. Nothing enforces performance — until now.

Introducing FlameIQ

Today we are releasing FlameIQ v1.0.0 — an open-source, deterministic, CI-native performance regression engine for Python.

pip install flameiq-core

FlameIQ compares your current benchmark results against a stored baseline and fails your CI pipeline if any metric exceeds its configured threshold — the same way a type checker fails your build on a type error.

Quick Start

Step 1 — Initialise

cd my-project
flameiq init

Step 2 — Run your benchmarks and produce a metrics file

{
  "schema_version": 1,
  "metadata": {
    "commit": "abc123",
    "branch": "main",
    "environment": "ci"
  },
  "metrics": {
    "latency": {
      "mean": 120.5,
      "p95": 180.0,
      "p99": 240.0
    },
    "throughput": 950.2,
    "memory_mb": 512.0
  }
}

Step 3 — Set a baseline

flameiq baseline set --metrics benchmark.json

Step 4 — Compare on every PR

flameiq compare --metrics current.json --fail-on-regression

Output:

  Metric           Baseline    Current      Change   Threshold  Status
  ────────────────────────────────────────────────────────────────────
  latency.p95       2.45 ms     4.51 ms     +84.08%    ±10.0%  REGRESSION
  throughput        412.30      231.50      -43.84%    ±10.0%  REGRESSION

  ✗ REGRESSION — 2 metric(s) exceeded threshold.

Exit code 1. Pipeline fails. Regression caught before merge.

A Real Example: Catching a Regex Regression

Here is the kind of bug FlameIQ is designed to catch. A developer refactors a text processing function and accidentally recompiles the regex on every call:

# FAST — original implementation
def clean(text: str) -> str:
    text = re.sub(r"[^\w\s]", "", text)   # Python caches compiled regex
    text = re.sub(r"\s+", " ", text).strip()
    return text.lower()

# SLOW — regressed implementation
def clean(text: str) -> str:
    punct_re = re.compile(r"[^\w\s]")     # recompiled on every call!
    space_re = re.compile(r"\s+")          # recompiled on every call!
    text = punct_re.sub("", text)
    text = space_re.sub(" ", text).strip()
    return text.lower()

This is invisible in code review. The logic is identical. The diff looks clean.
FlameIQ catches it with an 84% p95 latency increase — well above the 10% threshold.

GitHub Actions Integration

- name: Install FlameIQ
  run: pip install flameiq-core

- name: Restore baseline cache
  uses: actions/cache@v4
  with:
    path: .flameiq/
    key: flameiq-${{ github.base_ref }}

- name: Run benchmarks
  run: python run_benchmarks.py > metrics.json

- name: Check for regressions
  run: flameiq compare --metrics metrics.json --fail-on-regression

Key Design Decisions

Deterministic by design
Given identical inputs, FlameIQ always produces identical outputs. No randomness, no network calls, no datetime.now(). Safe for any CI environment including air-gapped infrastructure.

No vendor dependency
Baselines are local JSON files. No SaaS account. No API keys. No telemetry. Your performance data stays on your infrastructure.

Direction-aware thresholds
FlameIQ knows that latency increases are regressions and throughput decreases are regressions. Thresholds are sign-aware per metric type — no manual configuration required for known metrics.

Statistical mode
For noisy benchmark environments, FlameIQ can apply the Mann-Whitney U test alongside threshold comparison. A regression is only declared if both the threshold is exceeded and the result is statistically significant.

Versioned schema
The metrics schema is versioned (currently v1) with a formal specification. The threshold algorithm and statistical methodology are both fully documented in /specs.

HTML Reports

flameiq report --metrics current.json --output report.html

Generates a self-contained HTML report with a full metric diff table, regression highlights, and trend analysis. No external assets — works offline.

Configuration

flameiq.yaml (created by flameiq init):

thresholds:
  latency.p95:   10%    # Allow up to 10% latency increase
  latency.p99:   15%
  throughput:    -5%    # Allow up to 5% throughput decrease
  memory_mb:      8%

baseline:
  strategy: rolling_median
  rolling_window: 5

statistics:
  enabled: false
  confidence: 0.95

provider: json

Try the Demo

We built a demo project — flameiq-demo — that walks through the full regression detection workflow using a real Python library:

👉 https://github.com/flameiq/demo-flameiq

Introducing FlameIQ — Deterministic Performance Regression Detection for Python

angufibo lincoln — Sat, 07 Mar 2026 22:06:07 +0000

The Problem

Performance regressions are invisible in code review.

Type checkers enforce correctness automatically. Linters enforce style automatically. Nothing enforces performance — until now.

Introducing FlameIQ

Today we are releasing FlameIQ v1.0.0 — an open-source, deterministic, CI-native performance regression engine for Python.

pip install flameiq-core

Quick Start

Step 1 — Initialise

cd my-project
flameiq init

Step 2 — Run your benchmarks and produce a metrics file

{
  "schema_version": 1,
  "metadata": {
    "commit": "abc123",
    "branch": "main",
    "environment": "ci"
  },
  "metrics": {
    "latency": {
      "mean": 120.5,
      "p95": 180.0,
      "p99": 240.0
    },
    "throughput": 950.2,
    "memory_mb": 512.0
  }
}

Step 3 — Set a baseline

flameiq baseline set --metrics benchmark.json

Step 4 — Compare on every PR

flameiq compare --metrics current.json --fail-on-regression

Output:

  Metric           Baseline    Current      Change   Threshold  Status
  ────────────────────────────────────────────────────────────────────
  latency.p95       2.45 ms     4.51 ms     +84.08%    ±10.0%  REGRESSION
  throughput        412.30      231.50      -43.84%    ±10.0%  REGRESSION

  ✗ REGRESSION — 2 metric(s) exceeded threshold.

Exit code 1. Pipeline fails. Regression caught before merge.

A Real Example: Catching a Regex Regression

Here is the kind of bug FlameIQ is designed to catch. A developer refactors a text processing function and accidentally recompiles the regex on every call:

# FAST — original implementation
def clean(text: str) -> str:
    text = re.sub(r"[^\w\s]", "", text)   # Python caches compiled regex
    text = re.sub(r"\s+", " ", text).strip()
    return text.lower()

# SLOW — regressed implementation
def clean(text: str) -> str:
    punct_re = re.compile(r"[^\w\s]")     # recompiled on every call!
    space_re = re.compile(r"\s+")          # recompiled on every call!
    text = punct_re.sub("", text)
    text = space_re.sub(" ", text).strip()
    return text.lower()

This is invisible in code review. The logic is identical. The diff looks clean.
FlameIQ catches it with an 84% p95 latency increase — well above the 10% threshold.

GitHub Actions Integration

- name: Install FlameIQ
  run: pip install flameiq-core

- name: Restore baseline cache
  uses: actions/cache@v4
  with:
    path: .flameiq/
    key: flameiq-${{ github.base_ref }}

- name: Run benchmarks
  run: python run_benchmarks.py > metrics.json

- name: Check for regressions
  run: flameiq compare --metrics metrics.json --fail-on-regression

Key Design Decisions

No vendor dependency
Baselines are local JSON files. No SaaS account. No API keys. No telemetry. Your performance data stays on your infrastructure.

Versioned schema
The metrics schema is versioned (currently v1) with a formal specification. The threshold algorithm and statistical methodology are both fully documented in /specs.

HTML Reports

flameiq report --metrics current.json --output report.html

Generates a self-contained HTML report with a full metric diff table, regression highlights, and trend analysis. No external assets — works offline.

Configuration

flameiq.yaml (created by flameiq init):

thresholds:
  latency.p95:   10%    # Allow up to 10% latency increase
  latency.p99:   15%
  throughput:    -5%    # Allow up to 5% throughput decrease
  memory_mb:      8%

baseline:
  strategy: rolling_median
  rolling_window: 5

statistics:
  enabled: false
  confidence: 0.95

provider: json

Try the Demo

We built a demo project — flameiq-demo — that walks through the full regression detection workflow using a real Python library:

👉 https://github.com/flameiq/demo-flameiq

DEV Community: angufibo lincoln

Introducing FlameIQ — Deterministic Performance Regression Detection for Python

The Problem

Introducing FlameIQ

Quick Start

A Real Example: Catching a Regex Regression

GitHub Actions Integration

Key Design Decisions

HTML Reports

Configuration

Try the Demo

Links

Introducing FlameIQ — Deterministic Performance Regression Detection for Python

The Problem

Introducing FlameIQ

Quick Start

A Real Example: Catching a Regex Regression

GitHub Actions Integration

Key Design Decisions

HTML Reports

Configuration

Try the Demo

Links