I indexed 17,341 polynomial fan curves in Postgres and matched a duty point in <100ms

#postgres #database #performance #javascript

Most B2B HVAC catalogs publish fan performance curves as scanned PDFs. Engineers do the interpolation by eye when they need to know "will this fan deliver 5,000 m³/h at 350 Pa?" — they overlay the operating point on the curve image and squint.

I wanted a catalog where the curves compute. Here's what that took.

TL;DR

I parsed 17,341 fan curves from manufacturer PDFs into 3rd-degree polynomial coefficients, stored them as JSONB arrays in Postgres, and built a selection engine that ranks all 17,341 curves against a duty point (flow + static pressure) in under 100ms p95. The data shape:

// One row per fan-motor variant
{
  fan_id: 'vr-80-75-no5-r4-0-55kw',
  rpm_curves: [
    { rpm: 1450, coefficients: [a0, a1, a2, a3] },  // P(Q) = a0 + a1*Q + a2*Q^2 + a3*Q^3
    { rpm: 2900, coefficients: [a0, a1, a2, a3] }
  ],
  bep_flow: 4200,
  efficiency_at_bep: 0.78
}

That's the entire data model for the selector. The math is in the polynomials.

The problem

For one fan, evaluating a polynomial at a flow rate is trivial — 4 multiplications. The problem is that engineers don't pick one fan; they want to compare every reasonable fan against their duty point. So:

17,341 curves total in the catalog
For each duty point query: evaluate each curve at the user's flow rate, check whether the resulting pressure is "close enough" to the user's target, rank surviving fans by efficiency at that operating point.

Naive in-memory pass: load 17k coefficient arrays, evaluate each, sort. Easy. But:

I want this in a Next.js API route, so memory pressure matters.
I don't want to pull 17k rows from Postgres into Node every request — that's the slow part, not the math.
I want filtering by category / manufacturer / size BEFORE the polynomial pass, otherwise I'm wasting CPU.

The approach

Two-phase query.

Phase 1: SQL-side filter. I store fan metadata (manufacturer, type, size, max airflow, max pressure) as regular columns. Filter to candidate fans by max-airflow > user's target flow. This typically reduces 17,341 to 200-800 candidates.

SELECT id, rpm_curves
FROM performance_curves pc
JOIN fans f ON f.id = pc.fan_id
WHERE f.max_airflow >= $1
  AND f.max_pressure >= $2
  AND f.category_id = $3
LIMIT 1000;

Index on (category_id, max_airflow, max_pressure) makes this <10ms.

Phase 2: in-memory polynomial scoring. For each candidate, score each of its RPM curves at the user's flow, find the closest match to the target pressure, compute efficiency at that point.

function evaluatePolynomial(coefficients, x) {
  // Horner's method
  let result = 0;
  for (let i = coefficients.length - 1; i >= 0; i--) {
    result = result * x + coefficients[i];
  }
  return result;
}

function rankCandidates(candidates, targetFlow, targetPressure) {
  return candidates
    .map(c => {
      const matches = c.rpm_curves.map(rpm => {
        const pressure = evaluatePolynomial(rpm.coefficients, targetFlow);
        return { rpm, pressure, delta: Math.abs(pressure - targetPressure) };
      });
      const best = matches.reduce((a, b) => a.delta < b.delta ? a : b);
      return { fan: c, best };
    })
    .filter(r => r.best.delta < targetPressure * 0.05)
    .sort((a, b) => a.best.delta - b.best.delta);
}

This is ~3ms for 800 candidates on a M1-class Postgres + Node host.

End-to-end: SQL filter ~5-10ms + polynomial pass ~3-5ms + JSON serialize ~5ms = under 100ms p95.

What didn't work

Three things I tried first that turned out wrong:

Storing curves as binary blobs. Saved ~30% storage but blocked SQL inspection (SELECT rpm_curves WHERE ... impossible). Switched to JSONB with GIN index — slightly larger, but I can introspect from psql.
Cubic spline with natural boundary at flow=0. Looked clean for most fans, but for fans with steep characteristics near stall (~12% of catalog) the natural boundary produced negative pressure at zero flow, which renders as ugly artifacts on the chart. Switched to clamped boundary (constrained derivative at flow=0 to point at P_max) for those fans.
Bezier curves instead of explicit P(Q) polynomials. Visually smoother, but the inverse problem (given a duty point, find intersection on each curve) gets ugly because Bezier is parametric — you need a numeric solver. Stuck with explicit polynomials; the selector code is 4x simpler.

Validation

For each fan I scrape the catalog PDF, fit polynomials, then re-check at the manufacturer's published test points and compute deviation. Across the current dataset:

Median absolute pressure error vs published test point: 1.4%
95th percentile: 4.2%
Worst case: 8.1% (one fan with very steep characteristic near stall)

The 4.2% p95 is comfortably under the 7-8% manufacturer tolerance band on any test certificate I've seen. For the worst-case fans I flag fit-quality visually in the UI.

What I'd do differently

If I were starting over I'd consider:

Storing the polynomial fit residuals, not just the coefficients, so the runtime can show error bars on the operating point — engineers love error bars.
Pre-computing the BEP (best-efficiency-point) flow per RPM curve at fit time and storing it. I derive it at query time currently; no perf issue but it's wasteful work.
Per-manufacturer fit profiles instead of one polynomial-degree choice for everyone. Some manufacturers publish noisier test data and benefit from degree-2; some publish very clean data and degree-5 helps.

Live demo

The live selector runs at https://wentmarket.ru/?lang=en — the homepage form takes flow + pressure and returns ranked fans with overlay graphs. Site language defaults to Russian; the URL param above forces English.

The repo isn't public (it's a commercial B2B catalog) but happy to share specific snippets if anyone is doing similar work — the polynomial-fitting Python script in particular saves a lot of time vs hand-extracting from PDF table cells.