If you work in building HVAC, you know the ritual. You need a fan that will push 5000 m³/h at 400 Pa. Your manufacturer rep emails you a 400 MB installer — usually Windows-only, often Delphi, almost always from 2007 — and you click through dialog boxes until the program picks a fan for you. The selection engine is a black box. You cannot embed it. You cannot query it. You cannot even link to a result.
The math underneath is not complicated. It has been public for decades. It is just locked behind a desktop binary.
I have been building a web-native alternative for a Russian HVAC marketplace. The selection engine now indexes 18,141 real manufacturer pressure–flow curves across 13 fan families, and picks a matching fan for a given duty point in about 4 ms on a single Postgres row read plus in-memory evaluation. This post walks through the storage model, the evaluation math, the matching loop, and the three gotchas that each cost me roughly a week.
Full working code: github.com/goncharovart/polynomial-fan-matcher. Everything below runs in production on wentmarket.ru.
The problem, stated precisely
A fan has a characteristic curve: for every volumetric flow rate Q (in m³/h) the fan is capable of producing, there is a corresponding static pressure P(Q) (in Pa) it can develop. As Q goes up, P goes down. The curve is smooth and roughly quadratic.
Given a target duty point — say Q_target = 5000 m³/h at P_target = 400 Pa, with a tolerance band of ±15% on pressure — I want to, against a catalog of ~18,000 curves, return the subset that:
- Physically covers the target flow (the curve is defined at
Q = 5000). - Produces a pressure at that flow inside
[340, 460] Pa. - Is ranked by efficiency
η(Q_target), because two fans that both "work" are not the same fan — the one consuming 2.2 kW instead of 4.0 kW pays for itself in 18 months.
Straightforward. Except that the input data does not arrive as a clean polynomial. It arrives as PDFs, scraped HTML tables, Delphi data files, and Excel sheets with merged cells.
Why a lookup table is the wrong answer
The naive move is to store each curve as a list of sampled (Q, P) points — say 100 evenly spaced samples between Q_min and Q_max. Matching then becomes "find the row, interpolate linearly between the two nearest samples."
For 18,141 curves at 100 samples each, that is 1.8 million rows. Postgres can chew through that, but you are paying three costs you did not have to pay:
- Storage is inflated. Each curve that actually fits in 5 floats is now 200 floats.
- Every read is an interpolation, which introduces piecewise-linear error between sample points. Fan curves are smooth; sampling them discards that smoothness.
-
Scaling the curve becomes awkward. Fans run at variable speeds via VFDs, and affinity laws say
Q ∝ nandP ∝ n². Applied to a polynomial, that is a one-line coefficient transform. Applied to 100 samples, it is 100 multiplications plus a full re-sampling grid.
Every manufacturer's own selection software stores curves as coefficients. There is a reason.
Storage: a tiny array of coefficients
A fan curve is well-approximated by a polynomial of modest degree:
P(Q) = a_0 + a_1·Q + a_2·Q² + a_3·Q³ + ... + a_n·Qⁿ
In practice, degree 3–5 is enough for a smooth curve over the working range. The catalogs I imported use degree 6 at the high end. So each curve is stored as an array of 7 floats.
The table schema is boring on purpose:
CREATE TABLE fan_curves (
id bigserial PRIMARY KEY,
fan_id bigint NOT NULL,
label text NOT NULL,
coeffs double precision[] NOT NULL,
q_min double precision NOT NULL,
q_max double precision NOT NULL,
eta_xpoints double precision[] NOT NULL,
eta_values double precision[] NOT NULL,
n_nominal double precision NOT NULL,
n_max double precision NOT NULL,
p_max double precision NOT NULL,
fan_type text NOT NULL,
motor_min_kw double precision,
motor_max_kw double precision
);
CREATE INDEX fan_curves_q_range_idx ON fan_curves (q_min, q_max);
CREATE INDEX fan_curves_fan_id_idx ON fan_curves (fan_id);
Two things worth noting. First, coeffs is a double precision[] — Postgres arrays, not jsonb. I benchmarked both; arrays are ~2× faster for this access pattern because there is no JSON parse. Second, efficiency η is stored as separately measured sample points, not as a polynomial. I will come back to why in the gotchas section.
Evaluation: Horner, not Math.pow
The wrong way to evaluate a polynomial is the way it is written in textbooks:
// DON'T do this
let p = 0;
for (let i = 0; i < coeffs.length; i++) {
p += coeffs[i] * Math.pow(q, i);
}
At degree 6 this does 6 Math.pow calls, 7 multiplications and 7 additions. Math.pow is also a generic float-exponent routine; for integer powers it is slow and accumulates rounding error differently at each step.
Horner's method rewrites the same polynomial as a nested multiply:
a_0 + a_1·Q + a_2·Q² + a_3·Q³ = a_0 + Q·(a_1 + Q·(a_2 + Q·a_3))
Same mathematical value, but you compute it right-to-left: one multiply and one add per coefficient. No pow. The rounding is also better-behaved — floating-point error grows like the condition number of the polynomial rather than accumulating across independent Q^i computations.
The implementation is about as small as code gets:
export function evaluatePolynomial(coeffs: number[], q: number): number {
if (coeffs.length === 0 || !Number.isFinite(q)) return 0;
let result = 0;
for (let i = coeffs.length - 1; i >= 0; i--) {
result = result * q + coeffs[i];
if (!Number.isFinite(result)) return 0;
}
return result;
}
On an M2 laptop this evaluates a degree-6 polynomial in ~30 ns. For 18,141 curves the full batch is ~540 µs. The rest of the 4 ms budget is Postgres I/O and JS object allocation.
The isFinite guard is not theoretical. I learned the hard way that if a scraped coefficient file has a stray NaN anywhere in the array, it will silently turn every subsequent evaluation into NaN, and your matching loop will return zero results with no exception. Fail fast, return 0, log the offending row, move on.
The matching loop
Given a catalog of curves and a duty point, pick the ones that fit. The filter is a coarse range check against the stored q_min/q_max domain, followed by a pressure check at Q_target, followed by an efficiency ranking:
import { evaluatePolynomial } from './horner';
import { interpolateEta } from './efficiency';
export interface Curve {
id: number;
label: string;
coeffs: number[];
qMin: number;
qMax: number;
etaXpoints: number[];
etaValues: number[];
}
export interface Duty {
qTarget: number;
pTarget: number;
tolerance: number;
}
export interface Match {
curve: Curve;
pressureAtQ: number;
deviation: number;
eta: number;
}
export function matchDuty(curves: Curve[], duty: Duty): Match[] {
const { qTarget, pTarget, tolerance } = duty;
const pMin = pTarget * (1 - tolerance);
const pMax = pTarget * (1 + tolerance);
const results: Match[] = [];
for (const c of curves) {
if (qTarget < c.qMin || qTarget > c.qMax) continue;
const p = evaluatePolynomial(c.coeffs, qTarget);
if (p < pMin || p > pMax) continue;
const eta = interpolateEta(c.etaXpoints, c.etaValues, qTarget);
results.push({ curve: c, pressureAtQ: p, deviation: (p - pTarget) / pTarget, eta });
}
results.sort((a, b) => (b.eta - a.eta) || (Math.abs(a.deviation) - Math.abs(b.deviation)));
return results;
}
The primary index on (q_min, q_max) prunes most of the catalog before we evaluate anything — for a mid-range duty point we typically touch 1500–3000 rows, not 18,000. After Postgres hands back the rows, the JS matching loop is a tight CPU-bound pass over a pre-allocated array.
On my laptop the end-to-end number looks like this:
| Stage | Mean time |
|---|---|
| Postgres range-filter + row transfer | 2.8 ms |
| Horner evaluation, 2,100 rows mean | 0.6 ms |
| η interpolation and sort | 0.5 ms |
| Total (warm cache) | 3.9 ms |
The three gotchas that each cost a week
Gotcha 1 — Pressure and efficiency are separate functions
The single costliest mistake I made: assuming η(Q) could be derived from P(Q) via a shaft-power formula and fan affinity. It cannot, because the efficiency curve has a different shape from the pressure curve. Pressure drops roughly quadratically as flow increases; efficiency has a bell shape with a peak somewhere around 70% of Q_max. Two curves with similar pressure output at a duty point can have drastically different efficiencies there.
Every naive polynomial-fan tutorial online quietly assumes a single polynomial captures both. The resulting efficiency numbers are off by 5–15%, which is enough to recommend the wrong fan and to break the life-cycle cost calculation that comes next.
Store η as its own thing. In my data it ships as a vector of sampled measurement points (Q_i, η_i), and I interpolate linearly. Fitting an independent polynomial to η also works; the important property is that it is not algebraically tied to P(Q).
Gotcha 2 — Polynomial degree varies per manufacturer
One catalog fits degree-3 curves. Another fits degree-5. A third went with degree-6. If you store "the coefficient array" as a variable-length column and your hot loop does coeffs.length reads and branches, you leave performance on the table.
The fix is to zero-pad every curve to the max degree in the catalog (6 in my case), so every row is a fixed-length array of 7 floats. A degree-3 polynomial with coefficients [a0, a1, a2, a3] and a degree-6 polynomial with coefficients [b0, b1, b2, b3, 0, 0, 0] are mathematically identical when you evaluate with Horner, since the higher terms just fold 0·q into the running result. The win is that the matching loop now has a fixed shape — the JS engine can keep the inner loop monomorphic and the array in a packed representation.
Measured: ~30% faster on the full batch after zero-padding.
Gotcha 3 — Extrapolation is silently wrong
A polynomial evaluated outside its fit domain does not return an error. It returns a number. That number is physically meaningless — often negative pressure, or a wildly wrong efficiency, or a value that implies the fan produces more air at a higher static pressure, which is not how fans work.
The domain check at the top of the matching loop is not a defensive optimization, it is a correctness guarantee.
I originally skipped this, because the results "looked reasonable" on the data I had tested. They were reasonable because I had only queried points inside every fan's working range. As soon as a real user asked for 200 m³/h on a fan whose curve started at 800 m³/h, the engine cheerfully reported that the fan would produce 1200 Pa at that flow — a pure polynomial extrapolation artifact. The fan would stall in reality.
Store the domain. Enforce the domain. Do not trust the polynomial outside its fit.
Scaling across RPM — the affinity law bonus
A small reward for storing things as coefficients: variable-speed drives change the operating RPM, and fan affinity laws map that to a change on the curve:
Q' = Q · (n'/n)
P' = P · (n'/n)²
Given a polynomial P(Q) = Σ aᵢ Qⁱ at base RPM n, the curve at RPM n' has coefficients:
export function scaleByRpm(coeffs: number[], nBase: number, nTarget: number): number[] {
const r = nTarget / nBase;
// P_new(Q) = r² · P_old(Q/r) → coefficient i scales by r² / r^i = r^(2-i)
return coeffs.map((a, i) => a * Math.pow(r, 2 - i));
}
That is the whole transformation. No re-sampling, no table rebuild. One pass over the array. The stored curve stays unmodified; we build scaled variants in memory per-query when the user asks for VFD-controlled selection.
Benchmarks, cold and hot
Ran against the full production catalog (18,141 curves) on a cold Postgres with shared_buffers at 128 MB, and again with the index in buffer cache:
| Scenario | p50 | p95 | p99 |
|---|---|---|---|
| Cold cache, first query | 38 ms | 62 ms | 94 ms |
| Warm cache | 4.2 ms | 6.1 ms | 9.8 ms |
| Warm + query cache hit | 0.9 ms | 1.4 ms | 2.3 ms |
The cold number is dominated by disk reads on the btree index. Warm, the bottleneck shifts to row transfer and JS evaluation. With a small per-query LRU in front (keyed on (qTarget, pTarget, tolerance) at 100 Pa / 100 m³/h resolution), most repeat selections serve out of memory.
What is in the repo, and what is next
The polynomial-fan-matcher repo extracts the evaluation and matching core from the production code at wentmarket.ru. It ships:
-
evaluatePolynomial(coeffs, q)— Horner in 10 lines, tested against SciPy -
scaleByRpm(coeffs, nBase, nTarget)— affinity-law transform on coefficients -
matchDuty(curves, duty)— the matching loop from this post - Sample data: 200 curves from the ВР 80-75 and ВО 06-300 families, hand-verified against manufacturer PDFs
- A tiny CLI:
npx polynomial-fan-matcher --q 5000 --p 400 --tol 0.15
On the roadmap:
- A Rust/WASM port of the hot loop. The JS version is fast enough for web; the WASM version is fast enough to run batch duty-point sweeps in a CAD plugin.
- More reference catalogs. Every curve set I extract from a PDF is a week of alignment work; contributions very welcome if you have digitized manufacturer data sitting on a drive somewhere.
- A minimal HTTP wrapper so the engine can be called from Revit/ArchiCAD plugins without shipping Node.
The full engine — with VFD selection, life-cycle cost calculations, and the acoustic side of the problem — powers fan selection at wentmarket.ru. If you work on HVAC tooling in the West and want to talk about collaboration, licensing, or engineering work, I am reachable at goncharov.artur.02@gmail.com.
Summary
- Store fan curves as polynomial coefficient arrays in Postgres, not sampled lookup tables — it is smaller, smoother, and makes affinity-law RPM scaling a one-liner.
- Evaluate with Horner's method; 10 lines, no
Math.pow, stable floating-point. - Index on
(q_min, q_max)and domain-check every query, because polynomial extrapolation silently returns physically wrong numbers. - Keep pressure and efficiency as independent curves — deriving one from the other breaks recommendations in the 5–15% range.
- An extract of the production engine is open source at github.com/goncharovart/polynomial-fan-matcher; it matches 18k curves in ~4 ms and is the math you will not find inside a 400 MB Windows installer.
Top comments (0)