You get a CSV from a vendor. The columns are fname, lname, tel, addr1. Your database expects first_name, last_name, phone, street_address. You write a mapping dict, ship it, and move on — until the next vendor sends givenName, surname, mobile, address_line_1.
infermap solves this by inferring column mappings automatically. You hand it a source schema and a target schema, and it returns the best 1:1 field assignment with confidence scores and per-scorer reasoning. No hardcoded synonyms. No manual mapping files. It shipped on PyPI in March 2026 and has been running in production Python pipelines since.
Today it's available on npm as a full TypeScript port — feature parity with the Python version, same algorithm, same accuracy. This post covers what changed, what stayed the same, and why a TypeScript infermap opens doors that Python alone couldn't.
What infermap Does in 30 Seconds
infermap runs a multi-scorer pipeline to match source fields to target fields:
- Schema extraction — reads columns from CSVs, JSON, database tables, or in-memory records. Infers dtypes, null rates, cardinality, and samples values.
-
Common-prefix stripping — if every source column starts with
prospect_, it strips the prefix before matching soprospect_citymatchescity. - Score matrix — for each (source, target) pair, seven independent scorers produce a score and a reasoning string. Results are combined via weighted average (minimum 2 contributors).
- Optimal assignment — the Hungarian algorithm finds the minimum-cost 1:1 matching across the full M x N matrix.
- Confidence filtering — assignments below the threshold (default 0.2) are dropped.
The result: a list of mappings, each with a confidence score and explainable reasoning from every scorer that contributed.
The Seven Scorers
| Scorer | Weight | What It Does |
|---|---|---|
| ExactScorer | 1.0 | Case-insensitive exact name match |
| AliasScorer | 0.95 | Known synonyms — fname to first_name, extensible via config and domain dictionaries |
| LLMScorer | 0.8 | Pluggable LLM-backed scoring (stubbed by default) |
| InitialismScorer | 0.75 | Matches abbreviations via dynamic programming — ASSI to assay_id, CONSC to confidence_score
|
| PatternTypeScorer | 0.7 | Semantic type detection from sample values — email, UUID, ISO date, phone, URL, IP, ZIP, currency |
| ProfileScorer | 0.5 | Statistical similarity — dtype, null rate, unique rate, value length, cardinality |
| FuzzyNameScorer | 0.4 | Jaro-Winkler string similarity on normalized names |
The combination of name-based, value-based, and statistical scorers means infermap handles cases where any single approach fails. A column named col_7 won't match on name — but if every value is an email address, PatternTypeScorer catches it. A column named patient_identifier won't fuzzy-match mrn — but AliasScorer with the healthcare domain dictionary will.
What the TypeScript Port Looks Like
import { map } from "infermap";
const result = map(
{ records: vendorData, sourceName: "vendor" },
{ records: internalSchema, sourceName: "internal" }
);
for (const m of result.mappings) {
console.log(`${m.source} -> ${m.target} (${(m.confidence * 100).toFixed(0)}%)`);
}
// fname -> first_name (98%)
// lname -> last_name (98%)
// tel -> phone (91%)
// addr1 -> street_address (76%)
The API mirrors Python's. If you've used infermap in a Python pipeline, the TypeScript version feels identical — same function names, same config structure, same output shape.
Database support
import { mapFromDb } from "infermap/node";
// Map columns between two Postgres tables
const result = await mapFromDb(
{ uri: "postgresql://localhost/warehouse", table: "raw_imports" },
{ uri: "postgresql://localhost/warehouse", table: "canonical_customers" }
);
SQLite, PostgreSQL, and DuckDB are supported as optional peer dependencies — install only what you need.
Custom scorers
import { defineScorer, makeScorerResult, MapEngine, defaultScorers } from "infermap";
const domainScorer = defineScorer(
"DomainScorer",
(source, target) => {
if (source.name.includes("price") && target.name.includes("amount")) {
return makeScorerResult(0.85, "price/amount semantic match");
}
return null; // abstain
},
0.7
);
const engine = new MapEngine({
scorers: [...defaultScorers(), domainScorer],
});
Config persistence
// Save a mapping for reuse — no re-inference needed
result.saveConfig("vendor_to_internal.json");
// Later: load and apply
import { applyConfig } from "infermap";
const renamed = applyConfig(newData, "vendor_to_internal.json");
Same Algorithm, Same Accuracy
The Python and TypeScript versions share a 162-case benchmark corpus — 82 cases from the Valentine schema matching benchmark plus 80 synthetic cases. Both implementations produce results within 0.0005 F1 of each other on the shared corpus.
| Metric | Python | TypeScript |
|---|---|---|
| Overall F1 | 0.840 | 0.840 |
| Valentine corpus (82 cases) | 0.794 | 0.794 |
| ChEMBL subset (25 cases) | 0.819 | 0.819 |
| Calibrated ECE | 0.005 | 0.005 |
The Hungarian algorithm implementation is vendored in TypeScript (O(n³) — no scipy dependency), and every scorer was ported line-by-line with matching test cases. 186 TypeScript tests verify parity.
Zero Runtime Dependencies in Core
The TypeScript core has zero runtime dependencies. No lodash, no heavy string libraries, no Node.js built-ins. The Jaro-Winkler implementation, Hungarian algorithm, and pattern matchers are all self-contained.
This matters because it means infermap runs anywhere JavaScript runs:
- Next.js Edge Runtime — map schemas in middleware or edge API routes
- Cloudflare Workers — schema mapping at the edge, sub-50ms cold starts
- Vercel Edge Functions — inline mapping in serverless functions
- Browser — map schemas client-side in a data import wizard
- Deno / Bun — no Node.js-specific APIs in the critical path
Node.js-specific features (file system access, database providers) are isolated in the infermap/node entrypoint. The core infermap import is edge-safe.
Doors This Opens
A Python-only infermap was useful for batch ETL jobs and data pipelines. A TypeScript infermap changes what's architecturally possible.
Upload-time schema resolution
When a user uploads a CSV to your web app, you can map their columns to your schema before the data ever hits your backend. Run infermap client-side or in an edge function, show the user the proposed mapping with confidence scores, let them confirm or override, then send the already-mapped data to your API.
// In a Next.js API route or edge function
import { map } from "infermap";
export async function POST(req: Request) {
const { headers, sample } = await req.json();
const result = map(
{ records: sample, sourceName: "upload" },
{ fields: TARGET_SCHEMA, sourceName: "canonical" }
);
// Return proposed mapping for user confirmation
return Response.json({
mappings: result.mappings,
unmapped: result.unmapped_source,
confidence: result.mappings.map((m) => m.confidence),
});
}
No round-trip to a Python service. No cold-starting a container. The mapping happens at the edge in milliseconds.
Full-stack type safety
With TypeScript, the mapping result is fully typed. Your IDE autocompletes field names, catches typos at compile time, and your CI pipeline verifies the mapping config matches your schema types. Python's infermap returns dicts — TypeScript's returns typed objects that integrate with your existing type system.
Monorepo workflows
If your backend is Node.js or TypeScript, infermap slots into your existing build pipeline. No Python runtime to install in CI. No virtualenv to manage. One npm install infermap and you're done.
Browser-based mapping UIs
The score matrix that infermap computes — the full M x N grid of (source, target, confidence) tuples — is exposed via the API. You can render it as an interactive mapping UI where users see every candidate match, the confidence score, and the per-scorer reasoning. The zero-dependency core means this runs in the browser without bundling a runtime.
Shared config between Python and TypeScript
infermap configs are serialized as JSON (TypeScript) or YAML (Python). A mapping you infer in a Python notebook can be loaded and applied in a TypeScript API route, and vice versa. Teams that use Python for data science and TypeScript for production APIs can share mapping definitions without translation.
What Stayed the Same
Everything that matters:
- Seven scorers with the same weights and logic
- Hungarian algorithm for optimal 1:1 assignment
- Common-prefix canonicalization before matching
- Domain dictionaries for healthcare, finance, and ecommerce aliases
- Confidence calibration via Isotonic (PAV) and Platt (Nelder-Mead) calibrators
-
CLI with
map,apply,inspect, andvalidatesubcommands - Config persistence — save once, apply forever
Install and Try It
npm install infermap
Or with database support:
npm install infermap better-sqlite3 # SQLite
npm install infermap pg # PostgreSQL
npm install infermap @duckdb/node-api # DuckDB
benzsevern
/
infermap
Inference-driven schema mapping engine for Python and TypeScript. 7 built-in scorers, domain dictionaries (healthcare/finance/ecommerce), confidence calibration, cross-language accuracy benchmark (F1 0.84), and full Python↔TypeScript parity.
infermap
Inference-driven schema mapping engine.
Map messy source columns to a known target schema — accurately, explainably, with zero config.
Built by Ben Severn.
📖 Wiki · 🌐 Docs · 🧪 Examples · 💬 Discussions · 🐛 Issues
infermap is a schema-mapping engine. Give it any two field collections (CSVs, DataFrames, database tables, in-memory records) and it figures out which source field corresponds to which target field, with confidence scores and human-readable reasoning. Available as a Python package on PyPI and a TypeScript package on npm, with mapping decisions verified bit-for-bit by a shared golden-test parity suite.
Table of contents
- Install
- Quick start
- How it works
- Features
- Which package should I use?
- Custom scorers
- CLI examples
- Config reference
- Documentation
- License
Install
Python
pip install infermap
Optional database extras:
pip install infermap[postgres] # psycopg2-binary
pip install infermap[mysql] # mysql-connector-python
pip install infermap[duckdb] # duckdb
pip install infermap[all] # all…The Python version is still on PyPI (pip install infermap) and both are maintained in the same monorepo with shared golden tests.
Key Takeaways
- infermap's seven-scorer schema mapping engine is now on npm with full feature parity to the Python version
- The TypeScript core has zero runtime dependencies — it runs in Edge Functions, Workers, browsers, and Node.js
- Both versions score F1 0.840 on 162 real-world benchmark cases, verified to within 0.0005 F1
- Upload-time schema resolution, browser mapping UIs, and full-stack type safety are now architecturally possible
- Mapping configs are portable between Python and TypeScript — same team, two runtimes, one source of truth
Schema mapping shouldn't require a Python service, a container, and a five-second cold start. Now it doesn't. npm install infermap and map your first schema in under a minute.
Top comments (0)