benzsevern

Posted on Apr 13 • Originally published at goldenmatch.ai

infermap Now Runs in TypeScript: Schema Mapping on the Edge

#typescript #opensource #npm #dataengineering

You get a CSV from a vendor. The columns are fname, lname, tel, addr1. Your database expects first_name, last_name, phone, street_address. You write a mapping dict, ship it, and move on — until the next vendor sends givenName, surname, mobile, address_line_1.

infermap solves this by inferring column mappings automatically. You hand it a source schema and a target schema, and it returns the best 1:1 field assignment with confidence scores and per-scorer reasoning. No hardcoded synonyms. No manual mapping files. It shipped on PyPI in March 2026 and has been running in production Python pipelines since.

Today it's available on npm as a full TypeScript port — feature parity with the Python version, same algorithm, same accuracy. This post covers what changed, what stayed the same, and why a TypeScript infermap opens doors that Python alone couldn't.

What infermap Does in 30 Seconds

infermap runs a multi-scorer pipeline to match source fields to target fields:

Schema extraction — reads columns from CSVs, JSON, database tables, or in-memory records. Infers dtypes, null rates, cardinality, and samples values.
Common-prefix stripping — if every source column starts with prospect_, it strips the prefix before matching so prospect_city matches city.
Score matrix — for each (source, target) pair, seven independent scorers produce a score and a reasoning string. Results are combined via weighted average (minimum 2 contributors).
Optimal assignment — the Hungarian algorithm finds the minimum-cost 1:1 matching across the full M x N matrix.
Confidence filtering — assignments below the threshold (default 0.2) are dropped.

The result: a list of mappings, each with a confidence score and explainable reasoning from every scorer that contributed.

The Seven Scorers

Scorer	Weight	What It Does
ExactScorer	1.0	Case-insensitive exact name match
AliasScorer	0.95	Known synonyms — `fname` to `first_name`, extensible via config and domain dictionaries
LLMScorer	0.8	Pluggable LLM-backed scoring (stubbed by default)
InitialismScorer	0.75	Matches abbreviations via dynamic programming — `ASSI` to `assay_id`, `CONSC` to `confidence_score`
PatternTypeScorer	0.7	Semantic type detection from sample values — email, UUID, ISO date, phone, URL, IP, ZIP, currency
ProfileScorer	0.5	Statistical similarity — dtype, null rate, unique rate, value length, cardinality
FuzzyNameScorer	0.4	Jaro-Winkler string similarity on normalized names

The combination of name-based, value-based, and statistical scorers means infermap handles cases where any single approach fails. A column named col_7 won't match on name — but if every value is an email address, PatternTypeScorer catches it. A column named patient_identifier won't fuzzy-match mrn — but AliasScorer with the healthcare domain dictionary will.

What the TypeScript Port Looks Like

import { map } from "infermap";

const result = map(
  { records: vendorData, sourceName: "vendor" },
  { records: internalSchema, sourceName: "internal" }
);

for (const m of result.mappings) {
  console.log(`${m.source} -> ${m.target}  (${(m.confidence * 100).toFixed(0)}%)`);
}
// fname -> first_name  (98%)
// lname -> last_name   (98%)
// tel   -> phone       (91%)
// addr1 -> street_address (76%)

The API mirrors Python's. If you've used infermap in a Python pipeline, the TypeScript version feels identical — same function names, same config structure, same output shape.

Database support

import { mapFromDb } from "infermap/node";

// Map columns between two Postgres tables
const result = await mapFromDb(
  { uri: "postgresql://localhost/warehouse", table: "raw_imports" },
  { uri: "postgresql://localhost/warehouse", table: "canonical_customers" }
);

SQLite, PostgreSQL, and DuckDB are supported as optional peer dependencies — install only what you need.

Custom scorers

import { defineScorer, makeScorerResult, MapEngine, defaultScorers } from "infermap";

const domainScorer = defineScorer(
  "DomainScorer",
  (source, target) => {
    if (source.name.includes("price") && target.name.includes("amount")) {
      return makeScorerResult(0.85, "price/amount semantic match");
    }
    return null; // abstain
  },
  0.7
);

const engine = new MapEngine({
  scorers: [...defaultScorers(), domainScorer],
});

Config persistence

// Save a mapping for reuse — no re-inference needed
result.saveConfig("vendor_to_internal.json");

// Later: load and apply
import { applyConfig } from "infermap";
const renamed = applyConfig(newData, "vendor_to_internal.json");

Same Algorithm, Same Accuracy

The Python and TypeScript versions share a 162-case benchmark corpus — 82 cases from the Valentine schema matching benchmark plus 80 synthetic cases. Both implementations produce results within 0.0005 F1 of each other on the shared corpus.

Metric	Python	TypeScript
Overall F1	0.840	0.840
Valentine corpus (82 cases)	0.794	0.794
ChEMBL subset (25 cases)	0.819	0.819
Calibrated ECE	0.005	0.005

The Hungarian algorithm implementation is vendored in TypeScript (O(n³) — no scipy dependency), and every scorer was ported line-by-line with matching test cases. 186 TypeScript tests verify parity.

Zero Runtime Dependencies in Core

The TypeScript core has zero runtime dependencies. No lodash, no heavy string libraries, no Node.js built-ins. The Jaro-Winkler implementation, Hungarian algorithm, and pattern matchers are all self-contained.

This matters because it means infermap runs anywhere JavaScript runs:

Next.js Edge Runtime — map schemas in middleware or edge API routes
Cloudflare Workers — schema mapping at the edge, sub-50ms cold starts
Vercel Edge Functions — inline mapping in serverless functions
Browser — map schemas client-side in a data import wizard
Deno / Bun — no Node.js-specific APIs in the critical path

Node.js-specific features (file system access, database providers) are isolated in the infermap/node entrypoint. The core infermap import is edge-safe.

Doors This Opens

A Python-only infermap was useful for batch ETL jobs and data pipelines. A TypeScript infermap changes what's architecturally possible.

Upload-time schema resolution

When a user uploads a CSV to your web app, you can map their columns to your schema before the data ever hits your backend. Run infermap client-side or in an edge function, show the user the proposed mapping with confidence scores, let them confirm or override, then send the already-mapped data to your API.

// In a Next.js API route or edge function
import { map } from "infermap";

export async function POST(req: Request) {
  const { headers, sample } = await req.json();

  const result = map(
    { records: sample, sourceName: "upload" },
    { fields: TARGET_SCHEMA, sourceName: "canonical" }
  );

  // Return proposed mapping for user confirmation
  return Response.json({
    mappings: result.mappings,
    unmapped: result.unmapped_source,
    confidence: result.mappings.map((m) => m.confidence),
  });
}

No round-trip to a Python service. No cold-starting a container. The mapping happens at the edge in milliseconds.

Full-stack type safety

With TypeScript, the mapping result is fully typed. Your IDE autocompletes field names, catches typos at compile time, and your CI pipeline verifies the mapping config matches your schema types. Python's infermap returns dicts — TypeScript's returns typed objects that integrate with your existing type system.

Monorepo workflows

If your backend is Node.js or TypeScript, infermap slots into your existing build pipeline. No Python runtime to install in CI. No virtualenv to manage. One npm install infermap and you're done.

Browser-based mapping UIs

The score matrix that infermap computes — the full M x N grid of (source, target, confidence) tuples — is exposed via the API. You can render it as an interactive mapping UI where users see every candidate match, the confidence score, and the per-scorer reasoning. The zero-dependency core means this runs in the browser without bundling a runtime.

Shared config between Python and TypeScript

infermap configs are serialized as JSON (TypeScript) or YAML (Python). A mapping you infer in a Python notebook can be loaded and applied in a TypeScript API route, and vice versa. Teams that use Python for data science and TypeScript for production APIs can share mapping definitions without translation.

What Stayed the Same

Everything that matters:

Seven scorers with the same weights and logic
Hungarian algorithm for optimal 1:1 assignment
Common-prefix canonicalization before matching
Domain dictionaries for healthcare, finance, and ecommerce aliases
Confidence calibration via Isotonic (PAV) and Platt (Nelder-Mead) calibrators
CLI with map, apply, inspect, and validate subcommands
Config persistence — save once, apply forever

Install and Try It

npm install infermap

Or with database support:

npm install infermap better-sqlite3  # SQLite
npm install infermap pg              # PostgreSQL
npm install infermap @duckdb/node-api # DuckDB

benzsevern / infermap

Inference-driven schema mapping engine for Python and TypeScript. 7 built-in scorers, domain dictionaries (healthcare/finance/ecommerce), confidence calibration, cross-language accuracy benchmark (F1 0.84), and full Python↔TypeScript parity.

infermap

Inference-driven schema mapping engine.
Map messy source columns to a known target schema — accurately, explainably, with zero config.
Built by Ben Severn.

📖 Wiki · 🌐 Docs · 🧪 Examples · 💬 Discussions · 🐛 Issues

infermap is a schema-mapping engine. Give it any two field collections (CSVs, DataFrames, database tables, in-memory records) and it figures out which source field corresponds to which target field, with confidence scores and human-readable reasoning. Available as a Python package on PyPI and a TypeScript package on npm, with mapping decisions verified bit-for-bit by a shared golden-test parity suite.

Install

Python

pip install infermap

Optional database extras:

pip install infermap[postgres]   # psycopg2-binary
pip install infermap[mysql]      # mysql-connector-python
pip install infermap[duckdb]     # duckdb
pip install infermap[all]        # all

…

View on GitHub

The Python version is still on PyPI (pip install infermap) and both are maintained in the same monorepo with shared golden tests.

Key Takeaways

infermap's seven-scorer schema mapping engine is now on npm with full feature parity to the Python version
The TypeScript core has zero runtime dependencies — it runs in Edge Functions, Workers, browsers, and Node.js
Both versions score F1 0.840 on 162 real-world benchmark cases, verified to within 0.0005 F1
Upload-time schema resolution, browser mapping UIs, and full-stack type safety are now architecturally possible
Mapping configs are portable between Python and TypeScript — same team, two runtimes, one source of truth

Schema mapping shouldn't require a Python service, a container, and a five-second cold start. Now it doesn't. npm install infermap and map your first schema in under a minute.

DEV Community