<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arshav</title>
    <description>The latest articles on DEV Community by Arshav (@r_j_multischema).</description>
    <link>https://dev.to/r_j_multischema</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3776091%2F154179dd-2bf9-4d5e-a357-ee532cc92390.png</url>
      <title>DEV Community: Arshav</title>
      <link>https://dev.to/r_j_multischema</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/r_j_multischema"/>
    <language>en</language>
    <item>
      <title>How We Built a Deterministic File Import Pipeline in TypeScript (CSV, XLSX, ZIP)</title>
      <dc:creator>Arshav</dc:creator>
      <pubDate>Mon, 16 Feb 2026 16:34:07 +0000</pubDate>
      <link>https://dev.to/r_j_multischema/how-we-built-a-deterministic-file-import-pipeline-in-typescript-csv-xlsx-zip-23pe</link>
      <guid>https://dev.to/r_j_multischema/how-we-built-a-deterministic-file-import-pipeline-in-typescript-csv-xlsx-zip-23pe</guid>
      <description>&lt;h1&gt;
  
  
  How We Built a Deterministic File Import Pipeline in TypeScript (CSV, XLSX, ZIP)
&lt;/h1&gt;

&lt;p&gt;Most file importers look good in a demo.&lt;/p&gt;

&lt;p&gt;Production is different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;headers are inconsistent&lt;/li&gt;
&lt;li&gt;users upload many files at once&lt;/li&gt;
&lt;li&gt;ZIP files include random extra files&lt;/li&gt;
&lt;li&gt;retries create duplicates&lt;/li&gt;
&lt;li&gt;support gets flooded with “why did this fail?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While building &lt;a href="https://www.multischema.com" rel="noopener noreferrer"&gt;multischema.com&lt;/a&gt;, we made one rule non-negotiable:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Same input + same schema version = same output. Every time.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1) Determinism first
&lt;/h2&gt;

&lt;p&gt;Our import flow behaves like a pure function:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;normalize + sort files in a stable order&lt;/li&gt;
&lt;li&gt;build a deterministic &lt;code&gt;runKey&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;reuse previous result for retries instead of reprocessing&lt;/li&gt;
&lt;/ol&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
ts
const filesInOrder = files
  .map((f) =&amp;gt; ({ path: f.path, size: f.size, sha256: f.sha256 }))
  .sort((a, b) =&amp;gt; a.path.localeCompare(b.path));

const runKey = sha256(
  filesInOrder.map((f) =&amp;gt; `${f.path}:${f.size}:${f.sha256}`).join("|") +
    `|schema:${schemaVersion}`
);

const existing = await getImportRunByKey(runKey);
if (existing) return existing.result;
This single change removed most duplicate-import chaos.

2) ZIP support without UX pain
ZIP uploads are common, but they often contain:

hidden system files
screenshots/PDFs
unrelated exports
We don’t fail the whole run.
We process valid files and return a skipped-file summary with reasons.

const supported = new Set([".csv", ".xlsx", ".xls"]);

for (const entry of zipEntries) {
  const ext = extname(entry.name).toLowerCase();

  if (!supported.has(ext)) {
    skipped.push({ file: entry.name, reason: "Unsupported file type" });
    continue;
  }

  accepted.push(entry);
}
User-facing result:

Imported: 4 files
Skipped: 3 files
Reasons: unsupported type, empty file, invalid format
Clear summary = fewer support tickets.

3) Schema mapping is a real pipeline step
Most failures happen before business logic, during column mapping.

We treat mapping as a first-class stage:

parse file
normalize headers
map to schema fields
validate rows
upsert records
Each error is structured: code, row, column, message.
Users can fix data quickly instead of guessing.

4) Idempotent writes (not just idempotent processing)
Deterministic processing still fails if writes duplicate rows.

Use stable keys for upsert:

invoice_number + vendor_id
external_id
deterministic record hash (only if no natural key exists)
INSERT INTO invoices (vendor_id, invoice_number, amount, due_date)
VALUES (?, ?, ?, ?)
ON CONFLICT (vendor_id, invoice_number)
DO UPDATE SET
  amount = excluded.amount,
  due_date = excluded.due_date,
  updated_at = CURRENT_TIMESTAMP;
No stable key = no reliable import.

5) UX quality matters as much as parser quality
Reliability is invisible if UX is vague.

For every run we show:

accepted files
skipped files + reasons
row-level validation errors
inserted / updated / failed counts
Developers care about correctness.
Operators care about clarity.
You need both.

Final takeaway
A production-grade importer is not “parse CSV and hope.”

It should be:

deterministic
idempotent
schema-aware
explicit about skipped/failed inputs
predictable across retries
If you’re building import flows, start with determinism first. Everything else gets easier after that.

I’m happy to share a follow-up on queue architecture (upload API, worker pipeline, retries, and backpressure) if there’s interest.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>backend</category>
      <category>dataengineering</category>
      <category>saas</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
