Most CSV import tutorials end the same way:
- read the file
- loop the rows
db.add(...)- done
That looks fine in a demo.
In production, it is how you wake up to corrupted customer data, duplicate records, broken status values, and a support thread that starts with “we only uploaded a CSV.”
I did not want a CSV uploader.
I wanted a safe ingestion layer.
So I built a FastAPI-based CSV import engine that treats external CSV files as hostile input, forces a preview-first workflow, records an import run, prevents duplicate commit accidents, and connects imported records to follow-up operational tasks.
This article is about building that system properly.
The real problem is not file upload
The hard part of CSV import is not parsing a file.
The hard part is this:
- somebody exports from Excel in a weird encoding
- headers do not match your schema
- one row has extra commas
- another row is missing a value
- status values are
ACTIVE,有効,new,archived, or something worse - someone uploads the same file twice
- a “simple import” silently creates junk in production
- nobody can explain what happened afterward
That is not an edge case.
That is the normal shape of business CSV.
So I stopped thinking in terms of “CSV upload” and started thinking in terms of this instead:
How do I create a safe entry point for external business data?
That framing changed everything.
What I built
This project is a Client Ops Workspace built with FastAPI and Next.js.
At the center is a CSV import engine for client records with this flow:
- Sign in
- Download the correct CSV template
- Upload a CSV
- Run Preview
- Inspect:
- total rows
- valid rows
- error rows
- create candidates
- update candidates
- row-level errors
- normalization suggestions
- row decisions
- Run Commit
- Store an import run record and audit trail
- Continue into project / task follow-up if needed
This is not “import and pray.”
It is preview first, commit later.
The workspace below shows the operator flow once real data is loaded: preview summary, row-level intent, filters, search, pagination, errors, and normalization feedback.
The core design choice: preview and commit must be separate
This is the part most tutorials skip.
I split CSV ingestion into two phases.
Preview
The system reads the uploaded CSV, validates it, normalizes values, detects malformed rows, checks for duplicates, determines create/update intent, and returns a structured preview.
Nothing is written to the database yet.
Commit
The system accepts an import_run_id from a preview that already happened and applies only the rows that survived validation.
That gives me several things at once:
- safer UX
- clearer operator intent
- auditable import history
- duplicate commit protection
- better error reporting
- a future path for approvals, manual mapping, and role-based operations
Once I committed to that architecture, the rest of the design got much cleaner.
The API shape
For the first version, I focused on clients.
The expected columns are:
contact_namecompany_nameemailphonestatus
Allowed statuses:
leadactiveinactive
The preview endpoint returns:
- import run id
- filename
- summary
- errors
- suggestions
- row decisions
- header mapping info
- timestamps
That means the frontend can render a real operator-facing review screen instead of a binary “success / failed” message.
Why the import run matters
The ImportRun model turned this from a toy into a system.
Each preview stores:
- resource type
- filename
- file hash
- preview status
- total / valid / error row counts
- create / update candidate counts
- header mapping JSON
- mapping confidence JSON
- mapping notes JSON
- preview payload JSON
- actor user id
- created / expires / committed / failed timestamps
That gives me:
1. Workflow-level idempotency
Commit is tied to a specific preview result through import_run_id.
If someone tries to commit the same run twice, the system refuses it.
2. Auditable behavior
I can inspect exactly what the operator previewed, what the system inferred, and when the commit happened.
3. Expiration
Previews are not forever. They have an expiration window.
That matters because preview payloads can be large, and stale previews are dangerous.
CSV hell is real, so the parser has to be suspicious
The parser is designed around distrust.
Encoding support
One detail that matters a lot in practice, especially in Japanese business environments, is CSV compatibility beyond clean UTF-8 exports.
That is why the parser also accounts for UTF-8 with BOM, CP932, and Shift-JIS patterns, along with header alias mapping for Japanese business labels.
I added decoding fallback for:
- UTF-8 with BOM
- UTF-8
- CP932
- Shift-JIS
That alone removes a huge class of “works on my machine” CSV problems for Japanese business data.
Header alias mapping
Real users do not always send your perfect schema.
So I added alias mapping for cases like:
-
会社名→company_name -
担当者名→contact_name -
連絡先メール→email -
電話番号→phone -
状態→status
This gives the engine the ability to meet users where they are, not where my model wishes they were.
Malformed row detection
I use csv.DictReader with restkey and row-level parse checks to catch:
- rows with more columns than the header
- rows with fewer columns than the header
- likely comma mismatch
- likely broken quoting
That means malformed rows become explicit review items instead of silent garbage.
Broken rows are surfaced before commit instead of silently poisoning production data.
Validation is not enough. Normalization matters too.
The engine does not only reject bad data. It also tries to make valid intent usable.
Status normalization
Examples:
-
ACTIVE→active -
new→lead -
有効→active -
見込み→lead -
archived→inactive
Phone normalization
Examples:
-
+1-415-555-0182→+14155550182 -
090-9999-8888→09099998888
Contact name fallback
If contact_name is empty but company_name exists, the preview can suggest filling it from the company.
But this is important:
Suggestions are not the same as blind auto-correction.
The operator gets a readable preview of what the system inferred.
That is a very different philosophy from silently mutating everything.
Real-world CSV means alias mapping, not idealized schemas
If a user sends company_name, that is easy.
Real files are rarely that polite.
Business CSV files tend to arrive with mixed conventions, Japanese labels, and whatever the exporting system decided to call a field that week.
That is why alias mapping is part of the engine, not an afterthought.
The goal is not to force every operator to manually reshape a spreadsheet before import.
The goal is to let the system absorb common real-world variation safely.
Row decisions are the real product
The most useful part of the preview is not the summary card.
It is the row decision list.
For each valid row, the system tells the operator:
- which row it is
- whether it will create or update
- the normalized values
- the reason
For example:
-
createbecause the email does not exist in the database -
updatebecause the email already exists in the database
In the UI, operators do not see a generic “import successful” message.
They see row-level intent.
That sounds simple, but it changes the operator’s experience from this:
“I uploaded a file and hoped for the best.”
to this:
“I can see exactly what this system is about to do.”
That is the difference between a utility and a trustworthy business tool.
The frontend is not decoration. It is part of the safety model.
I built a dedicated Next.js page for the workflow.
The page includes:
- sign in
- auth status
- token show / hide / copy
- template download
- CSV upload
- preview
- commit
- run status
- preview snapshot
- row decisions
- errors
- suggestions
- task follow-up
The frontend evolved a lot during implementation because I stopped treating it as a debug console and started treating it like an operator workspace.
Key choices that mattered:
Preview snapshot stays compact
The top section is intentionally simple.
Analysis panels get wider
Once the CSV is analyzed, the page expands because row-level review needs horizontal space.
Color meaning stays disciplined
I avoided noisy UI coloring.
- analyzed business data uses one calm accent tone
- errors are red
- summaries stay readable
- the interface does not scream unless something is actually wrong
Search, filtering, and pagination
For row decisions, I added:
-
All / Create / Updatefilters - search
- rows-per-page controls
- pagination
This became necessary once I tested with 500+ and 1000+ row files.
If your “safe CSV import” falls apart when the dataset gets bigger, it is not safe.
It is theatrical.
Import does not end at commit
This is the part I care about most from a business perspective.
Most CSV demos stop at “data inserted successfully.”
Real work begins after that.
So I connected the import workflow to downstream operational actions:
- match imported rows to existing clients
- surface match quality
- load related projects
- load related tasks
- create a project
- create a task
- change task status directly in the same workspace
This is where the engine stops being a parser and starts becoming operational infrastructure.
It turns the CSV engine from a dead-end admin tool into a real operational bridge.
Why this matters
Businesses do not buy “CSV parsing.”
They buy shorter time from:
spreadsheet arrives
to
the right work starts
That is a much more valuable product story.
Match quality matters more than people admit
I added two match types in the follow-up UI:
Matched by emailMatched by company
And I do not treat them equally.
A company-name match is shown as a provisional link with an explicit warning.
That is not a cosmetic detail.
It prevents operators from over-trusting a weak match and creating follow-up work against the wrong record.
This is the kind of boring detail that makes systems usable in the real world.
Activity logging is part of the contract
CSV import without auditability is reckless.
I record activity log events such as:
csv_previewedcsv_import_committedclient_created_from_csvclient_updated_from_csv
This gives the system an operational memory.
When a team asks, “Who imported this?” or “Why did this client change?” the answer should not be “I think someone uploaded a spreadsheet last week.”
It should be queryable.
What I tested
I did not stop at one happy-path CSV.
I tested with multiple categories of files.
Core cases
- UTF-8 normal CSV
- CP932 Japanese CSV
- normalization-focused CSV
Broken cases
- extra columns
- missing columns
- malformed rows
Scale cases
- 500-row mixed CSV
- 1000-row CSV
- update-like datasets
The result was important:
- large previews stayed usable
- broken rows surfaced as errors instead of poisoning the preview
- malformed extra-column rows stopped producing misleading suggestions
- create/update behavior stayed understandable under load
That last point matters.
A system like this does not earn trust because it works once.
It earns trust because it behaves clearly when the input is ugly.
What makes this commercially interesting
I do not think the interesting part is “I built a FastAPI app.”
The interesting part is this:
I built a reusable ingestion layer for messy business CSV.
That can be used in:
- client imports
- CRM migration workflows
- lead intake pipelines
- operations dashboards
- internal admin tools
- SaaS backends that need safe CSV entry points
And it is much more sellable than a generic CRUD sample because it solves a pain that companies already have.
Companies do not wake up asking for “a cool FastAPI demo.”
They wake up asking why their imported data is broken.
What I would add next
This version already does a lot, but it also leaves room for stronger commercial versions.
1. Manual header remapping UI
The schema already leaves space for a future manual override flow.
2. AI-assisted column inference
If a user uploads headers like Contact Info or Primary Reach, the system could propose likely mappings and record the reason.
3. Stronger import policies
Per-client rules, per-column exceptions, protected fields, partial-safe commit policies, and custom merge logic.
4. Background cleanup / retention policies
Especially important once preview payloads start accumulating in production.
That is where this stops being a project and becomes a product line.
Why I built it this way
Because I am tired of tutorials pretending CSV import is trivial.
It is not.
The “read file and insert rows” version is easy to write and expensive to own.
I wanted something closer to the real standard teams need:
- suspicious parser
- explicit validation
- normalization with human-readable suggestions
- preview-first review
- idempotent commit flow
- audit trail
- downstream operational handoff
That is a better way to treat imported business data.
Final thought
Most CSV import examples are upload features.
I wanted to build something else.
I wanted to build a safe intake layer for operational data.
Something that assumes the CSV is messy.
Something that shows its reasoning.
Something that can say “not yet” before it says “done.”
Something that keeps humans in control without making them do everything by hand.
Because in production, the real feature is not import.
The real feature is not corrupting the business while you import.
Final stress test: 10,000 rows with 800 broken cases
As a final stress test, I ran this workflow against a 10,000-row CSV that included 800 intentionally broken cases.
Those broken rows included invalid emails, missing required fields, duplicate values, invalid status values, extra-column rows, and missing-column rows.
The result:
- 10,000 total rows
- 9,200 valid rows
- 800 error rows
On the lowest-spec machine I had available for testing, the preview completed in about 20 seconds and still returned a readable decision table, validation output, and suggestions.

I would not treat this as a raw CSV parsing benchmark.
A 10,000-row import can look “fast” or “slow” depending on what the system is actually doing. In this case, the workflow was not just reading rows. It was validating fields, classifying row decisions, isolating broken records, generating suggestions, and returning a reviewable preview for operations.
So the result that mattered to me was not just the elapsed time. It was the fact that the workflow remained usable at 10,000 rows with 800 broken cases.
Interested in a private implementation?
This project is not published as a full public repository.
If your team needs a similar workflow for internal operations, CSV validation, safe commit flows, or post-import task handling, I’m available for private implementation discussions based on your use case and scope.
Email: fastapienne@gmail.com





Top comments (0)