fastapier (Freelance Backend)

Posted on Apr 15 • Edited on Apr 20

Your CSV Import Is Fine... Until Real Data Arrives #1

#fastapi #python #nextjs #csv

This project is being developed independently as a private operational data intake engine for messy business CSV workflows.

If you'd like to support the work:
https://buymeacoffee.com/fastapier

Most CSV import tutorials end the same way:

read the file
loop the rows
db.add(...)
done

That looks fine in a demo.

In production, it is how you wake up to corrupted customer data, duplicate records, broken status values, and a support thread that starts with “we only uploaded a CSV.”

I did not want a CSV uploader.

I wanted a safe ingestion layer.

So I built a FastAPI-based CSV import engine that treats external CSV files as hostile input, forces a preview-first workflow, records an import run, prevents duplicate commit accidents, and connects imported records to follow-up operational tasks.

This article is about building that system properly.

The real problem is not file upload

The hard part of CSV import is not parsing a file.

The hard part is this:

somebody exports from Excel in a weird encoding
headers do not match your schema
one row has extra commas
another row is missing a value
status values are ACTIVE, 有効, new, archived, or something worse
someone uploads the same file twice
a “simple import” silently creates junk in production
nobody can explain what happened afterward

That is not an edge case.

That is the normal shape of business CSV.

So I stopped thinking in terms of “CSV upload” and started thinking in terms of this instead:

How do I create a safe entry point for external business data?

That framing changed everything.

What I built

This project is a Client Ops Workspace built with FastAPI and Next.js.

At the center is a CSV import engine for client records with this flow:

Sign in
Download the correct CSV template
Upload a CSV
Run Preview
Inspect:
- total rows
- valid rows
- error rows
- create candidates
- update candidates
- row-level errors
- normalization suggestions
- row decisions
Run Commit
Store an import run record and audit trail
Continue into project / task follow-up if needed

This is not “import and pray.”

It is preview first, commit later.

The workspace below shows the operator flow once real data is loaded: preview summary, row-level intent, filters, search, pagination, errors, and normalization feedback.

The core design choice: preview and commit must be separate

This is the part most tutorials skip.

I split CSV ingestion into two phases.

Preview

The system reads the uploaded CSV, validates it, normalizes values, detects malformed rows, checks for duplicates, determines create/update intent, and returns a structured preview.

Nothing is written to the database yet.

Commit

The system accepts an import_run_id from a preview that already happened and applies only the rows that survived validation.

That gives me several things at once:

safer UX
clearer operator intent
auditable import history
duplicate commit protection
better error reporting
a future path for approvals, manual mapping, and role-based operations

Once I committed to that architecture, the rest of the design got much cleaner.

The API shape

For the first version, I focused on clients.

The expected columns are:

contact_name
company_name
email
phone
status

Allowed statuses:

lead
active
inactive

The preview endpoint returns:

import run id
filename
summary
errors
suggestions
row decisions
header mapping info
timestamps

That means the frontend can render a real operator-facing review screen instead of a binary “success / failed” message.

Why the import run matters

The ImportRun model turned this from a toy into a system.

Each preview stores:

resource type
filename
file hash
preview status
total / valid / error row counts
create / update candidate counts
header mapping JSON
mapping confidence JSON
mapping notes JSON
preview payload JSON
actor user id
created / expires / committed / failed timestamps

That gives me:

1. Workflow-level idempotency

Commit is tied to a specific preview result through import_run_id.

If someone tries to commit the same run twice, the system refuses it.

2. Auditable behavior

I can inspect exactly what the operator previewed, what the system inferred, and when the commit happened.

3. Expiration

Previews are not forever. They have an expiration window.

That matters because preview payloads can be large, and stale previews are dangerous.

CSV hell is real, so the parser has to be suspicious

The parser is designed around distrust.

Encoding support

One detail that matters a lot in practice, especially in Japanese business environments, is CSV compatibility beyond clean UTF-8 exports.

That is why the parser also accounts for UTF-8 with BOM, CP932, and Shift-JIS patterns, along with header alias mapping for Japanese business labels.

I added decoding fallback for:

UTF-8 with BOM
UTF-8
CP932
Shift-JIS

That alone removes a huge class of “works on my machine” CSV problems for Japanese business data.

Header alias mapping

Real users do not always send your perfect schema.

So I added alias mapping for cases like:

会社名 → company_name
担当者名 → contact_name
連絡先メール → email
電話番号 → phone
状態 → status

This gives the engine the ability to meet users where they are, not where my model wishes they were.

Malformed row detection

I use csv.DictReader with restkey and row-level parse checks to catch:

rows with more columns than the header
rows with fewer columns than the header
likely comma mismatch
likely broken quoting

That means malformed rows become explicit review items instead of silent garbage.

Broken rows are surfaced before commit instead of silently poisoning production data.

Validation is not enough. Normalization matters too.

The engine does not only reject bad data. It also tries to make valid intent usable.

Status normalization

Examples:

ACTIVE → active
new → lead
有効 → active
見込み → lead
archived → inactive

Phone normalization

Examples:

+1-415-555-0182 → +14155550182
090-9999-8888 → 09099998888

Contact name fallback

If contact_name is empty but company_name exists, the preview can suggest filling it from the company.

But this is important:

Suggestions are not the same as blind auto-correction.

The operator gets a readable preview of what the system inferred.

That is a very different philosophy from silently mutating everything.

Real-world CSV means alias mapping, not idealized schemas

If a user sends company_name, that is easy.

Real files are rarely that polite.

Business CSV files tend to arrive with mixed conventions, Japanese labels, and whatever the exporting system decided to call a field that week.

That is why alias mapping is part of the engine, not an afterthought.

The goal is not to force every operator to manually reshape a spreadsheet before import.

The goal is to let the system absorb common real-world variation safely.

Row decisions are the real product

The most useful part of the preview is not the summary card.

It is the row decision list.

For each valid row, the system tells the operator:

which row it is
whether it will create or update
the normalized values
the reason

For example:

create because the email does not exist in the database
update because the email already exists in the database

In the UI, operators do not see a generic “import successful” message.

They see row-level intent.

That sounds simple, but it changes the operator’s experience from this:

“I uploaded a file and hoped for the best.”

to this:

“I can see exactly what this system is about to do.”

That is the difference between a utility and a trustworthy business tool.

The frontend is not decoration. It is part of the safety model.

I built a dedicated Next.js page for the workflow.

The page includes:

sign in
auth status
token show / hide / copy
template download
CSV upload
preview
commit
run status
preview snapshot
row decisions
errors
suggestions
task follow-up

The frontend evolved a lot during implementation because I stopped treating it as a debug console and started treating it like an operator workspace.

Key choices that mattered:

Preview snapshot stays compact

The top section is intentionally simple.

Analysis panels get wider

Once the CSV is analyzed, the page expands because row-level review needs horizontal space.

Color meaning stays disciplined

I avoided noisy UI coloring.

analyzed business data uses one calm accent tone
errors are red
summaries stay readable
the interface does not scream unless something is actually wrong

Search, filtering, and pagination

For row decisions, I added:

All / Create / Update filters
search
rows-per-page controls
pagination

This became necessary once I tested with 500+ and 1000+ row files.

If your “safe CSV import” falls apart when the dataset gets bigger, it is not safe.

It is theatrical.

Import does not end at commit

This is the part I care about most from a business perspective.

Most CSV demos stop at “data inserted successfully.”

Real work begins after that.

So I connected the import workflow to downstream operational actions:

match imported rows to existing clients
surface match quality
load related projects
load related tasks
create a project
create a task
change task status directly in the same workspace

This is where the engine stops being a parser and starts becoming operational infrastructure.

It turns the CSV engine from a dead-end admin tool into a real operational bridge.

Why this matters

Businesses do not buy “CSV parsing.”

They buy shorter time from:

spreadsheet arrives

the right work starts

That is a much more valuable product story.

Match quality matters more than people admit

I added two match types in the follow-up UI:

Matched by email
Matched by company

And I do not treat them equally.

A company-name match is shown as a provisional link with an explicit warning.

That is not a cosmetic detail.

It prevents operators from over-trusting a weak match and creating follow-up work against the wrong record.

This is the kind of boring detail that makes systems usable in the real world.

Activity logging is part of the contract

CSV import without auditability is reckless.

I record activity log events such as:

csv_previewed
csv_import_committed
client_created_from_csv
client_updated_from_csv

This gives the system an operational memory.

When a team asks, “Who imported this?” or “Why did this client change?” the answer should not be “I think someone uploaded a spreadsheet last week.”

It should be queryable.

What I tested

I did not stop at one happy-path CSV.

I tested with multiple categories of files.

Core cases

UTF-8 normal CSV
CP932 Japanese CSV
normalization-focused CSV

Broken cases

extra columns
missing columns
malformed rows

Scale cases

500-row mixed CSV
1000-row CSV
update-like datasets

The result was important:

large previews stayed usable
broken rows surfaced as errors instead of poisoning the preview
malformed extra-column rows stopped producing misleading suggestions
create/update behavior stayed understandable under load

That last point matters.

A system like this does not earn trust because it works once.

It earns trust because it behaves clearly when the input is ugly.

What makes this commercially interesting

I do not think the interesting part is “I built a FastAPI app.”

The interesting part is this:

I built a reusable ingestion layer for messy business CSV.

That can be used in:

client imports
CRM migration workflows
lead intake pipelines
operations dashboards
internal admin tools
SaaS backends that need safe CSV entry points

And it is much more sellable than a generic CRUD sample because it solves a pain that companies already have.

Companies do not wake up asking for “a cool FastAPI demo.”

They wake up asking why their imported data is broken.

What I would add next

This version already does a lot, but it also leaves room for stronger commercial versions.

1. Manual header remapping UI

The schema already leaves space for a future manual override flow.

2. AI-assisted column inference

If a user uploads headers like Contact Info or Primary Reach, the system could propose likely mappings and record the reason.

3. Stronger import policies

Per-client rules, per-column exceptions, protected fields, partial-safe commit policies, and custom merge logic.

4. Background cleanup / retention policies

Especially important once preview payloads start accumulating in production.

That is where this stops being a project and becomes a product line.

Why I built it this way

Because I am tired of tutorials pretending CSV import is trivial.

It is not.

The “read file and insert rows” version is easy to write and expensive to own.

I wanted something closer to the real standard teams need:

suspicious parser
explicit validation
normalization with human-readable suggestions
preview-first review
idempotent commit flow
audit trail
downstream operational handoff

That is a better way to treat imported business data.

Final thought

Most CSV import examples are upload features.

I wanted to build something else.

I wanted to build a safe intake layer for operational data.

Something that assumes the CSV is messy.

Something that shows its reasoning.

Something that can say “not yet” before it says “done.”

Something that keeps humans in control without making them do everything by hand.

Because in production, the real feature is not import.

The real feature is not corrupting the business while you import.

Final stress test: 10,000 rows with 800 broken cases

As a final stress test, I ran this workflow against a 10,000-row CSV that included 800 intentionally broken cases.

Those broken rows included invalid emails, missing required fields, duplicate values, invalid status values, extra-column rows, and missing-column rows.

The result:

10,000 total rows
9,200 valid rows
800 error rows

On the lowest-spec machine I had available for testing, the preview completed in about 20 seconds and still returned a readable decision table, validation output, and suggestions.

I would not treat this as a raw CSV parsing benchmark.

A 10,000-row import can look “fast” or “slow” depending on what the system is actually doing. In this case, the workflow was not just reading rows. It was validating fields, classifying row decisions, isolating broken records, generating suggestions, and returning a reviewable preview for operations.

So the result that mattered to me was not just the elapsed time. It was the fact that the workflow remained usable at 10,000 rows with 800 broken cases.

Interested in a private implementation?

This project is not published as a full public repository.

If your team needs a similar workflow for internal operations, CSV validation, safe commit flows, or post-import task handling, I’m available for private implementation discussions based on your use case and scope.

Email: fastapienne@gmail.com