DEV Community

Cover image for How a Non-Engineer Built a 1-Million-Row CSV Analyzer with Claude Code and DuckDB-WASM
Yuki
Yuki

Posted on

How a Non-Engineer Built a 1-Million-Row CSV Analyzer with Claude Code and DuckDB-WASM

Introduction

I built a tool called LeapRows โ€” a browser-based CSV analyzer that handles 1 million rows without breaking a sweat. ๐ŸŽ‰

The key feature is that everything runs entirely inside the browser using DuckDB-WASM and OPFS. Your data never leaves your machine.

I'm not an engineer. I wrote a tiny bit of code for minor tweaks and debugging, but 95%+ of the codebase was written by Claude Code.

I can't claim to understand every single line โ€” and that's exactly why I didn't want to just blindly ship AI-generated code. I put quality controls in place as best I could as a non-engineer: defining implementation rules in CLAUDE.md, building security audit Skills based on OWASP Top 10, and setting up pre-commit hooks for lightweight checks.

It may not be perfect, but I at least wanted to avoid "pray and deploy."

The tool is still in Beta, but I want to document what it took for a non-engineer to ship a real product in collaboration with Claude Code.

Why I Built It: "The Python Sharing Problem" and Server Costs

My day job is SEO. I regularly deal with CSVs containing hundreds of thousands of rows โ€” exports from Ahrefs, Google Search Console, BigQuery, and similar tools.

For heavy data work, I'd reach for Python (Polars) to transform and aggregate data. But Python has a high barrier to entry: environment setup, code adjustments โ€” it's just not something you can easily hand off to non-engineer teammates.

Even for myself, I'd often think, "Do I really have to write Python just for this small transformation?" And then there were frustrating moments like: "Why is the type inference different for the same CSV from the same tool?!" (causing join errors).

I'd wanted a tool that made handling hundreds of thousands to millions of rows as easy as using a spreadsheet.

The pain points:

  • Excel and Google Sheets struggle badly with CSVs over ~100k rows
  • Python is hard to share with non-technical teammates
  • Writing Python for small one-off tasks feels like overkill

My First Attempt: Running Polars Server-Side (Quickly Abandoned)

My first idea was: "What if I run Polars on the server? It'd be blazing fast for aggregation."

I started building that, but reality hit quickly:

  • Uploading and downloading large CSVs was just too slow to be usable
  • If user numbers grew, server costs could spiral out of control

I gave up on the server-side Polars approach almost immediately.

The Turning Point: DuckDB-WASM ร— OPFS

Just as I was about to abandon the whole idea, I came across an article by Shiguredo about handling 1TB of log data offline using DuckDB-WASM and OPFS.

I'm embarrassed to admit this was my first time hearing about the Parquet format, but the query speed shown in their demo blew me away.

That's when it clicked: "What if I instantly convert uploaded CSVs to Parquet and store them in OPFS? I could build a blazing-fast data processing tool with zero server involvement."

Looking at the documentation, I felt like I could probably write the basic code to load and display data with DuckDB-WASM myself.

But if I wanted something anyone could use, it would need a polished GUI โ€” and building a proper GUI while raising two kids was simply not realistic.

That's when I decided to try Claude Code, which was getting a lot of buzz at the time (this was around June 2025).

Is It Really "Zero Network Traffic"?

As a side note โ€” I know some people might be skeptical when a non-engineer claims their tool doesn't send data to a server.

That's exactly why I was committed to an architecture where data physically cannot leave the browser.

Here's a screenshot of the browser's Network tab while LeapRows processes a large CSV. The only POST request is to Vercel Analytics (page view tracking), and that's only enabled on the landing page.

Even inspecting the payload, you'll see that no file names or data contents are being sent anywhere.

This gave me the best of both worlds: zero server costs, and a tool users can trust with sensitive data.

The Battle with a Runaway Claude Code

I installed Claude Code, brimming with excitement, and typed: "Using DuckDB-WASM, build a tool that lets users upload a CSV, convert it to Parquet, store it in OPFS, and view the data."

Code poured out almost instantly. A minimal working tool took shape.

Energized, I kept adding features: pivot tables, filters, column operations... whatever came to mind.

Bugs Multiplied โ†’ Time to Start Over

After adding a few features, things started to fall apart. Claude Code fell into a loop: "Fixed it!" โ†’ I check โ†’ still broken โ†’ I report โ†’ "Fixed!" โ†’ still broken.

The number of back-and-forth exchanges to implement a single feature shot up, and bugs started appearing in parts I hadn't even touched.

At that point I was using a CLAUDE.md with basic principles I'd picked up from posts on X about improving Claude Code's output. But without a solid spec for the tool and with ad-hoc requests flying in randomly, CLAUDE.md wasn't doing much. Eventually, nothing worked reliably and bugs could appear anywhere. Pure chaos.

So after about a month of development, I made a hard call: scrap everything and start over.

This time, I consulted with Gemini to structure the project properly: design philosophy, DuckDB and OPFS connection conventions, shared UI rules. After that, implementation became dramatically smoother.

What's in My CLAUDE.md Now

Here's a condensed excerpt of the rules I've accumulated (it's grown long over time):

# Development Philosophy
* Incremental progress: small, composable changes
* Learning from existing code: study patterns before implementing
* Pragmatic over dogmatic: adapt to reality
* Clear intent over clever code: always prioritize clarity

# Bug Fix Methodology
* Follow investigate โ†’ test โ†’ fix order (never guess)
* Write a failing reproduction test before attempting a fix
* Cap fix attempts at 3 iterations; escalate if not resolved
* Audit the full impact scope (grep all usages)
* Enforce immutable patterns

# Skills (Reusable Implementation Guides)
* Rules must call the relevant Skill before writing any code
* Coverage: DuckDB operations, Zustand state management, security audits, E2E tests, UI patterns
* Plan agents must also reference Skills (read SKILL.md and cite it in the plan)

# Architecture Principles
* DuckDB Singleton โ€” centralized connection management, close() is forbidden
* Zustand state โ€” selective subscriptions via useShallow to prevent over-rendering
* SQL escaping โ€” all queries go through a dedicated utility function
* Single source of truth for fileId โ€” managed exclusively in file-context-store
* Query cancellation โ€” AbortController + debounce for logical cancellation
* HTML sanitization โ€” two-layer defense: escapeHtml + DOMPurify
* Input validation โ€” file size limits, ReDoS prevention, regex pattern validation
* API rate limiting โ€” IP-based brute-force protection

# Troubleshooting
* 30+ error patterns with documented resolutions
* Serves as a knowledge base to prevent recurring issues
Enter fullscreen mode Exit fullscreen mode

Hitting the "1 Million Row Wall"

Even with a cleaner architecture, development was full of challenges.

DuckDB Crashes (Multiple Connection Problem)

Since I was new to DuckDB-WASM, I didn't know you can't open multiple connections to a single instance. Claude Code, of course, had no idea either and happily generated code that did exactly that.

Frequent errors included:

  • Queries running before DuckDB finished initializing
  • SQL executing before CSVโ†’Parquet conversion was complete
  • New operations launching before previous queries had finished

For engineers, this is probably basic stuff โ€” but for me, it was here that I first learned what a Singleton pattern is, after consulting with Gemini.

Once I added DuckDB instance management through Zustand and explicitly documented in CLAUDE.md that all DuckDB connections must use the singleton instance, the error rate dropped dramatically.

The _setThrew is not defined Error Storm

Working with DuckDB-WASM, I ran into _setThrew is not defined an absurd number of times.

I had no idea what it meant. Neither Gemini, Claude, nor Google searches gave me a clear answer at first. Eventually I realized it was a WASM-level error, and once I had Claude Code build a mechanism to catch and log those errors to the console, debugging finally became possible.

Most of the root causes turned out to be the same issues as before: multiple connections, premature initialization, and data consistency mismatches โ€” all pretty fundamental mistakes.

Zero Wait Time: "Dynamic CTEs" and the Birth of the Recipe Feature

Early on, every column operation or filter would execute a query and overwrite the Parquet file in OPFS.

DuckDB-WASM is fast โ€” sub-second writes. But if I imagined users wanting a spreadsheet-like experience, making them wait even one second per action was a poor UX.

Before: Every action triggered a physical write, creating a noticeable delay.

Then I had an idea: "What if I stop writing to disk after every action, and instead store the operation history as JSON? Then, right before rendering, chain everything together with SQL CTEs and execute it in one shot."

-- Conceptual image of the dynamic CTE built internally

WITH step1 AS (SELECT * FROM source_data WHERE category = 'A'),
     step2 AS (SELECT * FROM step1 WHERE price > 1000)
SELECT * FROM step2;
Enter fullscreen mode Exit fullscreen mode

This eliminated the per-action wait entirely.

And then I realized: "If the operation history is cleanly stored as JSON, I can save it and replay entire workflows automatically." That insight became the foundation of LeapRows' Recipe feature.

After:

  • Wait time only occurs on initial file load
  • Entire workflows can be re-executed from saved JSON

For complex cases (heavy regex, nested calculations), the dynamic CTE itself could get slow. Since I can't do advanced query tuning by reading EXPLAIN output, I implemented a caching layer that physically saves intermediate results to keep the UI responsive.

The Ultimate Debugging Weapon: Lots of console.log

Even with all of this, there were still cases where Claude Code would insist "It's fixed!" while the error kept looping.

My final weapon: flooding the suspicious code with console.log statements. This let me trace things like "The data becomes undefined here" or "The schema changes between these two lines." Then I could tell Claude Code specifically: "The behavior here seems wrong because X." That dramatically improved the odds of getting a correct fix.

It might sound like vibe-coding, but for a non-engineer, being able to isolate where the problem is before handing it off to AI turned out to be a genuine winning strategy.

What I Learned from Building with Claude Code

You Can Build as a Parent with Limited Time

Before Claude Code, I'd start a side project, run out of time, drop it, start again months later having forgotten everything... and never actually ship anything. The cycle repeated many times.

With Claude Code, as long as I have a clear enough mental image of what I want to build and can write it up as a spec, the implementation moves forward on its own.

Including the rebuild phase, it took 8 months โ€” but getting to Beta with just 90 minutes a day of work was a huge deal for me personally. For years I'd convinced myself that "there's nothing I can do during the parenting years; I just have to accept falling behind." This project proved that wrong.

Non-Engineers Still Need to Understand Development Fundamentals

The main reason it took 8 months despite using Claude Code was that I lacked the foundational knowledge engineers take for granted โ€” things like application architecture, design patterns, and the conceptual building blocks behind good software.

(Singleton patterns, storing operation steps as state, etc.)

Claude Code is genuinely impressive, and I do believe we're in an era where non-engineers can ship real tools. But having even a basic understanding of architecture and design patterns saves enormous amounts of time and leads to far better outcomes.

Skills and Hooks Are Worth the Investment

I underutilized Skills early on, but gradually building them out made a real difference:

Skill Purpose
build-in-public Generate X posts (#BuildInPublic) from git commits
claude-md-organizer Prevent CLAUDE.md bloat (move completed specs to docs/)
documentation-update Update docs after implementation changes (explicit trigger only)
duckdb-singleton-safe DuckDB connection operations, _setThrew error prevention
duckdb-sql-standards SQL query construction, column name escaping
e2e-scenario-creator E2E test scenario generation
e2e-test-fixer Structured E2E failure diagnosis and repair (4 phases, max 3 iterations)
security-audit-api-security API auth, rate limiting, CSRF audit
security-audit-data-exposure Data leakage audit (logs, error messages)
security-audit-dependency Dependency vulnerability audit
security-audit-headers Security header audit
security-audit-input-validation Input validation vulnerability detection (ReDoS, file size)
security-audit-sql-injection SQL injection vulnerability detection
security-audit-xss XSS vulnerability detection
security-vulnerability-checker Full app security audit (OWASP Top 10)
tailwind-ui-patterns UI component creation (buttons, tables, etc.)
test-first-bug-fix TDD bug fixing (reproduce โ†’ fix โ†’ verify loop)

These aren't a substitute for a professional security audit, but they're guardrails that reduce the risk of AI-introduced vulnerabilities โ€” non-engineer style.

Closing Thoughts: AI Isn't Magic, But It Does Expand What's Possible

When I started, I wasn't sure how production-ready something built this way could actually be. Claude Code turned out to be more capable than I imagined, and it genuinely expanded what I thought I could build.

The biggest benefit is the ability to move forward on things I "kind of knew about but couldn't actually do myself." The more clearly you can picture what you want, the better the output.

I've barely reached Beta โ€” there's still a lot of marketing work ahead. But if any of this resonates with you, I'd love for you to try LeapRows and send me feedback. It would genuinely make my day. ๐Ÿ™

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.