The DuckDB Pattern When SQLite Is Not Enough but Postgres Is Overkill

#ai #productivity #claudecode #automation

DuckDB scans 10M rows in under a second on a laptop
SQLite for transactions, DuckDB for analytics, no server needed
Query CSV and Parquet files directly without an import step
I do not use it for live writes or concurrent users

I had a 4GB CSV of order data and SQLite choked on the aggregation query for 40 seconds. I loaded the same file into DuckDB and got the answer in 0.8 seconds. That was the moment I stopped treating database choice as a two-option problem. There is a missing middle, and it is DuckDB.

The Gap Between SQLite And Postgres Nobody Talks About

Most solo developers run the same mental script. Small project, use SQLite. App gets bigger, move to Postgres. The jump feels natural because those are the two databases everyone names.

The problem is that the trigger for moving is usually wrong. People assume they need Postgres when their app gets slow. But slow often means one specific thing: an analytical query running on a row-based store. SQLite stores data row by row. When you ask it to scan 5 million rows and sum one column, it reads every byte of every row even though you only wanted one field. That is the bottleneck, not the engine being weak.

Postgres handles this better but brings a server, a connection pool, a user table, a backup strategy, and a process you have to keep alive. For a solo app that runs on one box, that is a lot of operational weight to carry for what is essentially a reporting feature.

DuckDB fills the gap. It is columnar, so it reads only the columns your query touches. It is embedded like SQLite, so there is no server to run. It lives in a single file or in memory. You install it with one line and import it like any library. No daemon, no port, no credentials.

Here are the numbers from my own work. A dashboard query that grouped 8 million event rows by day took 31 seconds in SQLite. The same query in DuckDB returned in 1.2 seconds. I did not change my schema. I did not add indexes. I just used a database built for the shape of that question.

The mental model I settled on is simple. SQLite is for the state of my app: orders, users, sessions, the rows I write one at a time. DuckDB is for the questions I ask about that state: totals, trends, joins across big tables, exports. They are not competitors. They sit side by side, and the file format makes that easy.

If you want the deeper context on when row stores are genuinely fine, see SQLite Is Enough. I leaned on that thinking before DuckDB even entered the picture.

Reading Files Directly Is The Killer Feature

The thing that made DuckDB stick for me was not raw speed. It was that I can query a file without importing it first.

In SQLite, working with a CSV means a CREATE TABLE statement, a column type guess, an import step, and then a query. Three actions before I learn anything. In DuckDB I write one line:


SELECT country, count(*) FROM 'orders.csv' GROUP BY country;

That runs against the file on disk. No table, no schema, no import. DuckDB sniffs the types automatically and gives me the answer. For a Parquet file it is the same pattern, and Parquet is where the speed gets ridiculous because the format is already columnar and compressed.

I use this constantly for one-off questions. A vendor sends me a 200MB export. I do not want a permanent table for it. I want to know how many rows have a missing field, what the date range is, and whether two files share the same IDs. DuckDB answers all three before I would have finished writing the import script.

It reads multiple files at once too. Point a query at a glob like 'logs/*.parquet' and DuckDB treats the whole folder as one table. I had 90 daily export files and ran a single GROUP BY across all of them. It read the lot in under 4 seconds. No concatenation step, no loop, no temp table.

The join story is the same. I can join a Parquet file to a CSV to a table inside the DuckDB file, all in one query. The optimizer handles the heavy lifting. I once joined a 12 million row event file to a 50,000 row lookup CSV and got results in 2 seconds. Doing that in a scripting language with dictionaries would have meant loading both into memory and writing the merge by hand.

This file-first behavior changed how I prototype. I skip the import phase entirely during exploration. When a query proves useful, I save its result as a Parquet file and it becomes the input for the next step. The whole pipeline is just files and SQL.

For image and asset metadata work this matters more than you would expect. When I generate batches of assets with Magnific, the run logs land as CSVs and I query them straight from disk to find which settings produced the keepers.

Where DuckDB Earns Its Place In My Stack

I reach for DuckDB in four situations now, and each one used to be either slow or annoying.

The first is dashboards. Any internal page that shows totals, trends, or breakdowns runs against DuckDB. I keep the live app data in SQLite, then on a schedule I dump the relevant tables to a Parquet file. The dashboard queries hit Parquet. Page loads that used to take 6 seconds now take under one. Users writing to the app never touch the analytics path, so neither side blocks the other.

The second is data cleaning. Before any data goes into a permanent table I run it through DuckDB to check for duplicates, null clusters, and bad date formats. The window functions and samples I need are all there. A typical check across a 3 million row file finishes in well under 2 seconds, which means I actually run it instead of skipping it.

The third is exports. When someone needs a report as CSV or Excel, I write the query in DuckDB and use COPY to write the file. One statement produces the output. No pandas, no manual writing loop, no encoding surprises.

The fourth is joining data that lives in different formats. I keep API responses as JSON, logs as Parquet, and reference data in CSV. DuckDB queries all three together. It reads JSON files natively, including nested structures, so I can pull a field three levels deep without a parsing step.

What ties these together is that none of them need a write-heavy database. They are all read questions over data that already exists. That is exactly the workload columnar engines are built for.

I should be honest about the operational cost, which is close to zero. DuckDB is a single dependency. There is no process to monitor, no port to secure, no upgrade migration to fear. When I deploy to a fresh server it just works because it is a library, not a service. That keeps my Shopify backend reporting jobs simple, since I do not need a second database server alongside the app.

The discipline here is to keep DuckDB read-only in production. I treat the Parquet files as a refreshed snapshot, not as live state. That single rule prevents most of the trouble people hit.

Where I Do Not Use It And Why That Matters

DuckDB is not a general purpose application database, and pretending it is will hurt you. I learned the edges by hitting them.

The first edge is concurrent writes. DuckDB is built for one writer at a time. If your app has many users writing rows constantly, DuckDB is the wrong tool. Two processes trying to write the same file will fight, and the loser gets locked out. This is the opposite of what an OLTP database is designed for. For that workload I stay on SQLite, which handles many readers and serialized writes cleanly, or I move up to Postgres if I genuinely need many concurrent writers across machines.

The second edge is small frequent transactions. If your app inserts one row every time a user clicks something, DuckDB is not built for that rhythm. Columnar stores want bulk loads, not single-row drips. I keep that traffic in SQLite and batch it into DuckDB later if I need analytics on it.

The third edge is anything needing strict transactional guarantees across multiple connected clients. Order processing, inventory locking, payment state. That belongs in a real transactional store. DuckDB will not give you the isolation guarantees you want when money is involved.

The fourth edge is when the data is genuinely small. If your whole table is 20,000 rows, SQLite answers every query instantly and DuckDB buys you nothing. The columnar advantage only shows up at scale. Below a million rows or so the difference is noise. Do not add a dependency to solve a problem you do not have.

The honest summary is that DuckDB is an analytics engine that happens to be embedded. The moment your need is "store the truth of my application and let many people change it safely," you are outside its lane. The moment your need is "answer hard questions about a large pile of data on one machine," you are right in the middle of it.

This is the same judgement call I apply to every tool. Match the engine to the shape of the question, not to the size of the project. If you want the framework I use for deciding which tool goes where, the Claude Blueprint lays out how I keep a solo stack small without painting myself into a corner.

Bottom Line

DuckDB solved a problem I did not know was a problem. I had been treating slow analytics as a signal to move to Postgres, when the real signal was that I was asking columnar questions of a row store. The fix was a library, not a server.

My rule now is three databases for three jobs. SQLite holds the live state my app writes to. DuckDB answers the analytical questions over snapshots of that state. Postgres only enters when I truly need concurrent writers across machines, which is rarer than I used to assume.

The file-first design is what makes it worth adopting. Querying a CSV or Parquet file directly removes the import step that kills exploration. One line of SQL against a folder of files beats a loading script every time.

Start small. Take your slowest dashboard query, dump the table to Parquet, and run the same query in DuckDB. If it goes from seconds to milliseconds, you found your missing middle. If it does not, your data was small enough that SQLite was already fine, and you just saved yourself a dependency.

This article contains affiliate links. If you sign up through them, I may earn a small commission at no extra cost to you. (Ad)