DuckDB vs SQLite: Which one is better?

#duckdb #sqlite #iot #dataengineering

I spend most of my time somewhere between microcontrollers and dashboards. One week I'm squeezing firmware onto a device with barely any memory to spare, the next I'm staring at a few million sensor readings trying to work out why a gateway in the field keeps misbehaving. So when people line up "DuckDB vs SQLite" as a fight to the death, I always want to gently jump in.

They're both small. They both run inside your own program instead of off on some server. They both let you work with a single file using SQL. On paper that makes them sound like rivals. But the more I've used them — across embedded work, edge devices, and plain old data crunching — the more they feel like two neighbours who happen to do completely different jobs.

Let me walk through what I mean, starting from the very beginning.

First, what "embedded" actually means here

When people hear the word "database," they often picture something big and separate: a program running on its own server somewhere that your app has to connect to over the network, with a username and password, before it can read or write anything. PostgreSQL and MySQL work like that. There's nothing wrong with it — it's how most large web apps run — but it's a lot of moving parts.

SQLite and DuckDB throw all of that out. There's no separate program to start. Nothing to log into. No server quietly running in the background. The whole database is just an ordinary file sitting on your disk, and the database engine is a small library your code loads in directly. You point it at the file, you run your queries, and that's the whole setup. Nothing else to install or babysit.

That shared simplicity is the lovely part, and it's exactly why you find these two in places a big server-based database could never go — phones, web browsers, tiny sensors, a quick script on your laptop. But the moment you look at how each one stores your data and works through it, they head off in opposite directions. That difference is the whole story, so let's go there next.

SQLite: the one that's already everywhere

SQLite stores data in rows. Picture a spreadsheet where each row is one complete record, and all the values for that record sit together: the id, the name, the temperature reading, all in one place. That layout is perfect when you're constantly poking at individual records — add this new reading, update that setting, grab the latest entry for device 42. You want the whole record at once, and SQLite hands it to you fast.

A few things I love about it:

It's tiny. The whole engine is around a megabyte. On a small device, that matters enormously.
It's everywhere, and I mean everywhere. There are tens of billions of SQLite files in active use — it's running on basically every phone, and your web browser uses it right now to store your history and settings. It has earned the right to be boring.
It's astonishingly reliable. It's one of the most thoroughly tested pieces of software on the planet. The US Library of Congress even recommends SQLite as a format for preserving digital files long-term, because they trust it'll still open decades from now. When you're shipping a device that has to run untouched in a cabinet for years, that track record buys you a lot of peace of mind.

In my embedded work, SQLite is the default for anything that holds state — the current situation the device needs to remember. Its settings. A queue of readings waiting to be uploaded. A small log of recent events. It handles a steady stream of small writes gracefully and it doesn't ask for resources the hardware doesn't have.

Where it starts to struggle is heavy number-crunching across huge piles of data. Ask SQLite to add up and average ten million rows and it'll get there — but slowly, because it reads through the data row by row, dragging along every column even when your question only touches one of them. That's not a flaw. It simply wasn't built for that job.

DuckDB: the one that makes big analysis feel easy

DuckDB flips the storage layout around. Instead of keeping each row together, it keeps each column together — all the temperatures in one place, all the device names in another. So when you ask "what's the average temperature across ten million readings," it reads just the temperature column and skips everything else. On top of that it processes data in big batches rather than one row at a time, which modern processors are very good at. The result is that the heavy questions that make SQLite sweat come back from DuckDB before you've finished a sip of coffee.

There's another part that genuinely changed how I work, and it has nothing to do with speed. DuckDB will read your data files directly, right where they sit:

SELECT device_id, avg(temperature)
FROM 'readings/*.parquet'
GROUP BY device_id;

No loading step. No importing anything into a table first. You point it at a folder of files — CSV, JSON, or Parquet (a compact file format that's common for this kind of data) — and it just reads them. For someone who regularly gets handed a pile of sensor dumps, that's the difference between "give me an afternoon to set up a pipeline" and "give me thirty seconds."

It also handles data bigger than your computer's memory by spilling the overflow to disk, so a dataset that would crash a normal in-memory tool just... works.

It's newer than SQLite and the engine is a bit chunkier, but it's still the same idea at heart: a file and a small library, nothing to run separately.

Are these actually used in the real world?

Both, heavily — this isn't a case of betting on something obscure.

SQLite is the most widely deployed database engine that exists, full stop. The tens of billions of copies in daily use make it more common than every other database combined. It's not going anywhere.

DuckDB is younger but its adoption has shot up. It's pulling in around 37 million downloads a month on Python's package index, it's MIT licensed and free, and it has real commercial backing behind it (a company called MotherDuck builds a cloud service on top while keeping the core engine open and free), which answers the usual worry about whether an open-source tool will still be maintained in five years. It also fits neatly with where the industry is heading: open file formats, modern multi-core processors, and even AI coding assistants, which tend to be good at writing SQL and so reach for DuckDB naturally.

And SQLite isn't standing still either. Newer spin-offs like Turso/libSQL are adding things like replication and edge support on top of the classic engine. Both of these tools are safe bets.

So when does each one actually win?

Here's how it shakes out across the three places I work.

On small, constrained hardware, SQLite, almost every time. DuckDB's appetite for memory and processing power during a big query is more than a tiny chip wants to give. On a larger edge device running Linux it's a different story, but down at the small end, SQLite's tiny size and long history are hard to argue with.

At the edge — meaning the gateway boxes that sit between your sensors and the cloud — they actually team up. A pattern I keep coming back to: let SQLite handle the incoming readings on the device, quietly buffering them as they arrive, then let DuckDB do the local number-crunching before anything gets sent upstream. You ship neat summaries instead of the raw firehose, which is kinder to both your bandwidth bill and your cloud costs. They're not competing here. They're a relay team.

For sitting down and analysing data, DuckDB is the one I reach for. Being able to throw SQL at a folder of files without setting up any heavy machinery is exactly the kind of low-fuss tool the job usually calls for. It has quietly become a go-to for local analysis, and for good reason.

The one line I keep in my head

If I had to shrink all of this down to a fridge magnet:

SQLite is for managing data while it's being created and changed. DuckDB is for analysing it once it's all piled up.

Same small, no-server spirit — opposite ends of the data's life. SQLite looks after the data as it's coming in and changing. DuckDB shows up later to make sense of the whole pile. Once that clicked for me, the "versus" framing kind of fell apart.

So if you're choosing between them, the honest answer is often both, for different things. Work out whether the job in front of you is about handling data as it changes or making sense of a big pile of it, and the choice mostly makes itself.

If you've wired these two together in your own projects, I'd love to hear how you split the work. I'm always tinkering with my own setup, and the edge folks always seem to have the most interesting war stories.