A year ago, I wrote about our tech stack and how it helped us run a lean cloud computing startup. Since then, we've scaled over 20x. That kind of g...
For further actions, you may consider blocking this person and/or reporting abuse
Why not have multiple Postgres databases, the hot and cold one for a start? Over time the cold database could be split up by periods. That is what you are doing with the Parquet files, not?
Then the queries wil probably stay the same for the most part, there only needs to be a mechanism to connect to to the right database.
If you are used to handling Postgres files, it seems more risky to convert the data from Postgres to Parquet files, and run them in a database that is less familiar than Postgres.
It also introduced a risk of not fully transporting the Parquet file from S3 in case of a service outage.
Wouldn't suspending one or more cold database servers have a similar cost reduction?
Thanks for the comment, not 100% sure if I understand what youre saying though - the point is also that I pay a lot for storing cold data, fast disks are expensive. S3 is cheap.
The risk of not transporting doesnt really exist I think. I have good retry mechanisms for all error scenarios I could think of. Data is not deleted before I am 100% sure its stored in S3. Even partial failures are reconciled!
If you suspend database servers you still pay for the disks, which isnt exactly what I'd like to do!:D
I assumed data was the main cost, but I wasn't sure.
When I made the server suspension remark, I didn't think you needed to keep paying for the storage. But it makes sense form a hosting company standpoint.
The thing I'm still wondering is why choose DuckDB over Postgres as the cold storage database? Even if you want to keep Parquet files, Postgres can import them too.
I assume not all S3 data will be on the cold storage database server, that would just move the cost of the storage from one server to another.
From the DuckDB documentation I get it works best with large datasets, so how is the performance for the loaded data compared with Postgres?
The main goal of my comment is just trying to understand what made you pick that setup.
So it is a mix of trying to gather my thoughts and having questions.
Postgres might be able to import it with an extension, but duckdb (in memory btw, no extra hosted instance), is literally made for that exact usecase. It takes a lot of work away, like caching, reading only what is really required, etc.
So DuckDB runs on an instance with something else. Wouldn't that interfere with the capacity for the other application?
From my experience memory is more expensive than storage, so that should result is a lower cost reduction I assume?
Also isn't server memory capped? You can add storage, but you can't add more memory.
Great write-up. Loved the honesty around “simple beats clever” — flushing to S3 + DuckDB instead of forcing Postgres to be something it’s not is a solid call.
The EU-first pressure point is also very real once enterprise customers enter the picture.
Super practical lessons, not the usual resume-driven stack flex 👍
Nice AI response
Fair 😄 I used AI to help structure it, but the lessons are from real scaling pain. Simple beats clever for a reason.
Excellent lessons, and thanks for writing this, Jonas!
I particularly resonated with the idea that stack choices should help scale people, not just traffic.
In my experience, prioritizing team readability and cognitive load has prevented far more headaches than focusing solely on performance metrics.
If scaling 20× revealed anything, it’s that people and processes matter just as much as technology.
Do you have any go-to practices for evaluating team cognitive load when choosing new tools?
Nice AI response
How do you manage Networking? Do you use ZFS for backups?
A lot of wireguard and ZFS!
Very nice.
Indeed sir.
Thanks!
Solid retrospective. It's always helpful to see real-world examples of scaling pains and solutions.
Nice work!
Thanks!