DEV Community

Amichai Berger
Amichai Berger

Posted on

Why I Built DataForge — and How I Automated Excel-to-PDF Generation with Python

A few months ago, I had to build a custom PDF filler for a client project.
Users needed to upload an Excel sheet, fill in hundreds of PDFs, and download the results.

I looked everywhere for a tool that could just do this — upload Excel → map fields → generate PDFs — and somehow, nothing really fit, dont get me wrong there are things out there from mailmerge to add ons but none of them solved all my issues.
So… I built one myself.

That’s how DataForge was born.

side note - this is my very first project end to end - development, buying a domain, hosting(google cloud), DB(neon).

🚀 What DataForge Does

DataForge takes an Excel sheet and maps its columns to fields in a PDF form.
Once you define the mapping, it can generate:
🧾 One continuous merged PDF, or
📂 Separate PDFs for each row
You can even set custom file naming rules and send the generated PDFs by email directly — all from your browser.
No scripts, no Zapier chains, no “copy this Python snippet” tutorials.
Just upload → map → done.

⚙️ The Tech Behind It

The stack is simple, fast, and fully serverless:
Frontend: Next.js on Vercel
Backend: FastAPI on Google Cloud Run
Storage: Amazon s3 really should also be Google Cloud Storage will get to that eventually
PDF handling: Python’s pypdf and reportlab for mapping and rendering

Mapping Excel data to a PDF was trickier than I expected.
So I built a visual mapper — when you upload your PDF, the app renders it in-browser and shows you on the side all of your excel coloumns this way all you need to do is drag the coloumn unto the PDF and it will assign the Excel column.

On the backend, DataForge parses the Excel file into a pandas DataFrame, iterates through rows, fills the corresponding fields, and streams the output back to the user — either merged or separated.

Cloud Run handles the scaling, and with some smart caching, cold starts are almost invisible.

🧩 The Real Challenges

To be honest, I didn’t even know where to start — there were so many challenges.
The biggest question of all was hosting: where, why, and how much?

I actually started on an Amazon EC2 instance, but that quickly ended up costing around $7/month — not a lot, but too much for something idle most of the time. I debated outsourcing it, but scaling would’ve become a problem down the line.

Eventually, I Dockerized the whole thing and deployed it on Google Cloud Run. It scales to zero when unused, spins up to 10 containers when needed, and just feels clean. But that came after weeks of trying out Render, Railway, and a few others before finally landing on what worked. now it costs a couple of pennys and is included in the free teir.

For the database, it was a no-brainer — I wanted a free tier and something lightweight. No Elastic or heavy managed services. Neon seemed like the obvious choice. It scales down to zero, and if we ever need more, it can scale up seamlessly (and then we’ll gladly pay).

Then came cold starts — the inevitable tradeoff when you choose “scale-to-zero” for cost reasons. Since my frontend is static and behind a CDN, it loads instantly. So as soon as the page loads, it pings the backend like an alarm clock — waking it up while the user is still looking at the homepage. By the time they log in, everything’s ready (most of the time 😅).

More cold starts... When I first began, I discovered my Docker registry was in one region and my Cloud Run service in another — which made the cold starts even colder. Lesson learned.

And then there was payment processing. Sadly, Stripe doesn’t support my region, so I had to go with a local provider. I cycled through a few before finding one that was good enough to actually work reliably.

would love to get some honest feedback well maybe not to honest. Thanks if you made it this far check it out https://www.forgetp.com/

Top comments (0)