raphiki for Technology at Worldline

Posted on Jan 23 • Edited on Jan 24

Vibe Coding One Page at a Time

#vibecoding #gemini #pdf #ocr

Building a Smart Magazine Archiver

I’m starting a new series called "Vibe Coding one Step at a Time." The goal? To document the raw, messy, and surprisingly efficient process of building software in the age of AI. We’re not here to write perfect specs or obsess over UML diagrams (well, not yet). We’re here to vibe with the code, iterating on pure intent until the machine does exactly what we want.

In this first edition, I’m sharing how I used the Gemini CLI to build a tool I actually needed, learning some pretty cool image processing tricks along the way.

What is "Vibe Coding"?

I’m going to claim this term right here: Vibe Coding.

It’s not "lazy coding." It’s intent-driven development. In the old days, if you wanted to build a script, you had to know the syntax, the libraries, and the edge cases before you even opened your editor. You had to think in code.

Vibe Coding flips that. You think in outcomes. You describe the behavior, the "vibe" of the feature, and the AI handles the implementation details. You act less like a bricklayer and more like a conductor. The feedback loop isn't "Write -> Compile -> Error," it's "Ask -> Observe -> Tweak."

The Use Case: "I Just Want to Read Offline"

Here’s the situation: I subscribe to a fantastic niche magazine (which shall remain nameless to protect the innocent). It’s great, but their "digital reader" is a nightmare. It’s one of those web-based page-turners that requires an active internet connection.

I wanted to read it on my tablet, offline, on a plane, without waiting for high-res JPEGs to buffer.

The Problem: There was no "Download PDF" button.
The Clue: Inspecting the network traffic revealed that the magazine was just serving a sequence of high-quality images, one URL per page.

The Mission: Write a script to fetch these pages and stitch them into a single, high-quality, searchable PDF.

The Process: Galloping Toward Complexity

We didn't sit down and architect a solution. We started small and let the script evolve.

Step 1: The Naive Loop

We started with a simple hypothesis: "The URLs probably just have a page number in them."
I asked Gemini to write a script using requests to hit the URL for page 1, then page 2.
Boom. It worked. We had a directory full of 100 separate JPGs.

Step 2: The Picture Book

Having 100 files is annoying. I wanted a book.
We asked Gemini to "glue these together." It pulled in the PIL (Pillow) library.
Result: A massive PDF. It looked great, but it was dumb. It was just a container of pictures. You couldn't highlight text, search for keywords, or copy-paste quotes.

Step 3: The Search for Meaning (OCR)

This is where the "vibe" got technical. I realized a "picture book" wasn't enough. I needed Optical Character Recognition (OCR).
We decided to use Tesseract. But here’s the catch we discovered:

Human Eyes like soft colors and smooth anti-aliasing.
OCR Engines like harsh contrast, jagged edges, and black-and-white binary inputs.

If we optimized the images for the machine, the magazine looked ugly. If we kept them pretty, the machine couldn't read the text.

The Technical Deep Dive: The "PDF Sandwich"

This is where the magic happened. We ended up building a PDF Sandwich.

Instead of choosing between beauty and brains, we chose both.

The Visual Layer: We keep the original high-res color JPEGs. This is what you see.
The Data Layer: Behind the scenes, we create a "Frankenstein" version of the page—converted to grayscale, contrast cranked up to 2.0, and upscaled 2x using LANCZOS resampling (a fancy algorithm that keeps edges sharp).
The Merge: We feed the Frankenstein images to Tesseract to generate an invisible text layer, then use pypdf to overlay that text exactly on top of the pretty images.

The trickiest part? Math.
Because we upscaled the OCR images by 2x to help Tesseract read small fonts, the invisible text layer was twice as big as the visual page. We had to calculate scale factors to shrink the text back down so that when you highlight a sentence, the highlight actually lines up with the words.

What I Learned

Vibe coding this script taught me more in an hour than I’d usually learn in a weekend of reading docs:

Image Optimization: OCR is picky. Simply resizing an image isn't enough; the method of resizing (resampling filter) matters.
Library Specialization: PIL is for pixels; pypdf is for structure. Trying to do everything in one library is a trap.
The Power of the CLI: Using the Gemini CLI meant I didn't have to context-switch. I stayed in my terminal, describing what I wanted, and the code appeared.

Conclusion

We ended up with a ~100-line Python script that solves a genuine daily frustration. I didn't have to memorize the pypdf documentation or look up the Tesseract CLI flags. I just focused on the goal: "Make it searchable, make it pretty."

That’s Vibe Coding. You bring the vision, the AI brings the syntax, and together you build something cool.

We'll discover in the next episode if this is still true with a more complex use case and a GUI.

DEV Community