DEV Community

Cover image for How I Automated the Transcription of 91 Instagram (screenshots) Poems Using Python + LLM Vision (Real Case Study)
WizSebastian
WizSebastian

Posted on

How I Automated the Transcription of 91 Instagram (screenshots) Poems Using Python + LLM Vision (Real Case Study)

✨ The Story Begins

There is always that one message from a friend that starts like this:

“Hey… I do not think I can make it today because something came up.”

My friend Mariana, a community manager for a local Instagram poet, had a problem.

Her client publishes several poems every night as images. These images have handwritten text, stylized filters, shadows, messy backgrounds and artistic layouts. After a few months of doing this, all the poems were saved inside a Highlight titled “Poetry Vault.”

It looks cute on Instagram.
It is a nightmare if you have to transcribe everything manually.

Mariana had 91 images waiting for her.
Ninety-one.

Her brain in that moment looked like this:

Cartoon of an overwhelmed brain surrounded by chaotic icons, representing mental overload

She told me she could not help me with something we had planned. She had to spend the entire day typing.

So I said the sentence every developer eventually says:

“We can automate this.”

🚀 Why Automate This?

This situation is the perfect reminder of something developers often forget.
• We have tools that can save humans hours or even days.
• LLMs can extract text from images with surprisingly high accuracy.
• A simple script can eliminate repetitive manual work.
• Some of the most meaningful automations start with helping a friend.

She thought she would lose an entire day.
Instead, she finished everything in minutes and still joined me later.

🧠 System Overview

Diagram using Mermaid

🛠️ The Python Script

This is the simplified version of the script I used.
The API key is removed for safety.
It scans a folder, sends each image to OpenAI Vision, gets the extracted text, and stores it in a Markdown file.

import os
import base64
import json
import time
import shutil
import requests
from tqdm import tqdm

OPENAI_API_KEY = "YOUR_API_KEY_HERE"

BASE_FOLDER = "/path/to/folder"
IMAGES_FOLDER = os.path.join(BASE_FOLDER, "images")
MD_FOLDER = os.path.join(BASE_FOLDER, "md_results")

os.makedirs(MD_FOLDER, exist_ok=True)

def encode_image(path):
    with open(path, "rb") as img:
        return base64.b64encode(img.read()).decode("utf-8")

def extract_text(path):
    b64 = encode_image(path)
    payload = {
        "model": "gpt-4o",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Extract the visible text from this image. Only return the text."},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64}"}}
                ],
            }
        ],
        "max_tokens": 1000,
    }

    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": f"Bearer {OPENAI_API_KEY}"},
        json=payload
    )

    return response.json()["choices"][0]["message"]["content"]

def save_md(filename, text):
    md_name = filename.replace(".jpg", ".md")
    with open(os.path.join(MD_FOLDER, md_name), "w", encoding="utf-8") as f:
        f.write(f"# Extracted Text from {filename}\n\n{text}")

def run():
    images = [f for f in os.listdir(IMAGES_FOLDER) if f.lower().endswith((".jpg", ".png"))]

    for img in tqdm(images, desc="Processing images"):
        text = extract_text(os.path.join(IMAGES_FOLDER, img))
        save_md(img, text)

if __name__ == "__main__":
    run()
Enter fullscreen mode Exit fullscreen mode

CODE COMPLETE HERE

https://github.com/wizsebastian/gpt4o_batch_image_text_extractor

📈 Results

What would take hours by hand was completed in minutes.

Mariana delivered all 91 transcriptions.
Her client was happy.
She saved her entire day.
We still went to our event.

She now thinks I am some kind of wizard.

this is how the outputs look and yes all the poems are in spanish.

Illustration of a wizard using a laptop, symbolizing a developer automating work with code

🎯 What You Can Build From This

Once you understand this workflow, you can automate many tasks.
• Invoice OCR
• Extracting text from screenshots
• Digitizing handwritten notebooks
• Archiving social media text
• Creating searchable databases
• Processing large folders from clients

Automation does not have to be huge or complex.
Helping one person is already a meaningful win.

📚 Want a Part 2?

I can prepare a follow-up covering:
• Rate limit handling
• Retry logic
• Better extraction for low quality images
• Async batch processing for higher speed
• Exporting to PDF, CSV, JSON, TXT
• Building a small web interface
• Combining Vision LLM with Tesseract OCR

❤️ Final Thought

AI does not replace people. AI frees people.

Sometimes the best use of your skills is saving a friend from manually typing 91 poems.

Yes, in the end we were able to go out and have fun all together, thanks AI.

Illustration of friends having fun together, representing the free time gained thanks to automation

Top comments (0)