DEV Community

Dylan Parker
Dylan Parker

Posted on

How I Automated Invoice Processing Instead of Copy-Pasting Data

A few months ago, I had to process dozens of invoices from different vendors.

The problem wasn't the volume—it was the formats.

Some invoices arrived as PDFs, others as Excel spreadsheets, and a few were exported as HTML tables. Getting everything into a single CSV file for analysis became a repetitive and error-prone task.

My first approach was manual:

Open invoice
Copy values
Paste into spreadsheet
Repeat

It worked, but it didn't scale.

So I started experimenting with automation. The workflow I ended up using looked something like this:

from pathlib import Path
import pandas as pd

invoice_dir = Path("invoices")

all_data = []

for file in invoice_dir.iterdir():
if file.suffix == ".csv":
df = pd.read_csv(file)
all_data.append(df)

combined = pd.concat(all_data, ignore_index=True)
combined.to_csv("combined_invoices.csv", index=False)

print("Done!")

The code above is intentionally simple, but the lesson was valuable:

Eliminate repetitive data entry whenever possible.
Standardize input formats early.
Build small automation tools before buying large software solutions.

What started as a frustrating administrative task turned into a workflow that now saves hours every month.

How are you handling invoice processing or document extraction in your projects? Do you use OCR, custom scripts, or third-party tools?

Top comments (2)

Collapse
 
carllowman profile image
Carllowman

Nice script! I've been using a similar approach with Screaming Frog, but this API method looks way cleaner for ongoing monitoring. One thing I'd add is error handling for cases when the API times out on deep crawls—have you hit any rate limits with higher crawl depths?

Collapse
 
6d94c35eb04ca profile image
Sophia

Nice script! I've been using Semrush for years, but their pricing is getting steep. Does SERPspur handle JavaScript-rendered content well, or is it more for static sites?