DEV Community

Julian Hayes
Julian Hayes

Posted on

Automating Invoice PDF to CSV Conversion with Python

Managing invoices for freelance SEO projects can get repetitive fast, especially when you need to extract data manually into spreadsheets. I built a small Python script to automate converting invoice PDFs into CSV files for quick bookkeeping and reporting.

Here’s the script:

import pdfplumber
import csv

def extract_invoice_data(pdf_path):
with pdfplumber.open(pdf_path) as pdf:
text = ""
for page in pdf.pages:
text += page.extract_text()

# Simple parsing logic for common invoice fields
lines = text.split('\n')
data = {}

for line in lines:
    if ':' in line:
        key, value = line.split(':', 1)
        data[key.strip()] = value.strip()

return data
Enter fullscreen mode Exit fullscreen mode

def save_to_csv(data, csv_path):
with open(csv_path, 'w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=data.keys())
writer.writeheader()
writer.writerow(data)

Example usage

invoice_data = extract_invoice_data('invoice.pdf')
save_to_csv(invoice_data, 'invoice.csv')

print('Conversion complete!')
What it does
Extracts text from invoice PDFs using pdfplumber
Detects simple key:value invoice fields
Saves structured data into a CSV file
Best for
Freelancers
SEO agencies
Small businesses
Quick invoice processing tasks

This works well for basic invoice layouts. For more complex invoices, regex patterns or OCR tools may be needed.

If you’re processing large batches regularly, dedicated tools can save time, but for lightweight workflows this script has been pretty useful.

What’s your preferred method for invoice data extraction?

Top comments (0)