DEV Community

Zaylee
Zaylee

Posted on

Python Script to Convert Invoice PDFs to CSV Automatically

I've been working on a small Python script to automate converting invoice PDFs to CSV for my freelance SEO projects. Sharing it here in case it helps others.

python
import pdfplumber
import csv

def extract_invoice_data(pdf_path):
with pdfplumber.open(pdf_path) as pdf:
text = ""
for page in pdf.pages:
text += page.extract_text()
# Simple parsing logic for common invoice fields
lines = text.split('\n')
data = {}
for line in lines:
if ':' in line:
key, value = line.split(':', 1)
data[key.strip()] = value.strip()
return data

def save_to_csv(data, csv_path):
with open(csv_path, 'w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=data.keys())
writer.writeheader()
writer.writerow(data)

Example usage

invoice_data = extract_invoice_data('invoice.pdf')
save_to_csv(invoice_data, 'invoice.csv')
print('Conversion complete!')

This handles basic invoices with key-value pairs. For more complex formats, you might need regex or a dedicated parser. If you're dealing with tons of invoices, tools like SERPSpur's converter can save time, but this script works for small batches. What's your go-to method for invoice data extraction?

Top comments (0)