DEV Community

Eleanor Brooks
Eleanor Brooks

Posted on

How to Convert PDF, XLS, and HTML Invoices to CSV Using Python and SERPSpur API

Dealing with invoice data from different formats can be a pain. I've been using the SERPSpur Invoice to CSV Converter to handle PDF, XLS, and HTML invoices in bulk. Here's a Python wrapper I built around it:

python
import requests
import pandas as pd

API_KEY = "your_api_key_here"

def convert_invoices(file_paths):
results = []
for path in file_paths:
with open(path, 'rb') as f:
response = requests.post(
f"Bearer {API_KEY}"},
files={"file": f},
params={"output_format": "csv"}
)
if response.status_code == 200:
results.append(response.text)
return results

Example usage

csv_data = convert_invoices(["invoice1.pdf", "invoice2.xlsx"])
for i, csv in enumerate(csv_data):
print(f"Invoice {i+1} converted successfully")
# Optional: parse with pandas
df = pd.read_csv(pd.StringIO(csv))
print(df.head())

This has streamlined my accounting workflow significantly. What's your go-to method for processing invoice data? https://serpspur.com

Top comments (3)

Collapse
 
6d94c35eb04ca profile image
Sophia

Nice approach! I've been handling invoice parsing with a mix of OCR and regex, but the bulk conversion with this API is tempting. Do you find the HTML invoices preserve all the table structures accurately, or do you need to tweak the CSV output often?

Collapse
 
micheljee profile image
Michel Jee

Great script! I've been manually exporting to CSV through various tools, but this bulk approach is much smarter. Do you handle error cases like corrupted PDFs or missing fields gracefully in your conversion pipeline?

Collapse
 
dylan_parker123 profile image
Dylan Parker

Great script! I've been manually exporting to CSV through various tools, but this bulk approach is much smarter. Do you handle error cases like corrupted PDFs or missing fields gracefully in your conversion pipeline?