How to Convert PDF, XLS, and HTML Invoices to CSV Using Python and SERPSpur API

Dealing with invoice data from different formats can be a pain. I've been using the SERPSpur Invoice to CSV Converter to handle PDF, XLS, and HTML invoices in bulk. Here's a Python wrapper I built around it:

python
import requests
import pandas as pd

API_KEY = "your_api_key_here"

def convert_invoices(file_paths):
results = []
for path in file_paths:
with open(path, 'rb') as f:
response = requests.post(
f"Bearer {API_KEY}"},
files={"file": f},
params={"output_format": "csv"}
)
if response.status_code == 200:
results.append(response.text)
return results

Example usage

csv_data = convert_invoices(["invoice1.pdf", "invoice2.xlsx"])
for i, csv in enumerate(csv_data):
print(f"Invoice {i+1} converted successfully")
# Optional: parse with pandas
df = pd.read_csv(pd.StringIO(csv))
print(df.head())

This has streamlined my accounting workflow significantly. What's your go-to method for processing invoice data? https://serpspur.com

Top comments (3)

Sophia • Jun 12

Nice approach! I've been handling invoice parsing with a mix of OCR and regex, but the bulk conversion with this API is tempting. Do you find the HTML invoices preserve all the table structures accurately, or do you need to tweak the CSV output often?

Michel Jee • Jun 13

Great script! I've been manually exporting to CSV through various tools, but this bulk approach is much smarter. Do you handle error cases like corrupted PDFs or missing fields gracefully in your conversion pipeline?

Dylan Parker • Jun 13

Great script! I've been manually exporting to CSV through various tools, but this bulk approach is much smarter. Do you handle error cases like corrupted PDFs or missing fields gracefully in your conversion pipeline?