Your data pipeline gets CSV from one team, JSON from another, XML from a legacy system. Here's how to handle all three without writing a single parser.
The Conversion Matrix
Most data pipelines need these transformations:
CSV → JSON (most common: spreadsheet data into APIs)
JSON → CSV (exporting for Excel/Sheets users)
XML → JSON (legacy systems, SOAP services)
JSON → XML (enterprise integrations)
CSV → XML (rare but exists)
API-First Approach
import requests
# CSV to JSON
csv_data = """name,age,email
Alice,30,alice@example.com
Bob,25,bob@example.com"""
resp = requests.post("https://api.lazy-mac.com/data-transform/convert", json={
"from": "csv",
"to": "json",
"data": csv_data,
"options": {
"header_row": True,
"infer_types": True # age becomes integer, not string
}
})
result = resp.json()
# {"data": [{"name": "Alice", "age": 30, "email": "alice@example.com"}, ...]}
Handling XML from Legacy Systems
xml_data = """<?xml version="1.0"?>
<users>
<user id="1">
<name>Alice</name>
<email>alice@example.com</email>
</user>
</users>"""
resp = requests.post("https://api.lazy-mac.com/data-transform/convert", json={
"from": "xml",
"to": "json",
"data": xml_data,
"options": {
"flatten_attributes": True, # id attribute becomes a field
"array_paths": ["users.user"] # Always treat as array even with 1 item
}
})
# {"users": {"user": [{"id": "1", "name": "Alice", "email": "alice@example.com"}]}}
ETL Pipeline Example
def etl_pipeline(source_url: str, target_format: str):
# Fetch data (could be CSV export from Salesforce, XML from ERP, etc.)
raw = requests.get(source_url).text
source_format = detect_format(raw) # "csv", "json", "xml"
# Transform
converted = requests.post("https://api.lazy-mac.com/data-transform/convert", json={
"from": source_format,
"to": target_format,
"data": raw
}).json()
return converted['data']
Top comments (0)