agenthustler

Posted on Mar 27

How to Build a Shipping Rate Aggregator with Carrier API Scraping

#python #tutorial #webdev #programming

Shipping costs can vary wildly between carriers. Building a rate aggregator that scrapes UPS, FedEx, USPS, and DHL pricing pages helps e-commerce sellers find the cheapest option instantly. In this guide, we'll build one with Python.

Why Aggregate Shipping Rates?

E-commerce businesses lose thousands annually by defaulting to a single carrier. A rate aggregator compares real-time prices across carriers for every shipment, automatically selecting the cheapest option by weight, dimensions, and destination.

Architecture Overview

Our aggregator has three components:

Rate scrapers for each carrier's public rate calculator
A normalization layer to standardize pricing formats
A comparison API that returns the best rate

Setting Up the Scraper

First, install dependencies:

pip install requests beautifulsoup4 pandas

We'll use ScraperAPI to handle JavaScript rendering and anti-bot protections on carrier websites:

import requests
from bs4 import BeautifulSoup
import json

SCRAPER_API_KEY = "YOUR_SCRAPERAPI_KEY"

def get_page(url):
    """Fetch page content through ScraperAPI proxy."""
    params = {
        "api_key": SCRAPER_API_KEY,
        "url": url,
        "render": "true"
    }
    response = requests.get(
        "http://api.scraperapi.com",
        params=params,
        timeout=60
    )
    return BeautifulSoup(response.text, "html.parser")

def scrape_usps_rates(weight_oz, origin_zip, dest_zip):
    """Scrape USPS retail rate calculator."""
    url = (
        f"https://postcalc.usps.com/Calculator/"
        f"GetMailServices?oz={weight_oz}"
        f"&origin={origin_zip}&destination={dest_zip}"
    )
    soup = get_page(url)
    rates = []
    for row in soup.select(".mail-service-row"):
        service = row.select_one(".service-name").text.strip()
        price = row.select_one(".rate").text.strip()
        rates.append({
            "carrier": "USPS",
            "service": service,
            "price": float(price.replace("$", ""))
        })
    return rates

Building the Comparison Engine

Now let's normalize rates from multiple carriers and find the best deal:

import pandas as pd
from concurrent.futures import ThreadPoolExecutor

def get_all_rates(weight, origin, destination):
    """Fetch rates from all carriers concurrently."""
    scrapers = [
        ("USPS", scrape_usps_rates),
        ("UPS", scrape_ups_rates),
        ("FedEx", scrape_fedex_rates),
    ]

    all_rates = []
    with ThreadPoolExecutor(max_workers=3) as executor:
        futures = {
            executor.submit(fn, weight, origin, destination): name
            for name, fn in scrapers
        }
        for future in futures:
            try:
                rates = future.result(timeout=30)
                all_rates.extend(rates)
            except Exception as e:
                print(f"Error with {futures[future]}: {e}")

    df = pd.DataFrame(all_rates)
    df = df.sort_values("price")
    return df

# Example usage
rates = get_all_rates(16, "10001", "90210")
print(rates.head(10))

Adding a REST API

Wrap it in a FastAPI endpoint for your e-commerce platform:

from fastapi import FastAPI

app = FastAPI()

@app.get("/rates")
def compare_rates(weight: float, origin: str, dest: str):
    rates = get_all_rates(weight, origin, dest)
    return {
        "cheapest": rates.iloc[0].to_dict(),
        "all_rates": rates.to_dict(orient="records")
    }

Handling Anti-Bot Protections

Carrier websites use aggressive bot detection. Using a proxy service like ScraperAPI or ThorData handles IP rotation and CAPTCHA solving automatically, so your aggregator runs reliably.

Rate Caching Strategy

Cache rates for 15-30 minutes to reduce API calls:

from functools import lru_cache
from datetime import datetime

@lru_cache(maxsize=1000)
def cached_rates(weight, origin, dest, cache_key):
    return get_all_rates(weight, origin, dest)

# Cache key rotates every 15 minutes
cache_key = datetime.now().strftime("%Y%m%d%H%M")[:11]
rates = cached_rates(16, "10001", "90210", cache_key)

Production Tips

Monitor rate accuracy by spot-checking against carrier websites weekly
Set up alerts for when scraping fails (carrier site redesigns)
Track savings per shipment to quantify the aggregator's ROI
Use ScrapeOps to monitor your scraper health and uptime

Conclusion

A shipping rate aggregator pays for itself quickly. Even a 10% savings on shipping across thousands of orders adds up to significant money. The Python scraping approach gives you full control over which carriers and services to compare.

Start with USPS (easiest to scrape), add UPS and FedEx, then expand to regional carriers for even better rates.

DEV Community