API pagination broke differently on each endpoint
Started building a dashboard that pulls data from a client's API. Three endpoints, all paginated. Should be simple right?
Nope.
First endpoint worked fine
Their /users endpoint used offset pagination. Standard stuff. Pass ?offset=0&limit=100, get 100 results, increment offset by 100, repeat until you get less than 100 back.
def fetch_users(api_key):
offset = 0
limit = 100
all_users = []
while True:
response = requests.get(
f"https://api.example.com/users",
params={"offset": offset, "limit": limit},
headers={"Authorization": f"Bearer {api_key}"}
)
data = response.json()
if len(data) < limit:
all_users.extend(data)
break
all_users.extend(data)
offset += limit
return all_users
Worked first try. Got 847 users. Moved on.
Second endpoint used cursor tokens
Their /orders endpoint didn't use offsets. Used cursor tokens instead. You get a next_cursor in the response, pass it back in the next request.
def fetch_orders(api_key):
cursor = None
all_orders = []
while True:
params = {"limit": 100}
if cursor:
params["cursor"] = cursor
response = requests.get(
f"https://api.example.com/orders",
params=params,
headers={"Authorization": f"Bearer {api_key}"}
)
data = response.json()
all_orders.extend(data["orders"])
cursor = data.get("next_cursor")
if not cursor:
break
return all_orders
Fine. Different pattern but whatever. Documentation mentioned it so I adjusted.
Products endpoint was broken
Their /products endpoint looked like it used offset pagination. Documentation said it did.
Liar.
First 200 products came back fine with ?offset=0&limit=100 and ?offset=100&limit=100.
Then ?offset=200&limit=100 returned duplicate products. Products from offset 150 through 200 plus 50 new ones. Made zero sense.
Thought it was caching. Waited 10 minutes. Same duplicates.
Tried cursor tokens like the orders endpoint. No next_cursor field.
Tried page numbers instead of offsets. Got different products but duplicates scattered throughout.
Emailed support.
"Use offset pagination it works fine."
Cool thanks.
What I ended up doing
Tracked product IDs myself and filtered duplicates:
def fetch_products(api_key):
offset = 0
limit = 100
seen_ids = set()
all_products = []
empty_responses = 0
while empty_responses < 3: # Stop after 3 consecutive empty/duplicate batches
response = requests.get(
f"https://api.example.com/products",
params={"offset": offset, "limit": limit},
headers={"Authorization": f"Bearer {api_key}"}
)
data = response.json()
new_products = [p for p in data if p["id"] not in seen_ids]
if not new_products:
empty_responses += 1
else:
empty_responses = 0
all_products.extend(new_products)
seen_ids.update(p["id"] for p in new_products)
offset += limit
return all_products
Got 1,243 unique products. Filtered out around 180 duplicates.
Still don't know why their pagination breaks. Dashboard works now tho so I stopped asking
Top comments (0)