What I learned processing 50+ federal data sources with Node.js

#api #data #node #performance

I've been working on a project that pulls environmental data from federal agencies — EPA, FEMA, USGS, CDC, Census, and about 45 others.

Some things I ran into that might save you time:

Federal APIs are wild

No two agencies use the same format. EPA gives you XML. USGS gives you tab-separated files from the 90s. FEMA has a decent REST API but paginated at 1,000 records. Census has its own query language.

I ended up writing a separate parser for each source. No universal adapter worked.

Streaming EJS at scale

I needed to render ~280K static HTML pages from templates. The first approach (one EJS render per file, sequential) took 14 hours. Switched to streaming writes with a worker pool — got it down to ~5 minutes. The bottleneck was always disk I/O, not rendering.

Cloudflare R2 is underrated for static sites

Hosting 280K HTML files on traditional hosting is painful. R2 + Workers turned out to be perfect — no file count limits, edge-cached globally, and the free tier covers a lot.

Rate limits will find you

Every federal API has different rate limits, and most don't document them well. EPA ECHO silently returns empty results after ~300 requests/minute. USGS returns 503s. I ended up building a generic retry queue with exponential backoff that handles all of them.

Would love to hear from others who've worked with government data APIs. What's the worst format you've had to parse?