DEV Community

Sam Sullivan
Sam Sullivan

Posted on

Optimizing a 40-Year-Old Government Database: From 3s to 30ms

Corgi Logo

The National Highway Traffic Safety Administration maintains the most comprehensive vehicle database in existence. Every car sold in America since 1981, complete technical specifications, all publicly available.

There's just one problem: their API is painfully slow.

The Problem
At Cardog, we process millions of vehicle listings daily. Each one needs VIN decoding—make, model, year, engine specs, manufacturing details. The NHTSA's official API? 3+ second response times on average.

When you're handling millions of lookups, this isn't just inconvenient. It's impossible.

Inside the 40-Year-Old Database
We downloaded the VPIC (Vehicle Product Information Catalog) database to understand the bottleneck. What we found was fascinating: a 1.5GB SQL database that hasn't been meaningfully optimized since the 1990s.
The structure tells the story:

Textbook third normal form normalization (appropriate for 1995, overkill for 2025)

Separate tables for makes, models, body styles, engine types, manufacturing plants, fuel systems

Each VIN lookup requires 10-20 table joins
Millions of rows for edge cases from decades past

This wasn't incompetence—the NHTSA correctly prioritized data integrity over performance. They're regulators, not application developers. But it meant this incredible public dataset was performing far worse than it should.

Standard Optimization, Dramatic Results
We applied basic database cleanup:

Stripped regulatory metadata applications don't need
Removed tables for obsolete vehicle types
Applied modern SQLite optimization (proper indexes, query planning, compression)

No machine learning. No revolutionary algorithms. Just maintenance that should have been routine for 40 years.
The numbers:

Database size: 1.5GB → 21MB compressed
Response time: 3200ms → 30ms
Network dependency: Required → Optional
Rate limits: ~10/second → Unlimited

Why This Matters
Every automotive application needs VIN decoding:

  • Car marketplaces analyzing inventory
  • Insurance companies assessing risk
  • Lenders underwriting auto loans
  • Parts suppliers matching components
  • Fleet management tracking vehicles
  • Mobile apps providing vehicle info

The government provided the data. We just made it usable.
Making It Open Source
We built this for Cardog's platform, but the problem isn't unique to us. Rather than keep a competitive advantage, we open-sourced it.
Why? Because the government already did the hard work—40 years of comprehensive data collection. Making it performant should be table stakes for the industry.

Try it yourself:
npm install @cardog/corgi

The library auto-updates monthly with fresh NHTSA data, works universally (Node.js, browsers, edge workers), and requires zero configuration.

The Real Lesson
This isn't a story about brilliant engineering. It's about what happens when critical infrastructure goes unmaintained.

The NHTSA VPIC database is an incredible public resource—detailed specifications for every vehicle sold in America for four decades. But packaged for government compliance systems, it performs far worse than necessary for everyday applications.

Sometimes the best optimization is just deleting what you don't need and making what remains fast.

The government built the foundation. The rest of us just need to maintain it.

Top comments (0)