Building a fast H1B search engine with DuckDB
Hey devs,
I'm excited to share a project I've been working on recently - a fast and clean H1B search engine called OpenH1B.org. As someone who's dealt with slow government databases and ad-heavy sites, I wanted to create a better alternative for finding honest H1B data.
To make this project a reality, I used a combination of Python and DuckDB, a columnar database that allows for fast querying and indexing of large datasets. Here's a high-level overview of the tech stack and architecture:
- Data Ingestion: I used Python scripts to process over 546,000 certified H1B records.
- Data Storage: I used DuckDB and Parquet for fast analytical processing, with SQLite for transactional metadata.
- Performance: Optimized for 200+ QPS on a 4-core ARM VPS.
- Frontend: I used a minimal HTML/CSS/JS frontend to keep it lighweight and fast.
Some key features of the project include:
- Wage Levels by ZIP: Users can search by ZIP code to find the official DOL Wage Levels (I-IV) for their specific role.
- Real-time Market Pulse: The homepage features a real-time feed of the latest 100 certifications, providing insight into current market trends.
- Dedicated Rankings: I built dedicated rankings for Top Hospitals and Law Firms, where median salaries often exceed $200k-$500k+.
In terms of performance, I'm proud to say that the site loads in under a second, even on mobile devices. I achieved this by optimizing the database queries, using caching, and minimizing the number of requests to the server.
I'm still fine-tuning the search functionality, so I'd love to hear your feedback. Does the ZIP code lookup accurately match your area? Check out the site at https://openh1b.org and let me know what you think!
Top comments (0)