I've been using a Korean hot deal aggregator called Hotgusa for a while now, and it got me thinking about the technical challenges behind building something like this.
The problem it solves
Korean shoppers check multiple community forums (Ppomppu, Clien, Ruliweb, FM Korea) for deals. Same product, different prices across platforms. Manually checking everything is a time sink.
Technical challenges I imagine they faced
1. Multi-source crawling
Each community has different HTML structures. Some are SPAs requiring headless browsers like Puppeteer or Playwright.
2. Real-time updates
Hot deals sell out in minutes. The crawling interval needs to be aggressive but respectful to avoid getting blocked.
3. Deduplication
Same deal posted across multiple communities. Need similarity matching on title, price, and URL.
4. Ranking algorithm
They seem to use views, upvotes, and recency to surface the best deals first.
5. Push notifications
Keyword-based alerts via FCM for their mobile app. Users set keywords and get notified instantly.
Thoughts
Building an aggregator like this seems straightforward at first, but the edge cases are brutal. Rate limiting, changing HTML structures, duplicate detection...
If anyone's built something similar, I'd love to hear about your architecture choices. What worked? What didn't?
Top comments (0)