DEV Community

핫구사
핫구사

Posted on

Building a deal aggregator: Lessons from crawling Korean e-commerce sites

I've been using a Korean hot deal aggregator called Hotgusa for a while now, and it got me thinking about the technical challenges behind building something like this.

The problem it solves

Korean shoppers check multiple community forums (Ppomppu, Clien, Ruliweb, FM Korea) for deals. Same product, different prices across platforms. Manually checking everything is a time sink.

Technical challenges I imagine they faced

1. Multi-source crawling

Each community has different HTML structures. Some are SPAs requiring headless browsers like Puppeteer or Playwright.

2. Real-time updates

Hot deals sell out in minutes. The crawling interval needs to be aggressive but respectful to avoid getting blocked.

3. Deduplication

Same deal posted across multiple communities. Need similarity matching on title, price, and URL.

4. Ranking algorithm

They seem to use views, upvotes, and recency to surface the best deals first.

5. Push notifications

Keyword-based alerts via FCM for their mobile app. Users set keywords and get notified instantly.

Thoughts

Building an aggregator like this seems straightforward at first, but the edge cases are brutal. Rate limiting, changing HTML structures, duplicate detection...

If anyone's built something similar, I'd love to hear about your architecture choices. What worked? What didn't?

Top comments (0)