DEV Community

Johann
Johann

Posted on

How I Built a Desktop SEO Crawler That Handles 100k+ Pages (React + Electron + SQLite)

How I Built a Desktop SEO Crawler That Handles 100k+ Pages

As an indie dev, I used to pay $200/month for cloud-based crawlers. Desktop alternatives? Mostly Java apps with UIs from 2005.

So I decided to build my own: Spider Pro.

Spider Pro Dashboard

The Stack

Frontend

  • React 19 with Vite for fast dev experience
  • TanStack Table for virtualized tables (100k+ rows without lag)
  • Zustand for lightweight state management
  • Recharts for analytics dashboards
  • react-force-graph for 3D link visualization

Backend

  • Node.js with Fastify
  • Crawlee (from the Apify team) for the crawl engine
  • Playwright for JavaScript rendering
  • better-sqlite3 for local storage

Desktop

  • Electron for cross-platform distribution (Win/Mac/Linux)

The Interesting Technical Challenges

1. Real-Time Updates with Socket.io

When crawling, users want to see results immediately. I used Socket.io to stream each crawled page to the frontend:

// Backend emits each page as it's crawled
socket.emit('crawl_update', { url, status, title, issues });

// Frontend updates the table in real-time
useSocket('crawl_update', (data) => {
  addPageToTable(data);
});
Enter fullscreen mode Exit fullscreen mode

2. Handling 100k+ Rows in a Table

Traditional React tables would choke on 100k rows. TanStack Table's virtualization only renders visible rows:

<TanStackTable
  data={pages} // 100k+ items
  columns={columns}
  enableVirtualization
/>
Enter fullscreen mode Exit fullscreen mode

3. JavaScript Rendering with Playwright

SPAs don't render content in the initial HTML. I added a toggle to switch between:

  • CheerioCrawler (fast, HTTP-only)
  • PlaywrightCrawler (slower, full JS rendering)
const crawler = jsRenderingEnabled
  ? new PlaywrightCrawler(options)
  : new CheerioCrawler(options);
Enter fullscreen mode Exit fullscreen mode

4. SQLite for Portable Storage

Each project is a single .db file. No PostgreSQL server needed. Users can backup, share, or move projects easily.

What I Learned

  1. Electron apps can be fast if you're careful with IPC
  2. SQLite is underrated for desktop apps
  3. Virtualization is essential for large datasets
  4. Socket.io makes real-time UX trivial

Try It Out

Spider Pro is launching on Product Hunt soon. Try it free for 14 days:

👉 [https://spiderpro.app/]

Top comments (0)