DEV Community

Cover image for Supacrawler: lightweight, and ultra-fast web scraping api
Antoine Ross
Antoine Ross

Posted on

Supacrawler: lightweight, and ultra-fast web scraping api

Supacrawler is an opensource webscraping api engine written in Go. Out of the box it comes with 3 endpoints: Scrape, Crawl, and Screenshots.

It's a light wrapper on playwright with Dockerfiles for both local development and for production. It's also ultra-fast because of go concurrency and channels. I have a write-up of the benchmarks in the documentation in Supacrawler benchmarks.

Going through the endpoints, we have the following:

Scrape: This endpoint allows you to scrape the web using headless browsers and receive the output automatically cleaned in markdown.

Scrape Dashboard

Crawl: This endpoint allows you to, with a headless browser, systematically crawl an entire website and receive it back in both markdown/html format.

Crawl Dashboard

Screnshots: This endpoint is for rendering javascript pages, rendering full page screenshots, mobile screenshots all through an api endpoint.

Screenshots Dashboard

Watch (app exclusive): This endpoint is for watching/monitoring changes within the contents of a website. You can run a job that uses a cron job and then sends you an email notification if anything changes. Works like a charm!

Watch changes email notification

The best part about Supacrawler is that it works out of the box with just a few lines of code:

curl -O https://raw.githubusercontent.com/supacrawler/supacrawler/main/docker-compose.yml
docker compose up
Enter fullscreen mode Exit fullscreen mode

I'm always keen to know more about how people will use tools like this. Let me know if you find this useful or if you have any questions!

If you're interested in seeing more you can visit the following:
Website
Github

Top comments (0)