DEV Community

Karl Eriksson
Karl Eriksson

Posted on • Edited on

Scraping HackerNews with GPT-4

I wanted to share a project I recently created - An automated scraper for that can scrape any website and store the text content as JSON using GPT. I thought this might be helpful for anyone interested in scraping data or working with APIs.

Step 1: Tech Stack

After exploring various ways to achieve this, I opted for a no-code solution. In the end I chose the no-code platform Clevis to cobble together the required steps and automate the process by running it on a daily schedule.

Step 2: Scraping the content

By using an HTTP Request step in Clevis, I can make a GET request to any website and scrape the text content.

HTTP Request

Step 3: Using ChatGPT to parse the content as JSON

Next, I use the scraped text from HackerNews and prompt ChatGPT to create a JSON object with a schema that I provide in the prompt. In this screenshot, the scraped text is referenced as steps.scrape.output.

ChatGPT

Step 4: Storing the result

Now that ChatGPT has provided me the result, I can store it in my own database with another HTTP Request step that calls an API I built.

Workflow

Step 5: Scheduling

By enabling a schedule in Clevis, I can have this run daily to store the top HackerNews posts for later curation.

Top comments (0)