DEV Community

Karl Eriksson
Karl Eriksson

Posted on • Edited on

Scraping HackerNews with GPT-4

I wanted to share a project I recently created - An automated scraper for that can scrape any website and store the text content as JSON using GPT. I thought this might be helpful for anyone interested in scraping data or working with APIs.

Step 1: Tech Stack

After exploring various ways to achieve this, I opted for a no-code solution. In the end I chose the no-code platform Clevis to cobble together the required steps and automate the process by running it on a daily schedule.

Step 2: Scraping the content

By using an HTTP Request step in Clevis, I can make a GET request to any website and scrape the text content.

HTTP Request

Step 3: Using ChatGPT to parse the content as JSON

Next, I use the scraped text from HackerNews and prompt ChatGPT to create a JSON object with a schema that I provide in the prompt. In this screenshot, the scraped text is referenced as steps.scrape.output.

ChatGPT

Step 4: Storing the result

Now that ChatGPT has provided me the result, I can store it in my own database with another HTTP Request step that calls an API I built.

Workflow

Step 5: Scheduling

By enabling a schedule in Clevis, I can have this run daily to store the top HackerNews posts for later curation.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay