I wanted to share a project I recently created - An automated scraper for that can scrape any website and store the text content as JSON using GPT. I thought this might be helpful for anyone interested in scraping data or working with APIs.
Step 1: Tech Stack
After exploring various ways to achieve this, I opted for a no-code solution. In the end I chose the no-code platform Clevis to cobble together the required steps and automate the process by running it on a daily schedule.
Step 2: Scraping the content
By using an HTTP Request step in Clevis, I can make a GET request to any website and scrape the text content.
Step 3: Using ChatGPT to parse the content as JSON
Next, I use the scraped text from HackerNews and prompt ChatGPT to create a JSON object with a schema that I provide in the prompt. In this screenshot, the scraped text is referenced as steps.scrape.output
.
Step 4: Storing the result
Now that ChatGPT has provided me the result, I can store it in my own database with another HTTP Request step that calls an API I built.
Step 5: Scheduling
By enabling a schedule in Clevis, I can have this run daily to store the top HackerNews posts for later curation.
Top comments (0)