Building a Recipe Scraping Tool in Python: What I learned

#python #programming #coding

The Problem..

We've all been there, you want to learn how to cook a new meal so you Google the recipe. Then you get hit with all the ads, the website randomly scrolling on its own, and it just being a pain to just get the ingredient list or the instructions. I always think that there should be an easier way, then it hit me.. why don't I just ``make it easier.

I wanted to make a tool in Python that scrapes through recipe website and returns the title, ingredient list, and instructions list in a txt file that's saved to your computer.

The Journey..

Tools used:

Python (3.13)
Requests for requesting webpages
BeatifulSoup for html parsing
ARgparse for cli tool implementation

The basic code flow:

Receive URL from user input
Request the webpage using 'requests'
Parse the html for 'application/ld+json' data using Beautiful Soup (bs4)
Load and extract the title, ingredients, and instructions from JSON
Save data to an array and write the data to a txt file

Challenges and What I learned:

My first webscraping project so I wasn't really sure how to go about getting the same data from different websites.
At first, my code was very static, using bs4 to only get things from the website using hard coded class names.
I had to do some research and learned that most websites have a script of type='application/ld+json' that contains the metadata such as title, ingredients, and instructions.
I had also never created my own Pypi Python package, at first it was just a python script that the user would run.
I learned how to package the tool so others can install and just run it, with the url as the parameter

Final Txt file:

This is what using the package looks like:
This is the final txt file:

If you want to use the package:

pip install recipescraper-cli-tool-er
recipescraper (recipe url)

Next Steps:

I want to make a website where people can go to and download the file
I want to have it save the data to a pdf file instead of txt file
There are some websites that still don't work so for a quick project it's okay, but I eventually want to have other ways to get the data when my current method doesn't work

Conclusion:

This was a fun quick project that taught me about website json metadata, parsing the html structure, and creating Python packages. I do want to return to this project to improve it but for now onto the next one.

Here's the GitHub repo if you're interested in the full code:
[https://github.com/eduardoreyes007351208/recipeScraper]

Thank you for reading, leave me your thoughts and ideas, and hopefully this makes cooking a little easier!