The Problem..
We've all been there, you want to learn how to cook a new meal so you Google the recipe. Then you get hit with all the ads, the website randomly scrolling on its own, and it just being a pain to just get the ingredient list or the instructions. I always think that there should be an easier way, then it hit me.. why don't I just ``make it easier.
I wanted to make a tool in Python that scrapes through recipe website and returns the title, ingredient list, and instructions list in a txt file that's saved to your computer.
The Journey..
Tools used:
- Python (3.13)
- Requests for requesting webpages
- BeatifulSoup for html parsing
- ARgparse for cli tool implementation
The basic code flow:
Parse the html for 'application/ld+json' data using Beautiful Soup (bs4)
Load and extract the title, ingredients, and instructions from JSON
Challenges and What I learned:
- My first webscraping project so I wasn't really sure how to go about getting the same data from different websites.
- At first, my code was very static, using bs4 to only get things from the website using hard coded class names.
- I had to do some research and learned that most websites have a script of type='application/ld+json' that contains the metadata such as title, ingredients, and instructions.
- I had also never created my own Pypi Python package, at first it was just a python script that the user would run.
- I learned how to package the tool so others can install and just run it, with the url as the parameter
Final Txt file:
- This is what using the package looks like:
- This is the final txt file:
If you want to use the package:
- pip install recipescraper-cli-tool-er
- recipescraper (recipe url)
Next Steps:
- I want to make a website where people can go to and download the file
- I want to have it save the data to a pdf file instead of txt file
- There are some websites that still don't work so for a quick project it's okay, but I eventually want to have other ways to get the data when my current method doesn't work
Conclusion:
This was a fun quick project that taught me about website json metadata, parsing the html structure, and creating Python packages. I do want to return to this project to improve it but for now onto the next one.
Here's the GitHub repo if you're interested in the full code:
[https://github.com/eduardoreyes007351208/recipeScraper]
Thank you for reading, leave me your thoughts and ideas, and hopefully this makes cooking a little easier!
Top comments (0)