DEV Community

Eduardo Reyes
Eduardo Reyes

Posted on

Building a Recipe Scraping Tool in Python: What I learned

The Problem..

We've all been there, you want to learn how to cook a new meal so you Google the recipe. Then you get hit with all the ads, the website randomly scrolling on its own, and it just being a pain to just get the ingredient list or the instructions. I always think that there should be an easier way, then it hit me.. why don't I just ``make it easier.

I wanted to make a tool in Python that scrapes through recipe website and returns the title, ingredient list, and instructions list in a txt file that's saved to your computer.

The Journey..

Tools used:

  • Python (3.13)
  • Requests for requesting webpages
  • BeatifulSoup for html parsing
  • ARgparse for cli tool implementation

The basic code flow:

  1. Receive URL from user input

  2. Request the webpage using 'requests'

  3. Parse the html for 'application/ld+json' data using Beautiful Soup (bs4)

  4. Load and extract the title, ingredients, and instructions from JSON

  5. Save data to an array and write the data to a txt file

Challenges and What I learned:

  • My first webscraping project so I wasn't really sure how to go about getting the same data from different websites.
  • At first, my code was very static, using bs4 to only get things from the website using hard coded class names.
  • I had to do some research and learned that most websites have a script of type='application/ld+json' that contains the metadata such as title, ingredients, and instructions.
  • I had also never created my own Pypi Python package, at first it was just a python script that the user would run.
  • I learned how to package the tool so others can install and just run it, with the url as the parameter

Final Txt file:

  • This is what using the package looks like:
  • This is the final txt file:

If you want to use the package:

  • pip install recipescraper-cli-tool-er
  • recipescraper (recipe url)

Next Steps:

  • I want to make a website where people can go to and download the file
  • I want to have it save the data to a pdf file instead of txt file
  • There are some websites that still don't work so for a quick project it's okay, but I eventually want to have other ways to get the data when my current method doesn't work

Conclusion:

This was a fun quick project that taught me about website json metadata, parsing the html structure, and creating Python packages. I do want to return to this project to improve it but for now onto the next one.

Here's the GitHub repo if you're interested in the full code:
[https://github.com/eduardoreyes007351208/recipeScraper]

Thank you for reading, leave me your thoughts and ideas, and hopefully this makes cooking a little easier!

Top comments (0)