DEV Community

Gavi Schneider
Gavi Schneider

Posted on • Originally published at Medium on

IsraBrew: A Craft Beer Scraper

We may be small in size geographically, but the craft beer scene in Israel is booming. New beers are appearing on shelves on a weekly (if not daily) basis, and I’m not just talking about beers brewed locally. New beers from countries all over the world, some of them landing in Israel for the very first time are also popping up left and right. Last month (January ’21) was a particularly historic time for Israeli craft beer enthusiasts, with over 15 new beers from BrewDog (and their experimental brewery, OverWorks) making their debut appearance in Israeli stores all on the same day.

As someone who likes to stay up to date with all the latest beers making their way to our country, I was looking for a way to see everything that’s available from all the different stores and suppliers in one place. I could have just visited every store’s website and scrolled through their inventory, but I’d essentially be doing the same thing over and over again, only on different sites (assuming I even know where to look — something that might pose as a challenge to new Israeli beer enthusiasts).

When you want repeat some process over and over again, automation is your best friend. Specifically, in my case I turned to automating a web scraper. By writing a program that would visit all the websites for me, pull in their inventories and aggregate them into one location I wouldn’t need to bounce from site to site to know what beers are currently available.

This is how IsraBrew was born: The need for a website to view what craft beers are currently available from all the major Israeli suppliers. If you’re as obsessive as me and want to try every beer that you can get your hands on, you’re going to love this.

The IsraBrew project is made up of four main components:

  1. The Web Scraper: The app that visits the websites and populates the database.
  2. The API: Responds to requests with beers it retrieves from the database.
  3. The Client: Web interface to view the beers.
  4. Deployment: Infrastructure and services that aided in deploying the app.

Web Scraper

Although I really like using Puppeteer when building projects that utilize web scraping in JavaScript / Node.js, I decided that this time I’d go with Python for my web scraping needs. Selenium would have been an obvious choice, and I’ve even used it previously in both Python and Java, however I decided to go with Beautiful Soup — its simplicity and speed (relative to Selenium) were what won me over. I was looking to build a lightweight scraper that I could deploy on a server and run a few times a day, and Beautiful Soup did that perfectly. Well, almost perfectly, there was one thing it couldn’t do: scrape content that was generated via JavaScript after the page loaded. This initially became a problem, as a few of the suppliers used JS to load their content. After countless Stack Overflow and Reddit threads, I finally discovered ChompJS, which solved this exact problem.

With ChompJS, I was able to target script tags in the DOM and get the content that they would eventually produce. Problem solved. Using both Beautiful Soup and ChompJS, I was able to scrape all the desired websites without actually having to open a browser (headless or otherwise).

Once scraped, the beers were then stored in an in-memory SQLite database. Last time I had worked with a Flask backend I wrote out all the SQL commands inside of my Python code, but this time I decided to use the SQLAlchemy ORM package, which allowed me to do CRUD operations in Python without any need for raw SQL.

The web scraper is deployed with the Flask server and is scheduled to run multiple times throughout the day, so whenever you’re browsing the site the products will be up to date.

API

This part turned out to be the shortest and easiest to implement. I created a barebones Flask server that had a ‘/beers’ route and would retrieve beers from the database and send them to the requesting client. In my opinion, Flask’s simplicity is both it’s strength and it’s weakness — for smaller projects that require a web server, Flask is super easy to get started with and very easy to understand. For larger projects however, its lack of structure may cause uncertainty and confusion when choosing how to properly build out your application (similar to Express in the Node.js ecosystem, it’s not very opinionated, leaving the developer with many decisions to be made). I personally prefer using Django when building larger web applications with Python, but for this particular project Django seemed like overkill.

Client

A simple Frontend built with my usual stack: React, TypeScript and TailwindCSS. I discovered this NPM package which made implementing different tabs for different suppliers super easy. You all know about React, I’m not going to go into much detail here. What I will say is — if you’re using React but not using Tailwind to style your components, you should be. I still use old school CSS (or SASS) for specific things, but for me Tailwind has become almost as invaluable as React — If I’m building a React project, I’ll be styling it with TailwindCSS.

Deployment

If you’re even remotely interested in the Flask world, you’ve probably heard of Miguel Grinberg, who’s books I consider to be our Flask Bible. His blog is just as informative, and I found this article that really helped me understand how to properly deploy a Flask project on a server.

Essentially, Nginx is what holds everything together: It proxies all traffic starting with ‘/api’ to the backend (which is running on a Gunicorn server), and everything else to the React frontend. Not only does Nginx make this really easy to set up, but because my frontend and backend are deployed on the same machine, I never had to deal with any CORS issues in this project, which was… new. The server itself is a DigitalOcean Droplet, which are super easy to deploy and configure.

Serving my app over HTTPS turned out be a breeze thanks to Cloudflare. I’m not entirely sure how they’re staying business with their ridiculously good free tier, but I’m very thankful for it.

One feature that I really wanted to add was search — the ability to search for a specific beer and have results from all the different suppliers appear. Unfortunately, there was a slight language barrier — some of the suppliers named their beers in English while some named them in Hebrew. It’s possible that three suppliers are all carrying the same beer, but if two of them are named in Hebrew and you searched in English, you’d only get one result. I can implement the search with both English and Hebrew options and make it clear that searching in a certain language will only yield partial results, but I’d really like to find a way to overcome the language obstacle so that I can implement a proper search bar that can really show you all your different options when searching for a particular beer.

If you’re into programming, you can check out the repo here. If you’re in Israel and looking for some good beer, you can check out IsraBrew here.

Cheers 🍻

Top comments (0)