DEV Community

Cover image for Raspberry Pi + Squid: Building a Proxy Server with your Raspberry Pi for Web-scraping
Shilleh
Shilleh

Posted on

Raspberry Pi + Squid: Building a Proxy Server with your Raspberry Pi for Web-scraping

Discover the simplicity of setting up a proxy server on your Raspberry Pi using the user-friendly and open-source software known as Squid. In this tutorial, we provide a step-by-step guide, demonstrating its application for web scraping. However, the advantages of establishing a proxy server extend beyond this, encompassing enhanced security, efficient caching, accelerated networking requests, and streamlined connection management. Unlock the potential of your Raspberry Pi with this comprehensive tutorial on Squid proxy server setup!

Before reading the remainder, be sure to subscribe and support the channel if you have not!

Subscribe:
Youtube

Support:
https://www.buymeacoffee.com/mmshilleh

Hire me at UpWork to build your IoT projects:
https://www.upwork.com/freelancers/~017060e77e9d8a1157

Part 1: Setting up the Raspberry Pi

Initial Setup:

  • Ensure your Raspberry Pi 4 is set up with Raspbian (or another compatible OS) and that it's connected to the internet.
  • Access your Raspberry Pi terminal through SSH or directly using a monitor and keyboard.

Update and Upgrade Packages:

  • Run sudo apt-get update and sudo apt-get upgrade to ensure all packages are up to date. Install Squid:
  • Execute sudo apt-get install squid.
  • Once installed, the Squid service should start automatically.

Configure Squid:

  • Backup the original configuration file: sudo cp /etc/squid/squid.conf /etc/squid/squid.conf.backup.
  • Edit the configuration file: sudo nano /etc/squid/squid.conf.

Image description

You can see the lines are uncommented:

acl localnet src 192.168.1.0/24
http_access allow localnet

This is needed to only allow devices on your local home network to connect to the proxy. It is a very simple setting; you can play with the config in this file to filter what IPs and what network security parameters you would like to configure. For the sake of this video, we keep it simple.

Restart Squid:

  • Restart the Squid service to apply the changes: sudo systemctl restart squid. Verify Squid is Running:
  • Check the status of Squid: sudo systemctl status squid.

Get Raspberry Pi IP Address:
To get the Raspberry Pi IP address you can type in the command ifconfig in the terminal and look at the inet address. You will need this address to use on the Python script on your local computer that will send the scraping request to the Pi's IP address first as proxy!

Part 2: Local Computer Code

Create a Python Script on your local computer and run the following script, make sure you substitute your Pi IP address.

https://github.com/shillehbean/youtube-p2/blob/main/test_proxy.py

This script is designed to scrape web content from a specified URL (in this case, a search page on eBay for laptops) using the Python requests library for making HTTP requests and the BeautifulSoup library from bs4 for parsing HTML content. The script uses a proxy server to make the request.

In a real-world scenario, you'd likely want to add more functionality after parsing the HTML to extract and process the specific data you're interested in.

Conclusion:

Hope you enjoyed the quick tutorial, if the Python script ran it means that the requests are going through your Pi's IP address which can help you webscrape, let me know if you have any questions.
If you enjoy the video, please subscribe to my channel Shilleh on Youtube in the video above, your support would be appreciated. Ping me for any questions, thanks everyone!

Top comments (0)