loading...

Attention Web Scrapers and Pen Testers: Slither is now a PyPI package! πŸŽ‰

kaelscion profile image kaelscion ・2 min read

Hey data science, web automation, web scraping, and data aggregation folks. Are you tired of needing to purchase proxy IP addresses that get blocked on your goal web asset within a couple of days at most? Do you not yet have your own solution for cycling IP addresses and/or user agents? Do you like super salesy pitches like this one and tend to buy things from QVC after being asked stupid rhetorical questions!?! Well then have I got great news for you!

All kidding aside, I've finally gotten around to uploading my proxy IP and user-agent cycling library Slither to PyPI! To check out the GitHub repo, go here, for the PyPI page, head here

Only python 3 is supported and no support for python 2 is planned. This is my small way of doing my part to encourage Python 3 use over Python 2. To install it in your next project in a Python-3-only environment:

pip install slitherlib

for a multi-distro environment:

pip3 install slitherlib

To actually use the library in your scraping projects:


from slitherlib.slither import Snake
from random import choice

import requests

s = Snake()
ip_address = choice(s.ips)
user-agent= choice(s.uas)

headers = {
    "User-Agent": user-agents
}

r = requests.get('https://www.google.com', 
                 proxies={'https': ip_address, 
                          'http': ip_address},
                 headers=headers})

At this time, Slither pulls IP addresses and User-Agents from free sources around the web and dump them into two variables, ips and uas. We add new proxy ip:port sources as we can find them and verify, to the best of our ability, that they are not run by hackers looking to steal IP address information.

As this project grows, we hope to build it into a full web-scraping suite that easily supports concurrency and multi-processing, ROBOTS.txt support, webdriver browser automation, dynamic mouse-moves, and other goodies that will keep the data-collection enthusiast collecting data more and fighting 403 and 404 codes less!

If you like it, please give us a star on GitHub! I welcome bug reports, feature requests, and any comments or concerns you have so that I can make this library the best it can be! And, as always, I LOVE to collaborate so feel free to open a PR if you have improvements or ideas!

Discussion

pic
Editor guide
Collapse
mmphego profile image
Mpho Mphego

Does this package work with Selenium firefox proxy authentication?

Collapse
kaelscion profile image
kaelscion Author

The framework returns proxy ip:port combinations as a list of strings. Yes, it can be used with selenium Firefox as the IP and User-Agents overrides/arguments accept a string argument.

Basically, treat the Snake() object as a curated list of IP and UA choices that can be used anywhere a string object is accepted as an IP and/or UA argument.

Were you running into a particular issue using Slither in your selenium Firefox project?

Collapse
mmphego profile image
Mpho Mphego

Beautiful, haven't used it as yet, but looking forward. I stumbled upon your Youtube video on Reddit.

Great content keep it up.

Thread Thread
kaelscion profile image
kaelscion Author

Thanks so much! I've got a few videos up and always love to hear when people enjoy my content!