loading...

Introducing Slither: Adding anonymous proxy IPs and randomized User Agents to Scraping Projects

kaelscion profile image kaelscion ・1 min read

Hey there Devvers! Long time, no see! I have been on vacation for the past while and thought I would return with my first open sourced project, Slither. Slither is a basic anonymizing framework for adding elite, https, anonymous proxy IPs and pseudo-random User Agents to your web scraping and/or pen-testing projects! Finally up and running and based heavily on the Bye Bye 403 series that I've been writing, Slither is my first foray into OSS and something I deeply care about and hope folks find useful!

The GitHub repo can be found here and I really hope you all enjoy it! A lot of requests and questions have come from around the web about this topic and how to make it easier to avoid the dreaded 403 (or worse, 503) when you're trying to scrape and/or aggregate data! The framework is dead simple and supports concurrent scraping as well as parallel processes.

Whenever an instance of the Slither class is declared, a list of IPs and User-Agents are pulled from proxy sites around the web and assigned to the Slither().ipand Slither().ua variables. Simple plug those two variables into your project's headers and your off and running!

I really hope this helps some newcomers to web scraping and the emerging field of collecting data for data scientists and ML engineers to use! Please give it a try and leave a comment here or on the repo. Be gentle as this is my first OSS project despite years of working in software 😝. Enjoy and happy scraping!

Posted on by:

kaelscion profile

kaelscion

@kaelscion

I'm Jake Cahill. Lifetime Pythonista, web scraping and automation expert. Enjoy books. Love my wife, dog, and cat, and think AI and Julia are pretty nifty

Discussion

markdown guide
 

The automation tool must be compatible and can provide you not only with anonymity, but also with security since it has a feature called Anytime IP Refresh. With this advanced feature, your IP address can be changed anytime, either by demand or on a pre-set schedule.

 

Great! Yet another tool to train my IP blacklisting app with app with. Machine learning model against spam generator... Perfect!

 

Spam generator, pen testing tool, toolset for developing the data aggregators that run most of the web. Who really knows these days? 😜😜

Either way, I'm glad this has gotten the attention of somebody developing a counter to it (because I'm building one of those too 😁😁). The only way to make our tools better is through the friendly competition these types of things inspire toward each other!

Specifically in my corner, Slither is open source so that it can be made better by other automation and security experts. Our "keep bots out" system Sentinel, is a freemium service that helps me analyze how folks are defeating filters (whether they use Slither or not) so we can better keep them out.

Maybe we can compare notes sometime? I'd love to see a like-minded person's approach to this. I'm always up for learning something new. Anybody who knows me can attest that being wrong is one of my favorite places to be because that means I get to learn from the person who is right. Thanks so much for commenting and I would love to see your blacklisting model in action when it's up and running and pit Slither against it!