DEV Community

Cover image for Setting up proxies with Octoparse is a straightforward process
Kev the bur
Kev the bur

Posted on

Setting up proxies with Octoparse is a straightforward process

How to Set Up Proxies in Octoparse for Efficient Web Scraping

Octoparse is a powerful yet user-friendly tool that lets you extract data from websites without needing to write any code. It includes features like automatic IP rotation and extended session times to help you stay within website traffic rules. With its advanced machine learning, Octoparse can handle complex site structures to capture text, links, images, and even HTML content reliably.

One key component to maintaining smooth and anonymous scraping workflows is using proxies. In this article, we'll walk through how to configure proxies in Octoparse step-by-step, using DataImpulse proxies as an example provider.

Setting up proxies with Octoparse is a straightforward process image 1

Why Use Proxies with Octoparse?

When scraping websites, your IP address may get blocked if many requests come from the same source. Proxies help distribute requests across multiple IPs to avoid being throttled or banned. Octoparse’s proxy integration supports both rotating and sticky sessions, making it easier to mimic real user behavior.


Step 1: Whitelist Your IP Address

Before configuring proxies in Octoparse, you need to add your IP to the proxy provider’s whitelist. This allows you to connect to their proxy servers without needing to enter login credentials every time.

Here’s how to do it with DataImpulse:

  • Choose your proxy plan from DataImpulse.
  • Go to the “Manage Whitelist IPs” section on their dashboard.
  • Click “Detect my IP” or manually enter your current IP address.
  • Press the “Add new IP” button.

This step ensures your IP is authorized, setting the stage for hassle-free proxy usage.


Step 2: Install and Launch Octoparse

If you haven’t done so already:

  • Download Octoparse from the official website.
  • Install it and open the app.

Step 3: Create a New Scraping Task

  • Click the +New button at the top-left corner.
  • Select Custom Task.
  • Enter the URL you want to scrape — for example, books.toscrape.com.
  • Hit Save.

Step 4: Enable Proxy Settings in Octoparse

  • Once the page loads, click the Settings button in the top-right corner.
  • Scroll down to the Anti-blocking Settings section.
  • Check the box labeled Access websites via proxies.
  • Click the Configure button to open the proxy configuration window.

Setting up proxies with Octoparse is a straightforward process image 2


Step 5: Add DataImpulse Proxies

Now, add your proxy IPs provided by DataImpulse:

  • Paste the IP addresses in the format IP:PORT into the field.
  • For rotating residential proxies, specify the IP address accordingly—for example, 148.251.5.30:823.

Setting up proxies with Octoparse is a straightforward process image 3


Step 6: Configure Proxy Rotation

  • Adjust the Switch interval to control how often your proxy IP changes. This depends on whether you prefer rotating or sticky sessions.
  • Click Confirm to save your proxy configuration.

Setting up proxies with Octoparse is a straightforward process image 4


Step 7: Finalize Settings and Create Workflow

  • Check that a checkmark appears next to the Configure button to confirm your proxies are active.
  • Click Save to apply your settings.
  • You will be returned to the main page view.

Setting up proxies with Octoparse is a straightforward process image 5


Step 8: Build Your Scraping Workflow

  • Click the lightbulb icon to open workflow options, including pagination or scrolling.
  • After making your choice, press Create Workflow.
  • Click on a page element you want to extract (e.g., a category like “Mystery”).
  • Choose Extract text of the selected element.

Setting up proxies with Octoparse is a straightforward process image 6


Step 9: Run and Monitor Your Scraping Task

  • Save the extraction, then click Run.
  • Select Run on your device and Standard mode unless you have specific needs requiring other options.
  • The scraping process begins; you can pause and resume as necessary.
  • When finished, stop the run.

Step 10: Export Your Data

  • Review the scrape statistics.
  • Choose to export the data immediately or save for later.
  • Select the desired export format based on your needs.

Wrapping Up

You’ve now successfully set up proxies with Octoparse, enabling more reliable and scalable web scraping. Using services like DataImpulse as your proxy provider ensures continuous IP rotation and anonymity, which are critical for large scraping projects.

Start leveraging proxies in your Octoparse workflows to improve performance and reduce the risk of blocking during your next data extraction project.

Top comments (0)