How to Set Up Proxies in Octoparse for Efficient Web Scraping
Octoparse is a powerful yet user-friendly tool that lets you extract data from websites without needing to write any code. It includes features like automatic IP rotation and extended session times to help you stay within website traffic rules. With its advanced machine learning, Octoparse can handle complex site structures to capture text, links, images, and even HTML content reliably.
One key component to maintaining smooth and anonymous scraping workflows is using proxies. In this article, we'll walk through how to configure proxies in Octoparse step-by-step, using DataImpulse proxies as an example provider.
Why Use Proxies with Octoparse?
When scraping websites, your IP address may get blocked if many requests come from the same source. Proxies help distribute requests across multiple IPs to avoid being throttled or banned. Octoparse’s proxy integration supports both rotating and sticky sessions, making it easier to mimic real user behavior.
Step 1: Whitelist Your IP Address
Before configuring proxies in Octoparse, you need to add your IP to the proxy provider’s whitelist. This allows you to connect to their proxy servers without needing to enter login credentials every time.
Here’s how to do it with DataImpulse:
- Choose your proxy plan from DataImpulse.
- Go to the “Manage Whitelist IPs” section on their dashboard.
- Click “Detect my IP” or manually enter your current IP address.
- Press the “Add new IP” button.
This step ensures your IP is authorized, setting the stage for hassle-free proxy usage.
Step 2: Install and Launch Octoparse
If you haven’t done so already:
- Download Octoparse from the official website.
- Install it and open the app.
Step 3: Create a New Scraping Task
- Click the +New button at the top-left corner.
- Select Custom Task.
- Enter the URL you want to scrape — for example,
books.toscrape.com. - Hit Save.
Step 4: Enable Proxy Settings in Octoparse
- Once the page loads, click the Settings button in the top-right corner.
- Scroll down to the Anti-blocking Settings section.
- Check the box labeled Access websites via proxies.
- Click the Configure button to open the proxy configuration window.
Step 5: Add DataImpulse Proxies
Now, add your proxy IPs provided by DataImpulse:
- Paste the IP addresses in the format
IP:PORTinto the field. - For rotating residential proxies, specify the IP address accordingly—for example,
148.251.5.30:823.
Step 6: Configure Proxy Rotation
- Adjust the Switch interval to control how often your proxy IP changes. This depends on whether you prefer rotating or sticky sessions.
- Click Confirm to save your proxy configuration.
Step 7: Finalize Settings and Create Workflow
- Check that a checkmark appears next to the Configure button to confirm your proxies are active.
- Click Save to apply your settings.
- You will be returned to the main page view.
Step 8: Build Your Scraping Workflow
- Click the lightbulb icon to open workflow options, including pagination or scrolling.
- After making your choice, press Create Workflow.
- Click on a page element you want to extract (e.g., a category like “Mystery”).
- Choose Extract text of the selected element.
Step 9: Run and Monitor Your Scraping Task
- Save the extraction, then click Run.
- Select Run on your device and Standard mode unless you have specific needs requiring other options.
- The scraping process begins; you can pause and resume as necessary.
- When finished, stop the run.
Step 10: Export Your Data
- Review the scrape statistics.
- Choose to export the data immediately or save for later.
- Select the desired export format based on your needs.
Wrapping Up
You’ve now successfully set up proxies with Octoparse, enabling more reliable and scalable web scraping. Using services like DataImpulse as your proxy provider ensures continuous IP rotation and anonymity, which are critical for large scraping projects.
Start leveraging proxies in your Octoparse workflows to improve performance and reduce the risk of blocking during your next data extraction project.






Top comments (0)