I am currently working on a project that involves scraping public data from Facebook and researching data extraction strategies.
Solutions such as Apify have actors that do all the heavy lifting for you, but the cost of the platform subscription grows quickly with the amount of data you need.
I'm trying to decide whether just to build a custom, cheaper alternative using open source tools.
Before I go ahead and write the codebase, I wanted to open this up to the community for your hands-on advice.
The Stack I'm ConsideringIf I go the open source route, I'm thinking of using programmatic browser automation with Playwright (C#/.NET) or Puppeteer + puppeteer-extra-stealth (Node.js) with rotating residential proxies to get past anti-bot blocks
My Questions to the Community:
- Can an open source build really win the long game against Facebook's anti-scraping defenses? Or will I be stuck forever fixing broken scripts 90% of the time when selectors or security algorithms change?
- Is it really cheaper to build it yourself? Is it really beating Apify pricing at scale when you consider the monthly cost of premium rotating residential proxies and cloud infrastructure?
- What is your favorite stealth setup? What open source tools or configuration did you use to avoid getting blocked immediately if you've managed to build a custom scraper for heavily protected sites? Would love to hear about your experiences, horror stories or success metrics on build vs buy for social media scraping!
Top comments (0)