DEV Community

conjurer
conjurer

Posted on • Edited on

Web scraping- Interesting!

A cool term:
CRON = programming technique that schedules tasks automatically at specified intervals

Web what?

When researching projects etc., we usually write info from various sites- be it in a diary / excel / doc etc.
We are scraping the web and extracting data manually.

Web scraping is automating this.

intro

Example

When googling say sneakers online, it shows a list of websites with products and prices. On the shopping tab is a more detailed record right?
Google just scraped websites for you to show sneakers from different sites.
This techinque is used by almost all big companies for their businesses since data has been increasing exponentially.

Web Crawler

This is a technique that although fetches information but differs from scraping in the sense that it searches for the best websites and indexes them whereas scraping is done in a single website.

It's used for SEO analysis (scraping - gathering data).

Famous web scraping technologies:

Issues!

Notice it's not a user making requests to get the info from site, it's the code written! If the websites know this task is automated, they will quickly block the IP address.
And this check has given rise to

  1. Captchas
  2. Rate limiting
  3. Dynamic content

Goal: simulate how humans work!

Bright data automates the job. It even rotates IPs to make the user unknown and unblocks sites (paid version!) for the user.

Shoutout to JSM for the wonderful explanation.
Ps:
captcha
Lol!

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

SurveyJS custom survey software

Build Your Own Forms without Manual Coding

SurveyJS UI libraries let you build a JSON-based form management system that integrates with any backend, giving you full control over your data with no user limits. Includes support for custom question types, skip logic, an integrated CSS editor, PDF export, real-time analytics, and more.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay