What is Web Scraping?
Ans: Try Google Search lol.
As the name suggests it's just scraping data of the web sites using http request response cycle.
Request response cycle is just two way communication between servers or computers browser is just an interface like messanger and servers are like users.
in your native terminal.
That's also web scraping of sort.
But let's just cut to the chase.
So today we are gonna be scraping all the image from udemy.com, sounds fun :o!!!
To get started first you need to install a couple of python modules
requests, bs4 and lxml
So requests package is used to make requests to the server,
bs4 is a tool that make stuff looks pretty and easy to work with,
and lxml well that's a parser to parse HTML response markup and goes along with bs4.BeautifulSoup() like a sauce.
We're ready and good to go.
Now just follow along.
Step 7: Make a request to the corresponding server to receive image data.
So now we have image source, which can be used to in request.get() to ask some other server for "The Image Data". But we got to check for the type first because in HTML5 we can also use svg as source of an image that may be represented differently than the image. In short, we can surely save an svg's via this method but I just haven't added that functionality yet!
In fact you can fetch any type of data GIF's, PNG's, JPG's, you name it.
Step 10: Bruh, you did it ! Congrats!!!
Let the loop complete its job.
Now you have a new script to show off.
You did a great job. Now you can scrape any type of data you like from a normal website.
Before that, there is one more thing.
You can also fetch image file names from those source url's.
I prefer to do that by regex and believe me it's weird to see those funny looking patterns at first so please don't judge .
Goto some site like the one displayed above and find out number of words in the page.
I'd love to hear about your progress in the comments section.
Have a beautiful day. ✨