Baby steps to learning browser automation and web scraping (Intro)
1's Self Solutions Feb 04, 2018
What is browser automation?
Before we talk about anything else, let’s get some fore knowledge on what it is all about.
Browser automation means you write a program that emulates what a human will do in a browser. Are you familiar with this:
This is a mechanism to fight against bots. Such scripts are bots and do a whole lot of things in a short time than a human, or a group of people.
Why browser automation?
Testing web applications
One of the most legit uses of browser automation is automated testing. Testing can be boring, moreover there are times you’ll have to test a system by entering thousands of rows of data. Do you do it manually? Or hire 500 more SQA guys?
One common use of browser automation is web scraping. I will explain this with an example: So you have built a mobile app that allows you to place food orders, you need data on all restaurants in your country. It is readily available by some agency but they won’t give it to you for free, it’s $1000. You create a script to visit their website and scrape all the data, page by page.
From account creation to sending emails, to filling survey checkboxes, to tweeting… (There might be roadblocks and mechanisms to stop bots, but that doesn’t mean it’s not doable). Automate anything you do with a browser. In our time where the internet is our home, this means a lot.
Python: You can either use python2 or python3, we’ll try and write our code to run for both. Check our python tutorials if you have never written python code.
Obviously you need a computer (I know you want to hit me right now)
Chromedriver for Chrome, geckodriver for Firefox. Download, extract them, put them in your python folder so they’re available in PATH. Or put it anywhere and add the directory to your PATH. Don’t know how? Check this out. If you didn’t start python with us, your python path may not be in your PATH. You may have to add it.
HTML: You should be familiar with HTML tags.
For linux users, put chromedriver and geckodriver somewhere that’s in your
PATH variable. Use this command to check the list of directories
echo $PATH or just put it in any directory and add the folder to path like this:
A little HTML
In our next lesson, we’ll talk just a little about HTML tags before we begin.
This is the framework we will be using. There is so much to this framework most especially when combined with xpath (We will talk about this in-depth as we go on).
I hope I have lit something within you that makes you anticipate the next post. 😀 Stay with us.