DEV Community

Cover image for Retreiving anchor tag using BeautifulSoup

Posted on

Retreiving anchor tag using BeautifulSoup

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching and modifying the parse tree. It commonly saves the programmer's hours or days of work.

The examples in this documentation should work the same way in Python 2.7 and Python 3.2.

You need to install beautiful soup.

We will import urllib, SSL, Beautiful soup.

Urllib is a package that collects several modules for working with URLs:
urllib.request for opening and reading URLs
urllib.error containing the exceptions raised by urllib.request
urllib.parse for parsing URLs
urllib.robotparser for parsing robots.txt files

Secure Sockets Layer (SSL) is a networking protocol designed for securing connections between web clients and web servers over an insecure network, such as the internet. After being formally introduced in 1995, SSL made it possible for a web server to securely enable online transactions between consumers and businesses.

Note: Don't worry about the SSL and the next 3 lines. It is just a way to ignore errors if you have SSL certification errors

So, let's start.

Alt Text

Now, we will ask the user to enter the Url. We will use urllib.request.urlopen().read() method to read data from web pages. Then we will pass this data to Beautiful soup so that it will deal with all the nasty bit of code and it will convert UTF-8 to Unicode as Python follows Unicode.

Alt Text

Now, we retrieve all the anchor tags.

Alt Text

Then, we loop through all tags and we pull out all text in href.

Alt Text

So, let's try it out.

Alt Text

That's it. If you have any Doubts feel free to comment.

Top comments (0)