DEV Community

Cover image for Extracting Wikipedia data in Python 2022
Raman Bansal
Raman Bansal

Posted on

Extracting Wikipedia data in Python 2022

Wikipedia is the world's largest and free encyclopedia. Its data is easily accessible through a Python library called Wikipedia.
We will see how can we do that today in this tutorial.

Installation

Before using this API, firstly, we will manually installing it. Because, it is not an built-in API. So just type th following commands in your command prompt.

$ pip install Wikipedia
Enter fullscreen mode Exit fullscreen mode

Searching Wikipedia

Firstly, we will understand how to search a query in Wikipedia with this API.

Search Method

The search method returns thelist of search results for our query. Just like Google, Wikipedia also has its own search engine.
Here this the code.

import wikipedia
result = wikipedia.search("Tesla Inc.")
Enter fullscreen mode Exit fullscreen mode

Suggestions

Wikipedia also gives suggestions on searching on using search() method. On passing suggestion parameter in this method, we will find some suggestions for our query (if any).

result = wikipedia.search("Tesla Inc.", suggestion = True)
Enter fullscreen mode Exit fullscreen mode

Result

(["Tesla Inc.", ....], None)
Enter fullscreen mode Exit fullscreen mode

This will return a tuple which contains our search results and suggestions. There is no suggestions for our query. That is why, this return us None.

Getting Summary

With this API, we get summary of any article published on Wikipedia by its Title. For this, we just have to run the following code:

wikipedia.summary("Google maps")
Enter fullscreen mode Exit fullscreen mode

You have to enter the title of the articles published on wikipedia.

Languages

This Wikipedia module gives us an option to change the language in which we want to read the articles.

wikipedia.set_lang("fr")
Enter fullscreen mode Exit fullscreen mode

In above code, fr is language code for french language.

Supported Languages

To get the list of supported languages, run this code:

wikipedia.languages()
Enter fullscreen mode Exit fullscreen mode

languages() method returns us the list of languages on which articles are written in wikipedia.

Page Details

This wikipedia API also gives us the option to access all the web pages hosted wikipedia website. To access the page details, firstly, you have to run this code:

India = wikipedia.page("India")
Enter fullscreen mode Exit fullscreen mode

Now, we will you this India variable( instance) to get the page details.

Title and Url

To get the title of the page just enter the following code

India.title
Enter fullscreen mode Exit fullscreen mode

To get the url of the page, we can just enter the following code:

India.url
Enter fullscreen mode Exit fullscreen mode

Content

To get the whole content of the page / article, you just have to run the following code:

India.content
Enter fullscreen mode Exit fullscreen mode

Thanks
Read the full article
Link is here (TECHWITHPIE)

Top comments (0)