DEV Community

Shreyas Soni
Shreyas Soni

Posted on • Originally published at shreyassoni.netlify.app

Explore Geopolitical data from GDELT

In this blog, we will explore the geopolitical data from GDELT and see how that data can be used in the analysis.

What is GDELT?

The GDELT Project created by Kalev H. Leetaru monitors the world's news from every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images, and events driving our global society.

In this blog, we will have a look at the Events database of GDELT and how this data can be used for analysis.

Event Database

The GDELT Event Database catalog over 20 main categories and more than 300 subcategories. Each category is given a particular cameo code. We will be looking into the 20 main cameo codes. That includes

  • Make Public Statement
  • Appeal
  • Express intent to cooperate
  • Consult
  • Engage in diplomatic cooperation
  • Engage in material cooperation
  • Provide aid
  • Yield
  • Investigate
  • Demand
  • Disapprove
  • Reject
  • Threaten
  • Protest
  • Exhibit military posture
  • Reduce relations
  • Coerce
  • Assault
  • Fight
  • Use unconventional mass violence

Let's see how we can get the data for these events for all countries.

How to get the data?

  • BigQuery You can query any data you want according to your need. Here is an example of a query.
select SQLDATE,EventRootCode,Actor1CountryCode,NumMentions from gdeltv2.events;
Enter fullscreen mode Exit fullscreen mode
  • Using gdelt python package

    • Installation: pip install gdelt
    • Call the gdelt version 2 database.
gd2 = gdelt.gdelt(version=2)
Enter fullscreen mode Exit fullscreen mode
  • Use gd2 object to search for the data of a given date and set table to events.
results = gd2.Search(['2020-01-01'],table='events',coverage=True)
Enter fullscreen mode Exit fullscreen mode

Processing the data to get Timeseries data for all countries

  • Load the data into the notebook.
df = pd.read_csv("gdelt.csv");
Enter fullscreen mode Exit fullscreen mode
  • The data output of the gdelt object has all the columns present in the events database. Now filter it to the columns necessary, i.e., SQLDATE, EventRootCode, Actor1CountryCode, NumMentions
results = results[['SQLDATE','EventRootCode','NumMentions','Actor1CountryCode']]
Enter fullscreen mode Exit fullscreen mode
  • Convert the SQLDATE format from 'YYYYMMDD' to 'YYYY-MM-DD'.
results['SQLDATE'] = results['SQLDATE'].apply(lambda x: pd.to_datetime(str(x), format='%Y-%m-%d'))            
Enter fullscreen mode Exit fullscreen mode
  • Aggregate the data based on SQLDATE, EventRootCode, and Actor1CountryCode.
results = results.groupby(['SQLDATE','EventRootCode','Actor1CountryCode']).agg('sum').reset_index()
Enter fullscreen mode Exit fullscreen mode

Data Analysis and Visualization

  • Mapping a Line Chart of a particular Cameo code for the country over time.
    Example: Protest in USA (Aggregated to Weekly basis)
    Alt Text

  • Mapping Top Cameo codes in a country based on the Number of Mentions of the particular cameo code.
    Example: Top Trends in USA (Last Week)

Alt Text

  • Mapping Top Countries in a particular cameo code based on the Number of Mentions of the particular cameo code in the country.
    Example: Top Countries in Protest (Last Week)
    Alt Text

  • Plot a choropleth map for a particular cameo code.
    Example: Protest (Today)
    Alt Text

Technology Used

  • Python
  • Pandas
  • Plotly

Code: Link

Co-author: @ashishsalunkhe

Top comments (0)