## DEV Community is a community of 698,340 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

# How to Build an Interactive Bubble Map in Python Using Plotly

Kedar Ghule
I love to code and travel. They're both filled with a lot of exploration and adventure.

In this tutorial, we will be creating a county-level geographic bubble map of the active COVID-19 cases in the United States. First of all, let us understand what a Bubble Map is!

# What is a Bubble Map?

Bubble maps are a kind of geographic visualization that draws their roots from the bubble charts. In bubble charts, the bubbles are plotted on a Cartesian plane. In the case of bubble maps, these bubbles are plotted on geographic regions. The size of the bubble over the geographic area is proportional to the value of a particular variable. Bubble maps are important as they are one of the best ways to compare proportions over a geographic region.

# Building a Bubble Map Using Plotly

Let us dive straight into the tutorial now. Throughout this tutorial, we will also do some basic exploratory data analysis and data cleaning.

1. Importing Libraries

The first step is to import the necessary libraries we will need throughout this tutorial. We will be using the popular python data analysis library called 'Pandas' and our data visualization library - Plotly. We need to import specifically a class called `graph_objects` from plotly.

``````import pandas as pd
import plotly.graph_objects as go
``````

Next, we import our dataset and store it into a DataFrame. The dataset I am using is by Johns Hopkins University and can be found here. When this code was written, the dataset for the 6th March 2021 was the last dataset that included the active COVID-19 cases count. It seems like Johns Hopkins removed the active and recovered cases data for datasets after 6th March 2021.

``````df = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-06-2021.csv",
dtype={"FIPS": str})
``````

This is what our DataFrame looks like -

3. Exploratory Data Analysis and Data Cleaning

Now you can see that this DataFrame has data for other countries as well. Since we are focusing only on the United States data for this tutorial, let's filter to only the US data and update our DataFrame.

``````df = df[df.Country_Region == "US"]
``````

Now, our updated DataFrame looks like this -

Let us now explore the data further. First, let us find the length of the DataFrame. Since we are going to make a bubble map for the active COVID-19 cases in the US, let us check the maximum and minimum values in the `Active` column. The `Active` column contains the data for the active COVID-19 cases.

``````len(df)
df.Active.max()
df.Active.min()
``````

Depending on the dataset you are using, you will get different values for the above statements.

Wait, how can active cases be a negative number? Surely there must be something wrong. Let us see which row in the DataFrame has this data. Furthermore, let us also check what other rows have their active cases values less than 0.

``````df[df.Active == df.Active.min()]
df[df.Active < 0]
``````

Ah! You can see that some unassigned rows have these values. This data needs to be cleaned from our DataFrame as it would serve us no purpose. So, we will filter out the rows which have values less than 0 in the `Active` column. we will also take a look at the length of the DataFrame once again and the minimum and maximum values in the `Active` column.

``````df = df[df.Active > 0]
len(df)
df.Active.max(), df.Active.min()
``````

Let us check for missing values in other columns before moving ahead, specifically the `Admin2`, `Lat`, and `Long_` columns. The `Admin2` column specifies the county name. The `Lat` and `Long_` columns specify the latitude and longitude values for these counties. These columns will feature heavily while we work on the code for the bubble map.

``````df.isna().sum()
``````

So we get the number of missing values in each column of our DataFrame. Our `Admin2` column has 5 missing values, while the `Lat` and `Long_` columns have 36 missing values. Let us remove these missing values from the `Admin2`, `Lat`, and `Long_` columns. They anyways won't serve any purpose to us while plotting our bubble map. We will also verify if these values have been removed or not.

``````df.dropna(subset=['Lat', 'Long_', 'Admin2'], inplace=True)
df.isna().sum()
``````

Fantastic! Our three main columns - `Admin2`, `Lat`, and `Long_` do not have any missing values.

4. Sorting And Rearranging Data

Next, let us sort our DataFrame in descending order of active cases. Since the sorting rearranges the indexes of the DataFrame, we will also reset the indexes of our newly sorted DataFrame.

``````df = df.sort_values(by=["Active"], ascending=False)
df.reset_index(drop=True, inplace=True)
``````

5. Setting Value Limit Intervals

We need to set some levels or limits to group the range of COVID-19 cases by specifying an upper bound and a lower bound of active COVID cases. For this, we create a list called `stages`. This `stages` list will be used for our bubble map's legend.
0-100 cases will be one range, 101-1000 cases will be another range, and so on.

After that, we will store the index values of rows that fall in these ranges as a list of tuples called `limits`.

``````stages = ["400000+", "300001-400000", "200001-300000", "100001-200000", "50001-100000", "10001-50000",
"1001-10000", "101-1000", "1-100"]

# Create tuples of row indexes for the above ranges
tuple1 = (0, df[df.Active > 400000].index[-1]+1)
tuple2 = (tuple1[1], df[(df.Active > 300000) & (df.Active <=400000)].index[-1]+1)
tuple3 = (tuple2[1], df[(df.Active > 200000) & (df.Active <=300000)].index[-1]+1)
tuple4 = (tuple3[1], df[(df.Active > 100000) & (df.Active <=200000)].index[-1]+1)
tuple5 = (tuple4[1], df[(df.Active > 50000) & (df.Active <=100000)].index[-1]+1)
tuple6 = (tuple5[1], df[(df.Active > 10000) & (df.Active <=50000)].index[-1]+1)
tuple7 = (tuple6[1], df[(df.Active > 1000) & (df.Active <=10000)].index[-1]+1)
tuple8 = (tuple7[1], df[(df.Active > 100) & (df.Active <=1000)].index[-1]+1)
tuple9 = (tuple8[1], df[df.Active <=100].index[-1]+1)

limits = [tuple1, tuple2, tuple3, tuple4, tuple5, tuple6, tuple7, tuple8, tuple9]
limits
``````

So, all rows with the value of their active cases greater than 400,000 will be in `tuple1`. All rows with their active cases value greater than 300,000, but less than or equal to 400,000 will be in `tuple2`. And so on.

6. Time to Plot our Bubble Map!

Since bubble maps show a bubble size proportional to the variable's value, it is also essential to set the right colour for the bubble. Aesthetics make a lot of difference in data visualizations. We will set a list of colours. I chose shades of red from the following link - http://www.workwithcolor.com/red-color-hue-range-01.htm. Note that the number of colours should be equal to the number of tuples we have in the `limits` variable.

``````colors = ["#CC0000","#CE1620","#E34234","#CD5C5C","#FF0000", "#FF1C00", "#FF6961", "#F4C2C2", "#FFFAFA"]
``````

Note that if you are using a Jupyter notebook, the below code should be in one cell. I have split it up in this blog post for explaining the code easily.

``````fig = go.Figure()
stage_counter = 0
for i in range(len(limits)):
lim = limits[i]
df_sub = df[lim[0]:lim[1]]
locationmode = 'USA-states',
lon = df_sub['Long_'],
lat = df_sub['Lat'],
marker = dict(
size = df_sub['Active']*0.002,
color = colors[i],
line_color='rgb(40,40,40)',
line_width=0.5,
sizemode = 'area'
),
name = '{}'.format(stages[stage_counter])))
stage_counter = stage_counter+1
``````

Okay, here starts the complex part.

First, we set our `stage_counter` (the variable that tracks which `stage` we are on) to 0.

Next comes the for loop, which loops 9 times, once for every tuple in the `limits` variable. During each iteration, we extract a part of our original DataFrame to `df_sub`. The new DataFrame `df_sub` contains the rows whose index falls in the range specified by that tuple. During our first iteration, `df_sub` will contain rows with indexes - 0, 1, 2 and 3. In the same iteration, we plot the bubbles for those rows using the latitude and longitude value specified for that county under the `Lat` and `Long_` columns. We specify the 'text' parameter as the county's name (value in `Admin2` column) so that once the visualization is ready, we can hover over the bubble to see the name of the county. Next, we specify the size of the bubble proportional to the Active COVID-19 cases by multiplying the value in the `Active` column with 0.002. You may use a different value. This value seemed apt to me for my visualization. We also specify the colour of the bubble. The 'name' parameter will specify the trace name. The trace name appears as the legend item and on hover. For the first iteration, this value will be the first item in the `stages` list, i.e., "400000+". And finally, before we move to the next iteration, we increment the `stage_counter` by 1.

If you are confused by the parameters in the above code snippet, check out this documentation.

``````fig.update_layout(
title_text = 'Active Covid-19 Cases In The United States By Geography',
title_x=0.5,
showlegend = True,
legend_title = 'Range Of Active Cases',
geo = dict(
scope = 'usa',
landcolor = 'rgb(217, 217, 217)',
projection=go.layout.geo.Projection(type = 'albers usa'),
)
)
``````

Next, we focus on the aesthetics of our bubble map visualization. We set the title of the bubble map and its position (title_x=0.5 means center aligned) and the title of the legend. Since we are making a bubble map about the US COVID-19 Active cases, we specify the bubble map scope as 'usa'. For aesthetics, I changed the US landmass colour to grey using the 'landcolor' parameter.

Finally, we save our graph on our local machine. And then, we display it on our Jupyter notebook.

``````fig.write_image("Active-Covid19-Cases-US-bubblemap.png", scale=2)
fig.show()
``````

And our bubble map is ready!

# Conclusion

You can find the code for this tutorial on my GitHub.

Thanks a lot for reading my tutorial! If you have any questions, feel free to ask me! You can also follow me on Twitter or connect with me on LinkedIn. I would also love to get some feedback on my code and my post!