Abstract
Air quality has become a significant concern in many cities globally due to increased pollution levels. In this short article, we'll see how to calculate the Air Quality Index (AQI), store air quality data in SingleStoreDB and quickly visualise the AQI.
Introduction
In this article, we'll obtain air quality data from the Department for Environment, Food and Rural Affairs (Defra) in the UK and store it in SingleStoreDB. SingleStoreDB integrates with many external tools and libraries. We'll utilise one of these integrations in this example.
Create a SingleStoreDB Cloud account
A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use AQI Demo Group as our Workspace Group Name and aqi-demo as our Workspace Name. We'll make a note of our password and host name.
We'll use the SQL Editor to create a new database, as follows:
CREATE DATABASE IF NOT EXISTS aqi_db;
Create a Deepnote account
We'll create a free account on the Deepnote website. Once logged in, we'll create a new Deepnote project to give us a new notebook. We'll also create a new directory called data
.
Obtain Air Quality Data
Courtesy of Tony Hirst, we'll use the Python downloader for UK Air Quality data notebook to obtain sample data. We'll use the same site code of KC1 (London N. Kensington) and select 2022 for the year. In the final step of the notebook, we'll assign the results to a Pandas Dataframe and write it to a CSV file, as follows:
import pandas as pd
...
aqi_df = pyreadr.read_r(fn)[fn.split(".")[0]]
aqi_df.to_csv("data/KC1_2022.csv", index = False)
At this point, we need to acknowledge Defra and uk-air.defra.gov.uk as the source of information as well as the licence:
© Crown 2023 copyright Defra via uk-air.defra.gov.uk, licenced under the Open Government Licence (OGL).
Let's take a quick look at the data:
aqi_df.head()
The output should be similar to the following:
date O3 NO NO2 NOXasNO2 SO2 CO PM10 PM2.5 wd ws temp site code
0 2022-01-01 00:00:00 39.11572 0.24946 9.94500 10.51875 0.26609 0.209556 14.9 12.076 212.0 3.0 10.4 London N. Kensington KC1
1 2022-01-01 01:00:00 38.71658 0.12473 7.84125 8.22375 -0.53218 0.197914 16.3 12.642 214.8 3.7 10.5 London N. Kensington KC1
2 2022-01-01 02:00:00 41.51056 0.12473 6.88500 7.07625 0.26609 0.221198 15.4 11.604 212.9 3.6 10.2 London N. Kensington KC1
3 2022-01-01 03:00:00 45.70153 0.12473 4.97250 5.16375 0.26609 0.186272 13.6 10.189 208.8 3.9 10.6 London N. Kensington KC1
4 2022-01-01 04:00:00 48.49551 0.12473 4.59000 4.59000 0.26609 0.186272 10.9 8.302 206.7 4.0 11.3 London N. Kensington KC1
We can see the range of values:
aqi_df.describe()
The output should be similar to the following:
O3 NO NO2 NOXasNO2 SO2 CO PM10 PM2.5 wd ws temp
count 8617.000000 8593.000000 8593.000000 8593.000000 7923.000000 7737.000000 8592.000000 8622.000000 8664.000000 8664.000000 8664.000000
mean 49.920524 4.817513 18.449004 25.724694 0.907117 0.165013 14.606646 8.869804 193.805794 3.286034 10.889578
std 26.435005 18.278836 16.204639 39.501914 1.111520 0.139987 10.512747 8.222229 98.460865 1.567511 6.410335
min -0.199570 -0.124730 0.000000 0.000000 -1.330450 0.000000 0.700000 0.377000 0.100000 0.000000 -5.900000
25% 32.929050 0.374190 7.458750 8.415000 0.266090 0.093136 8.200000 4.245000 107.475000 2.200000 6.500000
50% 51.089920 0.748380 12.813750 14.152500 0.532180 0.128062 11.900000 6.226000 219.100000 2.900000 10.700000
75% 66.456810 1.995680 23.906250 26.392500 1.064360 0.186272 17.500000 10.283000 265.000000 4.100000 15.300000
max 166.441380 368.701880 108.438750 652.927500 10.377510 1.641522 110.400000 84.623000 360.000000 12.900000 31.900000
We can see data for the polluting gases:
- Ozone (O3)
- Nitrogen monoxide (NO)
- Nitrogen dioxide (NO2)
- Sulphur dioxide (SO2)
- Carbon monoxide (CO)
- Particulates (PM10 and PM2.5)
We'll check for missing data:
aqi_df.isna().sum()
The output should be similar to the following:
date 0
O3 143
NO 167
NO2 167
NOXasNO2 167
SO2 837
CO 1023
PM10 168
PM2.5 138
wd 96
ws 96
temp 96
site 0
code 0
For our initial analysis, we'll remove the rows with missing data:
aqi_df.dropna(inplace = True)
Calculate Air Quality Index (AQI)
Next, we'll use the python-aqi
library to calculate the AQI. This library uses algorithms from:
- The United States Environmental Protection Agency (EPA)
- The China Ministry of Environmental Protection (MEP)
First, we need to install it, as follows:
!pip install python-aqi
We'll initially use the particulates (PM10 and PM2.5) to calculate the AQI using the EPA algorithm and create a new column in the Pandas Dataframe to store the result:
import aqi
aqi_df["aqi"] = aqi_df.apply(
lambda row: aqi.to_aqi([
(aqi.POLLUTANT_PM25, row["PM2.5"]),
(aqi.POLLUTANT_PM10, row["PM10"])
]),
axis = 1
)
Now we'll store the Pandas Dataframe in SingleStoreDB.
Store data in SingleStoreDB
We'll use Ibis to create the table and store the data in SingleStoreDB.
First, we'll install the package:
!pip install ibis-singlestoredb
Next, we'll import:
import ibis
ibis.options.interactive = True
We'll now create a connection to SingleStoreDB:
conn = ibis.singlestoredb.connect(
"admin:<password>@<host>:3306/aqi_db"
)
We'll replace the <password>
and <host>
with the values from our SingleStoreDB Cloud account.
Now we'll create a new table in SingleStoreDB using the Pandas Dataframe:
aqi_tbl = conn.create_table("aqi", aqi_df, force = True)
We can check the table using several methods. Let's use the SQL Editor in SingleStoreDB:
USE aqi_db;
SELECT * FROM aqi LIMIT 5;
The output should be similar to the following:
+---------------------+---------+---------+---------+----------+---------+----------+------+--------+-------+------+------+----------------------+------+------+
| date | O3 | NO | NO2 | NOXasNO2 | SO2 | CO | PM10 | PM2.5 | wd | ws | temp | site | code | aqi |
+---------------------+---------+---------+---------+----------+---------+----------+------+--------+-------+------+------+----------------------+------+------+
| 2022-01-01 03:00:00 | 45.7015 | 0.12473 | 4.9725 | 5.16375 | 0.26609 | 0.186272 | 13.6 | 10.189 | 208.8 | 3.9 | 10.6 | London N. Kensington | KC1 | 42 |
| 2022-01-01 04:00:00 | 48.4955 | 0.12473 | 4.59 | 4.59 | 0.26609 | 0.186272 | 10.9 | 8.302 | 206.7 | 4 | 11.3 | London N. Kensington | KC1 | 35 |
| 2022-01-01 17:00:00 | 46.3002 | 0.12473 | 8.22375 | 8.415 | 0.26609 | 0.197914 | 23 | 12.925 | 229.7 | 3.8 | 10.2 | London N. Kensington | KC1 | 53 |
| 2022-01-01 19:00:00 | 41.7101 | 0.12473 | 12.0487 | 12.24 | 0.26609 | 0.221198 | 23.4 | 14.057 | 231.8 | 5 | 11 | London N. Kensington | KC1 | 55 |
| 2022-01-01 21:00:00 | 44.7037 | 0.12473 | 9.75375 | 9.75375 | 0.26609 | 0.221198 | 24.3 | 13.679 | 231.8 | 5 | 11.3 | London N. Kensington | KC1 | 54 |
+---------------------+---------+---------+---------+----------+---------+----------+------+--------+-------+------+------+----------------------+------+------+
A variety of tools could now be used with SingleStoreDB to perform analytics on the data. For example, we have previously discussed the ease with which we can use analytics and visualisation tools with SingleStoreDB, such as Apache Superset, Metabase and Looker.
Bonus: Create visualisation
From our Python environment, we can also create a quick visualisation using Plotly:
import pandas as pd
import plotly.graph_objs as go
# Define the ranges for each AQI category
aqi_bounds = [0, 51, 101, 151, 201, 301, 500]
# Define the names of each AQI category
aqi_categories = [
"Good",
"Moderate",
"Unhealthy for Sensitive Groups",
"Unhealthy",
"Very Unhealthy",
"Hazardous",
"Very Hazardous"
]
# Define the colours for each AQI category
aqi_colors = [
[0.0, "green" ],
[0.1, "green" ],
[0.102, "yellow"],
[0.2, "yellow"],
[0.202, "orange"],
[0.3, "orange"],
[0.302, "red" ],
[0.4, "red" ],
[0.402, "purple"],
[0.6, "purple"],
[0.602, "maroon"],
[1.0, "maroon"]
]
# Define the tick values and labels for the colorbar
tickvals = [(aqi_bounds[i] + aqi_bounds[i + 1]) / 2 for i in range(len(aqi_bounds)-1)]
ticktext = aqi_categories[:-1]
# Create a new figure object with a heatmap trace
fig = go.Figure(data = go.Heatmap(
x = aqi_df["date"], y = aqi_df["site"], z = aqi_df["aqi"],
colorscale = aqi_colors, zmin = aqi_bounds[0], zmax = aqi_bounds[-1],
colorbar = dict(title = "Air Quality Index",
tickmode = "array",
ticktext = ticktext,
tickvals = tickvals
)
)
)
# Customise the layout of the figure
fig.update_layout(
xaxis_title = "Date",
yaxis_title = "Location",
title = "Air Quality Index by Location and Date",
width = 1000,
height = 600
)
# Show the figure
fig.show()
This produces the heatmap shown in Figure 1.
The overall quality of air throughout 2022 was Good. The next steps would be to drill down and further analyse the data, check the data for the other polluting gases, and so on. We could also make the heatmap more interesting by adding additional weather data.
Summary
In this short article, we have quickly analysed some air quality data from Defra in the UK, using the US EPA algorithm with data for particulates (PM10 and PM2.5). However, we note that many countries use different algorithms to calculate their AQI, although the approach described in this article could be easily adapted for other countries.
Top comments (0)