MarineCadastre.gov AIS vessel density maps with PySpark and h3 | by Jordan Bell | Feb, 2024 | Medium
Office for Coastal Management, 2023: Nationwide Automatic Identification System 2022, https://www.fisheries.noaa.gov/inport/item/67336
See Leveraging ESG Data to Operationalize Sustainability. November 11, 2020. Antoine Amend. Databricks
We ingest AIS message data for June 2022 into a Databricks table. (See https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-01-Data-Loading.ipynb and https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-10-Statistical-Tests.ipynb)
We get the following:
df_ais_june = spark.read.table("ais.june")
df_ais_june.printSchema()
root
|-- MMSI: string (nullable = true)
|-- BaseDateTime: timestamp (nullable = true)
|-- LAT: double (nullable = true)
|-- LON: double (nullable = true)
|-- SOG: double (nullable = true)
|-- COG: double (nullable = true)
|-- Heading: double (nullable = true)
|-- VesselName: string (nullable = true)
|-- IMO: string (nullable = true)
|-- CallSign: string (nullable = true)
|-- VesselType: integer (nullable = true)
|-- Status: integer (nullable = true)
|-- Length: double (nullable = true)
|-- Width: double (nullable = true)
|-- Draft: double (nullable = true)
|-- Cargo: string (nullable = true)
|-- TranscieverClass: string (nullable = true)
|-- filename: string (nullable = true)
In https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-11-Time-Filtering.ipynb we make a plot of vessel density made using pyspark and h3 on a Databricks cluster, for June 4, 2022.
We make similar plots for each day in June in the following, also from https://github.com/jordanbell2357/uscg-nais-data/blob/main/Databricks/Databricks-AIS-11-Time-Filtering.ipynb:
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Polygon
import h3
import h3_pyspark
from pyspark.sql import functions as F
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
import matplotlib.colors as colors
# Define the lat/lon boundaries
min_lat, max_lat, min_lon, max_lon = 10, 60, -140, -50
# Define the hexagon resolution
resolution = 4
# Iterate over the days of the month
for day_of_month in range(1, 31):
# Read the spark dataframe
df_ais_june = spark.read.table("ais.june")
# Filter the dataframe for the specific day of the month
df_ais_june = df_ais_june.filter(F.dayofmonth('BaseDateTime') == day_of_month)
# Add a 'resolution' column
df_ais_june = df_ais_june.withColumn('resolution', F.lit(resolution))
# Create an h3 index for each location using the UDF
df_ais_june = df_ais_june.withColumn('hex_id', h3_pyspark.geo_to_h3(df_ais_june['LAT'], df_ais_june['LON'], df_ais_june['resolution']))
# Count the occurrences of each hex_id to get the density
df_density = df_ais_june.groupBy('hex_id').count()
# Convert the density dataframe to Pandas
df_density_pd = df_density.toPandas()
# Create polygons for each unique hex_id, and filter them based on the bounding box
hex_polygons = []
hex_ids = []
for hex_id in df_density_pd['hex_id'].unique():
hex_boundary = h3.h3_to_geo_boundary(hex_id)
polygon = Polygon([(lon, lat) for lat, lon in hex_boundary])
# create a point based on the polygon's centroid and check if it lies within the bounding box
centroid = polygon.centroid
if min_lat <= centroid.y <= max_lat and min_lon <= centroid.x <= max_lon:
hex_polygons.append(polygon)
hex_ids.append(hex_id)
# Filter the density dataframe to only include hexagons within the bounding box
df_density_pd = df_density_pd[df_density_pd['hex_id'].isin(hex_ids)]
# Replace inf with max non-inf value and NaN with 1
df_density_pd.loc[df_density_pd['count'] == np.inf, 'count'] = df_density_pd.loc[df_density_pd['count'] != np.inf, 'count'].max()
df_density_pd['count'] = df_density_pd['count'].replace({0:1}).fillna(1)
# Create a GeoDataFrame
gdf = gpd.GeoDataFrame(df_density_pd, geometry=hex_polygons)
# Save the choropleth map to a file
output_file = f"/dbfs/tmp/ais_june_choropleth_{day_of_month}.png"
fig, ax = plt.subplots(1, 1, figsize=(20, 20))
gdf.plot(column='count', cmap='YlOrRd', norm=colors.LogNorm(vmin=gdf['count'].min(), vmax=gdf['count'].max()), missing_kwds={'color': 'white'}, ax=ax)
plt.xlim(min_lon, max_lon)
plt.ylim(min_lat, max_lat)
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.savefig(output_file)
plt.close(fig)
print(f"Choropleth map for day {day_of_month} saved to: {output_file}")
Choropleth map for day 1 saved to: /dbfs/tmp/ais_june_choropleth_1.png
Choropleth map for day 2 saved to: /dbfs/tmp/ais_june_choropleth_2.png
Choropleth map for day 3 saved to: /dbfs/tmp/ais_june_choropleth_3.png
Choropleth map for day 4 saved to: /dbfs/tmp/ais_june_choropleth_4.png
Choropleth map for day 5 saved to: /dbfs/tmp/ais_june_choropleth_5.png
Choropleth map for day 6 saved to: /dbfs/tmp/ais_june_choropleth_6.png
Choropleth map for day 7 saved to: /dbfs/tmp/ais_june_choropleth_7.png
Choropleth map for day 8 saved to: /dbfs/tmp/ais_june_choropleth_8.png
Choropleth map for day 9 saved to: /dbfs/tmp/ais_june_choropleth_9.png
Choropleth map for day 10 saved to: /dbfs/tmp/ais_june_choropleth_10.png
Choropleth map for day 11 saved to: /dbfs/tmp/ais_june_choropleth_11.png
Choropleth map for day 12 saved to: /dbfs/tmp/ais_june_choropleth_12.png
Choropleth map for day 13 saved to: /dbfs/tmp/ais_june_choropleth_13.png
Choropleth map for day 14 saved to: /dbfs/tmp/ais_june_choropleth_14.png
Choropleth map for day 15 saved to: /dbfs/tmp/ais_june_choropleth_15.png
Choropleth map for day 16 saved to: /dbfs/tmp/ais_june_choropleth_16.png
Choropleth map for day 17 saved to: /dbfs/tmp/ais_june_choropleth_17.png
Choropleth map for day 18 saved to: /dbfs/tmp/ais_june_choropleth_18.png
Choropleth map for day 19 saved to: /dbfs/tmp/ais_june_choropleth_19.png
Choropleth map for day 20 saved to: /dbfs/tmp/ais_june_choropleth_20.png
Choropleth map for day 21 saved to: /dbfs/tmp/ais_june_choropleth_21.png
Choropleth map for day 22 saved to: /dbfs/tmp/ais_june_choropleth_22.png
Choropleth map for day 23 saved to: /dbfs/tmp/ais_june_choropleth_23.png
Choropleth map for day 24 saved to: /dbfs/tmp/ais_june_choropleth_24.png
Choropleth map for day 25 saved to: /dbfs/tmp/ais_june_choropleth_25.png
Choropleth map for day 26 saved to: /dbfs/tmp/ais_june_choropleth_26.png
Choropleth map for day 27 saved to: /dbfs/tmp/ais_june_choropleth_27.png
Choropleth map for day 28 saved to: /dbfs/tmp/ais_june_choropleth_28.png
Choropleth map for day 29 saved to: /dbfs/tmp/ais_june_choropleth_29.png
Choropleth map for day 30 saved to: /dbfs/tmp/ais_june_choropleth_30.png
We then download these daily plot files locally. Next we use ffmpeg. ffmpeg Documentation
The following bash script uses ImageMagick convert to make daily progress bars and overlay those with the daily plots. ImageMagick convert documentation
#!/bin/bash
# ffmpeg.sh
# Create 30 images with increasing filled progress bars
for i in {1..30}; do
progress=$((i*1000/30)) # increase size 10 times
convert -size 1000x200 xc:grey50 -fill "rgb(100,149,237)" -draw "rectangle 0,0 $progress,200" progress_$(printf "%02d" $i).png
done
width=1440 # width of original images
# Overlay progress bar onto each image, create new image
for i in {1..30}; do
# Calculate progress bar position
x_offset=$(( (width-1000)/2 )) # center the progress bar
# Overlay progress bar and save as new image
convert ais_june_choropleth_$i.png progress_$(printf "%02d" $i).png -geometry +$x_offset+10 -composite ais_june_choropleth_progress_$(printf "%02d" $i).png
done
# Create video using ffmpeg
ffmpeg -framerate 5 -pattern_type glob -i 'ais_june_choropleth_progress_*.png' -c:v libx264 -r 30 -pix_fmt yuv420p ais_june_choropleth.mp4
# Clean up progress bar and composite images
rm progress_*.png
rm ais_june_choropleth_progress_*.png
This creates ais_june_choropleth.mp4. This video file is hosted on YouTube:
We remark in passing that the Liquid syntax used for embedding the above YouTube video is
{% youtube Rt0RTpmlaK0 %}
Finally, we use ffmpeg to convert ais_june_choropleth.mp4 to ais_june_choropleth.gif.
ffmpeg -i ais_june_choropleth.mp4 -vf "fps=3,scale=1000:-1:flags=lanczos" -c:v gif ais_june_choropleth.gif
The file ais_june_choropleth.gif is shown below.


Top comments (0)