DEV Community

Cover image for Plotting ZIP Codes using Kepler.gl
vejendla
vejendla

Posted on • Updated on

Plotting ZIP Codes using Kepler.gl

Do you have a set of data that contains ZIP Codes, and want to display that data using a free, open-source tool?

Then read on to learn how I did exactly that to help a non-profit in my hometown understand their visitor data over a three year period.

This article contains three sections. First, I'll generate a list of vists by ZIP Codes using Pivot Tables.

Second, I'll describe how I used a free dataset and Python to convert the ZIP Codes into geographic coordinates.

Finally, I'll upload the data to Kepler.gl.

Here's what the final data looked like:

A visualization of data points

1. Creating a pivot table of Zip Codes

First, I previewed my data using Google Sheets:

A spreadsheet in Google Sheets

Then, I used pivot tables to display the count of visitors by ZIP Code. See the result (left) and settings (right) below. You can access this view by highlighting your data range and clicking "Insert > Pivot Tables".

A screenshot of Google Sheets displaying a pivot table and its configuration settings

2. Converting the Zip Codes to latitudes and longitudes

Since Kepler.gl requires coordinates to plot data, I had to find a way to convert ZIP Codes into latitudes and longitudes (As an aside, ZIP Codes aren't always the best datatype for business purposes. See this post for more.)

There are a number of ways to convert ZIP Codes to coordinates. One free way is to download this dataset from Kaggle, and use a VLOOKUP style operation in Pandas. That's exactly what I did. Below is the code that I used. You will need to have pandas and geopy installed prior to using this code.

import pandas as pd
import geopy
import math

# replace this with your input csv
df = pd.read_csv('20-21/20-21pv.csv')

# this is the name of the column in your input CSV that contains the ZIP Codes
df['Zip Code'] = df['Zip Code'].astype('int')

# replace this with the URL to the Kaggle csv
zipdf = pd.read_csv('zip_lat_long.csv')

def geocode_lat(row):
    if math.isnan(row['Zip Code']):
        return None

    searchZip = int(row['Zip Code'])
    tempRow = zipdf.loc[zipdf.ZIP == searchZip, "LAT"]
    if tempRow.empty:
        return None
    else:
        return tempRow.values[0]

def geocode_long(row):
    if math.isnan(row['Zip Code']):
        return None

    searchZip = int(row['Zip Code'])
    tempRow = zipdf.loc[zipdf.ZIP == searchZip, "LNG"]
    if tempRow.empty:
        return None
    else:
        return tempRow.values[0]

df['lat'] = df.apply(lambda row: geocode_lat(row), axis=1)
df['long'] = df.apply(lambda row: geocode_long(row), axis=1)

# replace this with the name of your output file
df.to_csv('20-21pv_with_latlong.csv')
Enter fullscreen mode Exit fullscreen mode

and here's the result (see the coordinate columns):

A table containing ZIP Codes and coordinates

Note 1: I did some mild data cleaning between step 1 and 2 that involved removing invalid ZIP Codes from the dataset. This included foreign ZIP Codes, and improperly formatted US ZIP Codes.

Note 2: The free dataset I used only contains ZIP Codes up to 2010. The US Postal Service has added a small number since then that weren't included in my final visualization.

3. Plotting visits in Kepler.gl

The final step is the easiest. With our .csv, I simply uploaded it into Kepler.gl.

The user interface should be self-explanatory, but there's a helpful guide if you get stuck.

Top comments (0)