DEV Community

Cover image for Building a US choropleth in Python with plotly express, using a real fragrance dataset
ahmad-khan-97
ahmad-khan-97

Posted on • Originally published at github.com

Building a US choropleth in Python with plotly express, using a real fragrance dataset

Plotly Express's choropleth is one of the cleanest "low-code, high-output" data viz tools in the Python ecosystem, but the documentation skips a few non-obvious gotchas that bite you the first time you build a US-state-level map. This walkthrough builds a publication-grade state map end-to-end, using a real dataset I open-sourced last week.

The dataset: the most-searched fragrance in every US state over the last 12 months, pulled from Google Trends across 30 of the most-talked-about fragrances of 2024-2026. Full dataset on GitHub under CC BY 4.0, full written analysis at perfumem.com.

What you'll build

A US state choropleth where each state is colored by its category winner (a discrete, not continuous, value), with a clean legend and a high-DPI export that's safe to drop into a blog post or report.

US state map

Setup

pip install plotly pandas kaleido
Enter fullscreen mode Exit fullscreen mode

kaleido is the static image export engine; you need it if you want PNG output instead of just an interactive HTML chart.

Step 1: get the data into a clean DataFrame

If you want to follow along with the actual fragrance dataset:

import pandas as pd

df = pd.read_csv(
    "https://raw.githubusercontent.com/ahmad-khan-97/us-fragrance-trends-2026/main/data/winning_fragrance_per_state.csv"
)
print(df.head())
#         state         winning_fragrance  score      second_place  second_score
# 0     Alabama                 Old Spice     78         Polo Blue            61
# 1      Alaska         Coco Mademoiselle     71         Old Spice            66
# 2     Arizona                 Old Spice     82  Coco Mademoiselle            58
Enter fullscreen mode Exit fullscreen mode

Wide format with one row per state and a categorical winner column. This is what plotly express's choropleth wants.

Step 2: map full state names to USPS codes (the gotcha)

plotly.express.choropleth with locationmode="USA-states" requires 2-letter USPS codes, not full state names. The most common silent failure mode is shipping a DataFrame with full names and getting an empty map with no error.

US_STATE_TO_CODE = {
    "Alabama": "AL", "Alaska": "AK", "Arizona": "AZ", "Arkansas": "AR",
    "California": "CA", "Colorado": "CO", "Connecticut": "CT", "Delaware": "DE",
    "District of Columbia": "DC", "Florida": "FL", "Georgia": "GA", "Hawaii": "HI",
    "Idaho": "ID", "Illinois": "IL", "Indiana": "IN", "Iowa": "IA",
    "Kansas": "KS", "Kentucky": "KY", "Louisiana": "LA", "Maine": "ME",
    "Maryland": "MD", "Massachusetts": "MA", "Michigan": "MI", "Minnesota": "MN",
    "Mississippi": "MS", "Missouri": "MO", "Montana": "MT", "Nebraska": "NE",
    "Nevada": "NV", "New Hampshire": "NH", "New Jersey": "NJ", "New Mexico": "NM",
    "New York": "NY", "North Carolina": "NC", "North Dakota": "ND", "Ohio": "OH",
    "Oklahoma": "OK", "Oregon": "OR", "Pennsylvania": "PA", "Rhode Island": "RI",
    "South Carolina": "SC", "South Dakota": "SD", "Tennessee": "TN", "Texas": "TX",
    "Utah": "UT", "Vermont": "VT", "Virginia": "VA", "Washington": "WA",
    "West Virginia": "WV", "Wisconsin": "WI", "Wyoming": "WY",
}

df["code"] = df["state"].map(US_STATE_TO_CODE)
missing = df[df["code"].isna()]
if len(missing) > 0:
    print("WARNING: unmapped states:", missing["state"].tolist())
df = df.dropna(subset=["code"])
Enter fullscreen mode Exit fullscreen mode

Always check for unmapped rows. Google Trends sometimes returns territories like "Puerto Rico" or oddly capitalized "district of columbia" that won't match a strict dict lookup.

Step 3: build the choropleth

import plotly.express as px

fig = px.choropleth(
    df,
    locations="code",
    locationmode="USA-states",
    color="winning_fragrance",
    scope="usa",
    title="Most-searched fragrance by US state (Google Trends, 12mo)",
    color_discrete_sequence=px.colors.qualitative.Set3,
    hover_data={
        "state": True,
        "winning_fragrance": True,
        "score": True,
        "code": False,
    },
)
Enter fullscreen mode Exit fullscreen mode

Three things that matter at this step:

  1. color_discrete_sequence (not color_continuous_scale). When your fill variable is categorical (a fragrance name, a winning candidate, a product category), you want discrete colors. px.colors.qualitative.Set3 gives you 12 visually distinct colors that work well on choropleths. Other good options: Bold, Pastel, Vivid, Safe.

  2. hover_data. By default, plotly shows the location code in hover, which reads as "AL" or "TX" and not as "Alabama" or "Texas". Override it explicitly so users see the full state name + the actual data point.

  3. scope="usa". Without this, plotly draws the whole world map and shoves the US in one corner. With it, plotly auto-zooms and projects to USA Albers, which is the standard publication projection for US choropleths.

Step 4: clean up the layout

Default plotly titles and margins are conference-talk sized, which is too big for a blog post.

fig.update_layout(
    title=dict(
        text="Most-searched fragrance by US state",
        font=dict(size=22, family="Inter, system-ui, sans-serif"),
        x=0.5, xanchor="center",
    ),
    legend=dict(
        title="Top fragrance",
        font=dict(size=11),
        orientation="v",
        yanchor="middle", y=0.5,
        xanchor="left", x=1.02,
    ),
    geo=dict(
        bgcolor="rgba(0,0,0,0)",
        lakecolor="rgba(0,0,0,0)",
    ),
    margin=dict(l=10, r=10, t=80, b=20),
    paper_bgcolor="white",
)
Enter fullscreen mode Exit fullscreen mode

Step 5: export at publication DPI

fig.write_image("us-state-map.png", width=1600, height=900, scale=2)
Enter fullscreen mode Exit fullscreen mode

scale=2 doubles the effective DPI, giving you a sharp 3200x1800 raster. Anything less than scale=2 looks blurry on Retina displays in 2026.

For interactive HTML embed (Substack, Notion, blog post):

fig.write_html("us-state-map.html", include_plotlyjs="cdn", full_html=False)
Enter fullscreen mode Exit fullscreen mode

full_html=False returns just the chart div, which you embed in an existing page. include_plotlyjs="cdn" keeps the file under 100 KB instead of the 4 MB you get with inline plotly.js.

Bonus: the matplotlib fallback for static-only PDF/print pipelines

If your downstream uses LaTeX or matplotlib-only figures (academic publishing, print magazines), here's the equivalent in pure matplotlib + geopandas:

import geopandas as gpd
import matplotlib.pyplot as plt

# US states shapefile (Census Bureau, public domain)
url = "https://www2.census.gov/geo/tiger/GENZ2021/shp/cb_2021_us_state_20m.zip"
states = gpd.read_file(url)
states = states.merge(df, left_on="STUSPS", right_on="code")

fig, ax = plt.subplots(figsize=(16, 9), dpi=200)
states.plot(
    column="winning_fragrance",
    ax=ax,
    legend=True,
    cmap="Set3",
    edgecolor="white",
    linewidth=0.5,
)
ax.set_xlim(-130, -65)  # crop to continental US
ax.set_ylim(23, 50)
ax.axis("off")
plt.tight_layout()
plt.savefig("us-state-map-mpl.png", dpi=200, bbox_inches="tight")
Enter fullscreen mode Exit fullscreen mode

Slightly more code than plotly, but the output is print-ready and embeds cleanly into any LaTeX document.

What the data actually showed

Quick worked example using the dataset we just visualized:

  • Old Spice ranks #1 in 43 of 51 regions
  • 8 outlier states pick something completely different:
    • Alaska + South Dakota: Coco Mademoiselle (Chanel)
    • Louisiana + Mississippi: Polo Blue (Ralph Lauren)
    • Montana: Marc Jacobs Daisy
    • New Mexico: Ariana Grande Cloud (the only state where Cloud ranks #1)
    • North Dakota + Vermont: Glossier You

Two of those clusters (LA/MS, ND/VT) line up with broader regional consumer-brand patterns visible in other state-level datasets. The others are more interesting puzzles. Full interpretation is at perfumem.com.

Use this dataset for your own viz tutorials

The CSVs in this GitHub repo are CC BY 4.0. Drop them into any state-level viz tutorial you write, with attribution. Particularly useful for:

  • Choropleth tutorials (most state-level demo datasets are election or population, this gives you something different)
  • Categorical fill examples (winners, not magnitudes)
  • Cluster analysis on state-level consumer behavior
  • plotly.express vs geopandas head-to-head comparisons

What I'd improve in v2

  1. Add a base map underlay. Choropleth-only maps lose context. A faint underlay of major cities + state borders would help readers locate Vermont vs New Hampshire vs Maine without squinting.

  2. Animated time series version. plotly.express.choropleth with animation_frame= gives you a play button if you have monthly data. The fragrance dataset is annual, but if I re-pull monthly I could show how Glossier You took Vermont from Old Spice over time.

  3. Highlight the outliers. Instead of equal coloring per state, dim the 43-state Old Spice block and highlight only the 8 outliers in vivid colors. This is the chart that should accompany the headline "8 states reject the national #1."

If you build any of these, link in the comments. The dataset is open under CC BY 4.0 so you only need to credit PerfumeM for the source data.


Source data + analysis: perfumem.com. Open dataset: github.com/ahmad-khan-97/us-fragrance-trends-2026 (CC BY 4.0).

Top comments (0)