If you've used pytrends for the typical "trending this week" overview, you've probably hit the same wall I did: the default interest_over_time and trending_searches give you national-level signal, which is exactly what every other analyst already has. The interesting story is almost always at the regional level, and pytrends has a quietly underused method for that: interest_by_region(resolution="REGION").
I used it to answer a question that had been bugging me: when 30 of the most-talked-about fragrances of 2024-2026 are matched up across all 50 US states + DC, does every state pick the same #1, or do some go their own way?
Result: 43 states pick the same fragrance. 8 states pick something completely different, and the outliers cluster in ways that turned out to be defensible (more on this below).
The full analysis went up at perfumem.com, the raw CSVs and choropleth PNGs are on GitHub under CC BY 4.0. This post is the technical walkthrough, with the gotchas I had to work around.
The core pattern: pytrends batched at 5 keywords with a state-level resolution
pytrends accepts max 5 keywords per request. To analyze 30 fragrances across 50 states, you batch the keywords and stitch the wide matrix together yourself.
from pytrends.request import TrendReq
import pandas as pd
import time
FRAGRANCES = [
"Dior Sauvage", "Bleu de Chanel", "Polo Blue", "Old Spice", "Glossier You",
"Chanel No 5", "Coco Mademoiselle", "Marc Jacobs Daisy",
"Ariana Grande Cloud", "Sol de Janeiro Cheirosa 62",
# ... 20 more
]
BATCH_SIZE = 5
TIMEFRAME = "today 12-m"
GEO = "US"
pytrends = TrendReq(hl="en-US", tz=300, retries=3, backoff_factor=2)
rows = {} # state -> {fragrance: score}
for i in range(0, len(FRAGRANCES), BATCH_SIZE):
batch = FRAGRANCES[i:i + BATCH_SIZE]
pytrends.build_payload(batch, timeframe=TIMEFRAME, geo=GEO)
df = pytrends.interest_by_region(
resolution="REGION",
inc_low_vol=True,
inc_geo_code=False,
)
for state, row in df.iterrows():
rows.setdefault(state, {})
for frag in batch:
rows[state][frag] = int(row[frag])
time.sleep(8) # be polite, Google rate-limits aggressive callers
The output rows dict gives you a state x fragrance matrix you can pivot into anything.
Gotcha 1: Google's scores are relative, not absolute
This trips up almost everyone the first time. The 0-100 score is normalized within the keyword set you submitted in that batch. A score of 87 in batch A is not directly comparable to 87 in batch B unless you re-normalize.
The cleanest fix is to include a "calibration" keyword in every batch (something with stable, broad search volume) and renormalize relative to it. For my use case I skipped this because I wasn't comparing across batches: the per-state winner is just the highest score within each state's row, regardless of cross-batch absolute magnitude. If you need cross-batch comparability, factor in a calibration term.
# example: rebase each batch to a calibration keyword
CALIBRATION = "perfume"
batch_with_cal = [CALIBRATION] + batch[:4] # 4 + 1 calibration
# after fetching, divide every score in batch by its calibration row score, multiply 100
Gotcha 2: inc_low_vol=True matters at the state level
Default is False, which means low-volume regions get filtered out and your matrix has holes. For a state-level analysis where some states (Wyoming, Alaska, Vermont) have low aggregate search activity, you want inc_low_vol=True or your sparse states disappear from the matrix entirely. Tradeoff: low-volume scores are noisier, so the bottom of the distribution is less trustworthy than the top.
Gotcha 3: pytrends is unmaintained-ish, expect occasional 429s
The project is community-maintained and Google occasionally changes the unofficial endpoint. Build retries with exponential backoff into your TrendReq constructor (retries=3, backoff_factor=2) and accept that some batches will need a re-run. I had 1 of my 6 batches fail on the first pass and had to retry an hour later.
Computing the per-state winner
Once you have the matrix, the winner-per-state is a one-liner:
import csv
with open("winners.csv", "w") as f:
w = csv.writer(f)
w.writerow(["state", "winning_fragrance", "score", "second_place", "second_score"])
for state in sorted(rows.keys()):
ranked = sorted(rows[state].items(), key=lambda kv: -kv[1])
winner, score = ranked[0]
second, second_score = (ranked[1] if len(ranked) > 1 else ("", 0))
w.writerow([state, winner, score, second, second_score])
Visualizing it: choropleth in 30 lines with plotly
Plotly Express handles US state choropleths cleanly. The trick is mapping full state names to USPS 2-letter codes (which locationmode="USA-states" requires).
import plotly.express as px
import pandas as pd
US_STATE_TO_CODE = {
"Alabama": "AL", "Alaska": "AK", "Arizona": "AZ", # ... (full dict)
}
df = pd.read_csv("winners.csv")
df["code"] = df["state"].map(US_STATE_TO_CODE)
df = df.dropna(subset=["code"])
fig = px.choropleth(
df,
locations="code",
locationmode="USA-states",
color="winning_fragrance",
scope="usa",
title="Most-searched fragrance by US state, last 12 months",
color_discrete_sequence=px.colors.qualitative.Set3,
)
fig.write_image("us-state-map.png", width=1600, height=900, scale=2)
You'll need kaleido installed for write_image to work: pip install kaleido.
What the data actually showed
Old Spice ranks #1 in 43 states. Of the 8 outliers:
| State | Winner | Category |
|---|---|---|
| Alaska, South Dakota | Coco Mademoiselle (Chanel) | Designer luxury women's |
| Louisiana, Mississippi | Polo Blue (Ralph Lauren) | Designer men's |
| Montana | Marc Jacobs Daisy | Designer women's |
| New Mexico | Ariana Grande Cloud | Celebrity / viral |
| North Dakota, Vermont | Glossier You | Niche / clean beauty |
The clusters are interesting in ways I didn't expect going in. Louisiana + Mississippi sharing a #1 (Polo Blue) tracks with cultural Gulf Coast preference signals you can find in other consumer-goods data. North Dakota + Vermont sharing Glossier You was the strangest finding to me; both are low-population states with strong direct-to-consumer ecommerce penetration, and Glossier's brand voice plays well in both demographics, but I wouldn't have predicted them as a pair.
New Mexico is the only US state where Ariana Grande Cloud ranks #1, which is a "single state where a celebrity scent dominates" pattern that I'd love to see replicated for other celebrity fragrance launches.
Open data + reproducibility
Everything is on GitHub: ahmad-khan-97/us-fragrance-trends-2026. The repo includes:
-
data/raw_interest_by_state.csv: the full 30 x 51 matrix -
data/winning_fragrance_per_state.csv: state, winner, score, runner-up, runner-up score -
charts/: the three matplotlib + plotly outputs -
LICENSE: CC BY 4.0, free to remix with attribution
The full written analysis with the cluster interpretation is at perfumem.com.
What I'd build next
If I were extending this:
-
Time series per state for the top 5 outlier picks: did Glossier You always win Vermont, or is this a 2026 phenomenon?
interest_over_timeper state would answer that. - Calibrated cross-state magnitude: with a calibration keyword in every batch, you could rank "intensity of fragrance interest" per state, not just the within-state winner.
- Compare to actual purchase data: Google Trends measures search intent, not purchases. Anyone with a national fragrance retailer's POS data has a great cross-validation opportunity here.
If you build any of those, I'd love to see the result. The dataset is intentionally permissive (CC BY 4.0) so derivative analyses are encouraged.
Quick reference: the full minimal script
# us_state_fragrance_trends.py
from pytrends.request import TrendReq
import csv, time
FRAGRANCES = [...] # your 5-30 keywords
BATCH_SIZE = 5
pytrends = TrendReq(hl="en-US", tz=300, retries=3, backoff_factor=2)
rows = {}
for i in range(0, len(FRAGRANCES), BATCH_SIZE):
batch = FRAGRANCES[i:i + BATCH_SIZE]
pytrends.build_payload(batch, timeframe="today 12-m", geo="US")
df = pytrends.interest_by_region(resolution="REGION", inc_low_vol=True)
for state, row in df.iterrows():
rows.setdefault(state, {})
for frag in batch:
rows[state][frag] = int(row[frag])
time.sleep(8)
with open("winners.csv", "w") as f:
w = csv.writer(f)
w.writerow(["state", "winner", "score"])
for state, frag_scores in rows.items():
winner, score = max(frag_scores.items(), key=lambda kv: kv[1])
w.writerow([state, winner, score])
That's it. ~30 lines for a state-level Google Trends analysis you can drop any keyword set into.
If you build something with this pattern, drop a link in the comments. Particularly interested in non-fragrance domains: fast food, pickup trucks, streaming shows. The state-level segmentation usually reveals at least one cluster that the national ranking hides.
Top comments (0)