I've been building BabyNamePick.com — a baby name database that now covers 1,500+ names from over 40 cultural origins. Here's what I learned about data modeling, cultural sensitivity, and scaling a name database.
The Data Model
Each name entry looks like this:
{
"name": "Saoirse",
"meaning": "Freedom",
"origin": "irish",
"gender": "girl",
"styles": ["strong", "cultural"]
}
Simple, but the decisions behind each field were not.
Origin: One String, Many Debates
Names don't respect borders. Is "Jasmine" Persian, Arabic, or English? We went with the earliest traceable origin (Persian), but added style tags to capture cross-cultural usage.
Some origin categories we use:
- Geographic:
japanese,korean,irish,welsh - Cultural:
biblical,native-american,polynesian - Language-family:
slavic,nordic
The key insight: origin is about etymology, styles are about usage. A name can be Irish in origin but popular in America.
Gender: Beyond Binary
We use four values: boy, girl, unisex, neutral. The difference between unisex and neutral:
- Unisex: Actively used for all genders (Riley, Kai, Rowan)
- Neutral: Gender-ambiguous by design or culture (some Chinese names)
Styles: The Secret Sauce
Styles are freeform tags that capture vibes:
-
nature,royal,biblical,mythological -
vintage,modern,short,elegant -
whimsical,strong,literary,space
This lets users filter by feeling, not just facts. "Show me nature names for girls" is a more natural query than "show me names of Latin origin."
Cultural Representation
Our origin distribution after 1,500 names:
| Region | Origins | Count |
|---|---|---|
| European | Irish, British, French, German, Nordic, etc. | ~600 |
| Asian | Japanese, Chinese, Korean, Indian, etc. | ~250 |
| Middle Eastern | Arabic, Hebrew, Persian, Turkish | ~200 |
| African | Various regions | ~80 |
| Americas | American, Native American, Hawaiian | ~80 |
| Other | Polynesian, Indonesian, Tibetan, etc. | ~100 |
European names are overrepresented because English-language search volume skews that way. But we're actively expanding Asian, African, and indigenous names — they're underserved in existing databases and represent real search demand.
Technical Decisions
Static Generation
Every name gets its own page, pre-rendered at build time with Next.js generateStaticParams. At 1,500 names + category pages + blog posts, we're generating 3,500+ static pages.
Build time: ~90 seconds. Worth it for the SEO benefits.
Normalization
We normalize styles to lowercase and deduplicate on name.toLowerCase(). Sounds obvious, but we caught 63 style inconsistencies in one audit ("Nature" vs "nature").
Sitemap
Auto-generated from the name list. Every name page, category page, and blog post gets an entry. Google Search Console picks them up within days.
What's Next
- 2,000 names by end of month
- Pronunciation guides for non-English names
- Name popularity trends using SSA data
- "Similar names" recommendations based on style overlap
Key Takeaways
- Start with a simple data model and extend it. We've barely changed the schema since day one.
- Cultural origin is harder than it looks. When in doubt, trace the etymology.
- Style tags > rigid categories. Let users search by feeling.
- Static generation scales surprisingly well for this kind of content.
- Representation matters. The gaps in your database are the gaps in your audience.
Check it out at babynamepick.com — 1,500+ names, 40+ origins, completely free.
Building something similar? I'd love to hear about your data modeling challenges in the comments.
Top comments (0)