DEV Community

Yunhan
Yunhan

Posted on

Building a Multilingual Baby Name Database: Lessons from 40+ Origins and 1,500 Names

I've been building BabyNamePick.com — a baby name database that now covers 1,500+ names from over 40 cultural origins. Here's what I learned about data modeling, cultural sensitivity, and scaling a name database.

The Data Model

Each name entry looks like this:

{
  "name": "Saoirse",
  "meaning": "Freedom",
  "origin": "irish",
  "gender": "girl",
  "styles": ["strong", "cultural"]
}
Enter fullscreen mode Exit fullscreen mode

Simple, but the decisions behind each field were not.

Origin: One String, Many Debates

Names don't respect borders. Is "Jasmine" Persian, Arabic, or English? We went with the earliest traceable origin (Persian), but added style tags to capture cross-cultural usage.

Some origin categories we use:

  • Geographic: japanese, korean, irish, welsh
  • Cultural: biblical, native-american, polynesian
  • Language-family: slavic, nordic

The key insight: origin is about etymology, styles are about usage. A name can be Irish in origin but popular in America.

Gender: Beyond Binary

We use four values: boy, girl, unisex, neutral. The difference between unisex and neutral:

  • Unisex: Actively used for all genders (Riley, Kai, Rowan)
  • Neutral: Gender-ambiguous by design or culture (some Chinese names)

Styles: The Secret Sauce

Styles are freeform tags that capture vibes:

  • nature, royal, biblical, mythological
  • vintage, modern, short, elegant
  • whimsical, strong, literary, space

This lets users filter by feeling, not just facts. "Show me nature names for girls" is a more natural query than "show me names of Latin origin."

Cultural Representation

Our origin distribution after 1,500 names:

Region Origins Count
European Irish, British, French, German, Nordic, etc. ~600
Asian Japanese, Chinese, Korean, Indian, etc. ~250
Middle Eastern Arabic, Hebrew, Persian, Turkish ~200
African Various regions ~80
Americas American, Native American, Hawaiian ~80
Other Polynesian, Indonesian, Tibetan, etc. ~100

European names are overrepresented because English-language search volume skews that way. But we're actively expanding Asian, African, and indigenous names — they're underserved in existing databases and represent real search demand.

Technical Decisions

Static Generation

Every name gets its own page, pre-rendered at build time with Next.js generateStaticParams. At 1,500 names + category pages + blog posts, we're generating 3,500+ static pages.

Build time: ~90 seconds. Worth it for the SEO benefits.

Normalization

We normalize styles to lowercase and deduplicate on name.toLowerCase(). Sounds obvious, but we caught 63 style inconsistencies in one audit ("Nature" vs "nature").

Sitemap

Auto-generated from the name list. Every name page, category page, and blog post gets an entry. Google Search Console picks them up within days.

What's Next

  • 2,000 names by end of month
  • Pronunciation guides for non-English names
  • Name popularity trends using SSA data
  • "Similar names" recommendations based on style overlap

Key Takeaways

  1. Start with a simple data model and extend it. We've barely changed the schema since day one.
  2. Cultural origin is harder than it looks. When in doubt, trace the etymology.
  3. Style tags > rigid categories. Let users search by feeling.
  4. Static generation scales surprisingly well for this kind of content.
  5. Representation matters. The gaps in your database are the gaps in your audience.

Check it out at babynamepick.com — 1,500+ names, 40+ origins, completely free.

Building something similar? I'd love to hear about your data modeling challenges in the comments.

Top comments (0)