DEV Community

Yunhan
Yunhan

Posted on

The Surprising Complexity of Baby Name Data

I thought building a baby name database would be straightforward. Names, meanings, origins — how hard could it be? Turns out, quite hard.

Problem 1: Names Aren't Unique

"Kai" means "sea" in Hawaiian, "forgiveness" in Japanese, and "food" in some other languages. The same spelling, completely different names.

Our solution: origin is part of the composite key. (name, origin) is unique, not just name.

Problem 2: Gender Is Cultural

Some names are strictly gendered in one culture but unisex in another. "Andrea" is a boy's name in Italy but a girl's name in English-speaking countries. "Kai" is used for all genders in Hawaii.

We use three gender values: boy, girl, unisex. But "unisex" means different things in different contexts.

Problem 3: Meanings Are Contested

Name etymology is surprisingly controversial. Different sources give different meanings for the same name. "Kennedy" might mean "helmeted chief" or "misshapen head" depending on which etymologist you ask.

We cite the most widely accepted meaning and note alternatives where significant.

Problem 4: Popularity Is Relative

A name that's "popular" in Korea might be completely unknown in the US. Our popularity field (popular, trending, classic, rare) is relative to the name's primary cultural context, not global usage.

Problem 5: Transliteration Varies

Chinese names can be romanized multiple ways. "Mingyu" vs "Ming-Yu" vs "Ming Yu". Arabic names have similar issues: "Muhammad" vs "Mohammed" vs "Mohamed".

We pick one canonical romanization and note common variants.

Problem 6: Style Tags Are Subjective

What makes a name "elegant" vs "classic" vs "modern"? These are inherently subjective categories. We use them because they're useful for discovery, but we're transparent that they reflect editorial judgment.

Our Data Model

After iterating, we settled on this:

{
  "name": "Sakura",
  "meaning": "Cherry blossom",
  "gender": "girl",
  "origin": "japanese",
  "style": ["nature", "elegant"],
  "popularity": "popular",
  "startLetter": "S"
}
Enter fullscreen mode Exit fullscreen mode

Seven fields. Simple enough to maintain manually, rich enough to power filtering, search, and discovery across 46 cultural origins.

The Takeaway

"Simple" data is rarely simple. The complexity lives in the edge cases, the cultural context, and the decisions you make about normalization. For name data specifically, respecting cultural nuance is more important than achieving perfect consistency.


BabyNamePick — explore 2,000+ baby names from 46 cultures.

Top comments (0)