Yunhan

Posted on Mar 22

Building a Multilingual Name Database: 2000+ Names Across 46 Cultures

#webdev #javascript #beginners #database

When I started BabyNamePick, the name database had maybe 200 entries. Today it has over 2,000 names spanning 46 cultural origins. Here's what I learned scaling a multilingual name dataset.

The Data Model

Each name entry is surprisingly simple:

{
  "name": "Sarangerel",
  "meaning": "Moonlight",
  "gender": "girl",
  "origin": "mongolian",
  "style": ["nature", "elegant"],
  "popularity": "rare",
  "startLetter": "S"
}

Seven fields. That's it. But getting those seven fields right across 46 cultures is where the complexity lives.

Challenge 1: Cultural Accuracy

Names carry deep cultural significance. A name's meaning in one culture might be completely different in another. "Kai" means "sea" in Hawaiian/Polynesian but "forgiveness" in Japanese.

We handle this by treating origin as the primary key alongside the name itself. The same spelling can exist in multiple origins with different meanings.

Challenge 2: Balanced Representation

Early on, our database was heavily skewed — 100+ American names, 100+ British names, but only 19 Native American names and 20 Dutch names. That's not a useful tool for parents exploring diverse cultural options.

We set a minimum target of 25-30 names per culture and systematically expanded underrepresented categories. The key was finding authentic names with verified meanings, not just padding numbers.

Current distribution ranges from 30 (Thai, Native American) to 116 (British), with most cultures having 40-60 entries.

Challenge 3: Gender and Style Tags

Not every culture categorizes names the same way. Some names are truly unisex. Some cultures don't have the same "modern vs. classic" distinction that English-speaking countries do.

We kept the style tags flexible — arrays rather than single values — and used culturally appropriate descriptors. A Tibetan name tagged "classic" means something different from a French name tagged "classic."

Challenge 4: Search and Discovery

With 2,000+ names, browsing isn't enough. We built multiple discovery paths:

By origin: 46 category pages for cultural browsing
By letter: 26 letter index pages for alphabetical exploration
By meaning: Thematic collections like names meaning light or names meaning hope
AI-powered: A generator that combines preferences to suggest personalized matches

The JSON-Over-Database Decision

At 2,000 entries, we still use a flat JSON file. No database. Why?

Build-time generation: Next.js reads the JSON at build time and generates all static pages. Zero runtime queries.
Version control: Every name addition is a git commit. We can track exactly when and why names were added.
Simplicity: No connection strings, no migrations, no ORM. import names from '@/data/names.json' is our entire data layer.

The tradeoff is obvious — this won't scale to 100,000 names. But for a curated database where quality matters more than quantity, it's perfect.

What's Next

We're approaching the point where the JSON file approach might need rethinking. But honestly, 2,000 carefully curated names with accurate cultural data is more valuable than 50,000 scraped entries with questionable meanings.

Quality over quantity. Every time.

BabyNamePick is a free AI baby name generator covering 46+ cultural origins. No signup required.

DEV Community