DEV Community

Cover image for I Built a Test Data Generator for 38 Countries — Here's What I Learned
Kaustubh Memane
Kaustubh Memane

Posted on

I Built a Test Data Generator for 38 Countries — Here's What I Learned

Every Android dev has typed "John Doe" into a form at some point.

Then "test@test.com." Then "+1234567890." Then some fake address that you made up on the spot and that somehow ends up in production logs six months later.

I was building a payments feature at work — multi-country, multiple ID formats, different phone number patterns across regions. The QA process involved someone manually creating test profiles for each country, one by one, by hand. It took most of a sprint just to set up the data.

At some point I thought: this is a solved problem. Surely there's an app that does this.

There wasn't. Not really. There are web tools, but nothing that works offline, nothing that generates mathematically valid national IDs, and nothing that handles locale-aware formatting across a serious list of countries. So I built it into Test Nexus, and Data-Hub became the first feature I shipped.

Here's what I learned doing it.


The Problem Is More Than "Random Names"

When I started, I assumed the hard part would be finding name data for different countries. That took maybe a day. The actual hard parts were:

Phone number formats are not what you think. Brazil uses 9-digit mobile numbers with a 9 prefix — a standard rolled out starting in 2012. Chile moved to a closed 9-digit numbering system for mobiles; every number starts with 9 and no longer requires an area code. Australia's mobiles start with 04. If your app validates phone numbers at all — and it should — fake numbers like "555-1234" will either fail or quietly pass and create junk data that's impossible to clean up later.

National IDs have check digits. India's Aadhaar uses a Verhoeff algorithm. Brazil's CPF has a specific double-checksum based on Modulo 11. Chile's RUT also uses Mod-11, multiplying digits by a specific repeating 2–7 series. Spain has DNI, NIF, and NIE — each with different formats. If you generate a random string and call it an ID, any backend validation will reject it immediately. You need to actually implement the algorithm, not just fake the shape.

"Address" means different things in different countries. The UK has postcodes like SW1A 1AA. The Netherlands uses 4 digits followed by 2 letters. Japan uses a 7-digit format preceded by the 〒 symbol. If you generate US-style addresses for every locale, your app looks broken to any non-US user testing it.

Data-Hub handles all of this. The generation is entirely algorithmic — no network call needed. Every profile is built from bundled locale data combined with validated generation logic per country.


What Gets Generated

For each of the 38 supported countries, a complete profile includes:

  • Full name (locale-aware first/middle/last, sourced from country-specific name lists)
  • Email derived from the name
  • Phone number in country format with valid area codes
  • Full address (street, city, state/province, postal code) in local format
  • Date of birth (age 18–65)
  • National ID(s) — all types available for that country, mathematically valid
  • Payment card — correct BIN prefix, Luhn-valid number, real CVV format
  • Banking ID where applicable (IBAN, routing numbers, BSB, IFSC, etc.)
  • Password at selectable strength (8/12/16 chars)

For US profiles that means SSN + EIN + Visa + Mastercard + Amex + ABA Routing. For Brazilian profiles: CPF + CNPJ + Visa + Pix/bank routing format. For Indian profiles: Aadhaar + Visa + IFSC.

39 national ID types across 38 countries. The number matters because if your app handles international users, testing with US-only data is how bugs stay hidden until a real user from Germany finds them.


The Batch Generation Flow

When I thought about how a developer actually uses test data, the flow isn't "generate one profile, copy it, close the app." It's:

  1. You're about to test a new registration flow
  2. You need 20–50 realistic accounts to populate a demo environment
  3. You want to export them directly to whatever format your backend or test suite expects

So batch generation was the core use case. Here's how it works under the hood:

graph TD
    A[Select Country + Count] --> B[Load Bundled Locale Data]
    B --> C[Generate Names, Address, Phone]
    C --> D[Apply Local Format Rules]
    D --> E[Compute ID Check Digits]
    E --> F[Generate Luhn-Valid Card]
    F --> G[Assemble Complete Profile]
    G --> H{Export Format}
    H --> I[CSV]
    H --> J[JSON]
    H --> K[XML]
    H --> L[TSV]
Enter fullscreen mode Exit fullscreen mode

You pick a country, pick a count (10, 20, or 50), tap generate, and get a list of complete profiles. Export goes to CSV, JSON, XML, or TSV depending on what your pipeline needs.

The CSV export is formatted for direct SQL import. The JSON structure maps cleanly to REST API payloads. XML is there for the enterprise folks. TSV is surprisingly useful for shell scripts.

One constraint I chose deliberately: batch generation always creates complete profiles. There's no "just give me names and emails" option. This felt like the right call because partial data is how you miss bugs — if you test registration with email only, you don't find out that your IDs module crashes on a Verhoeff-invalid Aadhaar until much later.


The Storage Decision

All generated data is stored in an AES-256 encrypted local database (SQLCipher). Nothing goes to a server. No analytics on what countries you generate, no cloud sync.

This was a deliberate choice. Test data sometimes looks embarrassingly close to real data — even when it's not, it can include realistic-looking ID numbers, card numbers, and addresses. That data belongs on the device that generated it.

The downside: you can't access your test profiles from multiple devices. This might change later, but right now I'd rather have the privacy guarantee than the convenience.


What I Got Wrong the First Time

Ethnicity fields. Data-Hub generates ethnicity as part of the demographic profile. Currently it uses a US-centric list for all 38 countries, which is clearly wrong. A Brazilian user generating Brazilian profiles shouldn't get US ethnicity categories. This is on the roadmap to fix — per-country demographic options — but it shipped this way because I ran out of time before launch.

Company name variety. The B2B company generator has 20 predefined company names. For most testing purposes that's fine, but if you're generating 50 profiles in batch, you'll see duplicates. I'll expand this with a proper randomized name generator.

The Romania situation. Romania is in the supported country list, but there was no national ID generator for it at first. You got Visa and IBAN, but no CNP — the 13-digit Romanian personal numeric code that requires its own Modulo-11 checksum. I built the country support before I had time to implement the algorithm. It's in there now, clearly labeled, but it's the kind of "95% done" problem that bothers me.

I mention these not to undermine the feature but because I've read enough "I built a thing" posts that glossed over the rough edges, and then you download the tool and immediately hit them. Better to know up front.


A Real Example: Testing a Multi-Country Payments Flow

Before Data-Hub, our QA setup at work involved:

  • A shared spreadsheet with hardcoded test profiles
  • Profiles going stale when countries changed their ID formats
  • Brazilian CPF numbers that looked valid but failed backend validation because they were typed by hand

With Data-Hub, the flow is:

  1. Select Brazil → generate 20 profiles → export to JSON
  2. Select Chile → generate 20 profiles → append to the same export
  3. Load the JSON into the test environment

The CPF numbers pass because they're generated with the correct algorithm. The phone numbers pass format validation. The names look like Brazilian and Chilean names, not "User123" and "Test Account".

This is the thing I was trying to fix. I think it works.


What's Coming

  • Per-country demographic options (fixing the ethnicity problem)
  • More companies in the B2B generator
  • Romania CNP support fully live
  • A "save template" feature so you can reuse country + count configurations you use frequently

If there's a specific country, ID type, or export format you need that's missing, I want to know. The country list was built around what I personally needed and what I had time to implement — not a comprehensive survey of what developers worldwide actually test against.


Try It

Test Nexus is free on Google Play. Data-Hub is the first tab you see after sign-in. Google sign-in is required — this is how I keep the generation features usable without a hard usage cap.

Play Store: https://play.google.com/store/apps/details?id=us.twocan.testnexus

Website: twocan.us

If you've dealt with the "John Doe" problem and have thoughts on what the right solution actually looks like, I'm reading the comments.

Top comments (0)