BIP39 in 180 Lines of Vanilla JS — Mnemonic Generation, Validation, Seed Derivation, and the Japanese Wordlist Trap

#javascript #security #crypto #webdev

The 12-to-24 word "seed phrase" inside every hardware wallet and software wallet is BIP39. The spec is short — one file. Implement it from scratch and you discover that the only place where independent implementations actually disagree is the Japanese wordlist edge case, which catches even seasoned crypto libraries.

Here's a zero-dependency, SubtleCrypto-only, 180-line BIP39 implementation, verified against both the Trezor and bip32JP test vectors.

🔐 Demo: https://sen.ltd/portfolio/bip39-tool/
📦 GitHub: https://github.com/sen-ltd/bip39-tool

⚠️ Don't paste a real wallet's mnemonic into any web tool, including this one. Read the source first or — better — clone it and run it offline.

What BIP39 actually defines (and what it doesn't)

BIP39 specifies two things and stops:

A two-way encoding between an entropy buffer and a checksum-protected list of words.
A function that turns the word list into a 64-byte seed.

Everything past that — seed → master key, master key → addresses, HD derivation paths — is BIP32 / BIP44. BIP39 ends at the seed.

That's actually significant: BIP39 is currency-agnostic. Bitcoin, Ethereum, Solana, Cardano, you name it — the same BIP39 phrase produces the same seed for all of them.

The encoding, in four steps

1. Take ENT bits of cryptographically secure entropy (ENT ∈ {128, 160, 192, 224, 256})
2. Compute CS = ENT / 32   (= 4, 5, 6, 7, 8 bits)
3. Take the first CS bits of SHA-256(entropy) and append them to entropy
4. Slice the (ENT + CS)-bit string into 11-bit groups; each is an index into a 2048-word wordlist

11-bit groups because 2¹¹ = 2048. Each word carries about 11 bits.

export async function entropyToMnemonic(entropy, wordlist) {
  const bits = entropy.length * 8;           // 128/160/192/224/256
  const checksumBits = bits / 32;            // 4/5/6/7/8
  const hash = new Uint8Array(await crypto.subtle.digest("SHA-256", entropy));
  const checksumByte = hash[0] >> (8 - checksumBits);

  // Pack entropy || checksum_bits into one buffer
  const totalBits = bits + checksumBits;
  const buf = new Uint8Array(Math.ceil(totalBits / 8));
  buf.set(entropy, 0);
  buf[entropy.length] = checksumByte << (8 - checksumBits);

  // Slice into 11-bit indices
  const wordCount = totalBits / 11;
  const words = [];
  for (let w = 0; w < wordCount; w++) {
    let idx = 0;
    for (let b = 0; b < 11; b++) {
      const bitPos = w * 11 + b;
      const bytePos = bitPos >> 3;
      const bitInByte = 7 - (bitPos & 7);
      idx = (idx << 1) | ((buf[bytePos] >> bitInByte) & 1);
    }
    words.push(wordlist[idx]);
  }
  return words;
}

SHA-256 is crypto.subtle.digest("SHA-256", entropy). Entropy is crypto.getRandomValues(new Uint8Array(bits / 8)). No third-party crypto libraries needed, including for the random source.

Validation is just the same code in reverse

Map each word back to its index, concatenate the bits, peel off the checksum bits at the end, and recompute:

const givenChecksum = buf[entropy.length] >> (8 - checksumBits);
const hash = new Uint8Array(await crypto.subtle.digest("SHA-256", entropy));
const expectedChecksum = hash[0] >> (8 - checksumBits);
if (givenChecksum !== expectedChecksum) throw new Error("invalid checksum");

For a 12-word phrase, only the bottom 4 bits of the last word are checksum. Any "random 12 English words" you type has a 1-in-16 chance of being valid. That's enough for typo detection but carries no security weight — security lives in the entropy, not the checksum.

Seed derivation — the fixed PBKDF2 parameters that you can never change

Mnemonic to 64-byte seed is one PBKDF2 call:

seed = PBKDF2-HMAC-SHA512(
  password = NFKD(mnemonic_phrase),
  salt     = NFKD("mnemonic" + passphrase),
  iter     = 2048,
  keyLen   = 64 bytes
)

In SubtleCrypto:

export async function mnemonicToSeed(words, passphrase = "", separator = " ") {
  const enc = new TextEncoder();
  const phrase = words.join(separator).normalize("NFKD");
  const salt   = ("mnemonic" + (passphrase || "")).normalize("NFKD");

  const key = await crypto.subtle.importKey(
    "raw", enc.encode(phrase), { name: "PBKDF2" }, false, ["deriveBits"]
  );
  const bits = await crypto.subtle.deriveBits(
    { name: "PBKDF2", salt: enc.encode(salt), iterations: 2048, hash: "SHA-512" },
    key, 512    // 64 bytes
  );
  return new Uint8Array(bits);
}

Three things are baked into the spec and you can't move them:

iterations = 2048 — set in 2014 and weak by modern PBKDF2 standards (where you'd want hundreds of thousands). Cannot be changed, because it would break compatibility with every existing wallet. The spec leans on the entropy itself for security
NFKD normalization — Unicode-normalizes the phrase so that combined characters like é (U+00E9) and e + ◌́ (U+0065 + U+0301) produce the same seed
salt = "mnemonic" + passphrase — even with no passphrase, the salt is the constant string "mnemonic". So the entire space of "no-passphrase BIP39 mnemonics" is fixed-salted, leaving a finite rainbow-table surface

The third point is why BIP39 implementations sometimes refer to the passphrase as "the 25th word" — without one, your security is just the 128-256 bits of entropy.

The Japanese wordlist trap — `U+3000` versus `U+0020`

Here's where independent implementations get caught. BIP39 ships a 2048-word Japanese list in its appendix, and the spec specifies a separator the implementation has to remember:

English: ASCII space U+0020
Japanese: ideographic space U+3000

NFKD does not collapse U+3000 into U+0020 — they remain distinct codepoints after normalization. Different join character means different bytes into PBKDF2 means different seed.

The bip32JP test vector that catches this:

entropy:    00000000000000000000000000000000
mnemonic:   あいこくしん あいこくしん あいこくしん あいこくしん
            あいこくしん あいこくしん あいこくしん あいこくしん
            あいこくしん あいこくしん あいこくしん あおぞら
            (joined with U+3000)
passphrase: ㍍ガバヴァぱばぐゞちぢ十人十色
seed:       a262d6fb6122ecf45be09c50492b31f92e9beb7d9a845987a02cefda57a15f9c
            467a17872029a9e92299b5cbdf306e3a0ee620245cbd508959b6cb7ca637bd55

This single vector trips an implementation that does any of:

Hardcodes the join character to ASCII space
Skips NFKD on the mnemonic
Skips NFKD on the passphrase
Normalizes the mnemonic but assumes "the passphrase is ASCII anyway"

That last one is the most insidious. The passphrase contains ㍍ (U+3349, SQUARE METORU) which NFKD decomposes into four characters メートル, and ガ (U+30AC) which decomposes into カ + ゛. Pass the user's literal bytes into PBKDF2 and the seed will not match a wallet that did normalize.

Lock the implementation down with both vector sets

For English coverage, Trezor's vectors.json ships 24 [entropy, mnemonic, seed, xprv] tuples — 128/192/256-bit, all with passphrase "TREZOR". For Japanese, bip32JP/bip32JP.github.io ships vectors with the ㍍-containing passphrase, so you can land all four pitfalls above in one assertion.

This implementation runs both — 34 tests total:

test("japanese vector: seed uses U+3000 separator", async () => {
  const words = await entropyToMnemonic(
    hexToBytes("00000000000000000000000000000000"), JAPANESE,
  );
  const seed = await mnemonicToSeed(
    words, "㍍ガバヴァぱばぐゞちぢ十人十色", "　"
  );
  assert.equal(bytesToHex(seed),
    "a262d6fb6122ecf45be09c50492b31f92e9beb7d9a845987a02cefda57a15f9c" +
    "467a17872029a9e92299b5cbdf306e3a0ee620245cbd508959b6cb7ca637bd55");
});

mnemonicToSeed takes a separator argument so the same code path handles both languages.

Browser-only deployment notes

The end product is a static HTML + one ES module. A few things to know:

Wordlists ship as static files — fetched from ./wordlists/english.txt at runtime. No npm dependency, no CDN, no JSON.
Web Crypto requires https:// (or localhost). On a http:// origin that isn't loopback, crypto.subtle is undefined and everything breaks silently.
Same code runs in Node 18+ — globalThis.crypto points at node:crypto.webcrypto, so crypto.subtle.digest and deriveBits work identically. Tests run under node:test without booting a browser.

$ npm test
✔ english wordlist has 2048 words, sorted, lowercase
✔ japanese wordlist has 2048 words
✔ vector 00000000…: entropy → mnemonic
✔ vector 00000000…: mnemonic → entropy
✔ vector 00000000…: mnemonic → seed (passphrase=TREZOR)
…
ℹ tests 34
ℹ pass 34

Takeaways

BIP39 is small: entropy ↔ mnemonic with a SHA-256 checksum, plus a PBKDF2 seed derivation. ~180 lines.
No external crypto needed: SubtleCrypto's digest("SHA-256") and deriveBits("PBKDF2", ..., "SHA-512") cover everything.
The Japanese list separator is U+3000, NFKD doesn't collapse it, and the ㍍-containing passphrase from bip32JP catches every form of normalization-skipping in a single test.
iterations = 2048 is weak by modern standards but locked by the spec — the security model leans on entropy and passphrase ("the 25th word"), not on key stretching.

Full source on GitHub — bip39.js is the implementation, tests/bip39.test.js runs all 34 vectors, wordlists/ holds the official English and Japanese lists. MIT licensed.

Live demo — and again, please don't paste a real wallet's mnemonic into it (or any other web tool).