My road to ML-KEM-768 over X25519 for my messaging app

#cryptography #security #kotlin #pqc

Eight months ago I started working on a messaging app as an hobby to see how difficult it is. One thing led to another and then I was obsessed with the idea of having it Post-quantum ready. It is well known that Signal works in that regard but from my perspective it isn't full E2EE. Boiling it down to small stuff - why I chose ML-KEM-768 instead of X25519.

The "harvest now, decrypt later" problem

X25519 is an elliptic curve Diffie-Hellman on Curve25519. Its security rests on the discrete log problem being hard. It is, today, on classical hardware.

A sufficiently large quantum computer running Shor's algorithm breaks it. Nobody has one yet but the bells are ringing. An adversary who can capture and store your encrypted traffic today can decrypt it the day they do. This is not a theoretical problem for a messaging app - messages sent today are expected to stay private for years, sometimes decades.

Post-quantum key exchange is the hedge. And as of August 2024, NIST has a standard for it: FIPS 203, which specifies ML-KEM (Module Lattice-based Key encapsulation Mechanism), the renamed CRYSTALS-Kyber.

Why 768, not 512 or 1024?

ML-KEM ships in three parameter sets:

Parameter set	Claimed security category	Rough classical equivalent
ML-KEM-512	1	AES-128
ML-KEM-768	3	AES-192
ML-KEM-1024	5	AES-256

768 is the sweet spot most deployed post-quantum systems have converged on. Cloudflare, Chromes hybrid X25519Kyber768, Signal's PQXDH all use the 768 tier. It's the default "safe modern choice" - strong enough that nobody serious argues 512 is sufficient, and light enough that 1024's extra bytes aren't worth the hit unless you're protecting state secrets. In any of the cases you can still leave room for it to be more secure if need be.

The numbers that actually matter

Here's the honest size comparison:

	X25519	ML-KEM-768
Public key	32 bytes	1,184 bytes
Ciphertext / encapsulation	32 bytes	1,088 bytes
Shared secret	32 bytes	32 bytes
Keygen (ms, M1)	~0.02	~0.05
Encap (ms, M1)	~0.04	~0.06
Decap (ms, M1)	~0.04	~0.07

The speed penalty is basically noise on a modern device. The size penalty is real - every session key exchange ships ~2 KB more on the wire than it used to. For a messaging app where most messages are smaller than that, it's a meaningful bump in bandwidth for the first message in the conversation.

I decided I could eat it. A 2KB one-time handshake cost per conversation is fine. A protocol that breaks in ten years is not.

The actual flow

I'm not using ML-KEM to encrypt messages directly - it's a KEM, not a cipher. It gives you a shared secret, and you feed that shared secret into something that can actually encrypt bulk data. In my case, ChaCha20-Poly1305.

sender                                   recipient
  |                                          |
  |  fetch recipient's ML-KEM public key     |
  |----------------------------------------->|
  |                                          |
  |  (encapsulate)                           |
  |    ciphertext, sharedSecret = Encap(pk)  |
  |                                          |
  |  HKDF(sharedSecret, salt=nonce)          |
  |     → 32-byte room key                   |
  |                                          |
  |  ChaCha20-Poly1305(key, nonce, plaintext)|
  |                                          |
  |  send {ciphertext, nonce, AEAD payload}  |
  |----------------------------------------->|
  |                                          |
  |                       sharedSecret = Decap(sk, ciphertext)
  |                       HKDF(...) → same 32-byte key
  |                       ChaCha20-Poly1305 decrypt → plaintext

The Kem output is 32 bytes of raw shared secret. Feeding it straight into ChaCha20 as a key would work but is brittle . you'd be binding the key to the KEM output directly, with no domain separation, no per-message salt, HKDF with a per-message nonce as salt gives you:

Domain separation (same KEM shared secret can produce different keys for different purposes.
A key that rotates every message, even when the underlying KEM secret is reused for a session.
A clean audit story - the cipher sees a fresh 32-byte key each time.

Here's what that looks like in Kotlin, using Bouncy Castle's ML-KEM provider:

fun encryptMessage(
    recipientPublicKey: MLKEMPublicKey,
    plaintext: ByteArray,
    associatedData: ByteArray
): EncryptedEnvelope {
    // 1. Encapsulate to get a ciphertext + shared secret
    val encap = MLKEMEncapsulator(recipientPublicKey).encapsulate()

    // 2. Derive a per-message key via HKDF
    val nonce = SecureRandom().nextBytes(12)
    val messageKey = hkdfSha256(
        ikm = encap.sharedSecret,
        salt = nonce,
        info = "quldra/v3/msg".toByteArray(),
        length = 32
    )

    // 3. Encrypt with ChaCha20-Poly1305
    val cipher = ChaCha20Poly1305()
    cipher.init(true, AEADParameters(KeyParameter(messageKey), 128, nonce, associatedData))
    val output = ByteArray(cipher.getOutputSize(plaintext.size))
    val written = cipher.processBytes(plaintext, 0, plaintext.size, output, 0)
    cipher.doFinal(output, written)

    return EncryptedEnvelope(
        kemCiphertext = encap.ciphertext,  // 1088 bytes, sent once per session
        nonce = nonce,                      // 12 bytes
        aead = output                       // plaintext.size + 16 bytes
    )
}

A few notes on the code above:

'encap_ciphertext' is the big one - 1088 bytes. In my protocol I only send it to on session establishment, not every message. Within a session I use a cached room secret derived from that initial exchange.
'info = "quldra/v3/msg" is the HKDF domain separator. The 'v3' bit matters - when i eventually rotate this (post-quantum standards will evolve), old messages stay decryptable under 'v2'/'v3' code paths and new ones use 'v4'.
Don't reuse nonces. ChaCha20-Poly1305 breaks catastrophically on nonce reuse. I use a 12-byte random nonce, which is safe for around 2^32 messages per key befiore birthday-bound becomes relevant - I rotate the key before that anyway.

The migration gotchas I hit

A few things I didn't expect when I did this swap:

BouncyCastle's ML-KEM API changed between bcprov-jdk18on 1.78 and 1.79. If you pin one, pin both. I lost an afternoon to a deserialisation mismatch.
Keys don't round-trip through toByteArray() on iOS Kotlin/Native. I had to go through the raw encoded format manually. If you're using Kotlin Multiplatform, test serialisation on both platforms early.
Don't hybridise unless you have to. Chrome and Cloudflare use hybrid X25519+ML-KEM-768 as a belt-and-braces move while the post-quantum algorithms are still young. For a new app with no legacy decryption path to maintain, pure ML-KEM-768 is simpler and I'd rather have one thing to audit than two.

Was it worth it?

Honestly — ask me in five years. The whole point of post-quantum is that you can't know today whether the hedge paid off. The question is whether the cost today is acceptable for the protection tomorrow.

For my app, a 2 KB handshake bump and a slightly larger public key registry was acceptable. For yours, it might not be. But if you're starting fresh and "messages I send today should still be private when my daughter is forty" sounds reasonable, ML-KEM-768 is the move.

I'm building Quldra, a post-quantum, single-device messaging app in Kotlin Multiplatform. This is post 1 of a short series on the tech behind it.