Abraham Arellano Tavara

Posted on Nov 9 • Originally published at myitbasics.com on Nov 9

Choosing Between ML-KEM and ML-DSA for Your Post-Quantum Migration [Part 2]

#security #cryptography #devops #architecture

Post-Quantum Cryptography Migration Series:

Part 1: The Quantum Threat

Part 2: ML-KEM vs ML-DSA (You are here)

Quick Recap from Part 1

In Part 1, we established that the quantum threat isn't coming—it's already here through harvest-now-decrypt-later attacks. Adversaries are collecting encrypted data today to decrypt when quantum computers mature around 2030-2035.

The urgency: If your data retention period + migration time > time until quantum computers, you've already run out of time to wait.

Now comes the critical question: What do we actually migrate TO?

The Question Every Architect Is Asking

"Should we use ML-KEM or ML-DSA?"

I've seen this question come up repeatedly in architecture discussions, and honestly, the confusion is understandable. The acronyms are overwhelming, and most documentation assumes you already know the difference.

Here's the reality: you need both. They solve fundamentally different problems.

What Actually Happened in August 2024

NIST finalized three post-quantum cryptography standards after 8 years of global scrutiny:

FIPS 203 (ML-KEM): Key encapsulation for establishing shared secrets
FIPS 204 (ML-DSA): Digital signatures for authentication
FIPS 205 (SLH-DSA): Hash-based backup signatures

These aren't experimental. They're production-ready and already deployed by AWS, Google, and Cloudflare.

ML-KEM: Your TLS Handshakes

What it replaces: RSA/ECDH key exchange

Where you'll use it:

TLS 1.3 connections (HTTPS, APIs)
VPN tunnels (IPsec, OpenVPN)
SSH sessions
Any key establishment protocol

Key sizes:

ML-KEM-768 (recommended):
  Public key: 1,184 bytes (vs. 32 bytes for X25519)
  Ciphertext: 1,088 bytes

Performance overhead: ~150 microseconds per handshake
With connection reuse: effectively 0%

Real-world example (AWS KMS):

KmsClient kms = KmsClient.builder()
    .httpClient(AwsCrtHttpClient.builder()
        .postQuantumTlsEnabled(true)  // ML-KEM-768 enabled
        .build())
    .build();

Benchmark results:

0.05% throughput reduction with proper connection pooling
0.3% latency increase on initial handshake
Negligible impact with TLS reuse

ML-DSA: Your Code Signing

What it replaces: RSA/ECDSA signatures

Where you'll use it:

Software distribution (binaries, containers)
JWT/OAuth tokens
Document signing
Future TLS certificates (when CAs support it)

Signature sizes:

ML-DSA-65 (recommended):
  Public key: 1,952 bytes
  Signature: 3,309 bytes (vs. 64 bytes for ECDSA P-256)

The surprising part:
ML-DSA is actually 10x faster than RSA-2048 for signing operations.

RSA-2048 signing: 2-5 milliseconds
ML-DSA-65 signing: 100-200 microseconds

Verification is similarly fast.

The Critical Part: Hybrid Mode

Never deploy pure post-quantum yet.

Hybrid combines classical + PQC:

TLS handshake:
  1. ECDH key exchange (classical)
  2. ML-KEM-768 key exchange (PQC)
  3. Combined: KDF(ECDH_secret || ML-KEM_secret)

Security: Attacker must break BOTH to compromise

Why this matters:

ML-KEM was finalized only in 2024
Implementation vulnerabilities might emerge
Side-channel attacks could be discovered
Hybrid provides insurance

Industry consensus:

NIST recommends a hybrid during transition
NSA CNSA 2.0 allows hybrid through 2030
IETF standardizing hybrid TLS specs
AWS/Azure/GCP implement hybrid by default

Decision Framework for Developers

For key exchange:

95% of use cases → ML-KEM-768 (hybrid with X25519)
Government/NSS → ML-KEM-1024 (CNSA 2.0 requirement)
Future diversity → Plan for HQC (code-based, finalizing 2027)

For signatures:

General purpose → ML-DSA-65
Long-term archival → SLH-DSA-256 (hash-based, conservative)
Embedded/IoT → FN-DSA-512 (compact, when FIPS 206 finalizes)

Performance comparison:

The Migration Timeline

Why this is urgent:

"Harvest now, decrypt later" attacks are active today. Adversaries are collecting encrypted data to decrypt when quantum computers arrive (~2030-2035).

Mosca's Theorem:

If: data_shelf_life + migration_time > time_until_quantum
Then: Start migration NOW

For most enterprises with sensitive data, that equation already fails.

Common Pitfalls

❌ "We'll wait for better algorithms"

These algorithms survived 8 years of attempted breaks. This IS the mature version.

❌ "We don't need hybrid"

Even AWS, Google, and NIST recommend hybrid. Don't skip it.

❌ "Key sizes will break our protocols"

1,184 bytes is manageable for modern networks. Packet fragmentation is handled by TLS.

❌ "Performance will be terrible"

With connection reuse, overhead is negligible. We've benchmarked it.

Getting Started

Step 1: Update your TLS libraries

OpenSSL 3.2+ (experimental support)
BoringSSL (Google's fork, deployed in Chrome)
AWS-LC (FIPS validated, production-ready)

Step 2: Test in non-production

# OpenSSH 9.9+ with ML-KEM
ssh -o KexAlgorithms=mlkem768x25519-sha256 user@host

Step 3: Monitor performance

Measure baseline (classical only)
Enable hybrid mode
Compare P50/P95/P99 latency
Check for regressions

Step 4: Gradual rollout

Canary deployment (5%)
Monitor for 1 week
Expand to 25%, 50%, 100%

Resources

I wrote a comprehensive deep-dive covering:

Complete algorithm comparison matrix
Performance benchmarks
Algorithm selection flowchart
Hybrid deployment strategies

📖 Read the full guide: Post-Quantum Cryptography Algorithms Explained

Also includes:

Downloadable algorithm selection flowchart (PDF)
PQC migration checklist
Links to NIST standards and AWS documentation

Bottom Line

The standards are ready. The implementations exist. Cloud providers are deploying.

Your move:

Identify systems using RSA/ECDH today
Update SDKs to versions supporting ML-KEM
Test hybrid mode in staging
Plan production rollout

The quantum threat isn't theoretical—it's operational today through harvest-now-decrypt-later attacks.

What's your migration strategy? Already testing ML-KEM, or still evaluating? 👇

What's Next in This Series

You now understand the threat (Part 1) and which algorithms to use (Part 2). In Part 3, we'll get hands-on with AWS implementation—real code, performance benchmarks, and operational guidance for deploying ML-KEM in production.

This is Part 2 of a 6-part series on post-quantum cryptography migration. Read Part 1: The Quantum Threat if you haven't already.

cybersecurity #cryptography #quantum #devops

DEV Community