SEN LLC

Posted on Apr 15

TOTP From First Principles: Building an RFC 4226 / 6238 CLI in Rust

#rust #cli #security #tutorial

A small Rust CLI that generates TOTP / HOTP codes, parses otpauth:// URIs, and verifies user-supplied codes against a clock-drift window. No dependency on totp-rs or oath-toolkit. Everything — base32, HMAC wiring, dynamic truncation, URI format — is written out so you can read it.

Source: https://github.com/sen-ltd/otp-cli

Every 2FA backend on earth — your GitHub, your AWS console, your bank, your employer's SSO — runs the same twenty-year-old algorithm to decide whether the six digits you just typed are the six digits it was expecting. That algorithm is TOTP (RFC 6238), which is itself a two-line wrapper around HOTP (RFC 4226). It is embarrassingly small. You can hold the whole thing in your head, and once you do, a lot of 2FA mystery dissolves.

I built otp-cli because every time I debug a broken TOTP integration I end up either (a) installing oathtool (C, unmaintained binary packages on macOS), (b) pip install pyotp (Python, not always available in the container I'm debugging), or (c) pulling in a Rust crate whose source I haven't read. What I actually wanted was a single static binary that I trust because I wrote it, that prints the RFC 4226 Appendix D test vectors on demand so I can prove to myself the math is right, and that speaks the same otpauth:// URI format a QR code encodes.

This post walks through what TOTP actually is, the three non-obvious bits of it, and the Rust code that implements them.

The problem

You scan a QR code with Google Authenticator. A six-digit code appears and changes every 30 seconds. On the server side, the same six-digit code must appear — without any network round-trip — using only a shared secret that was handed over at enrollment time. How?

The answer is HMAC, a counter derived from the clock, and a 1,300-line RFC you can read in an afternoon. No elliptic curves, no challenge-response, no clever ratcheting. Just:

counter = floor(unix_time / 30)
digest  = HMAC-SHA-1(secret, counter_as_8_bytes_big_endian)
code    = last_6_digits_of(dynamic_truncate(digest))

The only subtlety is dynamic_truncate, which we'll get to. Otherwise that's the whole spec.

Design: the three interesting bits

1. HOTP's dynamic truncation

HMAC-SHA-1 returns 20 bytes. Your six-digit code needs to fit in 20 bits (a bit over a million). So RFC 4226 picks four bytes out of the 20, not just the first four — it uses the low nibble of the last byte of the digest to choose the offset. This is the interesting bit.

Why? Because if you always used bytes 0..4, an attacker who saw a lot of codes would effectively see a lot of HMAC prefixes for sequential counters, and that's a mildly useful side channel. Randomizing the offset per digest makes every code bleed four bytes from a different place in the output. It's not cryptographically meaningful — HMAC's output is already uniform — but it's nice. And it's cheap.

Here is the dynamic-truncation step in src/hotp.rs, verbatim:

/// RFC 4226 §5.3 dynamic truncation.
fn dynamic_truncate(digest: &[u8]) -> u32 {
    let offset = (digest[digest.len() - 1] & 0x0F) as usize;
    ((digest[offset]     as u32 & 0x7F) << 24)
        | ((digest[offset + 1] as u32 & 0xFF) << 16)
        | ((digest[offset + 2] as u32 & 0xFF) <<  8)
        | ( digest[offset + 3] as u32 & 0xFF)
}

Three things to notice:

digest.len() - 1, not 19. Works for SHA-1 (20 bytes), SHA-256 (32 bytes) and SHA-512 (64 bytes) the same way. RFC 6238 says use the last byte regardless of hash length.
& 0x7F on the high byte. That masks off the sign bit so the result is a positive 31-bit integer. This is a leftover from 2005 when everyone was writing HOTP in Java, and Java's int is signed — the mask means the reference implementation and a Java port give the same decimal value. You inherit it even in Rust.
Four bytes starting at offset. Because offset ∈ [0, 15] and SHA-1 gives 20 bytes, you have exactly enough room (offset + 3 ≤ 18, fits in a 20-byte buffer).

The rest of compute is mechanical:

pub fn compute(secret: &[u8], counter: u64, digits: u32, algorithm: Algorithm) -> String {
    let counter_bytes = counter.to_be_bytes();              // 8-byte big-endian
    let digest        = hmac_digest(secret, &counter_bytes, algorithm);
    let code          = dynamic_truncate(&digest);
    let modulus       = 10u32.pow(digits);
    let truncated     = code % modulus;
    format!("{:0width$}", truncated, width = digits as usize)
}

And yes — it passes every RFC 4226 Appendix D vector (755224, 287082, 359152, ...) and every RFC 6238 Appendix B vector for SHA-1, SHA-256, and SHA-512. The test module has them inline. Any deviation is a bug, not flakiness.

2. Base32, the deliberately ugly alphabet

Authenticator secrets are encoded in base32, not hex, not base64, not base58. Why? Because RFC 4648 base32 uses only the 26 uppercase letters and the digits 2–7. The digits 0, 1, 8, 9 are deliberately excluded, because they get confused with O, I, B, g when written by hand or read aloud over a phone. A shared secret is something humans historically read out loud during enrollment (and some enrollment flows still display a string instead of a QR code — fall back for when the QR reader is broken), so the alphabet is optimized for error-free transcription, not density.

There's no crate needed. The whole decoder is forty lines of pure Rust:

pub fn decode(input: &str) -> Result<Vec<u8>, String> {
    let mut cleaned = String::with_capacity(input.len());
    for c in input.chars() {
        match c {
            ' ' | '-' | '\t' | '\n' | '\r' | '=' => continue, // strip noise + padding
            _ => cleaned.push(c.to_ascii_uppercase()),
        }
    }

    let mut out = Vec::with_capacity(cleaned.len() * 5 / 8);
    let mut buffer: u32 = 0;
    let mut bits:   u32 = 0;

    for c in cleaned.chars() {
        let v = char_to_value(c)
            .ok_or_else(|| format!("invalid base32 character: {:?}", c))?;
        buffer = (buffer << 5) | v as u32;
        bits += 5;
        if bits >= 8 {
            bits -= 8;
            out.push(((buffer >> bits) & 0xFF) as u8);
        }
    }
    if bits > 0 && (buffer & ((1 << bits) - 1)) != 0 {
        return Err("base32 input has trailing non-zero bits".into());
    }
    Ok(out)
}

fn char_to_value(c: char) -> Option<u8> {
    match c {
        'A'..='Z' => Some(c as u8 - b'A'),
        '2'..='7' => Some(c as u8 - b'2' + 26),
        _         => None,
    }
}

It's a bitstream buffer. You pump five bits in per character and pump eight bits out per byte. The strict check at the end rejects "trailing non-zero bits" — RFC 4648 says the unused bits after the last real byte must be zero, and oathtool is strict about it. We match that.

One nuance worth pointing out: we strip spaces and dashes. Real-world secrets are often displayed in groups of four (JBSW Y3DP EHPK 3PXP) because that's easier to type in from a screen. Copy-pasting that verbatim should work. So should lowercase — Google Authenticator shows uppercase but Android clipboards sometimes lowercase things.

3. TOTP: HOTP with a clock

This is the part that's genuinely a two-liner:

pub fn compute_at(
    secret: &[u8],
    unix_time: u64,
    period: u64,
    digits: u32,
    algorithm: Algorithm,
) -> String {
    let counter = unix_time / period;
    hotp::compute(secret, counter, digits, algorithm)
}

That's it. period is 30 in every real-world deployment. T0 = 0 (the unix epoch) by spec. So at unix time 59, counter = 1. At unix time 60, counter = 2. At unix time 1,700,000,000, counter = 56,666,666. You HMAC that with the secret, run dynamic truncation, take the low six digits, and that's your code.

Two gotchas in verify():

Clock skew. Clients and servers don't have perfectly synced clocks. If a user's phone is 15 seconds ahead of the server, the phone shows the "next" code a few seconds before the server thinks it should. Every production TOTP implementation accepts the previous and next period too, so the effective acceptance window is 90 seconds (three 30 s buckets). otp-cli verify --window 1 is exactly that:

pub fn verify(
    secret: &[u8], code: &str, unix_time: u64,
    period: u64, digits: u32, algorithm: Algorithm,
    window: i64,
) -> Option<i64> {
    for offset in -window..=window {
        let shifted = if offset >= 0 {
            unix_time.checked_add(offset as u64 * period)
        } else {
            unix_time.checked_sub((-offset) as u64 * period)
        };
        let Some(t) = shifted else { continue };
        let candidate = compute_at(secret, t, period, digits, algorithm);
        if constant_time_eq(candidate.as_bytes(), code.as_bytes()) {
            return Some(offset);
        }
    }
    None
}

Widen the window and you weaken the one-time-ness of the password linearly. Narrow it and legitimate users start failing on days the NTP gods are angry. ±1 is the universal default.

Constant-time comparison. a == b on &[u8] short-circuits on the first byte mismatch. For a 6-digit code that's probably not a real side channel — remote timing attacks at sub-microsecond resolution are hard — but the habit is free and the inline function is five lines. So we do it.

A note on HMAC-SHA-1

RFC 4226 mandates HMAC-SHA-1. You will see the word "SHA-1" and flinch. Don't. SHA-1's collision weakness is a hash weakness — attackers can construct two different inputs that hash to the same value. HMAC doesn't care about collisions; it cares about existential forgery under chosen message attack, which depends on the hash's PRF properties, which SHA-1 still has. NIST SP 800-107 still allows HMAC-SHA-1 for authentication. Every OTP library in existence still ships SHA-1 as the default. RFC 6238 adds SHA-256 and SHA-512 variants, and otp-cli supports them, but interoperability with real authenticator apps basically demands SHA-1.

Tradeoffs and honest non-goals

Not a vault. otp-cli reads the secret from --secret or --uri, which means it lands in ps, in your shell history, and in any log that records command lines. This is a debugging tool, not a 2FA app. For real codes, use a hardware key or a password manager.
No Steam Guard variant. Steam uses a five-character code over a custom alphabet (for historical reasons). We could implement it — it's 30 lines — but it felt like scope creep.
No RFC 6287 (OCRA). Challenge-response OTP is a related but different algorithm. Out of scope.
--window is linear. For each -w..=w we run a full HMAC. That's fine — HMAC-SHA-1 is nanoseconds — but don't expose it to attackers as --window 1000000. A real auth backend would also track "already used" codes inside the current period to prevent replay; that's a stateful concern that doesn't belong in a stateless CLI.
otpauth extensions ignored. The otpauth:// format has a few non-standard parameters (image, lock) that different vendors add. We parse the standard parameters (secret, issuer, algorithm, digits, period, counter) and forward-ignore the rest, so parsing never breaks on unknown fields.
No QR decoding. If you have a .png of a QR code, run it through zbarimg and pipe the output into otp-cli parse. A QR decoder is twenty times bigger than the rest of the tool put together and has nothing to do with the OTP math.

Try it in 30 seconds

The whole thing runs in a 9.7 MB Alpine image:

# Build
git clone https://github.com/sen-ltd/otp-cli
cd otp-cli
docker build -t otp-cli .

# RFC 6238 Appendix B vector — SHA-1, t=59, 8 digits → 94287082
docker run --rm otp-cli gen \
  --secret GEZDGNBVGY3TQOJQGEZDGNBVGY3TQOJQ \
  --time 59 --digits 8
# 94287082

# RFC 4226 Appendix D vector — counter=1 → 287082
docker run --rm otp-cli hotp \
  --secret GEZDGNBVGY3TQOJQGEZDGNBVGY3TQOJQ \
  --counter 1
# 287082

# Build a QR-compatible URI
docker run --rm otp-cli uri \
  --secret JBSWY3DPEHPK3PXP \
  --issuer Acme --account alice@acme.com
# otpauth://totp/Acme:alice@acme.com?secret=JBSWY3DPEHPK3PXP&algorithm=SHA1&digits=6&period=30&issuer=Acme

# Parse one as JSON
docker run --rm otp-cli --format json parse \
  --uri "otpauth://totp/Acme:alice@acme.com?secret=JBSWY3DPEHPK3PXP&issuer=Acme"

# Verify a code at a fixed time with ±1 period of slack
docker run --rm otp-cli verify \
  --secret GEZDGNBVGY3TQOJQGEZDGNBVGY3TQOJQ \
  --code 005924 --time 1234567890 --window 1
# ok (offset 0)

Those outputs are deterministic. Any drift means the implementation is wrong, not flaky.

What I learned

RFC 4226 is short. Maybe sixty pages, most of it test vectors and security analysis. You can read the whole thing in an hour. The actual algorithm is three pages.
Base32 is a human-error-tolerant protocol, not a compression choice. Once you see that, the exclusion of 0189 stops looking weird.
Dynamic truncation isn't cryptographic, it's cosmetic. The uniformity of HMAC's output guarantees the six digits are uniform regardless of offset choice. The offset exists because someone in 2005 wanted to avoid always truncating from the same position as a defense-in-depth habit. That's fine. It's a free habit.
"SHA-1 is broken" is a collision statement, not an HMAC statement. Reading RFC 2104 clarifies this more than any blog post.
A CLI is the right shape for this. You don't want this in a library you link into your auth server — you want it in a terminal where you can pipe it, script it, and prove to yourself the codes are right before you hook it into anything that matters.

The code is at https://github.com/sen-ltd/otp-cli under MIT. Forty lines of base32, a hundred of HOTP/TOTP, some clap glue, and a Dockerfile. If you want to understand TOTP, read the source; that was the whole point.

Top comments (1)

mote • Apr 21

Building from first principles is underrated for this kind of protocol. I spent a while debugging a TOTP integration that was almost correct — the timing window math was slightly off because I'd assumed the counter was floored to 30-second intervals in UTC when one of our embedded devices was computing it in local time with a drift. Adding RFC Appendix D test vectors would have caught it immediately.

The "I trust it because I wrote it" point resonates. There's something specifically valuable about a minimal static binary for security-adjacent tooling. The auditability tradeoff is real — fewer dependencies means a smaller blast radius if a crate in the supply chain does something unexpected.

The clock drift window management is the subtle part most implementations get wrong. RFC 6238 recommends ±1 step tolerance but doesn't specify how to handle persistent drift in resource-constrained environments where NTP isn't always available. Have you thought about how this would behave on an air-gapped embedded device that accumulates drift over days without a time sync?