I Almost Hashed a Document Number That Needed to Be Read Again

#python #backend #security #cybersecurity

What building KYC verification taught me about the difference between encryption and hashing - and why getting that choice wrong would have broken the entire feature

VaultPay is a wallet microservice I built on top of AuthShield.
Previous parts:
Part 1 is here: I Built AuthShield and Immediately Knew It Wasn't Enough
Part 2 is here: The Silent Failure I Never Saw Coming: What VaultPay Taught Me About Consistency Under Failure
Part 3 is here: I Started With a Blocklist. That Was the Wrong Instinct and VaultPay Taught Me Why.
Part 4 is here: I Watched Money Move Twice From the Same Request. That's When I Understood Idempotency.

When I started designing KYC verification for VaultPay, my first instinct was to hash the document number before storing it.

Hashing felt like the secure choice. It's what you do with passwords. One-way, irreversible, can't be decrypted even if the database leaks. I'd internalised "never store sensitive data in plaintext, always hash it" as a general security rule.

Then I tried to write the admin review endpoint, and the rule fell apart.

An admin needs to look at the actual document number to verify it against the physical ID a user submitted. "Does 1234 5678 9012 match what's on this person's Aadhaar card" is a question that requires reading the original value back. A hash can't answer that. SHA-256 of a document number gives you a fixed-length string that's useless for verification - you can check if two hashes match, but you can never recover what produced them.

That's the moment the difference between hashing and encryption stopped being abstract. Hashing answers "is this the same value I saw before." Encryption answers "what was the original value, and who's allowed to see it." KYC needed the second question answered, not the first.

What VaultPay's KYC Flow Actually Does

The flow is intentionally simple right now - submit a document type and number, get it encrypted at rest, have an admin review and approve or reject it.

User submits doc_type + doc_number
    ↓
Check: does this wallet already have a KYC submission?
    ↓ no
Encrypt doc_number with AES-256-GCM
    ↓
Store ciphertext, status = "pending"
    ↓
Notify admin: new submission waiting
    ↓
[Admin reviews]
    ↓
Decrypt doc_number for admin to view
Log the decrypt as an audit event
    ↓
Admin approves or rejects
    ↓ approved
wallet.kyc_verified = true

There's no document file upload in this version - no scanned ID image, no photo verification. It's a text field: the document type (Aadhaar, PAN, passport) and the document number itself. That number is what gets encrypted.

This is deliberately scoped. The roadmap includes RAG-based document extraction and duplicate detection across accounts, but neither is built yet. What's shipped is the part that actually has to be correct before any of that matters - encrypting and controlling access to identity numbers people are trusting you with.

Why AES-256-GCM and Not Hashing

Once the hashing approach was ruled out, the next question was which encryption scheme.

AES-256-GCM specifically, not just AES-256 in some generic mode, because GCM gives you authenticated encryption. It doesn't just encrypt the data - it also generates a tag that lets you verify the ciphertext hasn't been tampered with. If someone modifies the encrypted value in the database, decryption fails loudly instead of silently returning garbage that looks like a valid document number.

from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os

def encrypt_doc_number(doc_number: str, key: bytes) -> str:
    aesgcm = AESGCM(key)
    nonce = os.urandom(12)  # unique per encryption
    ciphertext = aesgcm.encrypt(nonce, doc_number.encode(), None)
    # Store nonce alongside ciphertext — needed for decryption
    return base64.b64encode(nonce + ciphertext).decode()

def decrypt_doc_number(stored_value: str, key: bytes) -> str:
    raw = base64.b64decode(stored_value)
    nonce, ciphertext = raw[:12], raw[12:]
    aesgcm = AESGCM(key)
    return aesgcm.decrypt(nonce, ciphertext, None).decode()

The encryption key itself lives in an environment variable, never in the database alongside the data it protects. If the database is ever compromised on its own, the ciphertext is useless without the key living separately.

Logging Every Decrypt, From the Start

The part of this design I'm most glad I got right early is the audit logging on admin access.

Every time an admin decrypts a document number to review a KYC submission, that access gets written to the audit log as VIEWED_KYC_DOCUMENT, tied to the admin's actor_id and the submission's target_id.

async def get_kyc_submission(submission_id: UUID, admin_id: UUID, db, kyc_crypto):
    submission = await db.get(KYCSubmission, submission_id)

    decrypted = kyc_crypto.decrypt(submission.doc_number_encrypted)

    # Logged unconditionally, every single time this runs
    await db.execute(
        insert(AuditLog).values(
            actor_id=admin_id,
            action="VIEWED_KYC_DOCUMENT",
            target_type="kyc_submission",
            target_id=submission_id,
            created_at=datetime.utcnow(),
        )
    )

    return {"doc_type": submission.doc_type, "doc_number": decrypted}

This wasn't something I added after thinking about compliance requirements later. It was part of the design from the first version of this endpoint, because decrypting someone's government ID number is the kind of action that should never happen invisibly — not even by the person authorised to do it.

The reasoning is simple: encryption protects the data from people who shouldn't see it. Audit logging protects the data from the people who can see it, by making every access accountable. Without the log, an admin viewing a document leaves no trace. With it, there's a permanent record of who looked at what and when — useful for the user's protection and the admin's, since it proves nothing was accessed without reason.

What Approval Actually Unlocks

Before KYC approval, a wallet exists but can't do much. Looking at VaultPay's guard conditions: viewing your balance requires nothing. But topping up, sending money, and withdrawing all require kyc_verified = true on the wallet.

This means a newly created wallet is functionally inert until KYC clears. You can see it. You can't put money in it or move money out of it. That's intentional - there's no scenario where a financial wallet should be able to move real money before the identity behind it has been verified at all.

The approval itself is a small atomic transaction - update the submission status, flip the wallet's kyc_verified flag, write the audit log, notify the user. All inside one commit, for the same reason the transfer engine from Post 1 wraps its writes atomically. You don't want a submission marked "verified" while the wallet itself is still locked out, or vice versa.

async with db.begin():
    submission.status = "verified"
    submission.reviewed_by = admin_id
    wallet.kyc_verified = True

    db.add(AuditLog(
        actor_id=admin_id,
        action="APPROVED_KYC",
        target_id=submission.id,
        before_state={"status": "pending"},
        after_state={"status": "verified"},
    ))

    db.add(Notification(user_id=submission.user_id, message="KYC verified!"))
    # COMMIT — submission status and wallet flag change together

If this weren't atomic, you could end up with a submission marked verified while the wallet still has kyc_verified = False - a state where the audit trail says one thing and the actual permission the user cares about says another. Small inconsistency, but exactly the kind of thing that erodes trust in a financial system if a user ever hits it.

What I Actually Learned

Going into this, I thought of encryption and hashing as roughly interchangeable security tools - both ways of not storing sensitive data in the clear. Building KYC made the distinction concrete instead of abstract.

Hashing is for proving you already know a value - checking a password against its hash, detecting if two values are identical without storing either one. Encryption is for protecting a value you need to read again later, under controlled conditions. KYC is squarely the second case. An admin has to be able to see the actual document number to do their job. No amount of clever hashing makes that possible.

The audit logging taught me something separate: encryption alone protects data from outsiders, but a financial system also needs to protect data from its own insiders having unaccountable access. Those are different threats and they need different controls. Encryption answers "who can see this at all." Audit logs answer "who did see this, and can we prove it."

What's shipped here is intentionally narrow - text-field encryption, manual admin review, nothing more. Document image uploads, duplicate detection across accounts, and RAG-assisted extraction are still on the roadmap. But the part that's live had to be correct before any of that complexity gets added on top, because every future feature inherits whatever assumptions this layer makes about how identity data is protected.

Next up: the last post in this series - how VaultPay enforces four permission tiers across two services that share no database, and where dual-mode JWT validation becomes a correctness question rather than a performance one.

Engineering docs + code samples: Vaultpay-Engineering