Introduction
A few days ago, I hit a frustrating issue while integrating a custom Nextcloud application with a Django REST Framework backend.
Everything looked correct:
- shared HMAC secret ✔️
- canonical request string ✔️
- HMAC-SHA256 ✔️
- timestamps synchronized ✔️
Yet every authenticated request failed with:
invalid nextcloud signature
The interesting part?
Both implementations were technically correct.
The failure came from something much smaller — and much more dangerous in distributed systems:
Different string encodings of the exact same HMAC digest.
This article walks through the full debugging process, the root cause, and the engineering lessons learned from debugging cryptographic interoperability between PHP and Python services.
System Architecture
The integration architecture looked like this:
┌──────────────────────┐
│ Nextcloud App (PHP) │
│ Generates HMAC │
└──────────┬───────────┘
│
│ Signed HTTP Request
▼
┌──────────────────────┐
│ Django DRF Backend │
│ Verifies Signature │
└──────────────────────┘
The request flow:
- Nextcloud generates a canonical request string
- PHP computes an HMAC-SHA256 signature
- Signature is attached to request headers
- Django reconstructs the canonical string
- Django recomputes the HMAC
- Signatures are compared
Simple in theory.
Except it kept failing.
Initial Symptoms
The backend logs showed repeated authorization failures:
nextcloud_hmac.denied
code=invalid_signature
Even more confusing:
- the integration had worked before
- secrets matched
- clocks matched
- payloads matched
At first glance, it looked like a replay issue, timestamp skew problem, or cache corruption.
It turned out to be none of those.
The Root Cause
The issue came from a mismatch in how the HMAC digest was encoded.
Nextcloud (PHP)
The PHP client generated the signature like this:
base64_encode(
hash_hmac('sha256', $canonical, $secret, true)
);
Notice the important detail:
true
That parameter returns the raw digest bytes.
Those bytes were then encoded as Base64.
Django (Python)
Meanwhile, Django verified signatures like this:
hmac.new(
secret,
canonical.encode(),
hashlib.sha256,
).hexdigest()
hexdigest() returns a hexadecimal string representation.
So both systems produced:
- the same HMAC bytes
- using the same algorithm
- using the same secret
But converted those bytes into different string formats.
The Hidden Interoperability Bug
This was the breakthrough moment.
The exact same digest bytes produced:
Hex:
44c39c4ecc7268547ca51db72c6f27125251e6ea8ce3c659d918a9542522b612
vs
Base64:
RMOcTsxyaFR8pR23LG8nElJR5uqM48ZZ2RipVCUithI=
Both values represent the same underlying bytes.
But string comparison obviously fails.
The Second Bug
While investigating, I found another subtle issue.
The Django verifier lowercased the incoming signature before comparison:
signature = signature.lower()
That may appear harmless for hexadecimal values.
But Base64 is case-sensitive.
Meaning:
ABC != abc
So even after fixing the encoding mismatch, lowercasing would still break verification.
This was a protocol normalization bug hiding inside the verification pipeline.
The Fix
I updated Django to verify signatures using Base64 instead of hexadecimal.
New Verification Function
import base64
import hashlib
import hmac
def compute_hmac_signature_b64(
*,
secret: bytes,
canonical_string: str,
) -> str:
"""Compute Base64 encoded HMAC-SHA256 signature."""
digest = hmac.new(
secret,
canonical_string.encode("utf-8"),
hashlib.sha256,
).digest()
return base64.b64encode(digest).decode()
Then all verification calls were updated to use:
compute_hmac_signature_b64()
instead of:
.hexdigest()
Finally, I removed:
.lower()
from the verification flow.
Verification Results
After deploying the fix:
Ping Endpoint
GET /api/v1/integrations/nextcloud/ping/
200 OK
Token Issuance
POST /api/v1/integrations/token/
200 OK
Authentication immediately started working again.
Secondary Investigation Findings
While debugging, I validated several other production concerns.
1. Time Drift
I suspected clock skew initially.
Both services were checked:
Nextcloud epoch: 1778841776
Django epoch: 1778841776
Drift: 0 seconds
Time synchronization was perfect.
2. Shared Secrets
Client IDs and secrets matched correctly across both systems.
This eliminated:
- environment mismatch
- stale secrets
- config drift
3. Redis and Cache State
I flushed:
- Redis
- Django cache
- integration token caches
This helped eliminate stale token artifacts and replay-state inconsistencies.
4. Infrastructure Validation
I also verified:
- loopback networking
- gunicorn binding
- uvicorn workers
- allowlists
- HTTP dev mode configuration
At this point the investigation became less about cryptography and more about systematic elimination of variables.
Why It “Worked Before”
This was the most interesting systems question.
I had not changed the signing logic recently.
So why did the failure suddenly appear?
The likely answer is:
Infrastructure state had been masking a latent protocol incompatibility.
Possible contributors:
- cached tokens
- stale replay windows
- inactive code paths
- existing sessions bypassing verification
- Redis persistence behavior
This is an important engineering lesson:
A system can contain dormant interoperability bugs for weeks before infrastructure conditions expose them.
Engineering Lessons Learned
1. Cryptographic Bytes ≠ String Representation
HMAC output is binary data.
Hexadecimal and Base64 are merely different textual encodings of the same bytes.
They are not interchangeable.
2. Cross-Language Integrations Need Explicit Contracts
Never assume:
- encoding format
- canonicalization rules
- normalization behavior
Define them explicitly.
Especially across:
- PHP
- Python
- Go
- Node.js
- Java
3. Normalization Can Break Security
Lowercasing signatures looked harmless.
It was not.
Cryptographic values should only be normalized if the protocol explicitly defines normalization behavior.
4. Infrastructure State Can Hide Bugs
Cache layers and token persistence can temporarily conceal protocol inconsistencies.
Sometimes:
- restarts
- cache flushes
- clock resets
suddenly expose issues that already existed.
5. Production Debugging Requires Elimination Discipline
The investigation involved validating:
- clocks
- secrets
- caches
- workers
- networking
- encoding
- replay protection
- request canonicalization
Good debugging is often less about guessing and more about systematically removing uncertainty.
Final Thoughts
The most dangerous bugs are not always algorithm failures.
Sometimes:
- the crypto is correct
- the infrastructure is healthy
- the logic is valid
…but the protocol contract between systems is inconsistent.
In this case:
The cryptography was correct on both sides. The protocol contract was not.
And that single mismatch was enough to break the entire authentication flow.
Top comments (0)