DEV Community

Sean Bailey
Sean Bailey

Posted on

ArchivioMD: A Deep Dive into the Cryptographic Stack

This is an update to my earlier post on ArchivioMD. Since that writeup, the plugin has landed on WordPress.org, gained a full algorithm suite, HMAC integrity mode, RFC 3161 trusted timestamping, and external anchoring to GitHub/GitLab. There's a lot to unpack.


What the plugin actually does

ArchivioMD started as a way to manage meta-documentation files — your security.txt, privacy-policy.md, robots.txt and similar — from inside the WordPress admin. Every document gets a UUID and a checksum, every edit appends to an immutable changelog, and HTML rendering from Markdown is handled automatically.

That's still the core. What's grown significantly is the cryptographic layer on top of it: how hashes are computed, how they're bound to identity, how they're externally verifiable, and how you can timestamp them against a trusted third party. That's what this post is about.


The hash helper: algorithm-agnostic from the start

The entire hashing surface of the plugin runs through a single class, MDSM_Hash_Helper. Every hash — post content, document checksums, anchor records — goes through this one entry point. That was an intentional design choice: swap the algorithm once, and it propagates everywhere.

The algorithm roster

Standard (production-ready):

  • sha256 — SHA-256, the default
  • sha512 — SHA-512
  • sha3-256 — SHA3-256 (Keccak)
  • sha3-512 — SHA3-512
  • blake2b — BLAKE2b-512

Experimental (with automatic fallback):

  • blake3 — BLAKE3-256
  • shake128 — SHAKE128 with 256-bit output
  • shake256 — SHAKE256 with 512-bit output

The experimental algorithms come with a graceful degradation chain. If blake3 isn't natively available via a PHP extension, the plugin falls back to BLAKE2b-512, and if that's not available, it falls back to SHA-256. The fallback: true flag in the return array lets callers surface a warning rather than silently emitting a weaker hash than intended.

BLAKE3 in particular required its own implementation class (MDSM_BLAKE3) because PHP doesn't ship it natively. The HMAC construction for BLAKE3 is built manually using the standard H(K XOR opad || H(K XOR ipad || message)) structure, with the pure-PHP Blake3 hasher standing in for the compression function.

The packed string format

All stored hashes use a self-describing format:

Standard:  "sha256:abcdef…"
HMAC:      "hmac-sha256:abcdef…"
Legacy:    "abcdef…"              (treated as bare SHA-256)
Enter fullscreen mode Exit fullscreen mode

The unpack() method handles all three. Legacy bare-hex hashes from before v1.3 verify correctly against hash() without any migration. The format tag in the packed string drives the entire downstream verification path — the global "active algorithm" setting is never consulted when verifying an existing hash.


HMAC Integrity Mode

Standard hash verification answers: has this content changed?

HMAC integrity mode answers: has this content changed, and was the original hash produced by someone with access to the secret key?

How it's configured

The key lives in wp-config.php, never in the database:

define( 'ARCHIVIOMD_HMAC_KEY', 'your-long-random-secret-at-least-32-chars' );
Enter fullscreen mode Exit fullscreen mode

Then you enable the mode in Settings → Cryptographic Verification → HMAC Integrity Mode.

The plugin enforces a minimum key length of 32 characters and surfaces a warning (not an error) if the key is shorter. A missing key when HMAC mode is enabled is a hard error — hash generation is blocked, not silently degraded.

What changes in the hash flow

When HMAC mode is enabled, compute_packed() calls compute_hmac() instead of compute(). The packed string gets the hmac- prefix. Verification calls hash_equals() against a freshly computed HMAC rather than a plain hash.

One important consequence: if you rotate the key constant in wp-config.php, every existing HMAC hash immediately fails verification. The old hashes don't become invalid per se — the hmac-sha256: prefix still tells the verifier what to do — but the key they were signed with is gone. This is by design. Key rotation is a deliberate action that invalidates the previous integrity chain.

The status helper

There's an hmac_status() method that returns a structured array for the admin UI: mode_enabled, key_defined, key_strong, ready, and a notice_level / notice_message pair for surfacing the right admin notice. The four states are: off, enabled-but-no-key, enabled-with-weak-key, and fully active.


Post content verification: deterministic hashing

The MDSM_Archivio_Post class handles WordPress post and page content. The challenge here is determinism: the same logical content must always produce the same hash, regardless of whitespace normalization, line ending differences, or editor artifacts.

The canonicalization pipeline strips the content down to a stable form before hashing. It also binds the hash to the post ID and author ID to prevent hash reuse — you can't take a hash generated for post 42 and pass it off as valid for post 99.

The result is stored in post meta. On every save, if auto-generate is enabled, the hash is recomputed and logged to the audit table (wp_archivio_post_audit). The audit table gained a post_type column in v1.5.9, and the schema migration runs inline on construction if the column is missing.

Verification badge

Three badge states are possible:

  • ✓ Verified (green) — current content hash matches the stored hash
  • ✗ Unverified (red) — content has changed since last hash generation
  • − Not Signed (gray) — no hash has been generated for this post

Badges can be auto-injected below titles or content, or placed manually with the [hash_verify] shortcode. Visitors can download a verification file containing the hash, algorithm, mode, post metadata, and (if available) RFC 3161 timestamp details.


RFC 3161 Trusted Timestamping

This is the most involved feature in the 1.6.x series. RFC 3161 is the internet standard for cryptographic timestamping — you send a hash to a Time Stamp Authority, they sign it and return a timestamp token (a .tsr file) that proves the hash existed at a specific point in time. The token is signed by the TSA's certificate chain, which anchors it to a trusted third party entirely outside your infrastructure.

Built-in TSA profiles

The plugin ships with four profiles:

Provider URL Auth
FreeTSA.org https://freetsa.org/tsr None (rate-limited)
DigiCert http://timestamp.digicert.com None
GlobalSign http://timestamp.globalsign.com/tsa/r6advanced1 None
Sectigo https://timestamp.sectigo.com None (throttle to 15s+)

A custom endpoint option is also available for private TSAs or enterprise deployments.

The timestamping flow

  1. Take the content hash from the anchor record.
  2. If the content algorithm is SHA-256 and the hash is 64 hex characters, use the raw bytes directly as the RFC 3161 MessageImprint. If it's any other algorithm, SHA-256-hash the hex string — this is recorded in the manifest as "method": "sha256_of_hex" so you can reproduce the imprint independently.
  3. Build a DER-encoded TimeStampReq in pure PHP — no external ASN.1 libraries required. The structure follows RFC 3161 §2.4.1: a version integer, a MessageImprint sequence (algorithm OID + hash bytes), a random 64-bit nonce, and certReq TRUE.
  4. POST it to the TSA with Content-Type: application/timestamp-query.
  5. Receive the TimeStampResp, check the PKIStatus integer is 0 or 1, extract the serial and genTime for logging.
  6. Store the .tsr response and the original .tsq request in wp-content/uploads/meta-docs/tsr-timestamps/. A .manifest.json file is written alongside them with everything needed for offline verification:
{
  "content_hash_algorithm": "sha256",
  "content_hash_hex": "abcdef...",
  "tsr_message_imprint": {
    "algorithm": "sha256",
    "method": "direct",
    "note": "TSR message data equals the raw content hash bytes."
  },
  "verification_command": "openssl ts -verify -in file.tsr -queryfile file.tsq -CAfile /etc/ssl/certs/ca-certificates.crt"
}
Enter fullscreen mode Exit fullscreen mode

The .tsr and .tsq files are blocked from direct HTTP access via .htaccess. They're served through an authenticated AJAX download handler instead. The manifest JSON is publicly accessible — it contains no secret data.

Offline verification

Because the .tsr is a standard RFC 3161 token, you can verify it with openssl ts on any machine, independent of the WordPress installation:

openssl ts -verify \
  -in document-20260214-143022.tsr \
  -queryfile document-20260214-143022.tsq \
  -CAfile /etc/ssl/certs/ca-certificates.crt
Enter fullscreen mode Exit fullscreen mode

For FreeTSA you need to download their certificate separately (the cert_url is in the manifest). For DigiCert, GlobalSign, and Sectigo, the root is already in your system trust store.


External Anchoring

Separate from RFC 3161, the plugin can push cryptographic anchor records to GitHub or GitLab repositories. The idea is a distributed, tamper-evident audit trail: even if your WordPress database is compromised, the hashes are already committed to an external repository under a different trust domain.

The anchoring system is provider-agnostic. Every provider implements MDSM_Anchor_Provider_Interface:

interface MDSM_Anchor_Provider_Interface {
    public function push( array $record, array $settings ): array;
    public function test_connection( array $settings ): array;
}
Enter fullscreen mode Exit fullscreen mode

push() returns ['success' => true, 'url' => '...'] or ['success' => false, 'error' => '...', 'retry' => bool, 'rate_limited' => bool]. The RFC 3161 provider implements the same interface, which is how multi-provider anchoring (added in v1.6.4) works — the queue just calls push() on each enabled provider independently.

The queue

Anchor jobs are persisted to wp_options via MDSM_Anchor_Queue. Key properties:

  • Hard cap at 200 jobs — prevents the options row from growing unbounded on high-volume sites
  • Exponential backoff: 1 min → 2 → 4 → 8 → 16 minutes, up to 5 retries
  • Transient-based locking (15-second TTL) to guard against two cron processes running simultaneously and double-processing the same job
  • Per-provider state tracking (added in v1.6.4) so a failing Git provider doesn't block RFC 3161 jobs in the same queue entry

Multi-provider anchoring

Since v1.6.4, you can run GitHub/GitLab and RFC 3161 simultaneously. Each provider maintains independent retry state. Each writes its own entry to the Anchor Activity Log. If you're building a compliance evidence chain, you get a Git-hosted hash record and a cryptographically timestamped .tsr file for every anchor event.


WP-CLI

As of v1.6.2, several operations are available from the CLI:

# Drain the anchor queue
wp archiviomd process-queue

# Anchor a specific post
wp archiviomd anchor-post --post_id=42

# Verify a post's content hash
wp archiviomd verify --post_id=42

# Prune old anchor log entries
wp archiviomd prune-log
Enter fullscreen mode Exit fullscreen mode

Log retention defaults to 90 days with automatic daily pruning via cron.


Compliance export

Tools → ArchivioMD → Compliance Tools includes a structured JSON export that packages the full evidence chain for a given post or document: post metadata, hash history, anchor log entries, and inlined RFC 3161 TSR manifests. The intent is a self-contained artifact you can hand to an auditor or feed into a SIEM without them needing access to the WordPress installation.


Backward compatibility

One thing I've been deliberate about: every hash format ever emitted by the plugin still verifies correctly against the current codebase.

  • Bare hex from before v1.3 → treated as sha256 / standard mode
  • sha256:hex from v1.3 → standard mode
  • hmac-sha256:hex from v1.4 → HMAC mode

No migration scripts, no format conversion. The packed string carries all the information needed to verify it.


The plugin on WordPress.org

It's live at [wordpress.org/plugins/archiviomd]

1.6.6 will be on GitHub this weekend uploaded to WordPress after some test-drives

Happy to answer questions about any of the implementation details — particularly the ASN.1 builder, the BLAKE3 fallback chain, or the queue concurrency model. Those were the three areas that took the most iteration to get right.

Top comments (0)