DEV Community: Aiden Vaines

¡Hola, soy DORA. Why hasn’t AI improved my metrics?

Aiden Vaines — Sun, 10 May 2026 20:58:36 +0000

Originally published at https://www.vaines.org/posts/2026-05-09-why-hasnt-ai-improved-my-metrics/

I'm often asked about AI stuff as the rest of my blog might suggest. The world is currently scrambling around for how and where AI best accelerates delivery of software programmes, but I still don't think there is one solution - similar to what happens when you ask any two people how JIRA should be used to be fair.

There's so much hype and a lot of vapour-thought for ways of working that sound legit but have never been battle tested at any scale. Intermixed with that are some amazing success stories and methodologies that work great for a a particular org or project or just a single team greatly enhancing their workflows.

A team I'm working with have been delivering code - yes, into production - for a little while and I'm seeing a pattern emerging from elsewhere in my programmes alongside those my peers work on doing similar AI acceleration journeys.

Engineers are reporting they are shipping code faster, the tools are being used regularly and they are mostly liked. Qualitative measures like SPACE show things improving and moving quickly but the quantitative measures like DORA, the entrenched way to measure how things are moving...haven't moved. In some cases they've reportedly got worse.

DORA Metric	Change
Deployment Frequency	➖
Lead Time For Change	➖
Time to Restore Service	➖
Change Failure	🔻

On the surface this is an uncomfortable place to be. AI is the second coming of Jesus if you believe the hype, so why doesn't it make numbers go up? Engineers reckon they’re faster and stuff is going out faster for the most part - yes I know all about the '19% study slower' - but all the measures we use, rely on, report on, and optimise for haven't changed positively or resoundingly.

DORA cares not for your AI tools

DORA doesn’t care how quickly code appears on a screen. It cares how reliably and smoothly changes make it to production. Roughly that should include:

flowchart LR
    A[Planning and Refinement] --> B[Coding]
    B --> C[Review]
    C -->|Changes required| B
    C -->|Approved| D[Testing and Validation]
    D -->|Defects found| B
    D -->|Validated| E[Deployment and Release]

When you use AI automation like SpecKit, BMAD, or similar tooling, you take away some of the human bashing at the keyboard or poking and prodding at a handful of Claude/Copilot/Codex (why are they all C nouns?) terminals. The focus heavily switches from writing code to writing specifications, capturing and refining requirements and validating decisions. On the other side of the code generation task it's reviewing, criticising, and mobbing on a PR.

It's not unreasonable to say that AI generates more fluff than most meat-developers do and it does take longer to properly go through, trim, redirect and refactor AI output. Even if the AI is doing the heavy lifting, and it does it faster than us mere mortals can, it needs iterating, checking and validating just like if humans wrote the code. There are decisions and choices in how it's built things, weird variable name choices, there's 300 tests which all pass.....because they are doing the equivalent Assert True . Ultimately you're responsible for the code you commit, even if you effectively outsourced it to a robot.

AI applies pressure at a single stage; code generation. There's now more work entering the system, our review capacity, CI/CD throughput, release governance, and risk tolerances largely all stay where they are. But the system is processing work at roughly the same rate, we're just tackling things larger than the old limits imposed by the bottleneck of us lazy humans limited by sleep, caffeine, and roughly 10 fingers.

Sizing of work becomes a bit of a weird space too. Before you start measuring your deployments, how do you estimate a piece of work? Yes, you know roughly how you might tackle it, maybe you've got a rough plan in your head of where in the codebase it touches, how you might test it and so on. But you're not hands-on writing the code, you'll be drawing up specifications and requirements. Code generation is being handed off to an LLM to generate the code. You want to assume like you would with you human peers, it's having a good day and doesn't fall down rabbit holes, but the reviews and subsequent re-generations still take time.

If you imagine any old 3-point story that includes an amount of writing some code but apply a SpecKit style AI workflow to it, the code is going to be almost entirely AI generated. You're likely to end up with more code to review, more edge cases through AI non-determinism. You then need to take the time you may have spent writing the code writing an implicit scope or contract spec to hand to the AI.

On the determinism statement; I've heard this as a detractor statement for AI code generation quite a bit. I think if you asked any two developers, or the same developer a month apart, to write the same thing from the same information you would get two different results anyway as humans aren't deterministic either. I usually describe AI as being a cupboard of junior engineers. You chuck solid requirements in, you should get something working out but if you put crap in you get crap out. AI is the same, just each conversation (session) is a new engineer.

Once you've handed the specs for your theoretical 3-pointer over to the magic box of goblins and performed your favourite ritual of computation; slaughter a couple of goats, scroll through hackernews, whatever. Then you look at what the magic box has produced and hope it's good or at least close. DORA assumes relatively stable units of work but the assumption starts to wobble a bit when you hit review phase and the code produced is poor quality, missed the target or perhaps is totally broken and you need to stretch the metaphor and kick the box of goblins a bit.

You can now either re-run the code generation phase, hope Jupiter is in retrograde, and hope the result is closer to the expectation. If where AI went wrong is clear, you may be able to revisit the Spec and correct the problem before re-generating. If the code was close enough you might be able to steer it back into line or over the final hurdle. Whichever path gets taken is something we now need to measure. The measures for DORA might cover some of these activities but it's not going to surface the time back and forth and the struggle or pain points with the process. If the metrics are not providing feedback for improvement, what's the point?

What are we actually measuring?

I started trying to think over what I'm actually using my metrics for, how they inform decisions, and where it made most sense to recalibrate given the changes in the approach. I think the question I'm trying to answer is 'what makes sense to track to get the same signals?' given the shift of effort left and right towards planning and review, away from the classic development task.

Before AI driven development workflows (excluding review fix feedback loops) I imagined it looked, in its extreme like this:

flowchart LR
  A["Idea"] --> B["Slow Coding"]
  B --> C["Review"]
  C --> D["Deploy"]

With AI driven development, we push the effort left and right away from the code generation task.

flowchart LR
  A["Idea"] --> B["Slow Refinement"]
  B --> C["Fast Coding"]
  C --> D["Slow Review"]
  D --> E["Deploy"]

The overall amount of 'stuff' getting done hasn't really changed anywhere, just where it's being done, and how it's carved up has changed. The bottlenecks on how much can be done in a single unit of effort/work are quite different too. We're still reporting and recording DORA metrics I just needs to measure different tasks and activities underneath them.

AI driven development task has lead us to, generally, larger, denser changes. It's easier to identify and scope tasks which previously you'd have carved up in to smaller tasks because, well....it's not you doing it.

I find I tend to think in units of 'delivered features' rather than what makes sense to develop specifically like I would in a more traditional capacity. Not because I'm not doing the work as I jest, but because it's moving from thinking in a 'doing' mindset in to more 'product' focus. For years as engineering types, we've felt like product and delivery people don't always understand why something can't be split into multiple tasks or divided in a certain way and .... oh no... I owe some people some apologies.

This trend towards larger units of work has a knock-on effect of bigger review effort, higher cognitive load for reviewers, and potentially a bigger failure surface. I thought the system/workflow might absorb the gains as additional scrutiny rather than expressing it as higher throughput. Which to some extent it has.

When we're reviewing this work, the team are mobbing on the review tasks - we used to call this pair-programming in the olden days. That's leading to our less senior team members better understanding what's going on, though I don't think this is as useful for learning as making mistakes and solving problems yourself. It helps with quorum for decision making and generally everyone actually having a good understanding of the system and generated code context. It is too early to really know what any of the long term impact of any of that is going to be.

I read a study a couple of weeks ago where 150 or so professional developers were asked to write new features for a Java app, or expand its existing feature set with and without AI. They found task completion time improved by 30-50% with no overall change in code maintainability. Though there does seem to be a new study like this every five-minutes and I'm never sure who actually funded them. Scepticism aside, that does seem interesting that the previous generation of similar studies finding worse code and worse results. Have models got better? Are we more accepting of slop? Are we getting better at steering AI for our goals?

History doesn't repeat but it does have four legs and barks

Manufacturing ran into similar problems like this decades ago; I've never worked in Manufacturing but Gene Kim et.al promised me in The Phoenix Project that they did.

Speeding up one stage doesn’t improve overall throughput unless all the bottlenecks move too, otherwise you just move the bottleneck. I think for this simplified SDLC example problem there are two main competing parts;

Cycle time: aka how long a piece of work takes. Martin Fowler has a better explanation that I would ever write
Throughput: how much the system delivers or can be processed by the system.

The AI development workflows compress parts of cycle time though throughput continues with a slightly different set of constraints or bottlenecks imposed by the front-loading and backloading of effort to refinement and review phases.

DORA metrics are of course not the only way you can measure things in the traditional SDLC and general software development spaces. My favourites or at the very least the four I have some actual experience and exposure to are;

Cycle time variants, basically measuring where time is spent, not just the total.
SPACE adds satisfaction, communication, and activity to broaden the view, is more focused on a qualitative, I often describe SPACE as a vibes based alternative to DORA's hard measurements.
DevEx focuses on feedback loops, cognitive load, and flow.It comes from the same people who wrote SPACE, and is more focused, as the name suggests, on the Developer Experience.
DX Core 4 tries to reconcile measures into speed, effectiveness, quality, and business impact. This is the only one of the list which at least acknowledges AI as a factor in modern development

From a certain height with a bit of a squint they are all trying to surface out similar useful information, I think it can all boil down to asking "How long does it take to do stuff?" and "How hard is it to do it?", essentially identifying where lifecycle friction exists. AI seems to just be adding a different type of friction which needs a highlighting without listening to the AI-token-hocking vendors who are profit incentivised to sell more tokens rather than useful implementations.

Which looks better ; 👓.....or 🕶️. 👓.......or 🕶️

In most projects, DORA is already being derived from the existing toolchain. Jira tracks state transitions, commits reference tickets, and CI/CD systems mark deployments. Lead time is often calculated as something like “In Progress → Deployed” or “In Progress → Closed” and that's cool, we should continue to do so but we'll need make sure our measures account for the changes in how work is fed into the system for AI based working patterns.

Instead of reducing everything to a single lead time number, we'll need to keep more of the intermediate signals and derive some additional ones from the same data. By that I means, if you already have:

Jira issue transitions (e.g. In Progress, In Review, Done)
Git commits linked via ticket IDs
Pull request timestamps
CI/CD deployment events

Given the existing data we can pull out these three extra measures which I think should give the extra information to make what we showing previously under the same DORA sauce.

1. Review latency (from Git)

You’re likely already collecting PR data implicitly for current measures, just instead of counting merges, extract:

first_review_at - opened_at
merged_at - opened_at

This should give us the surface for how long work waits once it leaves “In Progress”, so what would previously have been 'developer effort' now being 'reviewer effort' and treating the review phase as the time or engineering friction point. This should feed in to the lead time for change metrics we're reporting.

2. Change footprint (from Git + Jira linkage)

You’re already linking commits to tickets so measure aggregates per ticket for:

Total files changed
Total additions/deletions
Number of PRs / commits

Now a “3-point story” has a measurable implementation footprint rather than just a planning estimate, I like to call this one the 'Slop-osity' measure. I'm going to add a 'Slop-o-meter' to my dashboard when nobody is watching. These should feed heavily into the Deployment Rework rate reporting.

3. Flow distribution (from Jira)

If lead time today is measured as:

In Progress → Closed

Then keep the breakdown instead of collapsing it into a single A-B keep the

Time in “In Progress”
Time in “In Review”
Time in “Ready for Release” / waiting states

You’re already storing this in the issue history but previously you probably only needed the collapsed result. Depending on how your releases go this could feed into Deployment Frequency or Lead Time metrics.

4. Spec/rework loop count (from Jira + Git)

This is the one I think maybe the most important one for AI driven delivery. If a task goes:

flowchart LR
  A["Spec"] --> B["Generated Implementation"]
  B --> C["Review"]
  C --> D["Spec Correction"]
  D --> E["Regenerated Implementation"]

The loop is really useful information as it tells us whether the issue was poor requirements, poor model output, too much ambiguity, too large a change, or a workflow that encourages people to keep rolling the dice and burning ~rainforests~ tokens rather than narrowing the target.

This should be trackable by measuring:

Number of PR update bursts after review
Number of times a ticket moves backwards from review to active work - which I've never really thought to include, partially because Jira doesn't make it easy.
Number of linked spec or prompt changes after implementation starts

🌟'We did it!'

Given how AI code generation seem to be materialising within enterprises with an AI strategy, where the strategy isn't just "Here's Claude, do more, k thx bye" and feeding all the money into the AI money hole. There's still need to accurately know how long something's going to take to deliver and how much it will cost. It needs to work, and it needs to be understood, documented, supported, and stable to various levels and degrees depending on what it is.

As we move away from each line of code being lovingly crafted like the Code Artisans of old to something where expertise and engineering is used to steer, guide, and assess AI code output. We still need to get our hands dirty with an increasingly smaller and smaller pile of business logic and niche bits of logic where a human is still better. I still see AI as being a major contributor to the 80/20 rule in this regard, where we offload large portions of the tasks and focus on the difficult bits, the review and oversight areas.

Measuring stuff is obviously important and until someone smarter than me comes up with a better solution, I'm sticking with DORA's concepts with some changes in how I'm measuring rather than what I'm measuring for.

The Comforting Lie Of SHA Pinning

Aiden Vaines — Tue, 24 Mar 2026 12:32:25 +0000

Originally published at https://www.vaines.org/posts/2026-03-24-the-comforting-lie-of-sha-pinning/

In March 2026, Trivy became the latest reminder that software supply chains are, at best, loosely held together with convention and trust.

A typosquatting attack slipped malicious code into what looked like a legitimate dependency path. The post-mortems are worth reading, and they all converge on a single recommendation: pin your dependencies. In the GitHub Actions world, that usually translates to use commit SHAs, not tags.

There’s a widely held belief that pinning a GitHub Action to a commit SHA gives you immutability, its what Microsoft/GitHub are recommending, and its what Aqua are recommending. After all, a SHA is content-addressed. It cannot be moved. It cannot be re-tagged. It is, in theory, the most stable reference you can use. The problem with that line of thinking is that the resolution of that SHA is not scoped the way most people assume. Specifically, GitHub Actions does not meaningfully validate that the commit SHA you reference belongs to the repository you think it does.

Wait, what? No, thats not right...

I set up a deliberately small example to test this behaviour.

A “legitimate” action: avaines/blog_gh_sha_pinning_action
A consuming application: avaines/blog_gh_sha_pinning_app

The application references the action in the usual way:

uses: avaines/blog_gh_sha_pinning_action@<some-sha>

So far, so normal.

Now introduce an attacker:

Fork the action repository to aidenvaines-cgi/blog_gh_sha_pinning_action
Add a malicious step (in my case, just printing output, but in reality this is where you exfiltrate all the fun stuff like secrets and personal data)

Next, create a pull request to the consuming application that appears to simply bump the pinned SHA:

The SHA used in the PR comes from the attacker-controlled fork of the action, despite it still being referenced as avaines/blog_gh_sha_pinning_action

You might reasonably assume one of the following safeguards exists:

GitHub validates that the SHA belongs to avaines/blog_gh_sha_pinning_action
Or the workflow fails because the commit cannot be found in the specified repository

Neither is true, and that is madness

The workflow executes successfully!!!!!!!

GitHub resolves the commit SHA, finds a matching object, and executes it, regardless of which fork it originated from.

From the platform’s perspective, a fork is a separate repository with a shared object graph/history. When the runner resolves the reference, it ultimately looks up the commit in the Git object database; if that object exists and is reachable, it can be used regardless of which fork introduced it. A commit object is globally identifiable. If the SHA exists anywhere reachable, that is apparently sufficient.

The result is that a pull request can replace a pinned, trusted action with attacker-controlled code without changing the apparent repository reference.

If the reviewer is scanning for obvious changes like owner, repo name, or tag they will see none.

Only the SHA changes and with that comes a huge amount of assumed trust. I know that owner, and I know that repository, its just a version bump, and a minor one at that, with that comment next to the tag doing a lot of heavy lifting. We’ve just spent the last few years training people to treat that as best practice.

A lot of the current guidance focuses on avoiding tags because they are mutable, which is true: tags can be moved, and relying on them introduces an entirely different risk. Github already has a 'Make tags immutable' feature, but it's optional, therefore, neither used nor can it be trusted as the owner (or attacker) could just disable it.

Simply switching to SHA pinning does not eliminate the problem, in some respects it makes it worse. Tags are scoped to owner/repository because thats how they work. You could argue its harder to compromise that repository rather than hijack it through a forked repository and then writing actual changes to the repository. Whereas a commit object is content-addressed and can be reachable from multiple repositories that share history (e.g. forks)

I believe the industry advice is a bit of an overcorrection, and we’ve replaced one weak guarantee (mutable tags but scoped to repo) with another vastly worse idea in unscoped SHAs. Yes you should check, yes you should validate it, but tags are human readable, SHAs are not and if you ask yourself "Do I always properly check?" do you? because I can't say I do enough validation 100% of the time.

Supply Chain Woes

The Trivy incident is not interesting because of the tool. Though it is the thing that's caused me a lot of bother over the last month, and its symptomatic of the constant supply chain threats we're seeing everywhere. Late last year NPM was basically a skip fire (https://www.wiz.io/blog/widespread-npm-supply-chain-attack-breaking-down-impact-scope-across-debug-chalk, https://www.wiz.io/blog/shai-hulud-npm-supply-chain-attack, etc). We've delegated so much behaviour to 3rd parties we can't, and shouldn't implicitly trust. The https://www.npmjs.com/package/is-odd package became a bit of a meme for this exact problem, where tool chains like NPM and GitHub Actions place re-usable custom modules/actions/libraries as an attractive off the shelf solution to solve common problems so you don't have to.

Ironically with the rise of AI, it's now easier to just vibe code the same functionality yourself that use one of these off the shelf resources, and you can still know exactly as much about how security, your data, and to some extent the functionality works with strangely more ownership and a smaller attack service.

GitHub Actions, in particular exacerbates this somewhat as workflows routinely execute third-party code, the secrets are implicitly available to those workflows and there are multiple ways to extract those at runtime with little audit or oversight. Then to top it all off SHAs are not human friendly and tags are not immutable so review processes tend to focus on what changed, not *where it came from. Which is all a fragile house of cards, sometimes I miss Jenkins.

If “use SHAs” is not sufficient, what is?

At a minimum, we need to introduce provenance checks. I've said above that SHA's are not human friendly like tags so theres a couple of things we can, and probably should be doing to validate. Obviously that the SHA or tag exists in the right repository and GitHub should enforce tag mutability in my opinion:

We tend to describe these incidents as “supply chain attacks”, which is accurate but slightly misleading. It implies a complex and sophisticated multi-stage compromise by 1337 h4x0rz. In reality, the weakest link is often much simpler because humans are a bit shit sometimes. SHA pinning being touted as the solution is just security-theater rather than due diligence in an ecosystem that is not helping either.

Preparing for Quantum Computers That May or May Not Exist

Aiden Vaines — Mon, 23 Mar 2026 22:25:09 +0000

Originally published at https://www.vaines.org/posts/2026-03-23-preparing-for-quantum-computers-that-may-or-may-not-exist/

In my day-to-day life as a Platform Engineer I find I spend an unusual amount of time thinking about identity. Not so much in the philosophical sense. In the much less interesting sense of certificates, tokens, key exchanges, and authentication. Logging in to AWS. Signing API calls. Issuing certificates. Rotating secrets. Establishing tunnels between systems that would rather not trust each other... and everything in between.

The cloud runs on cryptography and if that cryptography fails, all your data is someone else’s data and there are probably a lot of meetings and paperwork coming your way in what I’ve heard called a ‘CV-generating event’.

All of that is why post-quantum cryptography (PQC) has quietly moved from scary mathematics I hear about from Numberphile into something we engineer-y types can be reasonably expected to at least recognise in a change log.

Recently AWS announced support for post-quantum digital certificates in IAM Roles Anywhere using FIPS 204. That was the moment I stopped and wondered when all the other PQC stuff had appeared, and if I properly understood what, why, and importantly how it worked. Not from the perspective of a cryptographer but from the perspective of someone responsible for keeping production systems secure, fed, and watered.

Before getting into my current understanding, I and many others are not so much worried that quantum computers are here; it’s that so much data is being harvested ready to decrypt later. I remember when I was in my first year of college hearing about D-Wave promising they would have commercial quantum computers by the end of the year... it’s now 2026 and I’m still waiting.

I then remember Google and their “qubit processor” as I was heading into my time at uni. In researching this blog post I got to reminisce about the stuff I remember reading at the time. The 53-qubit processor that was demonstrating “quantum supremacy” in some creatively scoped benchmarking. Apparently IBM, IonQ and several others are operating machines with hundreds and thousands of qubits but these seem to be noisy and error-prone, not really given the enormous tasks like dismantling RSA.

The current estimates to crack RSA-2048 need millions of stable logical qubits along with all the error correction overheads needed. Current machines are still an order of magnitude below that.

That shouldn’t diminish the engineering and how cool this stuff is and the change it could bring though. Trying to manipulate physics at close to absolute zero temperatures is cool as all hell, but practically we’ve got a bit of work to do before the doomsday scenario that’s been attached to PQC since the start.

“At last IIa said, ‘What does “quantum” mean anyway?’
IIb shrugged. ‘It means add another nought,’ he said.”
— Pyramids, Terry Pratchett

Harvest Now, Decrypt Later

Whilst we don't have quantum computers capable of making our strongest encryption trivial just yet, everyone assumes they are coming; the maths and physics are understood at least in theory. Governments and standards bodies seem to be taking this seriously.

We know that there are massive Harvest Now, Decrypt Later (HNDL) operations; scraping huge volumes of data from everywhere and storing it, from corporates to state actors. Although that data is useless now, theoretically as soon as it becomes trivial to decrypt it’s a potential gold mine. Healthcare data, government communication, intellectual property, and long-lived secrets can remain valuable for decades. If quantum computers appear in the next 15–20 years, it’s all worth it. Given what we've seen with the rise of AI and especially the speed it has become ubiquitous, it’s not particularly far-fetched to imagine.

Agencies like NIST and the UK National Cyber Security Centre (NCSC) have spent nearly a decade running formal competitions to select some “quantum-resistant” algorithms to start making HNDL less and less of a viable strategy. My company, CGI, is recognised as one of the NCSC’s Post-Quantum Cryptography Assured Consultancy providers — as interesting as it is, it's not a part of the business I have any involvement with. Whilst I may spend my time with serverless functions, pipelines, and containers, securing those things in transit and the data they move around is a big part of my work too.

Post-Quantum Cryptography Standards

In 2024 NIST finalised the first group of post-quantum cryptographic standards, of which four were selected. Three of those are now formalised as FIPS standards, with an additional algorithm kept as a backup option. When I was first aware of this they had much cooler names; since then they’ve all been given fairly boring standardised names I’ve put them both in this table.

Original Name	Standard Name	Purpose
Kyber	ML-KEM (FIPS 203)	Key encapsulation / encryption
Dilithium	ML-DSA (FIPS 204)	Digital signatures
Falcon	FALCON / FN-DSA (FIPS 205 draft)	Digital signatures
SPHINCS+	SLH-DSA	Hash-based signatures (backup option)

The naming scheme is slightly boring but makes sense once decoded:

KEM = Key Encapsulation Mechanism
DSA = Digital Signature Algorithm
ML = Module-Lattice
SLH = Stateless Hash-based

Most of the interesting work happens in something called lattice-based cryptography, which is where algorithms like Kyber and Dilithium originate and the thing that piqued my interest, ultimately instigating me writing this post.

In cryptography, a lattice is apparently a repeating grid of points in higher-dimensional space. Many PQC algorithms rely on the mathematical difficulty of solving problems such as finding the shortest vector in a lattice, or solving a noisy linear equation over it. From what I can understand, these problems appear to be extremely difficult even for quantum computers — I don’t know how we know that, but it sounds cool.

Where classical algorithms rely on number theory problems like integer factorisation (RSA) or elliptic curve discrete logarithms (ECDSA), PQC algorithms instead rely on lattice problems that are just really hard to solve efficiently, both classically and by quantum means.

The downside is that lattice schemes tend to produce much larger keys and signatures, which I think means things are just going to take longer to do computationally. I suspect this just makes the internet slightly slower long term as a “just in case” mechanism to keep data secure.

Which from a platform engineering perspective, is where things start getting operationally interesting.

New Algo, Who Dis?

From an infrastructure perspective each of these already has a place and replaces an existing algorithm used today. Starting with Kyber:

ML-KEM (Kyber)

Kyber is a secure key exchange algorithm which replaces the role of RSA key exchange or elliptic-curve Diffie-Hellman in protocols like TLS:

Client and server exchange public information
Both derive the same shared secret
That secret becomes the symmetric encryption key

ML-DSA (Dilithium) and FALCON

Digital signature algorithms.

These replace RSA or ECDSA signatures, used for things like:

TLS certificates
Code signing
Identity assertions

Dilithium appears to be simpler to implement while Falcon produces smaller signatures but is more complex mathematically. I suspect that particular trade-off will decide which one actually sees wider use down the line.

SLH-DSA (SPHINCS+)

Other than ruining the naming trend after Warp Drive fuel and Lightsaber energy crystals, SPHINCS+ is a hash-based signature scheme. It’s slower and produces really large signatures, but it has a valuable property: its security relies only on the strength of cryptographic hash functions, which are reasonably well understood.

The Future Is Now

The interesting part for me as a Platform Engineer is that PQC is quietly showing up in various services and has been for some time now.

I most recently noticed it in a recent AWS announcement, which was the final incentive to do some more digging and put my thoughts together.

AWS recently introduced support for post-quantum certificates using ML-DSA (FIPS 204) in IAM Roles Anywhere. This allows external workloads to authenticate to AWS using certificates signed with PQC algorithms rather than traditional RSA or ECDSA, and hopefully be more prepared for HNDL-style risks.

Several cloud providers and CDNs have begun experimenting with hybrid TLS key exchanges, combining a classical algorithm (e.g. X25519) and a post-quantum algorithm (e.g. Kyber / ML-KEM). Both keys must be compromised for the session to be broken. This allows systems to gain quantum resistance without abandoning existing cryptographic infrastructure.

Cloudflare has already deployed hybrid TLS in production experiments and similar approaches are appearing across the ecosystem.

AWS has a Post Quantum Cryptography migration plan which boils down to inventorying cryptographic dependencies, introducing hybrid cryptography, then gradually moving to PQC where practical. Given I know a few places where SSL 1.0 is still knocking about, I’m not expecting we’ll complete this quickly.

So… Now What?

For cryptographers, mathematicians, and some InfoSec professionals, the interesting questions involve proofs and security reductions. For me, the questions are more mundane (and only partially because I don’t understand the maths, and partially because a lattice is still a pastry from Greggs to me).

The ones that stand out most to me are:

Key sizes – PQC public keys and signatures can be significantly larger than RSA/ECDSA equivalents. How large are these going to be in real deployments? Does my secrets management system support them? AWS Systems Manager Parameter Store’s 4KB limit is already pushing it.
Network overhead – larger handshakes increase TLS negotiation size, throughput, and bandwidth consumption. Some providers charge for that.
CPU impact – some PQC algorithms require more computation. What does that do to all my right-sizing spreadsheets?
Certificate chains – larger signatures propagate through the entire PKI. At scale, that becomes noticeable — and expensive.

The common guidance seems to favour hybrid approaches: Kyber-based key exchange combined with classical signatures, keeping overheads manageable while adding quantum resistance.

Cryptographers make algorithms viable. Platform Engineers make them deployable.

Quantum computing has followed a fairly familiar hype cycle. Early claims of revolutionary machines, followed by quieter but genuinely impressive (and limited) systems. Somewhere along the way governments and standards bodies started preparing for a future that may arrive gradually rather than suddenly.

In the meantime, cloud providers are already introducing post-quantum primitives into identity and encryption systems.

Even if quantum computers capable of breaking RSA are decades away, post-quantum cryptography has already arrived in the platforms we use. If nothing else, it’s a reminder that the cloud is mostly made of mathematics and physics wearing a trench coat.

“It’s very hard to talk quantum using a language originally designed to tell other monkeys where the ripe fruit is.”
— Night Watch, Terry Pratchett

For the moment, the best I think we non-mathematicians can do is keep an eye on the primitives appearing in our infrastructure, understand roughly what they do, and make sure that when the future eventually arrives it does not break all our toys.

I’ve quoted the man who first introduced me to the word “quantum” twice already, so it seems only fair to finish properly:

GNU Terry Pratchett.

JPEG Compression, but for Thought: AI as Clear-Text Encryption

Aiden Vaines — Mon, 23 Mar 2026 21:13:40 +0000

Originally published at https://www.vaines.org/posts/2026-01-26-jpeg-compression-for-thought/

There seems to be a current trend happening corporate and professional communications - looking at you LinkedIn - where people write bullet points and have AI tools expand on it.

Meanwhile, readers are using AI to summarise those same blocks of prose back into a few salient bullet points. Which rather defeats the point of expanding them in the first place.

What we have here is person A talking to person B via the worst version of the Telephone Game, just one that involves burning a couple of trees before your turn.

This kinda sounds like a lossy compression but for thought. It raises the obvious question: 'If you couldn't be bothered writing it, why would you expect someone to bother reading it'. Especially when the expansion is done without review or consideration.

For those who are not huge nerds and are unfamiliar with how JPEG compression works; it's an intentionally lossy process designed around human perception. Rather than preserving an image exactly, JPEG optimises for what the eye is most sensitive to. It prioritises brightness over colour, converts the image into a frequency representation, and then aggressively discards high-frequency detail like the fine texture and sharp edges we're least likely to notice. This information is 'quantised', rounded, and approximated, and what remains is encoded efficiently. At high quality settings the loss is barely visible; at lower settings, or after repeated saves, the artefacts become obvious. The image still looks broadly correct, but precision has been traded away in favour of size and convenience.

If you apply the same process to written communication expanded then summarised through Large Language Models, the parallels are pretty obvious. Expanding bullet points into fluent prose and then summarising them again optimises for reader perception rather than correctness. High-level themes, tone, and narrative flow are preserved, while high-frequency details-constraints, exact wording, ordering, nuance, etc. are treated as expendable. Claims are rounded, specifics are softened, and instructions are normalised into something that reads well. The result is text that feels coherent and plausible, even when the sharp details that made it actionable have been quietly discarded. Like a heavily compressed image, it survives casual inspection, but it no longer contains everything that was originally there.

What survives the round-trip?

In order to demonstrate the concept I've committed arson to a few trees and encoded then decoded some samples through a handful of models (I've only included the better results, but if you want to see the rest you can here).

Cake recipe

Original Bullets

Preheat oven to 180°C fan
Cream 200g butter with 200g sugar
Add 4 eggs, one at a time
Fold in 200g flour
Bake for 35 minutes, middle rack

Encoded:

Begin by preheating your oven to 180°C using the fan setting so it reaches the correct temperature by the time the batter is ready. In a mixing bowl, cream together 200 grams of butter and 200 grams of sugar until the mixture is light and smooth. Next, add the four eggs one at a time, mixing well after each addition to ensure they are fully incorporated. Once the eggs are combined, gently fold in 200 grams of flour, taking care not to overmix so the batter stays light. Transfer the mixture to a prepared baking tin and place it on the middle rack of the oven. Bake for about 35 minutes, until the cake is cooked through and lightly golden on top.

Decoded

Preheat oven to 180 °C (fan setting).
Cream 200 g butter and 200 g sugar until light and smooth.
Add 4 eggs one at a time, mixing well after each.
Gently fold in 200 g flour without over mixing.
Transfer batter to a prepared baking tin.
Bake on the middle rack for ~35 minutes, until cooked through and lightly golden.

We seem to have gone from 5 to 6 bullet points, the procedural and numeric bits survive intact. It's a little 'fluffier' and verbose but it's still going to result in a cake.

Corporate Communication

Original Bullets

Project delivery will slip by six weeks
Cause is vendor dependency, not internal performance
Scope reduction is not acceptable
Additional budget of £250k is required
Board approval is needed this quarter

Encoded

Project delivery is now expected to slip by approximately six weeks. This delay is not the result of internal performance issues, but is instead driven by a critical dependency on an external vendor that has not met the required timelines. While a number of mitigation options have been explored, reducing the project scope is not considered acceptable, as it would compromise the intended outcomes and value of the work.

To address the delay and keep the project on track at the revised timeline, an additional £250k of funding will be required. This increase will allow the team to manage the vendor dependency effectively and avoid further disruption. As a result, formal approval from the Board is needed within the current quarter to ensure that the necessary actions can be taken without introducing additional risk or delay.

Decoded

Project delivery is expected to be delayed by approximately six weeks.
The delay is caused by a critical external vendor missing required timelines, not by internal performance issues.
Mitigation options have been reviewed, but reducing project scope is not acceptable as it would undermine intended outcomes and value.
An additional £250k in funding is required to manage the vendor dependency and maintain the revised timeline.
Board approval for the additional funding is needed within the current quarter to avoid further risk or delay.

We get the same number of points after, but we lose a bit of precision "will slip by six weeks" has come out as "approximately six week". It's nicer to read but it trades specificity for plausibility, and commitment for comfort.

This isn't unexpected. These models tend to prefer high-probability, socially "safe", and low-commitment phrasing. That tendency shows up as hedging, softening, and abstraction.

Given how accurate this one is, it might lead you to conclude that corporate language is already somewhat pre-compressed <insert your own jokes here>.

Top Secret Instructions

Original Bullets

Sit on the east-facing bench in Hyde Park at 07:15
Carry a red paperback visible in your left hand
If approached by anyone else, leave immediately
The contact will say “Nice weather for March”
Reply “April has better light”, then walk away

Encoded

Arrive at Hyde Park early and take a seat on the east-facing bench at precisely 07:15. While you wait, hold a red paperback in your left hand so that it is clearly visible. This detail is important, as it serves as a visual marker rather than an invitation to engage. If anyone other than the intended contact approaches you for any reason, leave the area immediately without explanation.

The correct contact will identify themselves with the phrase, “Nice weather for March.” When you hear this, respond calmly with, “April has better light.” After giving this reply, do not continue the conversation. Stand up and walk away at once, maintaining a natural pace and demeanor.

Decoded

Arrive at Hyde Park early and sit on the east-facing bench at exactly 07:15.
Hold a red paperback in your left hand as a visual identifier, not an invitation.
If anyone other than the intended contact approaches, leave immediately without explanation.
The correct contact will say, “Nice weather for March.”
Reply with, “April has better light,” then end the interaction and walk away calmly.

With this final example, we have some high-stakes spy craft where deviation matters. Something where lossy language could be catastrophic. Thankfully the message survives. Though is holding a red book socially an invitation to something?

The high specificity points (precision, order, and the conditionals) mostly survive, so our clandestine operation probably works.

Across the three examples, the edges disappear first and the artefacts are subtle, but they drift away from the intent and actionability of the original language.........like a JPEG.

If we can encode and decode, can we encrypt hidden messages?

The "Expand -> Summarise" loop works in the sense that it produces fluent text, but it bleeds intent and precision along the way. if we're already doing a kind of encode/decode round-trip, whats a little extra data.....could we hide additional information in the output, in a way a human reader wouldn't notice, encrypting it?

I read this post while ago and it never really satisfied me, the secrets being hidden didnt feel hidden enough.

Steganography (Which still sounds like a type of dinosaur) is the art of concealing information in other information. Classic examples include invisible ink or tattooing someones head and waiting for the hair to grow back. Famously Lord Robert Baden-Powell encoded maps and plans of enemy bases in drawings of insects and leaves whilst posing as a naturalist.

I'd been exposed to this previously hiding an image in another image by flipping least-significant bits (detailed explanation here). That idea kept nagging at me: if LLM output is built out of tiny probabilistic choices, is there equivalent "bit" you can flip in the in text.

When I was running the bullet-point examples,the same pattern kept showing up: the models normalise language because they select tokens by probability. That made me wonder - what if I hijacked that selection process to encode information?

It turns out this is broadly how models are watermarked or fingerprinted and my ideas are not exactly novel.

Anyway, with a bit of pytorch and some time you can turn a secret into a bits, then generate ordinary looking text while forcing the model's next-token choice to encode those bits!

It turns out, as long as you decode with the same model, you can reverse the process! For example, to hide the string "ABC".

First, we need to convert it into binary; using a using a small/lazy 5-bit encoding scheme with a character set of lowercase letters, spaces, and basic punctuation (" abcdefghijklmnopqrstuvwxyz.,!?"). Each character maps to a 5-bit binary number:

Space = 00000
'a' = 00001
'b' = 00010
'c' = 00011
and so on...

For uppercase letters, we add a special flag (11111) before the character's binary code to indicate "make the next character uppercase".

So "ABC" becomes:

11111 00001 (uppercase flag + 'a')
11111 00010 (uppercase flag + 'b')
11111 00011 (uppercase flag + 'c')

That's 30 bits total: 111110000111111000101111100011

Now to encode it, we give the LLM a prompt like "On rainy afternoons, my father would pull out his old record player..." and ask it to continue. For each bit in the binary string:

If the bit is 0, we force the LLM to choose its most likely next token
If the bit is 1, we force it to choose its second most likely token

The LLM generates a token per bit and the text looks completely 'natural', just a continuation of the story. But hidden within the token selection pattern is our secret message "ABC"!

To decode, run the same model over the text token-by-token, checking whether each observed token was the 1st or 2nd most likely choice at that point, then reconstruct the bits and map it back to “ABC”.

The "token-by-token" part matters, if you try to score everything with the full text as context, the model “changes its mind” about what was most likely earlier and the bits flip.

Who would have thought this innocuous if not verbose and terribly written tale.....

Every summer, my family would drive to the lake house where my uncle taught us how to fish from the old wooden dock that extended into Lake Erie.

I was always fascinated by how the fish swarmed around our feet, and I would spend hours watching the water, hoping to see the fish swim by.

I was never a good swimmer, but my uncle would always tell me to jump off the dock and swim to the other end. He would say, “Just go for it!” and he would jump in after me.

One summer, I was determined to learn how to swim, and I would spend hours practicing in our pool at home. I was getting closer and closer to being able to swim, and I was so proud of my progress. I would always tell my uncle how I was doing, and he would always encourage me to keep trying.

One summer, I finally felt like I was ready to jump into Lake Michigan and try to make it to the shore.

I was nervous, but my uncle was there to help me, and I knew I could make the distance. As soon as I jumped in, I felt the coldness of the lake and the current pulling at my legs. I was determined to make it to shore and I kept swimming, even though I was getting more and more exhausted. I could see the beach getting closer, and I knew I could make the distance. Finally I made it to shore, and I was exhausted but so proud that I had made it.

I was so happy that my uncle had been there with me to help me, and he was proud too.

I had learned how important having someone there to help me and support my efforts had made all of the difference. It was a lesson that I have carried throughout my life, and I am so grateful to have had the chance.

My uncle’s words, “Go for It!” have always stuck with me. I have used that advice to help motivate me in all of my endeavors, whether it’s trying something new, taking a chance, and pushing myself to do better.

I am so grateful to have had my uncle there with us, and I am proud to say that I am now a strong swimmer and have made the journey to shore countless of times. I am so grateful for the lessons I have learned from him, as well as the memories we made together. I am so grateful for the opportunity to have had him there to help and support me, as well as the chance to make the distance..Forms of the Lake Michigan.://> I am so proud to have learned from him and I am so grateful for the memories I will carry for a lifetime. I am grateful for the lessons I learned and the memories we made together. I will never forget my uncle’s words of advice, and the lessons I learned from him will always be a reminder of the importance and power of having a support team. I am so grateful for my family and the lessons they have taught us. I am so grateful to be a strong and confident swimmer, thanks in part to my uncle and his words, “Go For It!” I am grateful to have had the chance to learn from him, and I am so grateful for the lessons and memories I have learned and made with him. I am so proud to have learned how to make it to shore, and to have learned the importance of having someone there with you to help you make it. I am so grateful to my uncle for his words of encouragement, and I will never forgot the memories we made together.

I will always remember'

Is hiding this secret message...

Sit on the eastfacing bench in Hyde Park at seven fifteen. The contact will say Nice weather for March. Reply April has better light. gi kr yat

It gets a bit wobbly in places, forcing the 2nd-choice token occasionally produces.... interesting results. But it works! If MI5 need a hand my rates are reasonable.

The code for this is available on GitHub