I doubt I am the first to come up with this concept, but I am probably the first to name it.
Drift to Determinism (DriDe - as in "DRY'd" - Don't R...
For further actions, you may consider blocking this person and/or reporting abuse
Glad you're posting stuff like this on DEV π Really appreciate the observe β extract β codify core loop. I need to get better about doing that myself
Agreed
Glad you enjoyed the article and hope the concept serves you well in whatever you build going forward!
Finally, a constructive contribution that doesn't suggest "guardrails" or "getting better at prompting" to solve to the dilemma of unrealiable results of costly AI reasoning allegedly making us go faster (in the wrong direction). Thanks Graham!
glad that you found it useful bud! (now to go catch up on your articles as I have not been on here much :-)
Hope you are well!
I like it, I think it's just "obvious" enough that some people might miss it.
That being said the self healing stuff in interesting. Personally I'd still rather be in control and have the LLM create a set of tests that send warnings. Then I have the choice to get the LLM involved again if I want to (and I probably will want to). I just don't think having the LLM in the system after it's done tinkering works for me.
Now in the future I think we will have specialised ML systems to handle all sorts of things and the default interface to talk to those systems will be LLMs. I just think it's a bit more long term than people like to admit.
I realised I wrote a silly comment but never got around to writing a real one. So here's a real one for the algorithm π
Obviously the most safe version alerts you before falling back and waits for feedback. I think it is just my bias that a lot of companies couldn't stomach that (but as i said that I am actually thinking how many processes where you could risk a little non determinism would also be timing critical).
On reflection - depends what the process is. If it goes to a high value customer, yeah, give me the controls. If it outputs a load of reports, let it try and self heal and I can watch the notification later to know to look at them more closely!
Replying to thank you and for the algorithm :-)
I am building a whole system to actually enforce this concept in my spare time. I truly believe this is where agentic models and orchestration is heading.
If that sounds interesting to you, drop me a follow as I will be sharing learnings from that here.
Hope you enjoyed the article!
Interesting perspective. Your math is bothering me, though.
Isn't this the probability that all of the LLM calls are correct?
I think the calculation for your example should be:
0.00001 Γ 10,000 = 0.1(so 0.1 failures per 10,000 workflow runs).It is assuming compounding effects, which to be fair is a little too aggressive as most workflows are maybe 200-500 steps. But as our conversations get longer, tool calls get greater, the principle holds true. Chances of failure are multiplicative across complex systems, not cumulative.
itβs something iβm struggling to describe but you put it in words accurately to represent it.
my gripe with AI is itβs indeterministic but yes, right now every one of us can build the deterministic tools we need ourself out of them.
This crystallisation concept resonates hard. I've been living a version of this with programmatic SEO β started by using a local LLM to generate analysis for thousands of stock pages, and every iteration the goal is to figure out which parts of the content pipeline can become pure template logic vs what genuinely needs language model reasoning.
The math on compounding failure rates is the part most people underestimate. Even at 99.9% per step, a 500-step workflow drops to 60% reliability. The invoice reconciliation example makes it visceral.
One thing I'd push back on slightly: the shadow mode phase might be harder than it sounds in practice. Running two systems in parallel while comparing outputs requires its own infrastructure. For smaller teams the pragmatic version might be more like log everything the AI does, review weekly, replace the most predictable patterns first.
Yeah valid point on the pragmatism, but also in a smaller team you tend to put the time in for each job / client / product so you can probably be a little less precious about full automation and accuracy as there is more attention on each step of a process. As with anything it is a balance, 100% agree!
That's a really good nuance. At smaller scale you can afford the human-in-the-loop review that catches drift before it crystallises. The danger zone is that middle ground β when you've grown past manual review but haven't yet invested in automated guardrails. That's exactly where drift compounds silently.
We hit that inflection point around 10k pages. Before that, I could spot-check enough to feel confident. After that, logging everything and reviewing patterns weekly (like you described) was the only thing that kept quality from quietly degrading. The key insight from your article that stuck with me is that determinism isn't about removing AI β it's about knowing exactly where the non-deterministic parts live.
What you describe feels a bit like using AI as a discovery engine, not the final execution layer. Let the LLM explore the solution space, observe the pattern, and then progressively replace the repeatable parts with deterministic code.
In other words: AI to figure things out, code to make them reliable.
If this approach becomes common, the real skill in the next few years might not be βusing AI everywhereβ, but knowing where AI should disappear from the system.
That is exactly how I look at it. Discovery and self-deprecation is what the LLM is for, with code / determinism for scale and robustness. Spot on (and simpler explanation)! <3
The concept of drifting toward determinism is a great way to look at system reliability. In most projects, technical debt starts accumulating precisely when things become less predictable and more 'random' due to quick fixes. Moving back toward a deterministic approach usually requires a lot of discipline in the early stages, but itβs the only way to build something that doesn't break every time you push a minor update. Itβs definitely a mindset shift that more teams need to adopt.
Love this, the "discipline in early stages" is the key bit as once you get a certain distance in it becomes much more difficult to start trying to add deterministic steps rather than guardrials.
The shadow version idea is the part that interests me most β but in practice, how do you decide when a deterministic tool is "good enough" to replace the AI fallback? Seems like that threshold is where most of the engineering complexity hides.
Definitely is complex, even if the principle is simple.
Put it like this, the whole system spec to make this work (at an enterprise scale) is around 90 pages!!
But they key point is that you treat the deterministic part as you would any code - unit tests, tests on old data and known outcomes etc. So the deterministic part would get the same level of review, verification and scrutiny as any code that would go in your codebase (except we can "shortcut" some parts if we have enough edge case data, historic data etc. that we can validate against).
And as always you can always fall back to AI on poor inputs that don't match your deterministic code so we get a lot of robustness from that as a secondary backstop.
Also you can remove 95% of the complexity by just getting an AI to point you at where it thinks it can automate, and then working with it to build the automation and when you are happy, just switch it in!
Great read!
Thanks! <3
The DriDe concept maps well to what I've seen building AI-powered apps. Every time I add a feature that uses LLM output, there's a natural tension between letting the model be creative and making the output predictable enough that the UX doesn't break.
The practical solution I've landed on is layered constraints β let the model generate freely, then run the output through deterministic validation before it hits the user. Structured output schemas, regex guards on critical fields, fallback defaults. The AI handles the hard creative work, the deterministic layer makes it reliable.
Curious whether your framework accounts for the cost of determinism β sometimes the "drift" is where the value is, and locking it down too early kills the product's differentiation.
Deterministic validators are great and, despite how I phrased it, will probably exist in most systems.
I merely mean that you shift your thoughts from "how do I guard against mistakes / hallucinations" to "how do I get rid of AI wherever I can".
The way I am approaching it is via blackboarding, an idea my CTO floated a while back but the tech wasn't ready for. Deterministic atoms, watching a central board can create emergent workflows that can, in theory, be all deterministic without losing flexibility.
Still a theory on that side of things, but in general I think 95% of what most people are using AI for can definitely be improved to the point you halve or even quarter AI usage without losing creativity / becoming too rigid.
I like the layered constraints, that is an equally complex problem space that is fun to work in / on!!
Refreshing this mature take on AI use
Great job!
thanks!
I love the idea. I do this manually constantly, because I don't quite trust that AI will effectively identify and DRY up these processes well enough across systems. Maybe within a skill, but architecting across skills is my current goal. e.g. Need prospect enrichment information for processes x, y, and z? Use the single mcp that does that which looks at our cache first before hitting a waterfall of potential enrichment services, all deterministically, before reverting to an AI SearchMonster.
But, I like the idea... you'd think this would come built in with the agent skills. But alas, I guess their goal is to drive token usage rather than reduce it.
After reading this post, AI is definitely still in its infancy. It also opened my eyes about it being non-deterministic. Even if one process slips up, that could be costly and terrifying!