I think every developer goes through a phase where they accept errors as if errors were just a natural part of the system.
The API returns 400.
The API returns 422.
The API returns 500.
The terminal breaks.
Docker is not running.
The Node version is not installed.
The payload came with the wrong property name.
The type came wrong.
The route did not find the resource.
Playwright broke because a selector changed.
ESLint complains about something that even --fix cannot fix.
And we look at all of this as if it were normal.
But at some point, I started asking myself: why is this normal?
Why does a system receive information, identify that the information is almost correct, know exactly why it failed, often even know how to fix it, and still prefer to return an error to the user?
That is when the idea of Intent-based Healing became inevitable to me.
Not as a feature.
Not as a nice fallback layer.
But as a shift in mindset.
A system should not be designed only to validate data. It should be designed to understand intent, recover context, transform imperfect information into useful information, and learn from every recoverable failure.
When I started combining Intent with Self-healing, I noticed something curious: if you search for each concept separately, you can find content. You find things about intent recognition, intent-based systems, chatbots, self-healing infrastructure, DevOps, observability, Kubernetes, health checks, retries, and circuit breakers.
But when you combine a new concept with either of them, you find almost nothing.
Intent-based Healing?
Almost nothing.
Self-healing Router?
Almost nothing.
Human-in-the-Healing-Loop?
Almost nothing.
Interactive Healing?
Almost nothing.
Anti-fragile Intent-based Healing?
Then it is over. You are basically alone.
And that is a very good sign.
Because it means there is an entire architectural region still being treated as if it were only “error handling”, when in reality it is a new way of thinking about systems.
The first thing that changed my mind
Was realizing that opaque errors are architectural laziness.
An opaque error is when the system knows something went wrong, but returns a message that helps nobody. Not the user, not the developer, not the agent, and not the next execution.
“Internal Server Error.”
“Invalid payload.”
“Field required.”
“Unexpected type.”
“Command failed.”
“Module not found.”
This kind of response is not observability. It is a black hole.
The system received an intention, failed to execute it, and returned only the death certificate of the request.
But where is the analysis?
Where is the suggestion?
Where is the correction attempt?
Where is the learning so that this error does not happen again?
My turning point started with a very simple rage: I sent a POST request to a /users route, misspelled the name of a property, and my own API returned an error.
At that moment I thought:
My God, I sent a POST request to the
/usersroute. The API should know that those data are meant to create a user. It only needs to analyze the values and use them.
That was one of those small moments that changes the entire architecture.
Because I realized the API was treating the payload as a dead contract, not as living information.
If it received
userNameinstead ofusername, why should I return an error?If it received
phoneNumberinstead ofphone, why should I reject it?If it received
"32"instead of32, why should I pretend I did not understand?If it received
gmail.con, why can I not infer that it probably meantgmail.com?If it received a phone number without an area code, why can I not ask, infer, or complete it based on context?
If the intent is clear, the system should try to heal the execution before giving up.
From that point on, I started looking at errors differently.
An error is not only a failure.
An error is evidence.
An error is raw data about an intention that has not yet found its correct form.
An error is an opportunity to create a healing function so that the same failure never appears that way again.
That is the anti-fragile part.
A robust system withstands errors.
A resilient system recovers from errors.
An anti-fragile system improves because of errors.
And that is what I started pursuing.
Not only “do not break”.
Not only “try again”.
But transform every error into a new capability of the system.
If the user wrote a property name wrong, the system learns an alias.
If the type came wrong, the system learns a safe conversion.
If a field was missing, the system learns a minimal question.
If a command failed, the system learns the recovery command.
If a dependency does not exist, the system learns how to install it.
If a Playwright selector broke, the system learns how to search for another path.
If a tool returned a message saying exactly what to do, the system learns how to execute it by itself.
That is the difference between infrastructure self-healing and Intent-based Healing.
Most of the content I found about self-healing was DevOps. Did the service go down? Restart it. Did the container die? Spin up another one. Did the health check fail? Reschedule it. Did the pod freeze? Kill it and recreate it.
That is useful, but it is very limited.
Because, in practice, the system is only returning to its previous state. It did not understand the failure. It did not improve semantically. It only restarted.
It is as if the solution for every pain were to sleep and wake up again.
But Intent-based Healing does not only want to restore the process.
It wants to restore the intention.
The user’s intention was to create a customer.
The user’s intention was to search for a sale.
The user’s intention was to run a command.
The user’s intention was to open a page.
The user’s intention was to execute a flow.
If the intention is preserved, the system can try different paths until it finds a valid way to complete it.
And when it cannot complete it alone, it should not simply return an error. It should negotiate observability with the user, the developer, or the responsible agent.
That is where the idea of Adaptive Observability Negotiation came from.
Instead of returning a dead error, the system returns a useful question, a proposed correction, a hypothesis, a list of alternatives, or a partially recovered action.
The AI called this Human-in-the-Healing-Loop.
And it makes sense.
Because not every healing needs to be automatic from day one. Sometimes the system still does not know the best correction. But it can ask. It can show what it understood. It can request confirmation. It can transform human collaboration into a new healing rule.
This creates Interactive Healing.
The system fails, asks, learns, records, and fixes the entire class of that error.
After a while, you begin to realize that you are no longer “handling errors”. You are building an operational memory for the system.
And then comes the point of no return: when you truly enter self-healing, you cannot stand seeing the same error twice.
Because the second time already feels like incompetence.
The first time is discovery.
The second time is lack of automation.
That was when I built a self-healing router.
The idea was simple: if the user calls a route, sends a payload, or executes an intention in an imperfect way, the router should not only reject it. It should try to understand the destination, identify aliases, map fields, correct types, suggest transformations, and forward the intention to the correct behavior.
Then came tools for agents that assembled themselves and corrected themselves.
The tool did not need to be born perfect. It could carry schema, examples, description, constraints, and correction mechanisms. If the agent called it incorrectly, the system could adjust the call. If an argument was missing, it could ask only for the required argument. If the type came wrong, it could try to convert it. If the response broke, it could generate a new version of the tool.
Then I built a PoC to see whether I could fix Playwright code automatically.
It worked.
The selector broke, the automation analyzed the error, looked at the context, and tried to rewrite the action.
That opened another path.
Because if I could heal a Playwright test, I could also heal scripts, commands, pipelines, linting, project setup, local environments, and development workflows.
So I built a script that solved ESLint errors that even --fix could not solve.
Then I built an ObservableAgent that kept reading my project terminal.
For each type of error that happened, I copied the error and threw it into a file where I mapped that error to the command or transformation required to fix it.
In practice, I was creating a healing base.
Known error in.
Known correction out.
New error in.
The system observes, asks for help if needed, learns, and records.
Over time, it becomes a system that does not merely execute commands. It understands the environment.
Think about it: it is much easier for your system, instead of returning an error because Docker is not running, to simply execute a command to start Docker, or at least detect that state and offer the correct action.
It is much easier for the system, when it realizes a Node version is not installed, to install the version.
My latest case was a change in nvm.
When you try to use a version that is not installed, it returns a message saying that the version is not installed and even gives you the command to copy and paste.
That pisses me off.
Because the system knows the problem.
It knows the solution.
It knows the command.
It knows I will probably copy and paste it.
So why does it not execute it?
Why does this manual choreography still exist?
So I just asked the AI to create the pipeline so that all commands would execute without ever throwing that kind of error again.
This kind of thing seems small, but it changes your relationship with programming.
Because you start seeing every friction as an architectural bug.
It is not only the end user who suffers from dumb systems.
The developer suffers too.
A CLI that knows the correct command but tells you to copy and paste it.
A framework that knows which dependency is missing but only prints the error.
A runtime that knows the port is already occupied but does not offer to kill the process.
A linter that knows the pattern but does not fix it.
A build that breaks because of stale cache but does not clear the cache.
Docker is not running, but nobody tries to start it.
The database is offline, but nobody tries to bring it up.
An environment variable is missing, but it could be inferred from .env.example.
All of these are microfailures we have normalized.
But after you enter this mindset, they become unbearable.
You start thinking:
Why am I reading this?
Why am I copying this command?
Why am I repeating this correction?
Why does this tool not learn?
Why did this error not become a function?
This is where self-healing stops being “recovery” and becomes a development method.
Every manually fixed error should become a reusable unit.
A function.
A behavior.
A patch.
A mapper.
An alias.
A command.
A rule.
A memory.
A recovery policy.
An interactive question.
A fallback flow.
A new test.
An observability event.
A learning.
And if you work with CRM, this becomes even more obvious.
I only went deep into self-healing because I was building a CRM.
And in a CRM, any request can be the possibility of a sale.
I am not going to return an error just because the user wrote the property name wrong.
I am not going to lose a commercial intention because the payload came with the wrong type.
I am not going to throw away an opportunity because the user wrote “zap” instead of “whatsapp”.
I am not going to make a lead disappear because the phone number came without formatting.
I am not going to return 500 to someone who is trying to buy.
If there is intention, there is value.
And if there is value, the system has the obligation to try to recover.
That led me to another question:
Am I the only one who values information more than data?
Because if you return an error without even trying to use those values, you did not think about the information as a whole.
You only looked at the appearance of the data.
You saw a payload outside the contract and discarded it.
But that payload may be carrying exactly the information you need.
It is simply in a different form than the one you expected.
It is like receiving a letter with one misspelled word and throwing the whole letter away without reading it.
It is like a salesperson hearing a customer pronounce a product name with an accent and replying: “invalid input”.
It is like an attendant hanging up because the person forgot one digit of the ZIP code.
That is not an intelligent system.
That is a bureaucratic system.
Intent-based Healing is the opposite of that.
It starts from the premise that intention is more valuable than the initial shape of the payload.
The shape can be corrected.
The intention must be preserved.
And the more the system preserves intention, the more anti-fragile it becomes.
Because every real-world error becomes operational training.
Not necessarily model training.
System training.
Schema training.
Router training.
Parser training.
Alias training.
Policy training.
Agent training.
Observability training.
Recovery command training.
With every failure, the system gains a new defense.
With every inconsistency, a new normalization.
With every crooked payload, a new transformation.
With every opaque error, a new explanation.
With every human intervention, a new automation.
That is anti-fragility.
It is not merely surviving chaos.
It is using chaos as raw material.
The second-to-last major insight
Came when I discovered Linear Types through the Austral programming language.
The project had been stopped for almost a year. It did not even have a package manager. I built a more current website with more content at https://austral.codes/. In one week, I VibeCoded practically their entire roadmap and created a package manager using npm underneath.
But the idea of forcing the programmer to write code to destroy a value after it had been used once did not make sense to me.
A Linear Type only allows a value to be used once.
So my question was inevitable:
How does it not self-destruct?
Why do I need to manually remember to destroy something that, by definition, exists to be consumed only once?
To me, that became another concept: LinearAutoDestroy.
But that is for another article.
The point here is that the logic is the same.
If the language knows that the value can only be used once, it should participate in its destruction.
If the CLI knows the correction command, it should execute it or offer execution.
If the API knows the intent of the route, it should try to adapt the payload.
If the agent knows it made a wrong tool call, it should correct the call.
If the system knows that failure has already happened, it should never allow it to happen in the same way again.
That is my standard now.
A new error is acceptable.
A repeated error is technical debt.
An opaque error is disrespect toward intention.
And a recoverable error returned to the user is wasted value.
That is why I call this concept:
Anti-Fragile Intent-based Healing
Anti-fragile because it improves through failure.
Intent-based because intention is the central unit.
Healing because the correct response is not only to report the problem, but to try to restore the execution, learn from the failure, and prevent recurrence.
In the end, this became a system philosophy:
Do not return an error if you can fix it.
Do not ask everything again if you can infer it.
Do not interrupt the user if you can negotiate.
Do not throw away information because it came in an unexpected format.
Do not restart a service and pretend that is intelligence.
Do not accept the same error twice.
Do not treat a payload as garbage just because it did not respect your schema.
Do not treat the user as a programmer.
Do not treat the developer as a manual operator of obvious commands.
Make programming work for you.
The lazier the developer is, the more they automate their life: https://dev.to/fullagenticstack/vibe-coder-senior-manifesto-do-programador-vagabundo-1bh2
My first AI chatbot, in 2025, was naturally Intent-based, and naturally a question came to my mind: “if we do not train the AI, how will it correct its wrong answers?” That was the first time I saw the concept of self-healing. From that point on, I used it as the foundation for all my systems, and GPT called it Intent-based Healing, because 99% of the content I found was only about DevOps. What made me curious was that their solution was basically to restart the service. With my continuous use of self-healing and the creation of Adaptive Observability Negotiation, which the AI called Human-in-the-Healing-Loop, making Interactive Healing possible, I realized that I was no longer returning errors: the user’s intention would only fail if the user did not collaborate. Every error that could appear became an opportunity to learn how to correct that error. After creating a self-healing function so that it would not happen again, I caught myself thinking: DAMN, I WAS SO JUNIOR! How did I let the same error happen twice? How did I return error 500?!
Top comments (0)