<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rudson Kiyoshi Souza Carvalho</title>
    <description>The latest articles on DEV Community by Rudson Kiyoshi Souza Carvalho (@rudsoncarvalho).</description>
    <link>https://dev.to/rudsoncarvalho</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F151609%2F89dcc1f7-ea31-49e0-870e-83221de8c418.jpg</url>
      <title>DEV Community: Rudson Kiyoshi Souza Carvalho</title>
      <link>https://dev.to/rudsoncarvalho</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rudsoncarvalho"/>
    <language>en</language>
    <item>
      <title>The Right Proposal Lost Again: On Power Struggles Disguised as Technical Decisions</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Fri, 19 Jun 2026 15:18:48 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/the-right-proposal-lost-again-on-power-struggles-disguised-as-technical-decisions-4e8c</link>
      <guid>https://dev.to/rudsoncarvalho/the-right-proposal-lost-again-on-power-struggles-disguised-as-technical-decisions-4e8c</guid>
      <description>&lt;p&gt;You've seen this happen before, probably more than once. Two architecture proposals on the table. One is cheaper, simpler to operate, with fewer points of failure and less dependency on a single specialist. The other costs more, needs more people, more time, more meetings. The committee approves the second one.&lt;/p&gt;

&lt;p&gt;Nobody in the room will say "we picked the worse option because its author has lunch with the VP every Thursday." What comes out is always technically respectable: "the other solution scales better long-term," "this one reduces technical debt," "it's better aligned with the data governance the platform team has been pushing for." Sentence by sentence, none of it is a lie. And yet none of it is the real reason for the decision.&lt;/p&gt;

&lt;p&gt;You leave the meeting knowing, with an uncomfortable certainty in your gut, that you lost for a reason that wasn't on any slide. Then comes the worse part: you can't name that reason without sounding paranoid, or bitter, or "too political" — exactly the label the person who won will never get, because they were careful never to say the word "power" out loud.&lt;/p&gt;

&lt;p&gt;This pattern isn't a rare accident of a broken process. It is the process. It repeats in architecture decisions, in roadmap priorities, in who signs the RFC, in who gets the credit for the project that worked and who gets the blame for the one that didn't. It's frequent and regular enough to become a law. This pattern is what became the book &lt;strong&gt;"As Leis do Poder em Projetos"&lt;/strong&gt; (The Laws of Power in Projects).&lt;/p&gt;

&lt;h2&gt;
  
  
  The mechanism: power wearing a competence badge
&lt;/h2&gt;

&lt;p&gt;Every tech company runs two operating systems in parallel. One is documented: architecture, metrics, OKRs, post-mortems, security committees, RFCs, promotion criteria. The other never shows up in the minutes, but decides just as much as the first one: who gains territory, who gets spared when the project blows up, who's allowed to disagree in public, and who's only allowed to disagree "later, in private."&lt;/p&gt;

&lt;p&gt;The reason this second system is nearly impossible to point a finger at is simple: it speaks the exact same language as the first one. "Technical debt" is a real engineering category — and also the perfect label for any old decision someone wants to revisit for reasons that have nothing to do with actual debt. "That service doesn't scale" might be a fact about capacity — or it might be the sentence that kills a rival team's project without anyone having to admit the problem was never scale. Metrics, security, compliance, governance: each of these words has the same property. From the outside, they're indistinguishable from pure technical competence. That's exactly why they work so well as weapons — and why no one is ever held accountable for using them that way.&lt;/p&gt;

&lt;p&gt;Questioning a security veto in public, for instance, doesn't look like technical rigor. It looks reckless. So nobody questions it, and the veto becomes the most efficient power instrument in the company — protected by the simple fact that disagreeing with it is socially expensive, even when it's being used for the wrong reasons.&lt;/p&gt;

&lt;p&gt;None of this requires a closed-door conspiracy. It's structure, not a plot: any hierarchy that distributes budget, prestige, and career survival unevenly will generate competition for those resources. What's particular about tech is that this competition is almost never called by its name. It's called architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The book: 34 laws, six parts, from diagnosis to defense
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"As Leis do Poder em Projetos"&lt;/strong&gt; organizes this pattern into 34 laws, split into six parts that go from "this is happening to you right now, nobody just used that word" to "what to do about it without becoming the next person who does it to others."&lt;/p&gt;

&lt;h3&gt;
  
  
  I · Manufacturing the Problem
&lt;/h3&gt;

&lt;p&gt;Before any power struggle has a winner, someone first needs to convince everyone that a problem exists — and, preferably, that only one specific solution solves it. This part maps how crisis narratives are manufactured, inflated, and timed before any actual technical decision reaches the table.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some people light fires to sell extinguishers&lt;/li&gt;
&lt;li&gt;"Technical debt" is a rhetorical weapon, not just an engineering concept&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  II · Owning the Narrative
&lt;/h3&gt;

&lt;p&gt;Whoever decides which version of events survives into next quarter holds more power than whoever actually solved the problem. This part is about how a project's story gets written — and rewritten — by whoever controls the documents, the channels, and, above all, the vocabulary used to describe what happened.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whoever writes the post-mortem writes history&lt;/li&gt;
&lt;li&gt;Whoever controls the vocabulary controls the debate&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  III · Capturing Territory
&lt;/h3&gt;

&lt;p&gt;Technical merit and upward mobility inside a company rarely follow the same ruler, and pretending otherwise only delays the moment you realize it. This part is about how territory — scope, headcount, visibility, access — gets won and defended, and why being in the right place at the right time usually beats having done the right work.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Credit goes to whoever is on stage on demo day&lt;/li&gt;
&lt;li&gt;Turtles don't climb trees on their own: merit and promotion are not the same thing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  IV · The Executive's Game
&lt;/h3&gt;

&lt;p&gt;At the top, the rules change again. Decisions that look purely strategic are, with uncomfortable frequency, about the executive's personal survival within their own tenure — which usually lasts less than any three-year roadmap. This part looks at the board from the perspective of someone who only has a few quarters to show results before the next reorg.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CIO doesn't pick the best technology; they pick the one that survives the next change of command&lt;/li&gt;
&lt;li&gt;Decide which fires to let burn&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  V · Security, Risk, and Compliance as Power
&lt;/h3&gt;

&lt;p&gt;When a dispute can't be won on technical merit, it tends to migrate to the one terrain where questioning looks reckless instead of rigorous: security, risk, compliance. This part exposes how these domains — created to protect the company — also work as the most efficient veto available to anyone willing to use it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security is the one veto nobody dares challenge in public&lt;/li&gt;
&lt;li&gt;Every exception becomes a precedent — and every precedent becomes power&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  VI · Surviving Without Being Naive (the defense)
&lt;/h3&gt;

&lt;p&gt;The last part switches sides. After mapping the game, the book turns to how to play it without becoming what it describes — how to protect good work, allies, and reputation without resorting to the same dirty tactics the rest of the book spent five parts documenting.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't win arguments; win before them&lt;/li&gt;
&lt;li&gt;Being right is necessary; having narrative, timing, and allies is what makes being right count&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  This is not a manipulation manual
&lt;/h2&gt;

&lt;p&gt;It's worth repeating, because it's easy to read the list above and conclude the opposite: this isn't a book about manipulating people to win internal disputes. The same tools that run through all 34 laws — controlling the narrative, choosing the timing, building allies, naming the problem before someone else names it for you — don't inherently belong to whoever acts in bad faith. They belong to whoever uses them first and uses them best.&lt;/p&gt;

&lt;p&gt;The difference between using these tools to defend honest work and using the same tools to sabotage someone else's isn't in the tool. It's in who decides what to load into it. You can write a rigorous post-mortem and still write the story. You can build allies to protect the right decision, not to bury the right person. The book doesn't pretend that line doesn't exist — it just argues that pretending the whole game doesn't exist is the fastest way to lose to whoever plays without that hesitation.&lt;/p&gt;

&lt;p&gt;Political clarity isn't the same thing as cynicism. It's the difference between being blindsided by the game and choosing, eyes open, what to do with the cards it deals you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Launch
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fd9u367dth1srj7pp6kpa.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fd9u367dth1srj7pp6kpa.jpeg" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;As Leis do Poder em Projetos&lt;/strong&gt; (The Laws of Power in Projects) is currently in presale on Amazon, with launch expected for &lt;strong&gt;July 9, 2026&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://a.co/d/0bdF0Sk2" rel="noopener noreferrer"&gt;https://a.co/d/0bdF0Sk2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The book is currently available in Portuguese, but I'm considering an English edition if there is enough interest from the international software engineering community.&lt;/p&gt;

</description>
      <category>career</category>
      <category>leadership</category>
      <category>softwareengineering</category>
      <category>productivity</category>
    </item>
    <item>
      <title>A Proposta Certa Perdeu de Novo: Sobre Disputas de Poder Disfarçadas de Decisão Técnica</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Fri, 19 Jun 2026 13:56:27 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/a-proposta-certa-perdeu-de-novo-sobre-disputas-de-poder-disfarcadas-de-decisao-tecnica-2a15</link>
      <guid>https://dev.to/rudsoncarvalho/a-proposta-certa-perdeu-de-novo-sobre-disputas-de-poder-disfarcadas-de-decisao-tecnica-2a15</guid>
      <description>&lt;p&gt;Você já viu isso acontecer, provavelmente mais de uma vez. Duas propostas de arquitetura na mesa. Uma é mais barata, mais simples de operar, com menos pontos de falha e menos dependência de um único especialista. A outra custa mais, exige mais gente, mais tempo, mais reunião. O comitê aprova a segunda.&lt;/p&gt;

&lt;p&gt;Ninguém na sala vai dizer "escolhemos a opção pior porque o autor dela almoça com o VP toda quinta". O que sai é sempre tecnicamente respeitável: "a outra solução escala melhor no longo prazo", "essa reduz débito técnico", "está mais alinhada com a governança de dados que o time de plataforma está cobrando". Frase por frase, nada ali é mentira. E ainda assim nenhuma delas é o motivo real da decisão.&lt;/p&gt;

&lt;p&gt;Você sai da reunião sabendo, com uma certeza incômoda no estômago, que perdeu por um motivo que não estava em nenhum slide. Aí vem a parte pior: você não consegue nomear esse motivo sem parecer paranoico, ou amargurado, ou "político demais" — justamente o adjetivo que a pessoa que venceu jamais vai receber, porque ela teve o cuidado de nunca usar a palavra "poder" em voz alta.&lt;/p&gt;

&lt;p&gt;Esse padrão não é um acidente raro de processo malfeito. É o processo. Ele se repete em decisão de arquitetura, em prioridade de roadmap, em quem assina o RFC, em quem fica com o crédito do projeto que deu certo e em quem fica com a culpa do que deu errado. É frequente e regular o suficiente para virar lei. Foi esse padrão que virou o livro &lt;strong&gt;"As Leis do Poder em Projetos"&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  O mecanismo: poder usando crachá de competência
&lt;/h2&gt;

&lt;p&gt;Toda empresa de tecnologia roda dois sistemas operacionais em paralelo. Um está documentado: arquitetura, métricas, OKRs, post-mortems, comitês de segurança, RFCs, critérios de promoção. O outro nunca aparece em ata, mas decide tanto quanto o primeiro: quem ganha território, quem é poupado quando o projeto explode, quem tem permissão de discordar em público e quem só tem permissão de discordar "depois, em particular".&lt;/p&gt;

&lt;p&gt;A razão pela qual esse segundo sistema é quase impossível de apontar com o dedo é simples: ele fala exatamente a língua do primeiro. "Débito técnico" é uma categoria real de engenharia — e também é a etiqueta perfeita para qualquer decisão antiga que alguém quer revisitar por motivos que não têm nada a ver com dívida nenhuma. "Esse serviço não escala" pode ser um fato de capacidade — ou pode ser a frase que mata o projeto do time concorrente sem que ninguém precise admitir que o problema nunca foi a escala. Métrica, segurança, compliance, governança: cada uma dessas palavras tem a mesma propriedade. Vistas de fora, são indistinguíveis de competência técnica pura. É exatamente por isso que funcionam tão bem como arma — e por isso ninguém jamais é responsabilizado por usá-las assim.&lt;/p&gt;

&lt;p&gt;Questionar um veto de segurança em público, por exemplo, não parece rigor técnico. Parece imprudência. Então ninguém questiona, e o veto vira o instrumento de poder mais eficiente da empresa — protegido pelo simples fato de que discordar dele é socialmente caro, mesmo quando ele está sendo usado fora de propósito.&lt;/p&gt;

&lt;p&gt;Nada disso exige conspiração de sala fechada. É estrutura, não complô: qualquer hierarquia que distribui orçamento, prestígio e sobrevivência de carreira de forma desigual vai gerar disputa por esses recursos. A particularidade de tech é que essa disputa quase nunca é chamada pelo nome. Ela é chamada de arquitetura.&lt;/p&gt;

&lt;h2&gt;
  
  
  O livro: 34 leis, seis partes, do diagnóstico à defesa
&lt;/h2&gt;

&lt;p&gt;"As Leis do Poder em Projetos" organiza esse padrão em 34 leis, divididas em seis partes que vão de "isso está acontecendo com você agora, só que ninguém usou essa palavra" até "o que fazer sobre isso sem se tornar a próxima pessoa que faz isso com os outros".&lt;/p&gt;

&lt;h3&gt;
  
  
  I · Fabricar o Problema
&lt;/h3&gt;

&lt;p&gt;Antes de qualquer disputa de poder ter um vencedor, alguém precisa primeiro convencer todo mundo de que existe um problema — e, de preferência, que só uma solução específica resolve esse problema. Esta parte mapeia como narrativas de crise são fabricadas, infladas e cronometradas antes de qualquer decisão técnica de fato entrar na mesa.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Há quem acenda incêndios para vender extintores&lt;/li&gt;
&lt;li&gt;"Débito técnico" é arma retórica, não só conceito de engenharia&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  II · Dominar a Narrativa
&lt;/h3&gt;

&lt;p&gt;Quem decide qual versão dos fatos sobrevive ao próximo trimestre tem mais poder do que quem efetivamente resolveu o problema. Esta parte trata de como a história de um projeto é escrita — e reescrita — por quem controla os documentos, os canais e, acima de tudo, o vocabulário usado para descrever o que aconteceu.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quem escreve o post-mortem escreve a história&lt;/li&gt;
&lt;li&gt;Quem controla o vocabulário controla o debate&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  III · Capturar Território
&lt;/h3&gt;

&lt;p&gt;Mérito técnico e ascensão dentro da empresa raramente seguem a mesma régua, e fingir que seguem só atrasa o momento em que você percebe isso. Esta parte é sobre como território — escopo, headcount, visibilidade, acesso — é conquistado e defendido, e por que estar no lugar certo na hora certa costuma valer mais do que ter feito o trabalho certo.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O crédito vai para quem está no palco no dia da demo&lt;/li&gt;
&lt;li&gt;Tartaruga não sobe em árvore: mérito e ascensão são coisas diferentes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  IV · O Jogo do Executivo
&lt;/h3&gt;

&lt;p&gt;Lá no topo, as regras mudam de novo. Decisões que parecem puramente estratégicas são, com uma frequência desconfortável, sobre a sobrevivência pessoal do executivo dentro do próprio mandato — que costuma durar menos do que qualquer roadmap de três anos. Esta parte olha o tabuleiro a partir de quem só tem alguns trimestres para mostrar resultado antes da próxima reorganização.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;O CIO não escolhe a melhor tecnologia; escolhe a que sobrevive à próxima troca de comando&lt;/li&gt;
&lt;li&gt;Decida quais incêndios deixar queimar&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  V · Segurança, Risco e Compliance como Poder
&lt;/h3&gt;

&lt;p&gt;Quando uma disputa não pode ser vencida no mérito técnico, ela costuma migrar para o único terreno onde questionar parece imprudência em vez de rigor: segurança, risco, conformidade. Esta parte expõe como esses domínios — criados para proteger a empresa — também funcionam como o veto mais eficiente disponível para qualquer pessoa disposta a usá-lo.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Segurança é o único veto que ninguém ousa contestar em público&lt;/li&gt;
&lt;li&gt;Toda exceção vira precedente — e todo precedente vira poder&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  VI · Sobreviver Sem Ser Ingênuo (a defesa)
&lt;/h3&gt;

&lt;p&gt;A última parte muda de lado. Depois de mapear o jogo, o livro trata de como jogá-lo sem se tornar aquilo que ele próprio descreve — como proteger bom trabalho, aliados e reputação sem recorrer às mesmas táticas sujas que o resto do livro passou cinco partes documentando.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Não vença discussões; vença antes delas&lt;/li&gt;
&lt;li&gt;Ter razão é necessário; ter narrativa, timing e aliados é o que faz a razão valer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Isto não é manual de manipulação
&lt;/h2&gt;

&lt;p&gt;Vale repetir, porque é fácil ler a lista acima e concluir o contrário: isto não é um livro sobre como manipular pessoas para ganhar disputas internas. As mesmas ferramentas que atravessam as 34 leis — controlar a narrativa, escolher o timing, construir aliados, nomear o problema antes que alguém o nomeie por você — não pertencem por natureza a quem age de má-fé. Pertencem a quem as usa primeiro e usa melhor.&lt;/p&gt;

&lt;p&gt;A diferença entre usar essas ferramentas para defender um trabalho honesto e usar as mesmas ferramentas para sabotar o de outra pessoa não está na ferramenta. Está em quem decide o que carregar com ela. Dá para escrever um post-mortem rigoroso e, ainda assim, escrever a história. Dá para construir aliados para proteger uma decisão certa, e não para enterrar uma pessoa certa. O livro não finge que essa linha não existe — só argumenta que fingir que o jogo inteiro não existe é a forma mais rápida de perder para quem joga sem essa hesitação.&lt;/p&gt;

&lt;p&gt;Lucidez política não é a mesma coisa que cinismo. É a diferença entre ser surpreendido pelo jogo e escolher, de olhos abertos, o que fazer com as cartas que ele te dá.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fj18klimj3mohegrwy9f7.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fj18klimj3mohegrwy9f7.jpeg" alt=" " width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lançamento
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;As Leis do Poder em Projetos&lt;/strong&gt; está em pré-venda na Amazon, com lançamento previsto para &lt;strong&gt;09/07/2026&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://a.co/d/0bdF0Sk2" rel="noopener noreferrer"&gt;https://a.co/d/0bdF0Sk2&lt;/a&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>leadership</category>
      <category>softwareengineering</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your AI agent is inventing behavior — and you have no way to prove otherwise</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Tue, 16 Jun 2026 03:53:25 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/your-ai-agent-is-inventing-behavior-and-you-have-no-way-to-prove-otherwise-2c5e</link>
      <guid>https://dev.to/rudsoncarvalho/your-ai-agent-is-inventing-behavior-and-you-have-no-way-to-prove-otherwise-2c5e</guid>
      <description>&lt;p&gt;You reviewed the PR. The code looks right. The tests pass.&lt;/p&gt;

&lt;p&gt;But that new field in the API response — who asked for that?&lt;/p&gt;

&lt;p&gt;You check the history, the requirements, the conversation with the PO. It's nowhere. The AI agent just added it. And if you hadn't looked closely, it would have shipped to production.&lt;/p&gt;

&lt;p&gt;This happens every time an agent generates code. And it will happen more often as pipelines become more autonomous. The problem isn't that the AI makes mistakes — it's that when it adds something nobody asked for, there's no structural mechanism today that stops it from getting through.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap nobody closed
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblbyvjeexkmj9hd5f60e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fblbyvjeexkmj9hd5f60e.png" alt="Where BPR fits: the empty layer" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's an entire ecosystem of standards for tracking what happens to software. But each one covers a different piece:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RTM, ReqIF, OSLC — track requirements, but they're documents humans fill in. Nothing stops an agent from generating code that doesn't correspond to any of them.&lt;/li&gt;
&lt;li&gt;SLSA, SPDX, CycloneDX — cover the build chain and component inventory. Excellent, but they operate on the compiled artifact, not on behavior.&lt;/li&gt;
&lt;li&gt;W3C PROV — models data provenance in general. It doesn't go down to "who asked for this field in the response?"&lt;/li&gt;
&lt;li&gt;C2PA — provenance for media and digital content. A different domain.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The floor between an approved requirement and generated behavior is empty. There's no machine-checkable contract today that says: "this behavior has traceable origin, or it's rejected."&lt;/p&gt;

&lt;h2&gt;
  
  
  The root of the problem
&lt;/h2&gt;

&lt;p&gt;When a human engineer adds a field nobody asked for, there's natural friction: the PR goes to review, someone asks "where did this come from?", the person has to justify it.&lt;/p&gt;

&lt;p&gt;With AI agents the cycle is different:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The agent receives a context&lt;/li&gt;
&lt;li&gt;The agent generates code&lt;/li&gt;
&lt;li&gt;The code is plausible, it compiles, the tests pass&lt;/li&gt;
&lt;li&gt;Nobody has an automated way to ask: was this specific behavior derived from which requirement?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The honest answer is: there isn't one. And this isn't process fussiness — in regulated environments (finance, healthcare, aerospace), this is real risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core idea behind BPR
&lt;/h2&gt;

&lt;p&gt;The Behavioral Provenance Record (BPR) is a conformance specification that attacks exactly this problem.&lt;/p&gt;

&lt;p&gt;The logic is simple: instead of trying to prove the agent didn't invent anything — which is impossible — you turn invention into a structural failure.&lt;/p&gt;

&lt;p&gt;Every node in the pipeline (a requirement, a behavioral example, a scenario, a contract, a unit of code, a test) emits a provenance record. That record says: "this artifact came from that upstream node." If it didn't come from anywhere, it's rejected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Core rule: provenance-or-reject.

Every derived node MUST cite the upstream node it came from.
No resolvable origin → rejected.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Where it enters the SDLC
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F89b9qjt343w65uc7f1mn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F89b9qjt343w65uc7f1mn.png" alt="Where BPR enters the SDLC" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's not a document you fill in afterward. BPR enters the flow at the moment the artifact is produced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The human authors the need and the behavioral examples. These are the trust anchors — &lt;code&gt;authored&lt;/code&gt; nodes, the roots of the graph.&lt;/li&gt;
&lt;li&gt;The agent derives the scenario, the contract, the code, and the test — emitting a &lt;code&gt;derived&lt;/code&gt; record at each step, citing what came before.&lt;/li&gt;
&lt;li&gt;A conformance gate (a CI stage, a PR check) reads the records and passes or fails the change before merge.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a traceability graph. The classic RTM you know is just a 2D projection of this graph — not the actual object.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hardest level: anti-invention
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqi4z122tq4bbq3p77k9l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqi4z122tq4bbq3p77k9l.png" alt="Anti-invention, made structural" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The interesting part is what happens at the highest level — when you want to know whether the agent added behavior nobody asked for.&lt;/p&gt;

&lt;p&gt;BPR doesn't try to prove a negative. Instead, each node can declare its claims — the behavioral assertions it makes. And every claim needs to cite an upstream claim.&lt;/p&gt;

&lt;p&gt;Every claim has a type defined by observability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;behavioral&lt;/code&gt; — changes the observable functional contract (response field, status code). Must be traced.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;operational&lt;/code&gt; — changes observable non-functional behavior (latency, retry, logging, metrics, security). Must be traced.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;implementation&lt;/code&gt; — internal, no observable footprint (data structure, in-process cache). Exempt — but only if non-observability is attested. Without attestation, it's treated as behavioral. This closes the loophole of "laundering" invention by labeling it an internal detail.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A behavioral claim with no upstream ancestry is invention — now a localized, named, attributable failure. Not a hallucination hidden in the middle of the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Incremental adoption — not all or nothing
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funrkmfpt5ge30r66lvi3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Funrkmfpt5ge30r66lvi3.png" alt="Adopt incrementally: the conformance ladder" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the point I consider most important in the design: you don't have to adopt everything at once.&lt;/p&gt;

&lt;p&gt;L1 and L2 already deliver real value today, without depending on anything sophisticated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;L1: every scenario has a test. Every test verifies a scenario. Simple, deterministic, and most teams still don't have this enforced.&lt;/li&gt;
&lt;li&gt;L2: every approved need has downstream coverage through to a test. (Deferred, rejected, or informational requirements are explicitly exempt — the &lt;code&gt;status&lt;/code&gt; field handles that.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;L3 is the frontier — where you guarantee the agent didn't invent behavior. Harder, depends on external checkers (human, NLI model, LLM-judge), but fully specified and falsifiable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What BPR standardizes — and what it doesn't
&lt;/h2&gt;

&lt;p&gt;This distinction is what separates a standard from a product:&lt;/p&gt;

&lt;p&gt;What is the standard: the record schema, the serialization, the invariants, the conformance levels, and claim semantics.&lt;/p&gt;

&lt;p&gt;What is not the standard (replaceable): the validator, the checkers that produce verdicts, how the agent emits records in the pipeline, where the graph is stored, and the policy for who can attest.&lt;/p&gt;

&lt;p&gt;You implement your own validator in Go, TypeScript, whatever. If two independent implementations reach the same conformance verdict on the same examples, that's a standard — not a private API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honesty about the limits
&lt;/h2&gt;

&lt;p&gt;Conformance level is a claim about structure, not correctness.&lt;/p&gt;

&lt;p&gt;"L3b conformant" means every behavioral claim is cited and attested as &lt;code&gt;supported&lt;/code&gt;. It doesn't mean the attestation is correct. It's not proof that nothing was invented.&lt;/p&gt;

&lt;p&gt;BPR guarantees attributability — that invented behavior can't be committed silently. A weak checker can still issue a wrong &lt;code&gt;supported&lt;/code&gt;. The quality of the checker is the responsibility of whoever chooses it, and is out of scope for a purpose-built standard.&lt;/p&gt;

&lt;p&gt;What BPR doesn't solve on its own: a malicious agent forging records, an incompetent checker, a bad original requirement, absence of organizational policy, tampering with files after the fact. These are solved through composition — record signing, attestation policy, human process — not by expanding the scope of the standard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current state and what's missing
&lt;/h2&gt;

&lt;p&gt;BPR is published as a preprint of an initial specification:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ JSON Schema (JSON Schema 2020-12)&lt;/li&gt;
&lt;li&gt;✅ Reference validator L0–L3c in Python (runs and rejects invalid input with a specific reason)&lt;/li&gt;
&lt;li&gt;✅ Conformant and deliberately broken examples (including laundering attempts)&lt;/li&gt;
&lt;li&gt;🔲 Versioning + staleness (v0.5) — when a requirement changes, which downstream nodes become stale?&lt;/li&gt;
&lt;li&gt;🔲 Anti-self-attestation at L3c — issuer ≠ verifier as a specifiable property&lt;/li&gt;
&lt;li&gt;🔲 Normative mapping to W3C PROV-O&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The line between a specification and a standard is an independent implementation. Today there's one — the reference one. Publishing this is the invitation for a second.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to test it right now
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/RudsonCarvalho/bpr.git
&lt;span class="nb"&gt;cd &lt;/span&gt;bpr
pip &lt;span class="nb"&gt;install &lt;/span&gt;jsonschema

&lt;span class="c"&gt;# base example — should pass at L2&lt;/span&gt;
python validator/validate.py examples/profile-retrieval.records.json &lt;span class="nt"&gt;--level&lt;/span&gt; L2

&lt;span class="c"&gt;# example with claims — should pass at L3b&lt;/span&gt;
python validator/validate.py examples/profile-retrieval.l3.records.json &lt;span class="nt"&gt;--level&lt;/span&gt; L3b

&lt;span class="c"&gt;# example with invention and laundering — should FAIL&lt;/span&gt;
python validator/validate.py examples/profile-retrieval.l3-broken.json &lt;span class="nt"&gt;--level&lt;/span&gt; L3b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We don't need to prove the AI didn't invent anything. We need any invention to be a named, localized, traceable failure — not a hallucination that shipped to production because the tests passed.&lt;/p&gt;

&lt;p&gt;BPR proposes exactly that: a minimal, neutral, verifiable contract. The immediate value is in L1/L2. The long-term promise is in L3.&lt;/p&gt;

&lt;p&gt;If you work with AI pipelines generating code: clone it, run the examples, try to break L3, send adversarial cases. That's what turns a specification into infrastructure.&lt;/p&gt;

&lt;p&gt;Specification + schema + validator: 👉 &lt;a href="https://github.com/RudsonCarvalho/bpr" rel="noopener noreferrer"&gt;github.com/RudsonCarvalho/bpr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Preprint with DOI: 👉 &lt;a href="https://doi.org/10.5281/zenodo.20710512" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.20710512&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>softwareengineering</category>
      <category>devops</category>
    </item>
    <item>
      <title>Your agent skill was never loaded. And you have no way of knowing.</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Wed, 10 Jun 2026 03:48:55 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/your-agent-skill-was-never-loaded-and-you-have-no-way-of-knowing-1ekg</link>
      <guid>https://dev.to/rudsoncarvalho/your-agent-skill-was-never-loaded-and-you-have-no-way-of-knowing-1ekg</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/rudsoncarvalho/agent-skills-load-on-a-guess-and-cant-inherit-heres-the-fix-50ao" class="crayons-story__hidden-navigation-link"&gt;Agent skills load on a guess (and can't inherit). Here's the fix&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/rudsoncarvalho" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F151609%2F89dcc1f7-ea31-49e0-870e-83221de8c418.jpg" alt="rudsoncarvalho profile" class="crayons-avatar__image" width="800" height="830"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/rudsoncarvalho" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Rudson Kiyoshi Souza Carvalho
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Rudson Kiyoshi Souza Carvalho
                
              
              &lt;div id="story-author-preview-content-3862288" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/rudsoncarvalho" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F151609%2F89dcc1f7-ea31-49e0-870e-83221de8c418.jpg" class="crayons-avatar__image" alt="" width="800" height="830"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Rudson Kiyoshi Souza Carvalho&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/rudsoncarvalho/agent-skills-load-on-a-guess-and-cant-inherit-heres-the-fix-50ao" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 10&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/rudsoncarvalho/agent-skills-load-on-a-guess-and-cant-inherit-heres-the-fix-50ao" id="article-link-3862288"&gt;
          Agent skills load on a guess (and can't inherit). Here's the fix
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/llm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;llm&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/architecture"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;architecture&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agents"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agents&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/rudsoncarvalho/agent-skills-load-on-a-guess-and-cant-inherit-heres-the-fix-50ao#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            7 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Agent skills load on a guess (and can't inherit). Here's the fix</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Wed, 10 Jun 2026 03:37:00 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/agent-skills-load-on-a-guess-and-cant-inherit-heres-the-fix-50ao</link>
      <guid>https://dev.to/rudsoncarvalho/agent-skills-load-on-a-guess-and-cant-inherit-heres-the-fix-50ao</guid>
      <description>&lt;p&gt;Your agent skill was never loaded. And you have no way of knowing.&lt;/p&gt;

&lt;p&gt;Not "loaded the wrong version." Not "loaded late." Never loaded at all. The model read a one-line summary of it, decided it didn't need the details, and generated a confident, plausible, wrong artifact instead. No error. No log line. No stack trace. Just output that looks right and isn't.&lt;/p&gt;

&lt;p&gt;I kept running into this while building agents for regulated workflows, so I want to walk through &lt;em&gt;why&lt;/em&gt; it happens — it's structural, not a bug — and a small fix you can paste into your own stack today.&lt;/p&gt;

&lt;h2&gt;
  
  
  How skills actually load
&lt;/h2&gt;

&lt;p&gt;Most agent frameworks load skills the same way. The model never sees your skills up front. It sees a &lt;strong&gt;menu&lt;/strong&gt; — a list of names and short descriptions — and decides for itself, mid-task, whether to open any of them.&lt;/p&gt;

&lt;p&gt;Here's roughly the context the model wakes up to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AVAILABLE SKILLS
- rtm-format    : How to produce a requirements traceability matrix
- pii-redaction : Redact personal data before export
- audit-trail   : Log generated artifacts for compliance

TASK
Generate a requirements traceability matrix for the payments module.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, invisibly, it runs something like: &lt;em&gt;"Do I need to open &lt;code&gt;rtm-format&lt;/code&gt;? I already know what an RTM is — columns for requirements, sources, tests. I've got this."&lt;/em&gt; And it proceeds &lt;strong&gt;without ever opening the skill.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That's the whole mechanism. It's a &lt;strong&gt;semantic trigger&lt;/strong&gt;: a probabilistic, model-driven &lt;em&gt;pull&lt;/em&gt;. The skill body only enters the context if the model first decides it's needed. There's no guarantee, and — this is the part that hurts in production — no observability. You can't tell from the output whether the skill fired.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgqtcorlqrb8r851zig6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgqtcorlqrb8r851zig6.png" alt="Pull vs push: how a skill reaches the model" width="800" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The TV-manual problem
&lt;/h2&gt;

&lt;p&gt;Think about a TV manual. Nobody opens it to turn the TV on — you already know how. You only reach for the manual when you &lt;em&gt;recognize&lt;/em&gt; you don't know something: pairing a soundbar, fixing some weird HDMI handshake.&lt;/p&gt;

&lt;p&gt;The whole system depends on one assumption: &lt;strong&gt;you know what you don't know.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An LLM breaks that assumption. It doesn't know what it doesn't know. It "knows" how to generate an RTM in the generic sense, so it never recognizes that it should open &lt;em&gt;your&lt;/em&gt; RTM skill — the one that says your column order is fixed, your IDs follow &lt;code&gt;SYS-REQ-####&lt;/code&gt;, and there's one row per requirement, no exceptions. From the model's point of view, it already knows how to turn on the TV. So it never opens the manual. And it hands you a perfectly formatted RTM that's wrong in every way that matters to your auditor.&lt;/p&gt;

&lt;p&gt;This is why the failure is structural. The model can only choose to load a skill &lt;em&gt;after&lt;/em&gt; recognizing it lacks the knowledge — and the cases where it's most confidently wrong are exactly the cases where it feels no need to check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this quietly wrecks critical workflows
&lt;/h2&gt;

&lt;p&gt;A loud failure is a gift. A crash, a 500, a validation error — these tell you exactly where to look.&lt;/p&gt;

&lt;p&gt;The skipped-skill failure is the opposite. The agent produces a clean RTM with the wrong column order and non-conforming IDs. It passes a glance. It might pass review. It surfaces three weeks later when a compliance tool rejects the export, or worse, when nobody catches it at all. The cost of a silent failure isn't the failure — it's the false confidence it travels with.&lt;/p&gt;

&lt;p&gt;For ad-hoc help ("brainstorm some test cases"), probabilistic loading is fine, even good. For workflows where a specific artifact format is non-negotiable — regulated reporting, audit trails, anything with a downstream machine consumer — "the model will probably load the right skill" is not a foundation you want to stand on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: a Skill Resolver
&lt;/h2&gt;

&lt;p&gt;If a skill is &lt;em&gt;mandatory&lt;/em&gt; for a task type, the decision to load it shouldn't belong to the model at all. Make it a property of the pipeline.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Skill Resolver&lt;/strong&gt; is a tiny pre-dispatch step. It runs &lt;em&gt;before&lt;/em&gt; the LLM, looks at the task type, and injects the full body of every required skill straight into the context. No menu, no model discretion — push instead of pull.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;SKILL_STORE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtm-format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RTM SKILL: columns must be [ReqID, Source, Verification, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Status]; ReqID format SYS-REQ-####; one row per requirement.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit-trail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIT SKILL: log every generated artifact with author, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp, and source skill version.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;REQUIRED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtm-format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit-trail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;resolve_skills&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Runs BEFORE the LLM. Returns full skill bodies, not just summaries.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;REQUIRED&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;injected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolve_skills&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;required_skills&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;injected&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/required_skills&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Task: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_msg&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate an RTM for the payments module&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SKILL_STORE&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole idea. The key property isn't the line count — it's &lt;em&gt;where&lt;/em&gt; it runs. The injection happens outside the model's decision loop. By the time the LLM is called, &lt;code&gt;rtm-format&lt;/code&gt; is already in the context whether the model thought it needed it or not. The pull became a push.&lt;/p&gt;

&lt;p&gt;"Can't I just write it in &lt;code&gt;AGENTS.md&lt;/code&gt;?" You can, and it helps with prioritization — but it doesn't &lt;em&gt;guarantee&lt;/em&gt; anything. A line in &lt;code&gt;AGENTS.md&lt;/code&gt; is still an instruction the model interprets at inference time; it lives in the same probabilistic layer as the skill menu. The resolver lives one layer below inference, in code, where "always" actually means always.&lt;/p&gt;

&lt;h2&gt;
  
  
  Leveling up: skill inheritance for multinationals
&lt;/h2&gt;

&lt;p&gt;Now the real-world version. You're not running one agent — you're running one platform for a company with offices in a dozen countries. The audit format is global. The data-retention rules are German (GDPR) or Brazilian (LGPD). The reporting template is set by each local central bank. And a single business unit has its own quirks on top.&lt;/p&gt;

&lt;p&gt;The naive answer is copy-and-modify: fork the global skill set per country, tweak as needed. That falls apart fast. The forks &lt;strong&gt;drift&lt;/strong&gt; — a fix to the global skill never reaches the copies. You lose &lt;strong&gt;lineage&lt;/strong&gt; — six months later nobody can say which rule came from HQ and which a local team invented. And every global change becomes &lt;strong&gt;N update points&lt;/strong&gt; instead of one.&lt;/p&gt;

&lt;p&gt;What you actually want is inheritance: a scope chain from global down to the business unit, where more specific scopes override less specific ones, &lt;em&gt;most-specific-wins&lt;/em&gt; — except for invariants that HQ locks and no local scope can touch. If you've ever debugged CSS specificity, this is the same cascade: the most specific rule wins, and &lt;code&gt;!important&lt;/code&gt; is your invariant.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbn5a95o0y4qcap5obhrc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbn5a95o0y4qcap5obhrc.png" alt="Skill inheritance by scope, most-specific wins" width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a resolver that walks that chain and keeps the lineage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;REGISTRY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtm-format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v2.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit-trail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v4.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_invariant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit-trail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;country:BR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtm-format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bu:BR/retail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtm-format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit-trail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;# tries to weaken
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scope_chain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lineage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scope_chain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;                 &lt;span class="c1"&gt;# walk least -&amp;gt; most specific
&lt;/span&gt;        &lt;span class="n"&gt;layer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
        &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;|=&lt;/span&gt; &lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_invariant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;                       &lt;span class="c1"&gt;# invariant: can't be overridden
&lt;/span&gt;            &lt;span class="n"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;            &lt;span class="c1"&gt;# most-specific wins
&lt;/span&gt;            &lt;span class="n"&gt;lineage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scope&lt;/span&gt;               &lt;span class="c1"&gt;# who set it -&amp;gt; auditable
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lineage&lt;/span&gt;

&lt;span class="n"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;country:BR&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bu:BR/retail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lineage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;REGISTRY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;resolved&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (set by &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lineage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rtm-format   -&amp;gt; v1.0   (set by bu:BR/retail)
audit-trail  -&amp;gt; v4.0   (set by global)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;rtm-format&lt;/code&gt; cascades down to the business unit's &lt;code&gt;v1.0&lt;/code&gt;. But &lt;code&gt;audit-trail&lt;/code&gt; is locked global — the BU's attempt to swap in a weaker &lt;code&gt;v1.0&lt;/code&gt; is ignored, and the &lt;code&gt;lineage&lt;/code&gt; map tells you exactly which scope set each final value. One global change, one update point, full audit trail. That's &lt;strong&gt;Hierarchical Skill Resolution&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Microsoft APM fits
&lt;/h2&gt;

&lt;p&gt;None of this competes with &lt;a href="https://github.com/microsoft/apm" rel="noopener noreferrer"&gt;Microsoft's APM&lt;/a&gt; — it composes with it. There are three separate planes here, and it's worth keeping them straight:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfiofrvm92ji3eb0x053.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfiofrvm92ji3eb0x053.png" alt="Three planes: distribution, consumption, governance" width="799" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;APM is the &lt;strong&gt;distribution plane&lt;/strong&gt;: how skills are versioned, locked, and pulled from registries — the package-manager layer. The Skill Resolver is the &lt;strong&gt;consumption plane&lt;/strong&gt;: what's deterministically in the context when the model runs. HSR is the &lt;strong&gt;governance plane&lt;/strong&gt;: who controls which skill at which scope, and what can't be overridden. APM ships the &lt;code&gt;v1.0&lt;/code&gt;; the resolver guarantees it's actually present at inference; HSR decides that the BU was allowed to set it in the first place. None replaces the others.&lt;/p&gt;

&lt;p&gt;If this maps to a problem you're staring at, I opened a discussion in the APM community to push on the governance side — &lt;a href="https://github.com/microsoft/apm/discussions/1722" rel="noopener noreferrer"&gt;discussion #1722&lt;/a&gt;. Feedback and counter-arguments welcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;p&gt;Injection guarantees what &lt;em&gt;enters&lt;/em&gt; the context. It does not guarantee what the model &lt;em&gt;does&lt;/em&gt; with it. Stuff a skill into a 100k-token prompt and "lost in the middle" still applies — present isn't the same as attended-to. There's a token cost, too; injecting every required skill on every call adds up, so scope your &lt;code&gt;REQUIRED&lt;/code&gt; map tightly. And for genuinely open-ended, ad-hoc assistance, don't bother — probabilistic loading is the right tool there. The resolver earns its keep specifically where an output format is non-negotiable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;Semantic skill loading is a &lt;em&gt;pull&lt;/em&gt;, and the model decides whether to pull. That's perfect for exploration and quietly dangerous for compliance, because the model can't recognize a gap it doesn't know it has. A Skill Resolver flips it to a &lt;em&gt;push&lt;/em&gt; — moving the load decision out of the model and into your pipeline, in about fifteen lines. Add scope inheritance and you get governance for an org of any size, with lineage you can hand to an auditor.&lt;/p&gt;

&lt;p&gt;If you want to go deeper, the full write-up is in the paper, &lt;em&gt;Hierarchical Skill Resolution: Enabling Skill Inheritance and Deterministic Knowledge Injection for AI Agents&lt;/em&gt; (&lt;a href="https://doi.org/10.5281/zenodo.20619456" rel="noopener noreferrer"&gt;DOI: 10.5281/zenodo.20619456&lt;/a&gt;), and the governance discussion is over at &lt;a href="https://github.com/microsoft/apm/discussions/1722" rel="noopener noreferrer"&gt;APM #1722&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What's the worst silent skill-skip you've shipped? I'd love to hear it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>architecture</category>
      <category>agents</category>
    </item>
    <item>
      <title>TERSE Tool Catalog (TTC): Cut Tool Catalog Token Usage by 66.6% in Your AI Agents</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Tue, 05 May 2026 15:14:53 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/terse-tool-catalog-ttc-cut-tool-catalog-token-usage-by-666-in-your-ai-agents-2i9n</link>
      <guid>https://dev.to/rudsoncarvalho/terse-tool-catalog-ttc-cut-tool-catalog-token-usage-by-666-in-your-ai-agents-2i9n</guid>
      <description>&lt;p&gt;If you’ve ever built or worked with &lt;strong&gt;AI agents&lt;/strong&gt; that use tools via the Model Context Protocol (MCP), you’ve probably felt the pain that nobody talks about out loud:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The tool catalog is eating your entire context window and budget.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A single tool defined in MCP JSON Schema typically consumes &lt;strong&gt;100–270 tokens&lt;/strong&gt;. With 50 tools installed, you’re already spending &lt;strong&gt;5,000–13,500 tokens&lt;/strong&gt; &lt;em&gt;before the user even writes their first message&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This isn’t just expensive — it actively hurts performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Higher cost on every single request&lt;/li&gt;
&lt;li&gt;Lower tool-selection accuracy as the catalog grows (attention dilution)&lt;/li&gt;
&lt;li&gt;Less room for actual user instructions, memory, or reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The good news? There’s a clean, elegant solution: &lt;strong&gt;TERSE Tool Catalog (TTC)&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Today’s MCP JSON Schema
&lt;/h2&gt;

&lt;p&gt;The current MCP format was designed for &lt;strong&gt;machine-to-machine execution contracts&lt;/strong&gt;, not for &lt;strong&gt;LLM reasoning&lt;/strong&gt;. As a result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There is &lt;strong&gt;no explicit trigger condition&lt;/strong&gt; (&lt;code&gt;WHEN&lt;/code&gt;) — the LLM has to guess from a free-form &lt;code&gt;description&lt;/code&gt; string.&lt;/li&gt;
&lt;li&gt;There is &lt;strong&gt;no error contract&lt;/strong&gt; (&lt;code&gt;ERR&lt;/code&gt;) — the model has no idea what to do when a tool fails.&lt;/li&gt;
&lt;li&gt;There is &lt;strong&gt;no retrieval taxonomy&lt;/strong&gt; (&lt;code&gt;TAGS&lt;/code&gt;) — dynamic tool retrieval (RAG over tools) becomes painful.&lt;/li&gt;
&lt;li&gt;Verbose parameter descriptions add noise with almost zero signal for the LLM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is high cost + mediocre tool selection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing the TERSE Tool Catalog (TTC)
&lt;/h2&gt;

&lt;p&gt;TTC is an official &lt;strong&gt;extension of the TERSE Format&lt;/strong&gt; — a specification for dense, deterministic, human-and-machine-readable representations optimized for LLMs.&lt;/p&gt;

&lt;p&gt;It is &lt;strong&gt;not&lt;/strong&gt; just a compression of MCP JSON. It is a &lt;strong&gt;semantic reformulation&lt;/strong&gt; of the tool contract.&lt;/p&gt;

&lt;p&gt;TTC keeps everything the LLM actually needs for execution and adds three fields that MCP is missing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PURPOSE&lt;/code&gt; — clear one-line intent&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WHEN&lt;/code&gt; — explicit semantic trigger (the most important field for selection)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ERR&lt;/code&gt; — declared failure modes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TAGS&lt;/code&gt; — taxonomy for semantic grouping and retrieval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Measured result&lt;/strong&gt;: average &lt;strong&gt;66.6% token reduction&lt;/strong&gt; &lt;em&gt;with net information gain&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  TTC Syntax — Clean and Simple
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOOL &amp;lt;tool-id&amp;gt;
  PURPOSE: &amp;lt;one-line description of what the tool does&amp;gt;
  IN: &amp;lt;param1&amp;gt;:&amp;lt;type&amp;gt;, &amp;lt;param2&amp;gt;:&amp;lt;type&amp;gt;?
  OUT: &amp;lt;return-type&amp;gt;
  ERR: &amp;lt;error1&amp;gt; | &amp;lt;error2&amp;gt; | &amp;lt;error3&amp;gt;
  WHEN: &amp;lt;natural language trigger condition&amp;gt;
  TAGS: &amp;lt;tag1&amp;gt;, &amp;lt;tag2&amp;gt;, &amp;lt;tag3&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Supported Types
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;string&lt;/code&gt;, &lt;code&gt;int&lt;/code&gt;, &lt;code&gt;float&lt;/code&gt;, &lt;code&gt;bool&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;array[string]&lt;/code&gt;, &lt;code&gt;array[int]&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;object&lt;/code&gt;, &lt;code&gt;any&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;?&lt;/code&gt; suffix marks an optional parameter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Example: &lt;code&gt;gmail_send_email&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MCP JSON Schema (208 tokens):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gmail_send_email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sends an email message via the Gmail API to one or more recipients..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;very&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;verbose&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TTC (55 tokens):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOOL gmail_send_email
  PURPOSE: send email via Gmail
  IN: to:string, subject:string, body:string, cc:string?
  OUT: message_id:string
  ERR: auth_failed | quota_exceeded | invalid_recipient
  WHEN: user wants to send or compose an email
  TAGS: gmail, email, communication
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Same semantic content. 73.6% fewer tokens.&lt;/strong&gt; And the LLM now has structured fields to make much better decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Benchmark (10 Production Tools)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;JSON Schema&lt;/th&gt;
&lt;th&gt;TTC&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gmail_send_email&lt;/td&gt;
&lt;td&gt;208&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;73.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gmail_read_inbox&lt;/td&gt;
&lt;td&gt;121&lt;/td&gt;
&lt;td&gt;52&lt;/td&gt;
&lt;td&gt;57.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;drive_list_files&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;td&gt;53&lt;/td&gt;
&lt;td&gt;62.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;calendar_create_event&lt;/td&gt;
&lt;td&gt;262&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;70.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;slack_send_message&lt;/td&gt;
&lt;td&gt;206&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;66.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;github_create_issue&lt;/td&gt;
&lt;td&gt;269&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;68.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;td&gt;...&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL (10 tools)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1948&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;650&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;66.6%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Projection at scale&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;50 tools → ~9,740 → ~3,250 tokens&lt;/li&gt;
&lt;li&gt;100 tools → ~19,480 → ~6,500 tokens
&lt;strong&gt;Savings: ~13,000 tokens per request&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why TTC Works So Well
&lt;/h2&gt;

&lt;p&gt;It follows the core TERSE principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maximum information density per token&lt;/li&gt;
&lt;li&gt;Determinism (same input → same output)&lt;/li&gt;
&lt;li&gt;Human + machine readability&lt;/li&gt;
&lt;li&gt;Full composability (tools → servers → agent context)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it adds exactly what LLMs need for better reasoning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;WHEN&lt;/code&gt; becomes the primary discriminator for tool selection&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ERR&lt;/code&gt; enables graceful degradation and fallback strategies&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TAGS&lt;/code&gt; makes dynamic tool retrieval (RAG over tools) trivial&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Use It in Your Agent Context
&lt;/h2&gt;

&lt;p&gt;At the start of a conversation (or via dynamic retrieval), you inject:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOOLS v1.0 [3/47]
  MCP gmail v1.2
    TOOL gmail_send_email
      ...
  MCP google_drive v2.0
    TOOL drive_read_file
      ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With semantic tool retrieval, you only inject the 3–5 most relevant tools per request. Context cost becomes sub-linear no matter how large your total catalog grows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reference Converter (Python)
&lt;/h2&gt;

&lt;p&gt;The author provides a ready-to-use reference implementation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;github.com/RudsonCarvalho/terse-format&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It converts MCP JSON Schema → TTC with sensible defaults. For production use, you simply add explicit annotations for &lt;code&gt;OUT&lt;/code&gt;, &lt;code&gt;ERR&lt;/code&gt;, &lt;code&gt;WHEN&lt;/code&gt;, and &lt;code&gt;TAGS&lt;/code&gt; on the server side.&lt;/p&gt;

&lt;h2&gt;
  
  
  Planned Future Extensions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;EXAMPLE&lt;/code&gt; block — input/output examples for few-shot learning&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;COST&lt;/code&gt; annotation — estimated token/latency cost per call&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CHAIN&lt;/code&gt; annotation — tool dependencies and composition patterns&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ALIAS&lt;/code&gt; field — alternative trigger phrases&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AUTH&lt;/code&gt; annotation — required OAuth scopes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The TERSE Tool Catalog is not just a token-saving trick. It is a &lt;strong&gt;genuine improvement in agent quality&lt;/strong&gt; — better tool selection, better error handling, and native support for semantic tool retrieval.&lt;/p&gt;

&lt;p&gt;If you work with agents, MCP, LangGraph, CrewAI, AutoGen, or any modern agentic framework, TTC is worth trying today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;p&gt;📄 Full spec (Zenodo): &lt;a href="https://doi.org/10.5281/zenodo.19869007" rel="noopener noreferrer"&gt;https://doi.org/10.5281/zenodo.19869007&lt;/a&gt;&lt;br&gt;&lt;br&gt;
💻 GitHub: &lt;a href="https://github.com/RudsonCarvalho/terse-format/tree/main/extensions/ttc" rel="noopener noreferrer"&gt;https://github.com/RudsonCarvalho/terse-format/tree/main/extensions/ttc&lt;/a&gt;&lt;br&gt;&lt;br&gt;
🌐 Landing page: &lt;a href="https://rudsoncarvalho.github.io/terse-format/" rel="noopener noreferrer"&gt;https://rudsoncarvalho.github.io/terse-format/&lt;/a&gt;&lt;br&gt;&lt;br&gt;
📦 TERSE Format (parent spec): &lt;a href="https://doi.org/10.5281/zenodo.19058364" rel="noopener noreferrer"&gt;https://doi.org/10.5281/zenodo.19058364&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>mcp</category>
      <category>token</category>
      <category>terse</category>
    </item>
    <item>
      <title>Your AI agent wastes 13,000 tokens before saying "hello"</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Wed, 29 Apr 2026 01:22:37 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/your-ai-agent-wastes-13000-tokens-before-saying-hello-3141</link>
      <guid>https://dev.to/rudsoncarvalho/your-ai-agent-wastes-13000-tokens-before-saying-hello-3141</guid>
      <description>&lt;p&gt;And you probably have no idea.&lt;/p&gt;




&lt;p&gt;If you have an agent with 50 MCP tools installed, here's what happens before any user message is processed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gmail_send_email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sends an email message via the Gmail API to one or more 
    recipients. Use this tool when the user explicitly requests to send, 
    compose and send, or deliver an email message to someone."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The recipient email address or comma-separated list"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The subject line of the email"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The body content of the email in plain text or HTML"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's &lt;strong&gt;~195 tokens&lt;/strong&gt;. Per tool. Before anything else.&lt;/p&gt;

&lt;p&gt;50 tools × 195 tokens = &lt;strong&gt;9,750 tokens of pure overhead&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And that's just the catalog. You haven't touched user context, conversation history, documents, or anything useful yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  "But there's prompt caching, right?"
&lt;/h2&gt;

&lt;p&gt;Yes. It reduces the financial cost to ~10% of the base rate.&lt;/p&gt;

&lt;p&gt;But caching &lt;strong&gt;does not reduce attention cost&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Those tokens still occupy the context window. The model still attends to all of them on every request. And if you use dynamic tool retrieval — selecting different tools per request based on user intent — the cache breaks on every different selection.&lt;/p&gt;

&lt;p&gt;The bill doesn't disappear. It just gets cheaper.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;MCP JSON Schema was designed as a tool execution contract. Not as a semantic tool selection contract.&lt;/p&gt;

&lt;p&gt;The result: information critical for LLM reasoning is either absent or buried in free-form text:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No error contract&lt;/strong&gt; — the LLM doesn't know what to do when &lt;code&gt;auth_failed&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No explicit trigger&lt;/strong&gt; — it has to infer "when to use this tool" from a paragraph of description&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No retrieval taxonomy&lt;/strong&gt; — no standard way to group or filter tools by domain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Verbose AND semantically incomplete. The worst of both worlds.&lt;/p&gt;




&lt;h2&gt;
  
  
  TTC — TERSE Tool Catalog
&lt;/h2&gt;

&lt;p&gt;I spent the last few weeks solving this problem. The result is an extension of the &lt;a href="https://github.com/RudsonCarvalho/terse-format" rel="noopener noreferrer"&gt;TERSE Format&lt;/a&gt; called &lt;strong&gt;TTC — TERSE Tool Catalog&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The same tool above in TTC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;TOOL gmail_send_email&lt;/span&gt;
  &lt;span class="s"&gt;PURPOSE&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;send email via Gmail&lt;/span&gt;
  &lt;span class="s"&gt;IN&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;to:string, subject:string, body:string, cc:string?&lt;/span&gt;
  &lt;span class="s"&gt;OUT&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;message_id:string&lt;/span&gt;
  &lt;span class="s"&gt;ERR&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;auth_failed | quota_exceeded | invalid_recipient&lt;/span&gt;
  &lt;span class="s"&gt;WHEN&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user wants to send or compose an email&lt;/span&gt;
  &lt;span class="s"&gt;TAGS&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gmail, email, communication&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;~55 tokens. 73.6% reduction.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And notice what was &lt;em&gt;added&lt;/em&gt;, not just removed:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;MCP JSON&lt;/th&gt;
&lt;th&gt;TTC&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ERR — failure contract&lt;/td&gt;
&lt;td&gt;❌ absent&lt;/td&gt;
&lt;td&gt;✅ explicit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WHEN — selection trigger&lt;/td&gt;
&lt;td&gt;❌ buried&lt;/td&gt;
&lt;td&gt;✅ explicit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TAGS — retrieval taxonomy&lt;/td&gt;
&lt;td&gt;❌ absent&lt;/td&gt;
&lt;td&gt;✅ explicit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  It's not compression. It's reallocation.
&lt;/h2&gt;

&lt;p&gt;This is the most important point in the spec:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TTC does not reduce tokens by removing semantic content. It reduces syntactic and documentary overhead from JSON Schema — which serves human readability, not LLM reasoning — and reinvests part of those savings into explicit tool-selection semantics.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The actual math:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP JSON Schema:         ~195 tokens per tool
TTC without new fields:   ~35 tokens
TTC with all fields:      ~65 tokens

The 30-token "reinvestment" buys:
  ERR  → failure contract (absent from MCP)
  WHEN → selection trigger (absent from MCP)
  TAGS → retrieval taxonomy (absent from MCP)

Result: 195 → 65 tokens. -66.6%.
But those 65 tokens carry higher reasoning signal
than the original 195.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is &lt;strong&gt;net reasoning-signal gain&lt;/strong&gt; — not information gain in the classical sense. A critic might say you removed content (parameter descriptions, JSON Schema constraints). Correct. Content that serves human documentation, not LLM inference.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real benchmark — 10 measured tools
&lt;/h2&gt;

&lt;p&gt;Measured with BPE tokenizer (cl100k_base) on 10 real MCP tool definitions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;JSON Schema&lt;/th&gt;
&lt;th&gt;TTC&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gmail_send_email&lt;/td&gt;
&lt;td&gt;208&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;73.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;calendar_create_event&lt;/td&gt;
&lt;td&gt;262&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;70.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;github_create_issue&lt;/td&gt;
&lt;td&gt;269&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;68.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;jira_create_ticket&lt;/td&gt;
&lt;td&gt;254&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;69.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;slack_send_message&lt;/td&gt;
&lt;td&gt;206&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;66.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total (10 tools)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,948&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;650&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;66.6%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Projections for larger catalogs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Catalog size&lt;/th&gt;
&lt;th&gt;JSON Schema&lt;/th&gt;
&lt;th&gt;TTC&lt;/th&gt;
&lt;th&gt;Absolute saving&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;20 tools&lt;/td&gt;
&lt;td&gt;~3,896&lt;/td&gt;
&lt;td&gt;~1,300&lt;/td&gt;
&lt;td&gt;~2,596 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50 tools&lt;/td&gt;
&lt;td&gt;~9,740&lt;/td&gt;
&lt;td&gt;~3,250&lt;/td&gt;
&lt;td&gt;~6,490 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100 tools&lt;/td&gt;
&lt;td&gt;~19,480&lt;/td&gt;
&lt;td&gt;~6,500&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~12,980 tokens&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The absolute saving grows linearly. The larger the catalog, the higher the ROI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Normative WHEN vocabulary
&lt;/h2&gt;

&lt;p&gt;A natural language field without a standard creates another problem: two independent MCP server authors write incompatible WHEN conditions, degrading selection accuracy in large catalogs.&lt;/p&gt;

&lt;p&gt;TTC v1.0 solves this with a normative vocabulary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;WHEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user [wants|requests|asks|needs|intends] to [action] [object]&lt;/span&gt;

&lt;span class="na"&gt;Conformant examples&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;WHEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user wants to send an email message&lt;/span&gt;
  &lt;span class="na"&gt;WHEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user requests to list files in Google Drive&lt;/span&gt;
  &lt;span class="na"&gt;WHEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user needs to create a calendar event&lt;/span&gt;

&lt;span class="na"&gt;Non-conformant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;WHEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;send email          ← missing intent verb&lt;/span&gt;
  &lt;span class="na"&gt;WHEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;user email          ← missing action verb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Accuracy simulation (TF-IDF cosine similarity, 12 tools, 36 queries):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Condition&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP free-form description&lt;/td&gt;
&lt;td&gt;63.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTC WHEN controlled vocabulary&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;72.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delta&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+8.3 pp&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Caveat: TF-IDF simulation, not a real LLM benchmark. Directional evidence.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Where it works best
&lt;/h2&gt;

&lt;p&gt;✅ &lt;strong&gt;Large catalogs&lt;/strong&gt; (20+ tools) — where absolute savings justify migration&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Local and smaller models&lt;/strong&gt; — Qwen 7B, Llama 3, Mistral — no cache, narrow windows&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Multi-agent pipelines&lt;/strong&gt; — overhead compounds with every context handoff&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;RAG over tools&lt;/strong&gt; — compact TTC is ideal for vector DB indexing and subset injection  &lt;/p&gt;

&lt;p&gt;❌ Small catalogs with large LLM and wide context — marginal gain&lt;br&gt;&lt;br&gt;
❌ Replacing JSON Schema in API execution contracts — not the use case  &lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📄 &lt;strong&gt;Full spec (Zenodo):&lt;/strong&gt; &lt;a href="https://doi.org/10.5281/zenodo.19869007" rel="noopener noreferrer"&gt;https://doi.org/10.5281/zenodo.19869007&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/RudsonCarvalho/terse-format/tree/main/extensions/ttc" rel="noopener noreferrer"&gt;https://github.com/RudsonCarvalho/terse-format/tree/main/extensions/ttc&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Landing page:&lt;/strong&gt; &lt;a href="https://rudsoncarvalho.github.io/terse-format/" rel="noopener noreferrer"&gt;https://rudsoncarvalho.github.io/terse-format/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;TERSE Format (parent spec):&lt;/strong&gt; &lt;a href="https://doi.org/10.5281/zenodo.19058364" rel="noopener noreferrer"&gt;https://doi.org/10.5281/zenodo.19058364&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;If your agent has 50 tools installed and you haven't thought about catalog attention cost yet — now is a good time.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;ai&lt;/code&gt; &lt;code&gt;agents&lt;/code&gt; &lt;code&gt;mcp&lt;/code&gt; &lt;code&gt;llm&lt;/code&gt; &lt;code&gt;tooling&lt;/code&gt; &lt;code&gt;performance&lt;/code&gt; &lt;code&gt;opensource&lt;/code&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>mcp</category>
      <category>llm</category>
    </item>
    <item>
      <title>Seu agente de IA está desperdiçando 13.000 tokens antes de dizer "oi"</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Wed, 29 Apr 2026 01:22:14 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/seu-agente-de-ia-esta-desperdicando-13000-tokens-antes-de-dizer-oi-3pfg</link>
      <guid>https://dev.to/rudsoncarvalho/seu-agente-de-ia-esta-desperdicando-13000-tokens-antes-de-dizer-oi-3pfg</guid>
      <description>&lt;p&gt;E você provavelmente nem sabe disso.&lt;/p&gt;




&lt;p&gt;Se você tem um agente com 50 tools MCP instaladas, aqui está o que acontece antes de qualquer mensagem do usuário ser processada:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gmail_send_email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sends an email message via the Gmail API to one or more 
    recipients. Use this tool when the user explicitly requests to send, 
    compose and send, or deliver an email message to someone."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input_schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The recipient email address or comma-separated list"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The subject line of the email"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The body content of the email in plain text or HTML"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Isso é &lt;strong&gt;~195 tokens&lt;/strong&gt;. Por ferramenta. Antes de qualquer coisa.&lt;/p&gt;

&lt;p&gt;50 tools × 195 tokens = &lt;strong&gt;9.750 tokens de overhead puro&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;E isso é só o catálogo. Ainda não chegou no contexto do usuário, na memória da conversa, nos documentos, em nada.&lt;/p&gt;




&lt;h2&gt;
  
  
  "Mas tem prompt caching, não?"
&lt;/h2&gt;

&lt;p&gt;Sim. E reduz o custo financeiro para ~10% do valor original. &lt;/p&gt;

&lt;p&gt;Mas caching &lt;strong&gt;não reduz o custo de atenção&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Esses tokens continuam ocupando a janela de contexto. O modelo ainda processa tudo na atenção a cada request. E se você usa retrieval dinâmico de tools — selecionando ferramentas diferentes por request — o cache quebra em cada seleção diferente.&lt;/p&gt;

&lt;p&gt;A conta não some. Ela só fica mais barata.&lt;/p&gt;




&lt;h2&gt;
  
  
  O problema real que ninguém fala
&lt;/h2&gt;

&lt;p&gt;O MCP JSON Schema foi projetado como contrato de execução de ferramenta. Não como contrato semântico de seleção.&lt;/p&gt;

&lt;p&gt;Resultado: informação crítica para o LLM raciocinar está ausente ou enterrada em texto livre:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sem contrato de erro&lt;/strong&gt; — o LLM não sabe o que fazer quando &lt;code&gt;auth_failed&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sem trigger explícito&lt;/strong&gt; — tem que inferir "quando usar essa tool" de uma description de parágrafo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sem taxonomia de retrieval&lt;/strong&gt; — não tem como agrupar ou filtrar tools por domínio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ou seja: verboso E semanticamente incompleto. O pior dos dois mundos.&lt;/p&gt;




&lt;h2&gt;
  
  
  TTC — TERSE Tool Catalog
&lt;/h2&gt;

&lt;p&gt;Passei as últimas semanas resolvendo esse problema. O resultado é uma extensão do &lt;a href="https://github.com/RudsonCarvalho/terse-format" rel="noopener noreferrer"&gt;TERSE Format&lt;/a&gt; chamada &lt;strong&gt;TTC — TERSE Tool Catalog&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A mesma ferramenta acima em TTC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TOOL gmail_send_email
  PURPOSE: send email via Gmail
  IN: to:string, subject:string, body:string, cc:string?
  OUT: message_id:string
  ERR: auth_failed | quota_exceeded | invalid_recipient
  WHEN: user wants to send or compose an email
  TAGS: gmail, email, communication
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;~55 tokens. Redução de 73.6%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;E repara no que foi adicionado, não só no que foi removido:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Campo&lt;/th&gt;
&lt;th&gt;MCP JSON&lt;/th&gt;
&lt;th&gt;TTC&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ERR — contrato de falha&lt;/td&gt;
&lt;td&gt;❌ ausente&lt;/td&gt;
&lt;td&gt;✅ explícito&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WHEN — trigger de seleção&lt;/td&gt;
&lt;td&gt;❌ enterrado&lt;/td&gt;
&lt;td&gt;✅ explícito&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TAGS — taxonomia de retrieval&lt;/td&gt;
&lt;td&gt;❌ ausente&lt;/td&gt;
&lt;td&gt;✅ explícito&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Não é compressão. É realocação.
&lt;/h2&gt;

&lt;p&gt;Esse é o ponto mais importante do spec, e vale deixar claro:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TTC não economiza tokens removendo conteúdo semântico. Ele elimina overhead sintático e documental do JSON Schema — que serve legibilidade humana, não raciocínio de LLM — e reinveste parte dessa economia em semântica explícita de seleção de ferramentas.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A conta real:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MCP JSON Schema:        ~195 tokens por tool
TTC sem campos novos:    ~35 tokens
TTC com todos os campos: ~65 tokens

Os 30 tokens de "reinvestimento" compram:
  ERR  → contrato de falha (ausente no MCP)
  WHEN → trigger semântico (ausente no MCP)  
  TAGS → taxonomia de retrieval (ausente no MCP)

Resultado: 195 → 65 tokens. -66.6%.
Mas os 65 tokens carregam mais sinal de raciocínio
do que os 195 originais.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;É &lt;strong&gt;ganho líquido de sinal de raciocínio&lt;/strong&gt;, não ganho de informação no sentido clássico.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark real — 10 tools medidas
&lt;/h2&gt;

&lt;p&gt;Medi com tokenizador BPE (cl100k_base) em 10 definições reais de tools MCP:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;JSON Schema&lt;/th&gt;
&lt;th&gt;TTC&lt;/th&gt;
&lt;th&gt;Redução&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gmail_send_email&lt;/td&gt;
&lt;td&gt;208&lt;/td&gt;
&lt;td&gt;55&lt;/td&gt;
&lt;td&gt;73.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;calendar_create_event&lt;/td&gt;
&lt;td&gt;262&lt;/td&gt;
&lt;td&gt;78&lt;/td&gt;
&lt;td&gt;70.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;github_create_issue&lt;/td&gt;
&lt;td&gt;269&lt;/td&gt;
&lt;td&gt;84&lt;/td&gt;
&lt;td&gt;68.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;jira_create_ticket&lt;/td&gt;
&lt;td&gt;254&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;69.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;slack_send_message&lt;/td&gt;
&lt;td&gt;206&lt;/td&gt;
&lt;td&gt;69&lt;/td&gt;
&lt;td&gt;66.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total (10 tools)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.948&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;650&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;66.6%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Projeção para catálogos maiores:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Catálogo&lt;/th&gt;
&lt;th&gt;JSON Schema&lt;/th&gt;
&lt;th&gt;TTC&lt;/th&gt;
&lt;th&gt;Economia&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;20 tools&lt;/td&gt;
&lt;td&gt;~3.896&lt;/td&gt;
&lt;td&gt;~1.300&lt;/td&gt;
&lt;td&gt;~2.596 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50 tools&lt;/td&gt;
&lt;td&gt;~9.740&lt;/td&gt;
&lt;td&gt;~3.250&lt;/td&gt;
&lt;td&gt;~6.490 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100 tools&lt;/td&gt;
&lt;td&gt;~19.480&lt;/td&gt;
&lt;td&gt;~6.500&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~12.980 tokens&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A economia absoluta cresce linearmente. Quanto maior o catálogo, maior o ROI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Vocabulário normativo para WHEN
&lt;/h2&gt;

&lt;p&gt;Um campo de linguagem natural sem padrão cria outro problema: dois autores de servidores MCP diferentes escrevem &lt;code&gt;WHEN&lt;/code&gt; de formas incompatíveis, degradando a acurácia de seleção em catálogos grandes.&lt;/p&gt;

&lt;p&gt;O TTC v1.0 resolve isso com vocabulário normativo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WHEN: user [wants|requests|asks|needs|intends] to [ação] [objeto]

Exemplos conformantes:
  WHEN: user wants to send an email message
  WHEN: user requests to list files in Google Drive
  WHEN: user needs to create a calendar event

Não-conformante:
  WHEN: send email          ← falta verbo de intenção
  WHEN: user email          ← falta verbo de ação
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simulação de acurácia (TF-IDF cosine similarity, 12 tools, 36 queries):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Condição&lt;/th&gt;
&lt;th&gt;Acurácia&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MCP description livre&lt;/td&gt;
&lt;td&gt;63.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TTC WHEN vocabulário controlado&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;72.2%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Delta&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+8.3 pp&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Caveat: simulação TF-IDF, não benchmark real com LLM. Evidência direcional.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Onde funciona melhor
&lt;/h2&gt;

&lt;p&gt;✅ &lt;strong&gt;Catálogos grandes&lt;/strong&gt; (20+ tools) — onde a economia absoluta justifica a migração&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Modelos locais e menores&lt;/strong&gt; — Qwen 7B, Llama 3, Mistral — sem cache, janelas estreitas&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Pipelines multi-agente&lt;/strong&gt; — o overhead se acumula a cada passagem de contexto&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;RAG de tools&lt;/strong&gt; — TTC compacto é ideal para indexar em vetor DB e injetar subsets  &lt;/p&gt;

&lt;p&gt;❌ Catálogos pequenos com LLM grande e contexto amplo — ganho marginal&lt;br&gt;&lt;br&gt;
❌ Substituir JSON Schema em contratos de API — não é o propósito  &lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;📄 &lt;strong&gt;Spec completo (Zenodo):&lt;/strong&gt; &lt;a href="https://doi.org/10.5281/zenodo.19869007" rel="noopener noreferrer"&gt;https://doi.org/10.5281/zenodo.19869007&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/RudsonCarvalho/terse-format/tree/main/extensions/ttc" rel="noopener noreferrer"&gt;https://github.com/RudsonCarvalho/terse-format/tree/main/extensions/ttc&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Landing page:&lt;/strong&gt; &lt;a href="https://rudsoncarvalho.github.io/terse-format/" rel="noopener noreferrer"&gt;https://rudsoncarvalho.github.io/terse-format/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;TERSE Format (parent spec):&lt;/strong&gt; &lt;a href="https://doi.org/10.5281/zenodo.19058364" rel="noopener noreferrer"&gt;https://doi.org/10.5281/zenodo.19058364&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Se o seu agente tem 50 tools instaladas e você ainda não pensou no custo de atenção do catálogo, esse é um bom momento para repensar.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; &lt;code&gt;ai&lt;/code&gt; &lt;code&gt;agents&lt;/code&gt; &lt;code&gt;mcp&lt;/code&gt; &lt;code&gt;llm&lt;/code&gt; &lt;code&gt;tooling&lt;/code&gt; &lt;code&gt;performance&lt;/code&gt; &lt;code&gt;opensource&lt;/code&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>mcp</category>
      <category>llm</category>
    </item>
    <item>
      <title>COA-MAS v2: A Meta-Framework for Cross-Domain Multi-Agent Governance</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Wed, 01 Apr 2026 23:29:15 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/coa-mas-v2-a-meta-framework-for-cross-domain-multi-agent-governance-4mji</link>
      <guid>https://dev.to/rudsoncarvalho/coa-mas-v2-a-meta-framework-for-cross-domain-multi-agent-governance-4mji</guid>
      <description>&lt;p&gt;AI agents are crossing organizational boundaries. They call tools in partner domains, delegate tasks to external services, and operate in chains where no single actor sees the full picture.&lt;/p&gt;

&lt;p&gt;COA-MAS v1 solved the intra-domain governance problem — a four-layer architecture, the Action Claim contract, and the AASG enforcement boundary that ensures zero cognitive load at runtime. If you haven't read it, the paper is at &lt;a href="https://doi.org/10.5281/zenodo.19057202" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.19057202&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The cross-domain problem is different. And it took a full architectural pivot to solve it correctly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Silver Bullet Fallacy
&lt;/h2&gt;

&lt;p&gt;Early iterations of COA-MAS v2 tried to build a universal calibration mechanism — a way to translate risk scores between domains with different semantic spaces. After several rounds of debate and stress-testing, it became clear that this approach has the same flaw as trying to replace PIX, TED, wire transfers, and letters of credit with a single payment instrument.&lt;/p&gt;

&lt;p&gt;Each of those instruments exists because different transaction contexts require different guarantees. Resilience in distributed systems comes from routing to the right pattern based on context — not from finding the pattern that works everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Thesis
&lt;/h2&gt;

&lt;p&gt;COA-MAS v2 is a meta-framework, not a protocol. It standardizes one thing: the &lt;strong&gt;Action Intent&lt;/strong&gt; — a universal artifact that any federated governance pattern can consume. The choice of execution topology is delegated to a &lt;strong&gt;Pattern Selection Protocol&lt;/strong&gt; negotiated during trust peering.&lt;/p&gt;

&lt;p&gt;The Action Intent is the common currency. The federation mode is the exchange mechanism.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Action Intent
&lt;/h2&gt;

&lt;p&gt;The Action Intent is the "passport" of the COA-MAS federation. It is a standardized, cryptographically signed declaration of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Who&lt;/strong&gt; is acting — SPIFFE identity, delegation chain, GOV-RISK attestation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What&lt;/strong&gt; they intend to do — tool URI, operation type, resource scope&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What effect&lt;/strong&gt; they declare — reversibility, estimated scope, data sensitivity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cryptographic binding&lt;/strong&gt; — ephemeral DPoP public key for proof-of-possession&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Domain A's internal policy, prompts, and risk weights are never transmitted. Only the declared intent, authenticated by Domain A's governance layer.&lt;/p&gt;

&lt;p&gt;If Domain A lies — declares &lt;code&gt;bounded_set&lt;/code&gt; but attempts a full-table deletion — the signed intent becomes irrefutable forensic evidence. The problem moves from governance mathematics to organizational accountability, backed by cryptographic proof.&lt;/p&gt;

&lt;p&gt;The canonical JSON Schema is published at &lt;a href="https://doi.org/10.5281/zenodo.19376419" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.19376419&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Federation Modes
&lt;/h2&gt;

&lt;p&gt;The Pattern Selection Protocol routes each cross-domain interaction to the appropriate mode based on trust distance, acceptable latency, and cognitive burden tolerance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mode 0 — Intra-Domain (COA-MAS V1)&lt;/strong&gt;&lt;br&gt;
Same domain. Deterministic, microsecond latency, zero external dependencies. The foundation everything else builds on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mode 1 — Sovereign Visa&lt;/strong&gt;&lt;br&gt;
Domain A submits the Action Intent to Domain B's authorization endpoint. Domain B's GOV-RISK evaluates it using its own Executable Culture — full sovereignty, no calibration across semantic spaces. GOV-RISK-B issues a standard COA-MAS v1 Action Claim with DPoP binding. AASG-B validates a locally-trusted signature at runtime. Zero cognitive load.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mode 2 — Ambassador&lt;/strong&gt;&lt;br&gt;
Domain B doesn't expose tools to foreign agents at all. It exposes an agent communication interface. Domain A's intent becomes the opening message of an A2A conversation. Domain B's Ambassador agent formulates its own plan, submits it to GOV-RISK-B via Mode 0, and executes locally. Maximum isolation. Non-deterministic latency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mode 3 — Clearinghouse&lt;/strong&gt;&lt;br&gt;
A neutral Domain C — a regulated hub both domains trust — evaluates the intent and issues a universally-accepted Action Claim. Appropriate for regulated industries (Open Finance, healthcare prior authorization). Opt-in only: it trades polycentric sovereignty for operational simplicity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Future Mode 4 — ZK-Policy&lt;/strong&gt;&lt;br&gt;
The CAGA-compliant target. Domain A generates a zero-knowledge proof of correct policy execution without revealing internal data. Domain B verifies mathematically. Not implementable in production today due to ZKML hardware constraints — but the meta-framework is explicitly designed to incorporate it as Mode 4 when viable, without requiring changes to the Action Intent schema or SPIFFE infrastructure.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Pattern Selection Protocol
&lt;/h2&gt;

&lt;p&gt;Domains don't negotiate a single mode — they negotiate a Federation Policy that maps operation families and resource classes to modes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mode_by_operation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"read"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ttl_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"single_use"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"delete"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"ttl_seconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"single_use"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"configure"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mode_by_resource_class"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"pii"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"regulated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same pair of domains can use Mode 1 for routine reads and Mode 2 for infrastructure operations — without renegotiating the peering relationship.&lt;/p&gt;

&lt;h2&gt;
  
  
  Positioning Against CAGA
&lt;/h2&gt;

&lt;p&gt;Meyman [SSRN 6299461] formalizes the Cross-Agent Governance Alignment (CAGA) problem and identifies zero-knowledge proofs as the theoretically correct solution. COA-MAS v2 is the operationally deployable answer while ZKML hardware matures — trading full policy confidentiality for sub-millisecond runtime enforcement, zero integration cost for Domain B, and compatibility with stochastic LLM-based GOV-RISKs.&lt;/p&gt;

&lt;p&gt;The relationship is complementary. CAGA defines what a correct solution must prove. COA-MAS v2 defines how production systems navigate the space between the theoretically ideal and the operationally deployable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Published
&lt;/h2&gt;

&lt;p&gt;📄 &lt;strong&gt;Working Paper v0.3&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://doi.org/10.5281/zenodo.19376738" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.19376738&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenodo.org/records/19376739" rel="noopener noreferrer"&gt;zenodo.org/records/19376739&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔧 &lt;strong&gt;Action Intent Schema v1.0.0&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://doi.org/10.5281/zenodo.19376419" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.19376419&lt;/a&gt;&lt;br&gt;
&lt;a href="https://zenodo.org/records/19376420" rel="noopener noreferrer"&gt;zenodo.org/records/19376420&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📚 &lt;strong&gt;COA-MAS v1 (foundation)&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://doi.org/10.5281/zenodo.19057202" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.19057202&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you're building cross-domain multi-agent systems and the governance layer is an afterthought, the meta-framework and the schema are open access. Feedback, critique, and stress-testing welcome.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>architecture</category>
      <category>multiagent</category>
    </item>
    <item>
      <title>AI Agents Can Delete Your Production Database. Here's the Governance Framework That Stops Them.</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Tue, 31 Mar 2026 12:51:25 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/ai-agents-can-delete-your-production-database-heres-the-governance-framework-that-stops-them-ccj</link>
      <guid>https://dev.to/rudsoncarvalho/ai-agents-can-delete-your-production-database-heres-the-governance-framework-that-stops-them-ccj</guid>
      <description>&lt;p&gt;&lt;em&gt;This article presents COA-MAS — a governance framework for autonomous agents grounded in organizational theory, institutional design, and normative multi-agent systems research. The full paper is published on Zenodo: &lt;a href="https://zenodo.org/records/19057202" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.19057202&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem No One Is Talking About
&lt;/h2&gt;

&lt;p&gt;Something unusual happened in early 2026. The IETF published a formal Internet-Draft on AI agent authentication and authorization. Eight major technology companies released version 1.0 of the Agent-to-Agent Protocol. And a widely-read post demonstrated why the prevailing credential model for AI agents was structurally broken.&lt;/p&gt;

&lt;p&gt;The convergence wasn't coincidental. It was the signal that a structural problem — long present in early agentic deployments — had reached the threshold of production consequence.&lt;/p&gt;

&lt;p&gt;We've built agents that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delete production databases&lt;/li&gt;
&lt;li&gt;Execute financial transactions&lt;/li&gt;
&lt;li&gt;Modify business logic&lt;/li&gt;
&lt;li&gt;Spawn other agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And we gave them &lt;strong&gt;API keys&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An API key authorizes &lt;em&gt;access&lt;/em&gt;. It does not authorize a &lt;em&gt;specific action with a specific impact in a specific context&lt;/em&gt;. That distinction is the entire problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Structural Failure Mode: Distributed Cognitive Chaos
&lt;/h2&gt;

&lt;p&gt;I call this failure mode &lt;strong&gt;Distributed Cognitive Chaos (DCC)&lt;/strong&gt;: the structural consequence of deploying agents without formal authority hierarchies, authorization contracts, or enforcement boundaries.&lt;/p&gt;

&lt;p&gt;DCC has three symptoms:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Action hallucination&lt;/strong&gt; — an agent executes an action it was never authorized to perform, because nothing formally defined "authorized"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mandate drift&lt;/strong&gt; — through a chain of agent-to-agent delegations, the original human intent gets distorted beyond recognition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accountability collapse&lt;/strong&gt; — when something goes wrong, there is no tamper-evident record connecting the action to the authority that (supposedly) permitted it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is not a new problem. It's the oldest problem in organizational theory: how do you coordinate partially autonomous actors toward collective goals while preventing any individual actor from harming the collective?&lt;/p&gt;

&lt;p&gt;Herbert Simon identified it in 1947. Elinor Ostrom solved it in 1990. We just haven't applied those solutions to AI agents yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  COA-MAS: A Governance Framework Grounded in Theory
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;COA-MAS&lt;/strong&gt; (&lt;em&gt;Cognitive Organization Architecture for Multi-Agent Systems&lt;/em&gt;) is my answer. It synthesizes four intellectual traditions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simon's bounded rationality&lt;/strong&gt; → why agents need external governance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ostrom's institutional design principles&lt;/strong&gt; → how to structure governance for durability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normative multi-agent systems research&lt;/strong&gt; → how to formalize governance as computable norms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sociotechnical systems theory&lt;/strong&gt; → how to make social norms technically enforceable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The framework has three components. Each answers a different question.&lt;/p&gt;




&lt;h2&gt;
  
  
  Component 1: The Four-Layer Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Question: Who is in charge?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think of it as a corporate structure for AI agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────┐
│ LAYER 4 — STRATEGIC ORCHESTRATION                  │
│ Receives human objectives · decomposes into tasks  │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 3 — COGNITIVE GOVERNANCE                     │
│ Evaluates proposed actions · issues authorization  │
│ documents · maintains audit ledger                 │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 2 — FUNCTIONAL SPECIALIZATION                │
│ Domain agents · execute tasks within their         │
│ cognitive authority boundary                       │
└─────────────────────────────────────────────┘
                        ↕
┌─────────────────────────────────────────────┐
│ LAYER 1 — EXECUTABLE CULTURE (Constitutional)      │
│ Versioned YAML policies · weights · thresholds     │
│ Human-authored before runtime. Immutable during.   │
└─────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical insight, drawn from both Simon and Ostrom, is the &lt;strong&gt;separation between those who propose actions and those who authorize them&lt;/strong&gt;. An agent cannot authorize its own actions. This mirrors the principle of checks and balances in constitutional systems: the body that proposes is not the body that authorizes is not the body that records.&lt;/p&gt;




&lt;h2&gt;
  
  
  Component 2: The Action Claim
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Question: What exactly is the agent authorized to do?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;Action Claim&lt;/strong&gt; is a formal authorization document that agents must present before executing any real-world action. It's analogous to a building permit — not just "you're allowed to build," but: the location, the dimensions, the materials, the timeline, the inspector, and the version of the building code that governed the approval.&lt;/p&gt;

&lt;p&gt;The Action Claim has three parts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;DECLARED&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;FIELDS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;filled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;agent&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"proposed_transition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DELETE expired sessions older than 90 days"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"originating_goal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scheduled maintenance task #4421"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"delegation_chain"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"human:ops-team"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent:orchestrator-01"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent:db-cleaner"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"estimated_impact"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"destructivity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"data_exposure"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"resource_consumption"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"privilege_escalation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"logic_integrity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"recursive_autonomy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;DERIVED&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;FIELDS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;filled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;GOV-RISK&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(Layer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"justification_gap"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.08&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"APPROVE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"governance_signature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:a3f9..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"policy_digest"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:1b2c..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;AUDIT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;FIELDS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;filled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;infrastructure&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ac_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ac-2026-03-31-00421"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"state"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AUTHORIZED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"committed_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-03-31T14:22:01Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tripartite structure reflects Ostrom's principle of separating operational decisions from the collective-choice rules that govern them. The agent operates at the operational level; Layer 3 applies institutional norms; the audit trail creates an immutable record connecting every decision to the rules that governed it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Component 3: The AASG (Autonomous Agent Security Gateway)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Question: How is authorization enforced?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Think of the AASG as a customs inspector at the boundary between the agents' cognitive world and the real world of executing tools.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent Cognition (A2A) ────────────────► Real World (MCP)
                              │
                         [ AASG ]
                              │
                    Checks exactly 3 things:
                    1. Is the Action Claim valid and signed?
                    2. Is the agent identity correct?
                    3. Does the actual tool call match 
                       what was declared?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AASG is a &lt;strong&gt;physically separate process&lt;/strong&gt; — not a library in the agent, not a plugin in the MCP server. This separation is the architectural expression of Simon's insight: governance constraints must be external to the decision-maker they constrain. An agent cannot reliably police itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens when an agent lies?
&lt;/h3&gt;

&lt;p&gt;Consider an agent authorized to clean up expired sessions. It gets an AUTHORIZED Action Claim with &lt;code&gt;destructivity: 0.25&lt;/code&gt;. Then it tries to execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;  &lt;span class="c1"&gt;-- no WHERE clause&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AASG projects the actual tool call onto the governance space:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Declared: destructivity = 0.25 (bounded deletion)
Actual:   destructivity = 1.00 (full table wipe)

Congruence delta: 0.75 &amp;gt;&amp;gt; threshold (0.20)
→ ERR_AASG_003: CONGRUENCE_VIOLATION
→ Action never reaches the database
→ Merkle Ledger: INTERCEPTED entry recorded
→ GOV-AUDIT: SCOPE_SUBDECLARATION_DETECTED alert
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The action is blocked before any damage occurs — not discovered in a log review afterward.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Justification Gap: The First Computable Proportionality Metric
&lt;/h2&gt;

&lt;p&gt;The core innovation in COA-MAS is the &lt;strong&gt;Justification Gap&lt;/strong&gt; — a metric that answers: &lt;em&gt;is this specific action proportional to the mandate that authorized it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The governance space has six dimensions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;What It Measures&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;d₁ Destructivity&lt;/td&gt;
&lt;td&gt;Irreversible state change (DELETE, DROP, PURGE)&lt;/td&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;d₂ Data Exposure&lt;/td&gt;
&lt;td&gt;Cross-zone data movement, PII handling&lt;/td&gt;
&lt;td&gt;0.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;d₃ Resource Consumption&lt;/td&gt;
&lt;td&gt;Financial/computational impact&lt;/td&gt;
&lt;td&gt;0.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;d₄ Privilege Escalation&lt;/td&gt;
&lt;td&gt;IAM creation, credential access&lt;/td&gt;
&lt;td&gt;0.20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;d₅ Logic Integrity&lt;/td&gt;
&lt;td&gt;Business logic modification&lt;/td&gt;
&lt;td&gt;0.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;d₆ Recursive Autonomy&lt;/td&gt;
&lt;td&gt;Sub-agent spawning, delegation depth&lt;/td&gt;
&lt;td&gt;0.05&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gⱼ(V_A, V_M) = Σ wᵢ · φ(aᵢ, mᵢ)

where φ(aᵢ, mᵢ) =
  0               if aᵢ ≤ mᵢ   (within mandate)
  ((aᵢ - mᵢ)/mᵢ)² if aᵢ &amp;gt; mᵢ  (quadratic penalty)
  +∞              if aᵢ &amp;gt; 0 ∧ mᵢ = 0  (absolute prohibition)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The decision:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gⱼ ≤ 0.15  →  APPROVE
0.15 &amp;lt; Gⱼ ≤ 0.40  →  ESCALATE (human review with full dossier)
Gⱼ &amp;gt; 0.40 or Gⱼ = +∞  →  BLOCK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The quadratic penalty means small excesses may be tolerable, but large excesses signal a qualitatively different action from what was authorized. The ESCALATE tier implements "meaningful human oversight" as required by EU AI Act Article 14 — with actual quantitative evidence, not just a kill switch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mandate Laundering: The Attack You Didn't Know Existed
&lt;/h2&gt;

&lt;p&gt;Here's the attack that no existing security framework addresses: &lt;strong&gt;mandate laundering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In a delegation chain &lt;code&gt;Human → Agent₁ → Agent₂ → ... → Agentₖ&lt;/code&gt;, each intermediate agent can marginally expand the mandate it passes on. Each local expansion looks proportionate. But the cumulative expansion is not.&lt;/p&gt;

&lt;p&gt;COA-MAS anchors the Justification Gap to the &lt;strong&gt;root human mandate&lt;/strong&gt;, regardless of intermediate expansions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;G_chain(Aₖ) = Gⱼ(V_{Aₖ}, V_{M₀})  ← root mandate, always

G_total = 0.30 · G_local + 0.70 · G_chain
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Non-Improvement Theorem&lt;/strong&gt;: For any permissive subdelegation, &lt;code&gt;G_chain&lt;/code&gt; is monotone non-decreasing. You cannot launder your way out of the original constraint.&lt;/p&gt;




&lt;h2&gt;
  
  
  How COA-MAS Fits the Standards Ecosystem
&lt;/h2&gt;

&lt;p&gt;COA-MAS doesn't compete with existing standards — it implements what they defer:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Initiative&lt;/th&gt;
&lt;th&gt;What It Solves&lt;/th&gt;
&lt;th&gt;What It Defers&lt;/th&gt;
&lt;th&gt;COA-MAS Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;IETF draft-klrc-aiagent-auth&lt;/td&gt;
&lt;td&gt;Identity, authentication, authorization (SPIFFE, OAuth 2.0)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Policy model explicitly out of scope&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Implements the policy model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A2A Protocol v1.0&lt;/td&gt;
&lt;td&gt;Agent coordination standard&lt;/td&gt;
&lt;td&gt;Authorization at execution boundary&lt;/td&gt;
&lt;td&gt;AASG is the enforcement point A2A lacks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP v1.0&lt;/td&gt;
&lt;td&gt;Agent-to-tool communication&lt;/td&gt;
&lt;td&gt;No semantic authorization layer&lt;/td&gt;
&lt;td&gt;AASG is the authorization gate MCP doesn't have&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The IETF draft's Section 12 explicitly states: "the policy model and document format are out of scope." That is precisely where COA-MAS contributes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Failure Mode Transition
&lt;/h2&gt;

&lt;p&gt;The most consequential architectural property of COA-MAS is the failure mode it introduces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Traditional agentic systems&lt;/strong&gt;: fail semantically and silently. The agent reinterprets a guideline, slightly expands a scope, finds an unanticipated interpretation. Detectable only after damage, through log analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;COA-MAS&lt;/strong&gt;: introduces the explicit &lt;code&gt;CONGRUENCE_VIOLATION&lt;/code&gt; failure mode. When an agent attempts an action that violates its declared impact vector, the AASG returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A specific error code&lt;/li&gt;
&lt;li&gt;The dimension violated&lt;/li&gt;
&lt;li&gt;The quantitative delta&lt;/li&gt;
&lt;li&gt;A Merkle Ledger entry with full context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the organizational equivalent of a building inspector catching a code violation before the foundation is poured — not after the building collapses.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Published
&lt;/h2&gt;

&lt;p&gt;The full paper, &lt;strong&gt;COA-MAS: A Governance Framework for Autonomous Agents in Production Environments&lt;/strong&gt;, is available on Zenodo:&lt;/p&gt;

&lt;p&gt;📄 &lt;strong&gt;&lt;a href="https://zenodo.org/records/19057202" rel="noopener noreferrer"&gt;zenodo.org/records/19057202&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
🔑 &lt;strong&gt;DOI: &lt;a href="https://doi.org/10.5281/zenodo.19057202" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.19057202&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
📜 License: CC BY 4.0&lt;/p&gt;

&lt;p&gt;The paper covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full formal specification of the Action Claim ontology&lt;/li&gt;
&lt;li&gt;Complete mathematical treatment of the Justification Gap&lt;/li&gt;
&lt;li&gt;Attack pattern neutralization (scope subdeclaration, decomposition attack, mandate laundering)&lt;/li&gt;
&lt;li&gt;EU AI Act regulatory alignment (Articles 9, 11, 13, 14)&lt;/li&gt;
&lt;li&gt;Positioning against IETF, A2A, MCP, and AIMS model&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The governance of autonomous agents is not a new problem. Simon identified its theoretical roots in 1947. Ostrom identified the institutional design solutions in 1990. Normative MAS researchers formalized the computational analogues through the 1990s and 2000s.&lt;/p&gt;

&lt;p&gt;What's new in 2026 is the urgency.&lt;/p&gt;

&lt;p&gt;Agents that can delete production databases and execute financial transactions are being deployed without the governance infrastructure this body of knowledge prescribes.&lt;/p&gt;

&lt;p&gt;COA-MAS applies established principles to a new domain. The question is not whether governance is necessary — it's whether we build it before or after the first major incident.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you're building multi-agent systems in production, I'd be genuinely interested in feedback on whether these primitives map to the problems you're encountering. The paper is open access — feel free to cite, critique, or extend.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;— Rudson Kiyoshi Souza Carvalho, Independent Researcher&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://doi.org/10.5281/zenodo.19057202" rel="noopener noreferrer"&gt;doi.org/10.5281/zenodo.19057202&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>architecture</category>
      <category>multiagent</category>
    </item>
    <item>
      <title>TERSE — A New Serialization Format Built for LLMs</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Tue, 31 Mar 2026 12:10:36 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/terse-a-new-serialization-format-built-for-llms-4n34</link>
      <guid>https://dev.to/rudsoncarvalho/terse-a-new-serialization-format-built-for-llms-4n34</guid>
      <description>&lt;p&gt;&lt;em&gt;JSON is the default. But defaults were built for a different world.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Every time you send structured data to a Large Language Model, you pay for it token by token. And if you're using JSON — which almost everyone is — you're paying for a lot of characters that carry no information.&lt;/p&gt;

&lt;p&gt;Take this simple payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"feature_a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"feature_b"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"verified"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Count the noise: braces, quotes around every key and string value, commas, colons with spaces. Now imagine this multiplied across thousands of API calls per day. That's real money.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;TERSE&lt;/strong&gt; to address this.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is TERSE?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;TERSE&lt;/strong&gt; (Token-Efficient Recursive Serialization Encoding) is a text-based data serialization format designed to represent the complete JSON data model with substantially fewer tokens — making it significantly more cost-efficient for use as input to Large Language Models.&lt;/p&gt;

&lt;p&gt;The same payload in TERSE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;user_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1001&lt;/span&gt;
&lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;active&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;feature_a feature_b&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;verified&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;T&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same information. ~47% fewer tokens.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it compares
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Token savings vs JSON&lt;/th&gt;
&lt;th&gt;Full JSON coverage?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YAML&lt;/td&gt;
&lt;td&gt;~20%&lt;/td&gt;
&lt;td&gt;✓ (verbose arrays)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TOON&lt;/td&gt;
&lt;td&gt;~40%&lt;/td&gt;
&lt;td&gt;✗ (flat data only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TERSE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~47%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;✓&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;YAML is a genuine improvement over JSON — it's more compact and covers the full data model. But it was designed for humans to write, not for LLMs to consume. Verbose arrays (&lt;code&gt;- item&lt;/code&gt; per line), full-word booleans (&lt;code&gt;true&lt;/code&gt;/&lt;code&gt;false&lt;/code&gt;), and a notoriously complex parser spec limit its token savings.&lt;/p&gt;

&lt;p&gt;TOON goes further on token reduction but falls apart with nested objects — it only works for flat, uniform tabular data. If your payload has any nesting, TOON can't represent it.&lt;/p&gt;

&lt;p&gt;TERSE was designed to close that gap: full JSON data model coverage, with token efficiency as the primary design constraint.&lt;/p&gt;




&lt;h2&gt;
  
  
  The five design principles
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Bare strings&lt;/strong&gt; — identifiers and common values require no quotation marks. &lt;code&gt;production&lt;/code&gt; stays &lt;code&gt;production&lt;/code&gt;, not &lt;code&gt;"production"&lt;/code&gt;. Quotes are reserved for strings that actually need them — those containing spaces, reserved characters, or special syntax.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Compact primitives&lt;/strong&gt; — &lt;code&gt;null&lt;/code&gt;, &lt;code&gt;true&lt;/code&gt;, and &lt;code&gt;false&lt;/code&gt; become single characters: &lt;code&gt;~&lt;/code&gt;, &lt;code&gt;T&lt;/code&gt;, &lt;code&gt;F&lt;/code&gt;. Three of the most common values in any payload, each reduced to one token.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Implicit delimiters&lt;/strong&gt; — spaces separate values inside objects and arrays. No trailing commas, no colons between array elements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Schema arrays&lt;/strong&gt; — the biggest token win for tabular data. Uniform arrays of objects declare their fields once, then list values positionally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;users&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;#[id name role active]&lt;/span&gt;
  &lt;span class="s"&gt;1 Alice admin T&lt;/span&gt;
  &lt;span class="s"&gt;2 Bruno editor T&lt;/span&gt;
  &lt;span class="s"&gt;3 Carla viewer F&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The equivalent JSON repeats &lt;code&gt;"id"&lt;/code&gt;, &lt;code&gt;"name"&lt;/code&gt;, &lt;code&gt;"role"&lt;/code&gt;, &lt;code&gt;"active"&lt;/code&gt; on every single row. For a 100-row dataset, that's 400 unnecessary key repetitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Recursive structure&lt;/strong&gt; — all constructs nest arbitrarily. Objects inside arrays inside schema arrays — all valid, all compact. No flat-only limitations.&lt;/p&gt;




&lt;h2&gt;
  
  
  A real example: nested order
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;JSON&lt;/strong&gt; (~180 tokens):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"orderId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ORD-001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Rafael Torres"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"r@email.com"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"sku"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"qty"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.99&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"sku"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"B3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"qty"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;24.50&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"paid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"notes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TERSE&lt;/strong&gt; (~95 tokens):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;orderId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ORD-001&lt;/span&gt;
&lt;span class="na"&gt;customer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rafael&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Torres"&lt;/span&gt; &lt;span class="nv"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;r@email.com&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
&lt;span class="na"&gt;items&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;#[sku qty price]&lt;/span&gt;
  &lt;span class="s"&gt;A1 2 &lt;/span&gt;&lt;span class="m"&gt;9.99&lt;/span&gt;
  &lt;span class="s"&gt;B3 1 &lt;/span&gt;&lt;span class="m"&gt;24.50&lt;/span&gt;
&lt;span class="na"&gt;paid&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;T&lt;/span&gt;
&lt;span class="na"&gt;notes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;~&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where TERSE separates itself from TOON and CSV — deeply nested structures work exactly as expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  You don't write TERSE by hand
&lt;/h2&gt;

&lt;p&gt;The workflow is identical to JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your data (object/dict)
      ↓
serialize()        ← terse-js or terse-py
      ↓
TERSE string       ← sent to the LLM
      ↓
parse()            ← if you need it back
      ↓
Your data again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Just like nobody writes &lt;code&gt;JSON.stringify()&lt;/code&gt; output by hand — you call the function. TERSE works the same way. The format is optimized for the one reader that actually matters: the LLM.&lt;/p&gt;




&lt;h2&gt;
  
  
  On design intent: why not compress further?
&lt;/h2&gt;

&lt;p&gt;TERSE could go deeper — automatic key abbreviation, binary type encoding, dictionary compression. We deliberately stopped short of that.&lt;/p&gt;

&lt;p&gt;The goal is a format that remains &lt;strong&gt;human-auditable&lt;/strong&gt;: you can open a &lt;code&gt;.terse&lt;/code&gt; file in any text editor and understand what you're looking at without tooling. In LLM pipelines, auditability is a safety property, not just a convenience. When an agent misbehaves, you need to inspect its inputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two questions that come up
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can I use TERSE for REST API communication between microservices?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can, but it's not the primary use case. REST APIs are consumed by many clients across different teams and languages — JSON's universal support is a real advantage there. TERSE shines where you control both ends: serializing data before sending it to an LLM, and parsing the response on the other side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I use TERSE for application configuration, like YAML?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes — the format supports everything YAML does for config files: nested objects, arrays, typed values, comments. Worth considering if your config is also consumed by an LLM as context.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's available today
&lt;/h2&gt;

&lt;p&gt;The project includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Formal specification&lt;/strong&gt; (v0.7) with ABNF grammar, conformance rules, and security considerations — published on Zenodo with DOI: &lt;a href="https://doi.org/10.5281/zenodo.19058364" rel="noopener noreferrer"&gt;10.5281/zenodo.19058364&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reference implementations&lt;/strong&gt; in TypeScript, Python, Java, and Go&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live playground&lt;/strong&gt; where you can paste JSON and see the TERSE output in real time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything is open source under MIT (implementations) and CC BY 4.0 (specification).&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🌐 &lt;strong&gt;Landing page + playground&lt;/strong&gt;: &lt;a href="https://rudsoncarvalho.github.io/terse-format" rel="noopener noreferrer"&gt;rudsoncarvalho.github.io/terse-format&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/RudsonCarvalho/terse-format" rel="noopener noreferrer"&gt;github.com/RudsonCarvalho/terse-format&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;Spec (Zenodo DOI)&lt;/strong&gt;: &lt;a href="https://doi.org/10.5281/zenodo.19058364" rel="noopener noreferrer"&gt;10.5281/zenodo.19058364&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;npm install terse-js&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pip install terse-py&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;TERSE is still a draft — v0.7 is open for community review. If you work with LLM pipelines at scale, I'd love to hear whether this addresses a real pain point in your stack.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Rudson Kiyoshi Souza Carvalho — Independent Researcher&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>opensource</category>
      <category>ai</category>
      <category>token</category>
    </item>
    <item>
      <title>Resilience Evaluation and Optimization Framework — REOF</title>
      <dc:creator>Rudson Kiyoshi Souza Carvalho</dc:creator>
      <pubDate>Wed, 12 Jun 2024 12:23:30 +0000</pubDate>
      <link>https://dev.to/rudsoncarvalho/resilience-evaluation-and-optimization-framework-reof-4f9c</link>
      <guid>https://dev.to/rudsoncarvalho/resilience-evaluation-and-optimization-framework-reof-4f9c</guid>
      <description>&lt;p&gt;Autor: Rudson Kiyoshi Souza Carvalho&lt;/p&gt;

&lt;p&gt;Data: Abril de 2024&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Objetivo:&lt;/strong&gt; Este documento apresenta o REOF, um framework para avaliar, quantificar e otimizar a resiliência e confiabilidade de sistemas, com foco em aplicações de software.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Ao avaliar sistematicamente cada componente crítico, a metodologia ajuda a identificar proativamente áreas de vulnerabilidade que podem comprometer a confiabilidade/resiliência do sistema.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;1. Introdução ao REOF:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O REOF é uma ferramenta padronizada que permite a análise, quantificação e expressão da resiliência e confiabilidade de um sistema através de um índice numérico (IRC - Índice de Resiliência e Confiabilidade).&lt;br&gt;
A metodologia foca na prevenção de falhas e na implementação de melhores práticas para aumentar a confiabilidade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Metodologia de Análise REOF:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O método considera Verticais de Avaliação: O REOF divide a análise em "verticais" que representam pontos críticos de um sistema, como:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EE - Entrada Externa (pontos de interação com o cliente)&lt;/li&gt;
&lt;li&gt;SE - Saídas Externas (envio de dados para outros sistemas)&lt;/li&gt;
&lt;li&gt;CE - Consultas Externas (integrações com outros sistemas)&lt;/li&gt;
&lt;li&gt;DI - Dados Internos (consultas a banco de dados, cache, etc.)&lt;/li&gt;
&lt;li&gt;AC - Aplicação em Container (configurações de health check)&lt;/li&gt;
&lt;li&gt;SEC - Framework de Segurança Habilitado (ex: Spring Security)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Um dos pontos mais importantes sobre este framework é que ele foi concebido para ser flexível a qualquer vertical criada, portanto, você pode criar suas próprias verticais de avaliação e poderá avaliar qualquer processo que tenha um conjunto de boas práticas a serem avaliados. (logo poderia avaliar verticais de infraestrutura, técnicas de construções de aplicativos mobile, entre outros processos. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Proteções e Pesos:&lt;/strong&gt; Para cada vertical, são definidas "proteções" (melhores práticas) que aumentam a resiliência, cada uma com um peso específico.&lt;br&gt;
"Com sua equipe de engenharia ou arquitetura, você poderá listar as melhores práticas de proteção para promover resiliência e confiabilidade ao sistema, definindo pesos para cada proteção aplicada."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cálculo do Índice:&lt;/strong&gt; O IRC é calculado pela soma ponderada das pontuações de cada vertical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fator de Degradação:&lt;/strong&gt; Um fator de degradação é aplicado para considerar o impacto de múltiplos domínios/funcionalidades em um mesmo microsserviço (micromonolitos).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Para cada domínio adicional, quero reduzir a qualidade do índice geral em 10% para cada domínio/funcionalidade adicionada, pois incluir novas/extras funcionalidades/domínios diferentes faz com que seu serviço tenha que compartilhar recursos, e uma lentidão em uma funcionalidade pode esgotar recursos para outras funcionalidades no mesmo microsserviço.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Normalização do Índice:&lt;/strong&gt; O IRC é normalizado para uma escala de 0 a 10, facilitando a comunicação e comparação entre diferentes sistemas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. IRC/REOF como SLA:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O REOF permite expressar o IRC em níveis de serviço (SLA):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;item 1 Excelente (8 a 10)&lt;/li&gt;
&lt;li&gt;item 2 Bom (5 a 7.9)&lt;/li&gt;
&lt;li&gt;item 3 Aceitável (3 a 4.9)&lt;/li&gt;
&lt;li&gt;item 4 Insatisfatório (abaixo de 3)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Pirâmide de confiabilidade REOF de Ruds&lt;/em&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3ph37jwowxim33497zo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc3ph37jwowxim33497zo.png" alt=" " width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SLA para Serviço Excelente:&lt;/strong&gt; O IRC/REOF deve ser maior ou igual a 8, indicando um nível de serviço excelente. Isso reflete a alta confiabilidade e eficiência do microserviço, sem sobrecarga de domínios adicionais.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SLA para Serviço Bom:&lt;/strong&gt; O IRC/REOF deve ser entre 5 e 7.9, indicando um nível de serviço bom. Isso reflete a confiabilidade do microserviço.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SLA para Serviço Aceitável:&lt;/strong&gt; O IRC/REOF deve ser entre 3 e 4.9, indicando um nível de serviço aceitável. Isso indica que há espaço para melhoria. Medidas corretivas devem ser aplicadas para aumentar a confiabilidade deste serviço e reduzir impactos de paradas do serviço por causa da aplicação.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SLA para Serviço Insatisfatório:&lt;/strong&gt; O IRC/REOF deve estar abaixo de 3, indicando um nível de serviço insatisfatório. Isso indica que este serviço precisa de revisões e melhorias, não sendo um serviço confiável.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Flexibilidade e Automação:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O REOF é flexível e pode ser personalizado com novas verticais e proteções.&lt;br&gt;
É possível automatizar o cálculo do IRC através de análise estática de código, mas a precisão pode ser limitada.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. REOF vs. MTBF:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O REOF é uma medida proativa que avalia a robustez do sistema com base em sua construção, enquanto o MTBF é uma medida reativa que considera apenas o tempo médio entre falhas.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;O MTBF é a métrica da sorte ao longo do tempo, um MTBF alto pode indicar que um sistema teve um bom histórico operacional, dadas as condições ideais de operação ambiental desse sistema, no entanto, não diferencia necessariamente sistemas genuinamente bem projetados daqueles que Você pode ter tido 'sorte' de ter um ambiente estável durante o período de execução e avaliação.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;O REOF é mais abrangente e fornece insights mais acionáveis para melhorar a resiliência.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Relação com Chaos Engineering:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;REOF e Chaos Engineering são abordagens complementares.&lt;br&gt;
O REOF garante que as melhores práticas de resiliência sejam aplicadas durante o desenvolvimento, enquanto o Chaos Engineering testa a resiliência do sistema em produção.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Benefícios do REOF:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Comunicação eficaz sobre a confiabilidade do sistema.&lt;/li&gt;
&lt;li&gt;Identificação precisa de áreas de melhoria.&lt;/li&gt;
&lt;li&gt;Cultura de melhoria contínua e prevenção de falhas.&lt;/li&gt;
&lt;li&gt;Gerenciamento de riscos e conformidade com SLAs.&lt;/li&gt;
&lt;li&gt;Melhor experiência do usuário.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;8. Considerações sobre Custos:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Implementação do REOF pode ter custo inicial significativo, mas reduz custos operacionais a longo prazo.&lt;br&gt;
Chaos Engineering pode ter baixo custo de implementação, mas custos operacionais podem ser altos durante os testes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Como o método REOF é melhor do que o método MTBF?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O MTBF é uma estatística de funcionamento do seu sistema, segundo um histórico operacional, uma medição ao longo do tempo, onde um sistema pode funcionar muito bem dada as condições ideais de operação, se nada de anormal acontecer no seu ambiente/infra, o MTBF indicará que seu sistema é extremamente confiável, pois ele depende das condições sob a qual o seu sistema opera para que possam ocorrer falhas, este método não sabe como seu sistema foi construído, considera a freqüência de falhas num período de tempo, e não a robustez como o sistema foi construído para lidar com diferentes tipos de variações no ambiente e consequentemente se proteger das falhas, é um método reativo.&lt;/p&gt;

&lt;p&gt;O MTBF é a métrica da sorte em função do tempo, um MTBF alto pode indicar que um sistema teve um bom histórico de funcionamento dada as condições de ambiente ideais de operação deste sistema, porém, não necessariamente distingue entre sistemas genuinamente bem projetados e aqueles que pode ter tido "sorte" de ter um ambiente estável durante o período de execução e avaliação.&lt;/p&gt;

&lt;p&gt;O REOF genuinamente avalia a robustez do sistema, como o sistema foi construído para lidar com os diferentes tipos de problemas que possam ocorrer no ambiente produtivo, é um método proativo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Relação entre o método REOF e o Chaos Monkey/Engineering&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O método REOF, contrasta com a aplicação de ferramentas como o Chaos Monkey em vários aspectos fundamentais. Ambas as abordagens visam melhorar a resiliência e a confiabilidade dos sistemas, mas fazem isso de maneiras complementares, a engenharia do caos é uma disciplina de experimentação em um sistema para criar confiança na capacidade do sistema de resistir a condições turbulentas na produção, enquanto este método garante que foram aplicadas as melhores práticas para resistir ao caos, ou seja, garante a preparação para falhas, os pontos fortes da metodologia de avaliação de confiabilidade em relação ao uso de um Chaos Monkey são:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Foco na Prevenção e Melhoria Contínua&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Avaliação Holística: A metodologia fornece uma visão abrangente da performance do sistema ao longo do tempo, permitindo identificar tendências, áreas de melhoria e impactos das mudanças, ao contrário do Chaos Monkey, que testa a resiliência de forma mais imediata e isolada.&lt;/p&gt;

&lt;p&gt;Incentivo à Inovação: A gamificação incentiva (proposta tópico desafio de excelência) as equipes a buscar melhorias contínuas e soluções inovadoras para elevar os índices de confiabilidade, promovendo uma cultura de excelência operacional.&lt;/p&gt;

&lt;p&gt;Planejamento Estratégico: Oferece uma base para o planejamento estratégico e a alocação de recursos, ao identificar áreas críticas que necessitam de atenção e investimento, algo que a aplicação isolada do Chaos Monkey não proporciona diretamente.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gestão de Riscos e Conformidade&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Redução de Riscos Operacionais: Ao focar na avaliação e melhoria contínuas da confiabilidade, esta metodologia ajuda a mitigar riscos operacionais de longo prazo, enquanto o Chaos Monkey é mais uma ferramenta de teste de estresse que expõe vulnerabilidades.&lt;/p&gt;

&lt;p&gt;Conformidade com SLAs: A metodologia permite a monitoração proativa e a garantia de que os serviços atendam ou excedam os SLAs acordados, o que é fundamental para a satisfação do cliente e a conformidade regulatória.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Melhoria da Experiência do Usuário&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Foco no Usuário: Avaliar e melhorar a confiabilidade com base nos SLAs enfatiza a importância da experiência do usuário, visando garantir uma operação sem interrupções e desempenho otimizado dos serviços.&lt;/p&gt;

&lt;p&gt;Antecipação de Problemas: Permite a identificação e correção proativa de possíveis falhas antes que afetem os usuários finais, enquanto o Chaos Monkey simula falhas para testar a resiliência, o que pode ou não ser diretamente relacionado à experiência do usuário.&lt;/p&gt;

&lt;p&gt;Complementaridade com Ferramentas de Teste de Resiliência&lt;br&gt;
Abordagem Integrada: Embora focada em avaliação e melhoria, essa metodologia pode ser complementada por ferramentas como o Chaos Monkey para uma abordagem mais robusta à resiliência. Juntas, elas oferecem uma estratégia de defesa em profundidade contra falhas e interrupções.&lt;/p&gt;

&lt;p&gt;Em resumo, a metodologia de avaliação de confiabilidade traz uma abordagem preventiva e estratégica para a gestão da confiabilidade dos sistemas, enfocando a melhoria contínua, a inovação e a satisfação do cliente. Enquanto o Chaos Monkey é uma ferramenta valiosa para testar a resiliência de forma específica e isolada, a combinação das duas abordagens oferece um caminho poderoso para alcançar a excelência operacional e a resiliência do sistema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusão:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;O REOF é um framework poderoso para construir e gerenciar sistemas resilientes. Sua abordagem proativa, foco na prevenção e flexibilidade o tornam uma ferramenta valiosa para qualquer organização que busca alcançar a excelência operacional e garantir a satisfação do cliente.&lt;/p&gt;

&lt;p&gt;Siga o link para mais detalhes: &lt;br&gt;
Follow the medium link for more details about this framework: &lt;a href="https://medium.com/@rudsonkiyoshicarvalho/resilience-evaluation-and-optimization-framework-reof-541d23018460" rel="noopener noreferrer"&gt;Medium REOF&lt;/a&gt;&lt;/p&gt;

</description>
      <category>resilience</category>
      <category>microservices</category>
      <category>softwareengineer</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
