Siddhant Khare

Posted on Mar 28

AI Agent stack you need Context, Auth, and Cognitive Debt

#ai #programming #productivity #devops

Most AI content teaches you how to write prompts.

This is not that.

I've spent three years at Ona building platform infrastructure for 1.7 million developers. I'm the first independent maintainer of OpenFGA, the CNCF authorization system based on Google's Zanzibar paper. I built Distill, a context deduplication library that cuts token usage by 30-40% in 12ms. I wrote an essay on AI fatigue that hit #1 on Hacker News, got covered by Business Insider, Futurism, and The New York Times, and was cited by the Hard Fork podcast.

I wrote down everything I learned from that work. The result is the Agentic Engineering Guide: 216 pages, 33 chapters, covering the full stack from context engineering to agent governance.

But before you decide whether to read it, let me give you the most useful parts for free.

The thing that breaks first

When teams move from Level 2 (chat agents) to Level 3 (agents that actually execute code, call APIs, write files), the first thing that breaks is not the model. It's authorization.

Your agent has access to your database. Your secrets. Your production environment. What permission model are you using?

Most teams answer: "the same one as the developer who set it up."

That's the wrong answer. A developer has permissions scoped to their identity and their judgment. An agent has permissions scoped to... whatever you gave it, running autonomously, at 2am, without anyone watching.

The guide covers Zanzibar-based authorization for agents, the Rule of Two (no agent action should be irreversible without a second check), and why most MCP deployments have a security gap that most teams don't discover until something goes wrong.

The 30-40% problem

Here's a number that should concern you: 30-40% of the context you send to your LLM is redundant.

Your documentation says the same thing as your code comments. Your FAQ overlaps with your support tickets. Your API docs repeat what's in your tutorials. The LLM sees the same fact five different ways and gets confused. Same input, different output. Every time.

The instinct is to fix the prompt. It doesn't work. You cannot prompt your way out of garbage context.

The fix is upstream. Context engineering is the discipline of cleaning, deduplicating, compressing, and structuring the information before it reaches the model. The guide covers the 4-layer context stack, the meta-MCP pattern that cuts token usage by 88%, and why deterministic preprocessing beats LLM-based compression every time.

What 300 engineers told me about AI fatigue

In late 2025, I published a post about AI fatigue in engineering teams. It hit #1 on Hacker News. The comments were more useful than the post.

The pattern that emerged: teams that adopted AI tools without changing their workflows burned out faster than teams that didn't adopt AI at all. The tools added cognitive load without removing it. Engineers were reviewing AI output on top of writing their own code, not instead of it.

The teams that succeeded did something different. They treated agent adoption as an organizational change problem, not a technology problem. They changed review processes, changed how they measured productivity, changed what they expected from junior engineers. The technology was the easy part.

Chapter 20 of the guide covers the AI fatigue patterns in detail. Chapter 21 covers the Conductor Model: the workflow that lets engineers direct agents without becoming agents themselves.

The maturity model

Where does your team fall?

Level 1: Experimental. Individual developers using Copilot or Claude. No team policies. No shared context. No measurement.

Level 2: Structured. Team has agreed on which tools to use and when. Basic review policies. Some measurement of output quality.

Level 3: Integrated. Agents in the CI/CD pipeline. Automated quality gates. Cost tracking. Incident response procedures for when agents break things.

Level 4: Orchestrated. Agents run autonomously on task queues. Multi-agent systems with defined handoffs. Human oversight at the decision level, not the execution level.

Level 5: Autonomous. Agents operate 24/7. Background agents monitor repositories, fix issues, generate tests, update documentation. Humans set goals and review outcomes.

Most teams in early 2026 are at Level 2. The transition to Level 3 is where the engineering discipline becomes essential. The transition to Level 4 is where it becomes critical.

The guide has a full maturity assessment with specific practices for each level and a roadmap for moving between them.

The cognitive debt problem

Technical debt is code that works but is hard to maintain.

Cognitive debt is code that works but nobody understands.

At Ona, 88.5% of merged PRs are agent-authored. That's not a boast. It's a warning. When AI writes most of your code, the team's mental model of the codebase degrades. Engineers can review individual PRs without understanding the system those PRs are building. The code is correct. The understanding is gone.

This is more dangerous than technical debt. You can pay down technical debt by refactoring. You pay down cognitive debt by reading code you didn't write, understanding systems you didn't design, and rebuilding mental models that were never formed in the first place.

The guide covers three practices for managing cognitive debt: mandatory architecture reviews before agent-authored features ship, "explain this to me" sessions where engineers walk through agent-authored code without looking at the diff, and rotation policies that ensure every engineer touches every part of the codebase.

What's in the guide

33 chapters across 10 parts:

Foundations: What agents are, what they can do, the capability spectrum from Level 1 to Level 4
Context Engineering: The 4-layer stack, RAG vs. agentic search, token economics
Security & Authorization: The agent threat model, Zanzibar for agents, prompt injection, sandboxing
Protocols & Standards: MCP in production, A2A communication, AGENTS.md
Observability: OpenTelemetry for agents, cost tracking, incident response
Orchestration: The agent loop, multi-agent systems, memory and checkpoints
Team Practices: AI fatigue, the Conductor Model, the maturity model
Production Workflows: Your first agent in production, security checklists, measuring impact
Production Engineering: Evaluation, enterprise adoption, FinOps, governance, model routing
The Adoption Playbook: A step-by-step guide for taking a team from Level 1 to Level 3

Plus four appendices: tool directory, glossary, further reading, and templates.

Who it's for

Engineering leaders, senior engineers, and platform architects who are adopting AI agents or deciding whether to.

You should be comfortable with software engineering concepts (distributed systems, API design, CI/CD, observability). You don't need prior experience with AI or machine learning.

This is not a coding tutorial. Not a vendor comparison. Not a prompt engineering guide. It's a book about engineering judgment in the age of AI agents.

Get it

The full guide is free to read at agents.siddhantkhare.com and open source on GitHub.

If you want the PDF or EPUB to read offline, it's on Gumroad at pay-what-you-want (minimum $11). All future updates included.

Read free online →

Get the PDF / EPUB →

Questions? I'm @Siddhant_K_code on X or Siddhant Khare on LinkedIn. Drop a comment below if you want me to go deeper on any of these topics.

Top comments (2)

Harjot Singh • May 31

"Cognitive debt" is a sharp coinage - it's the agent equivalent of tech debt, where every shortcut in how the agent reasons (a vague prompt, an un-scoped context, a tool it half-understands) compounds into unpredictable behavior later. Pairing it with Context and Auth as the three pillars is a smart frame, because those are exactly the three layers people underinvest in while over-indexing on model choice.

Auth especially is one most agent posts skip entirely, and it's getting urgent: once an agent can call tools and act on a user's behalf, "what is this agent allowed to do, and on whose authority" stops being optional. An agent with un-scoped credentials is a breach waiting to happen. That's why in Moonshift (a multi-agent pipeline that ships a prompt to a deployed SaaS) auth/permissions are treated as a first-class generated concern, not an afterthought - the boring-but-critical layer that decides whether the output is shippable. Genuinely thoughtful post. On cognitive debt - do you have a concrete way to measure or detect it accumulating, or is it more of a design-discipline lens right now? I'd love a metric for "this agent's reasoning is getting muddy."

klement Gunndu • Mar 28

The Rule of Two for irreversible actions is solid, but how do you handle the latency cost? Adding a confirmation step to every destructive agent action can turn a 2-second workflow into 30 seconds if the second check is another LLM call.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.