DEV Community: Dustin

Agentic Engineering: Lessons Learned Vol. 2

Dustin — Mon, 23 Mar 2026 20:15:59 +0000

Six months ago, we shared our first lessons learned with agentic engineering. Since then, we've shipped real features, refactored infrastructure, screamed at agents — and discovered which of our original recommendations actually survived contact with production.

Note: This space is moving fast. Take these recommendations with a grain of salt and always validate them for your own use case. Our findings are as of March 2026, your mileage may vary.

What Actually Stuck from Vol. 1

Most of our original advice still holds. Context engineering remains the core skill and knowing the basic pitfalls like Context Poisoning, Context Distraction, Context Confusion, Context Clash are key to managing your agent's context effectively. Having those basic principles in mind is essential to know what to expect and avoid delegating tasks that are likely to fail.

Subagents: The context firewall

In Vol. 1, we recommended: "Subagents work best as researchers, not implementers."
While the first part of that statement still holds, we've found that the second part is more nuanced.

The better mental model is to think of subagents as context firewalls rather than scoping them down to specific actions.
They excel at isolating specific tasks and preventing context pollution, but that doesn't necessarily mean they can only read.

Research Tasks

By nature, read-only/research tasks lend themselves well to subagents because they often take a lot of grepping, log analysis, or documentation reading—all to produce a much smaller insight that is actually needed.
We use them all the time for that kind of work as you can also spawn them in parallel without producing conflicts.
For example, we have an ops-investigation skill, that given your prompt about a specific issue on production, spawns subagents to check logs and metrics in grafana, search through sentry issues and recent deploys in parallel.
Since also our Infrastructure as well as the code is in the monorepo, the subagents can also grep through the codebase to produce a distilled report of their findings. This is a huge time saver and allows us to get to the root cause of issues much faster.

Write Tasks

But with well structured plans, also subagents can be used for write operations - the key theme is here phased implementation.
When the main agent ensures, that the phases can be implemented in isolation, with clear boundaries and pre-coordination, then subagents can be trusted to implement those phases without causing chaos in the main context.
Even better, it still allows you to steer the session as the main agent still holds all the context and thus enables you to iterate and give feedback on the subagent's implementation without losing the overall plot.

The key learning: be deliberate about write operations, but don't avoid them entirely. The boundary between "safe" and "risky" writes is more nuanced than a simple read/write distinction.

💰 One more thing: Since you can control the model of the subagents, it is very easy to use cheap models for digging through particularly noisy tasks like log analysis, and then feed the distilled insights into the main agent that can be a more expensive model. This is a great way to optimize token usage while still getting valuable insights.

Side quests: Session Forking and Branching

Just like you're used to branch a git repository when you want to experiment with a new feature or fix a bug, you can also branch your agent session. This is especially useful for debugging and investigation tasks that require a lot of context and exploration, but you still have the main session going on that should be clean.
There are many ways to do this and we're seeing more and more tools supporting this pattern natively, but the core idea is the same: fork the session to isolate the side-quest work from the main implementation context. Especially Pi provides sessions as a tree, which allows you to nicely control the branching of sessions.

Some techniques we use are:

Handoff:
create a handoff file as an artifact with the relevant context. Then spawn a new session and load that file into context. Note that you actively compress the context (which can be good or bad depending on the task).

Main Session --writes context--> handoff.md --compressed load--> New Session with handoff knowledge
Main Session --> clean, proceed with next task

Fork:
Use the fork or branch feature of your coding agent. That way both sessions live and can proceed in parallel.

Main Session --> Fork!
  ├── Main Session (keeps building) — production work goes on
  └── Debug Session (goes exploring) — chaos contained, kill when done

Rewind:
Mark your session, then dive into the investigation, and then rewind to the marked point. Like time travel, the conversation is restored but a bug might be fixed.
Ensure to keep the code changes. Note that this is not parallelizable.

Session --> Mark --> Investigate --> Fix bug --> rewind to Session

Side query:

use /btw or similar commands to ask or verify things with the agent on the current context.

Deep in implementation --> /btw --> Quick answer
                      └── main work continues

Simple context management. Once you have a valuable context built up, protect it.

Planning is non-negotiable, even for small tasks

This was already a strong recommendation in Vol. 1, and after six more months it's become our hardest rule: no plan, no implementation. We've seen enough agent sessions derail mid-task to know that skipping the plan is always a false economy — the time you "save" comes back as wasted tokens and broken context.

For larger work, we follow a research/spec → plan → implement flow. But even for smaller tasks, a simple grounding step — spawning an explore subagent to survey the relevant code before touching anything — makes a huge difference in our experience. It forces the agent to build a mental model first, instead of guessing and course-correcting later.

This also means that any tool that cannot produce or assist in producing a proper plan is not a tool we can use for anything but trivial tasks. Autocomplete-style copilots are great for boilerplate and line-level suggestions, but the moment a task requires understanding across files or making architectural choices, you need a planning step they simply don't offer.

In the next months, we will map and extend our Software Development Lifecycle with clear phases and checkpoints, and enforce that agents follow that process.

Agent ready codebases

This is obviously an evergreen and as it basically controls what the agent "sees" when running in your project, so its impact
cannot be overstated. And as your project evolves, an agent ready codebase must be maintained and cared for like a garden.

Key recommendations:

Use a monorepo if you can 🤓 While there are other techniques for multi-repo setups (see Dexter Horthy's post), a mono-repo simplifies setup and context not just for you but also for the agent
Clean AGENTS.md. Do not ai-slop this crucial file. It shouldn't be longer than 100 lines and as it is read every time, each line must be inspected and it must survive the test of time.
Commands:
- Fast commands: To get fast feedback loops, you need fast commands. Ensure testing is fast, linting is fast. We optimize our linting and typecheck setup regularly (looking at you ts-go and oxlint). If you have many distinct packages, caching outputs like turborepo or nx can be a game changer.
- Clear commands: There should be a unified way to run tests, linters, type checkers, and other common tasks. Don't reinvent the wheel for each package and follow conventions that agents understand.
- Non verbose commands: Ensure a passing test suite is not blurting out thousands of lines of output. Agents need to see the signal, not the noise. If your test runner is too verbose, consider switching or configuring it for cleaner output. See Context-Efficient Backpressure for Coding Agents
Manage your MCPs: Even though claude has an mcp loading tool that dynamically loads mcps on demand once you cross a certain initial context threshold, we mostly use skills to connect to external tools and APIs. That way we can tailor them to our needs and control the context better.

There are also some readiness models going around and until we have a more mature one, this Agent Readiness Model by Factory is a general nice overview.

Level	Name	Description	Example Criteria
1	Functional	Code runs, but requires manual setup and lacks automated validation. Basic tooling that every repository should have.	README, linter, type checker, unit tests
2	Documented	Basic documentation and process exist. Workflows are written down and some automation is in place.	AGENTS.md, devcontainer, pre-commit hooks, branch protection
3	Standardized	Clear processes are defined, documented, and enforced through automation. Development is standardized across the organization.	Integration tests, secret scanning, distributed tracing, metrics
4	Optimized	Fast feedback loops and data-driven improvement. Systems are designed for productivity and measured continuously.	Fast CI feedback, regular deployment frequency, flaky test detection
5	Autonomous	Systems are self-improving with sophisticated orchestration. Complex requirements decompose automatically into parallelized execution.	Self-improving systems

Source: Factory - Agent Readiness Levels

We'd place ourselves somewhere between level 3 and 4 — our processes are standardized and enforced through automation, but we're still working toward systematic flaky test detection.

Frontend Validation: Closing the Visual Feedback Loop

In Vol. 1, we focused heavily on backend workflows as they're more straightforward to validate with existing tools. Frontend work, however, presents a unique challenge: How do you validate visual output without a human in the loop?

For a while, agentic frontend coding lagged behind in our team. The agent can write frontend code, but without visual feedback, they can't validate if it works. This creates a cumbersome loop:

Agent writes component
You open browser, check it
Report back: "The button is misaligned" or pastes a screenshot
Agent adjusts
Repeat

This manual validation step becomes a bottleneck and drains the attention from the developer, who has to act as a proxy for the agent's eyes.
Luckily, browser use tools are evolving rapidly and we've experimented with:

Claude + Chrome extension: Super easy to set up and use, but token-hungry. A great starting point to see what is possible.
agent browser: Agent-friendly (headless) browser use CLI. Offers structured snapshot and clear output, cuts a lot of noise. Has a nice auth vault in place, too. Our champion at the moment.
dev-browser skill: This was the first tool we used on our CI. Less token-hungry than the mcps and easy to extend with own scripts.

The first time it fixes a bug completely on its own by verifying its work was a magical moment✨.
But as often with magical AI moments, it quickly became the new normal and we quickly faced new challenges 😅:

headless environments: to fully leverage frontend validation, we want to bring it into our CI pipelines
token usage: navigating a browser and validating screenshots is token intensive, so we need to be strategic about when to use it or how to isolate it. Tools like agent-browser are great as they have baked-in backpressure compared to the playwright-mcp.
auth and setup scripts: You want to give the agent the ability to start from a clean and well-designed state.
missing product knowledge: It's very tedious if the agent needs to "explore" the product by UI again, to know how to do certain things. For example, it should of course detect and use the buttons on the page, but it shouldn't learn on the fly that it needs to go to configuration to create form template in order to create forms in our software. This must either be provided as knowledge (see next section) or ruled out as a task for the agent.

We're still refining this, but it's a massive leap forward.

Missed potential: Skills and documentation

We were kinda slow adopters of skills, as it was not entirely clear to us how they fit into our existing commands
and tooling. Now, we're building more and more skills specifically tailored for our use cases. Some general recommendations:

Basic:
- keep the SKILL.md focused up to max 200 lines. For everything else, use progressive disclosure with references.
- skills are not just markdown files. Put your agent-friendly or vibe-coded helper scripts there.
- the description field in the metadata is crucial. It is for the agent, not for humans, to check when to invoke it.
- speaking of invocation: Don't expect agents to call skills autonomously.
Skill content:
- Everything that you notice you have to explain repeatedly to the agent, could be a good candidate for a skill.
- Project-specific patterns or how to deal with certain processes can be nicely encapsulated in skills. Often you find yourself following an implicit or maybe explicit process in doing things, e.g. spec -> plan -> implement -> validate -> review, babysit PRs, bug-hunting involving several systems etc. Take your time and document them in skills and maybe later into entire processes!

Just having process steps documented in an actionable way that before were parts of tribal knowledge or onboarding guides is a huge win.
Note that they do not replace documentation though, as specifically the intent in **why* you are doing things a certain way is mostly not captured.*

In our company, there is still a lot of potential left in encoding our SDLC and product knowledge, but equally important non-technical information like company missions, processes, sales and customers that is accessible and usable to agents.

Beyond Coding: Prototyping and Idea Validation

This one is obvious, especially for vibe coders, but this pattern emerged organically: Coding agents are excellent for rapid prototyping and exploring ideas.
We use coding agents not just to ship features and smash bugs, but to do:

UI/UX prototyping: "Build me a quick playground prototype of this dashboard concept" → Full interactive mockup in minutes, awesome for validating design ideas with customers
Idea validation: "Does this architectural approach even work?" → Agent builds a proof-of-concept or grills your ideas
Throwaway exploration: Testing libraries, frameworks, or patterns without committing to them. Or even more modern, follow Karpathy's autoresearch pattern

One tool that's become a staple for us is the Claude Code Playground skill. It's a skill you can install in Claude Code that generates self-contained, single-file HTML playgrounds — complete with visual controls, live preview, and natural-language prompt output. No external dependencies, no build step.

The workflow is dead simple: you describe what you want to explore — "Create a playground for button design styles", "Build an interactive color palette explorer" — and the skill generates an interactive HTML page you can open in your browser. Tweak parameters, see results instantly, copy out a prompt or config when you're happy. Need to show a customer something tangible? "Build a clickable prototype of the onboarding flow with variants." In minutes you have something real to point at, not a slide deck.

That way, the question "is this even worth prototyping?" basically disappears. You can just... do it. In an hour you've tried three approaches, learned which one sucks, and moved on.

Just make sure you're the one calling the shots on what that insight means. Use your human brain where it counts: taste, direction, and the decisions that actually matter long-term.

Making reviews less painful

With code being generated in a fast and iterative way, we suffered like many others from the "review bottleneck".
Pumping out more code means you need to review more code. What is especially crucial for us is to keep the mental alignment
in the team - it is rarely about small lines of code but the overall approach, how it fits into the existing codebase and which tradeoffs are made.
We did not give in to the temptation to just publish agent's code without review like some do, but instead we looked into how to make reviews more efficient and less painful.

What we've done so far:

Strict CI: Same strict checks for every PR, no exceptions. The first quality gate is the CI, not the reviewer. If it doesn't meet the bar, it doesn't even reach human eyes.
Agents for local self-review: Before you even submit a PR, ask the agent (fresh session, at best another capable model) to review the code. This catches many low-hanging issues and improves the quality of the initial submission, so your colleagues can focus on higher-level feedback
Code review on CI: We have a CI job that runs an agent to review the PR diff and provide feedback. This is not meant to replace human review but to catch obvious issues and provide a first pass of feedback. This can be your CodeRabbit or GH Copilot integration for example.
Human friendly Output: Leveraging Nico Bailon's visual-explainer skill, we can flag certain PRs in need of a human friendly and visual explanation of the changes. The agent then produces a html page with a visual diff and natural language explanations of the changes, which is much easier and pleasant to review than raw code diffs, especially to get started.

Outlook: Organizational impacts

We hope this was a useful peek into our learnings and practices with agentic engineering. As we continue to integrate these tools into our workflows and adjust them, the elephant in the room is how our organizational structures and processes will need to evolve to fully leverage the potential of agentic tools. Some early thoughts:

Merging of job roles: With cheap prototypes and faster iteration, classic roles like "product manager", "designer", "developer" might blur as individuals can quickly prototype and validate ideas across disciplines.
Quality: With faster iteration, how do you ensure quality and maintainability?
Bottlenecks: Your team might have the same amount of people but they can do much more. Where are the new bottlenecks? How do you identify and address them?

So stay tuned for future posts where we will dive into these topics!

Agentic Engineering: Lessons Learned Vol. 1

Dustin — Mon, 29 Sep 2025 06:00:00 +0000

The buzz around agentic engineering is deafening, and for good reason: it promises to be a massive lever for software development. But as we've integrated these agents into our workflow, we've discovered that harnessing their power isn't as simple as writing a good prompt. True success comes from mastering a deeper, more dynamic skill: context engineering.

This post is a dispatch from the front lines. We're cutting through the noise to share our lessons on managing an agent's context. We'll cover what worked, what failed, and provide practical strategies you can use today, drawn from our experience with tools like Claude Code.
Note that many of these strategies are not new and have been discussed in various forms by experts (1, 2, 3, 4, 5) in the field. However, we've found that consolidating these insights into actionable lessons has been invaluable for our team.

Note: This space is moving at fast. Take these recommendations with a grain of salt and always validate them for your own use case. Our findings are as of September 2025.

Context Engineering 101

ℹ️ We assume that you are already familiar with the basics of LLMs, prompt engineering, and agentic engineering. If not, check out the resources section at the end of this post.

When thinking about agentic engineering, context engineering is one of the most important aspects to get right.

Mental model

Since LLMs are stateless, the context is the only way to provide them with the necessary information to perform a task.
When working on a codebase, you can think of this model as a function:

type Context = Instructions | Knowledge | Tools
agent: (context) -> output

So if the context is your entire input to the model, it is crucial to get it right. While in the early ages of LLMs, prompt engineering was the main focus, context engineering encompasses more than just the prompt:

Instructions: besides the prompt or even an entire spec, this could be examples (few-shot), constraints, or other memories like agents.md/claude.md
Knowledge: documentation, facts/memories,
Tools: regular tool calls like grep, read file, write file but also MCPS, subagents, etc.

Note that there is not a clear boundary between those categories. For example, a spec could be seen as instructions or knowledge, while documentation could be read with a MCP server like context7. The important part is to understand that the context is more than just the prompt and that all those aspects need to be considered when working with agents.

So when so many things could be the context, how to get it right? Luckily, there are smart people that already figured out
common issues that you will face. We can highly recommend this article by Drew Breunig for more details, but let's get right into the main points:

Common pitfalls with long contexts

as coined by Drew Breunig

Context Poisoning: When an error or hallucination enters the context and gets repeatedly referenced, corrupting subsequent responses. For example, if your code assistant hallucinates a non-existent API method early in a debugging session, it may keep trying to use that fictional method throughout the conversation, building increasingly nonsensical solutions around it.
Context Distraction: As context grows beyond certain thresholds, models start over-relying on their accumulated history rather than their training. A coding assistant might fixate on repeating past debugging attempts from its context instead of synthesizing new approaches, even when those old strategies clearly aren't working.
Context Confusion: Irrelevant information in the context interferes with response quality. Loading a coding assistant with 40+ tool definitions when you only need 3-4 causes the model to make inappropriate tool calls or get distracted by unrelated capabilities, like trying to use a database migration tool when you asked for string manipulation (Tool loadout).
Context Clash: When information gathered incrementally contains contradictions, models struggle to recover. If your coding assistant makes incorrect assumptions about your codebase architecture early on, those wrong assumptions remain in context and influence later responses—even after you provide corrections. The model gets "lost" and cannot recover from early missteps.

Crucial skill:
Think about your first interaction with a LLM that used to be a few messages back and forth. Now with ever-growing turns the agents will
take to fulfill your task, the context will necessarily grow and be expanded with information over time. Keeping this context relevant and focused is a crucial skill to master.

Context window management strategies

How to manage the context window effectively?
Lance Martin wrote an exceptional article about context engineering strategies and describes four main techniques to do so:

✏️ Writing context means saving it outside the context window to help an agent perform a task.
🔎 Selecting context means pulling it into the context window to help an agent perform a task.
🗜️ Compressing context involves retaining only the tokens required to perform a task.
✂️ Isolating context involves splitting it up to help an agent perform a task.

Each strategy is categorized by an emoji that we will use throughout this post to highlight which strategy we think is employed.

Lessons learned

Equipped with that knowledge, let's dive into the lessons learned from our journey with agentic engineering:

🔎 Don't set yourself up for failure

If your initial setup is already bad, you will have a hard time achieving good results with coding agents. This often happens when it is not tuned to your specific environment or tries to do too many things at once:

Provide a focused Claude.md/agents.md file. It should be kept compact and relevant for all use cases, thus task specific instructions should be provided in the prompt/spec, /<command> or subagent.
Maintain your Claude.md/agents.md file. Outdated or conflicting information will poison the context. In our case, we forgot to update how we ran tests after a migration, which made the agent ever-rediscovering how to actually run tests wasting tokens.
Only keep mcp servers that you really need. Having conflicting tool information or overlapping mcp definitions will lead to context consfusion and clash. It should be versioned and reviewed by the team. Use context to identify what is actually used.

🔎 Codebase is context too (or make your codebase agent friendly)

While you can put a lot of work into optimizing the your prompts and context files like documentation, it will only get you so far if you do not make the codebase itself agent friendly.

Code architecture/patterns:

Not only the prompt/spec is context, the codebase itself is context too. Since the model cannot "guess" your codebase, it will scan and crawl through to find files, analyze coding patterns, and understand the architecture. Thus, it is crucial to have a clean and consistent codebase to get good results.
If the codebase has lots of inconsistencies, bad patterns, or is just plain messy, the agent will struggle to understand it and will even use it as reference for generating and aligning new code.
Then you have the good old garbage in -> garbage out problem: Since the LLM will only predict the next most likely token, if the code is bad, the output will likely be bad too.

To overcome this, you need to do the stuff most of us should do anyway: Refactor, clean up, and improve the codebase. For example by having uniform structure, implement well-known patterns and keep complex abstractions at bay. Ironically, by making the codebase agent friendly, your fellow humans will also benefit from it 🤡.

Agent friendly tooling:

You can also make your codebase more agent friendly by using well-known tools and precise scripts, that will not stress your context window that much. For example, you can:

centralize your logging into a single place with service prefixes. E.g. route your backend, DB and frontend logs to one central log file. Now the Agent can grep/tail for the service and find all relevant logs in one place every time.
Fix noisy tools like test runners or linters that might produce a lot of output with verbose data.
Create helper scripts that will do the heavy lifting for the agent. If you have existing scripts, ensure that they are easy to use and that configuration is documented. This could be api generators, build scripts, or even tools that get you the next task like this example.

🔎/🗜️ Watch the context window size

As a general rule of thumb, keep the context window usage below 60%. This will ensure that the model has enough "space" to reason and generate output. If you go above that, you often start to see issues like context distraction and confusion.
When starting a task that will likely exceed the context window, plan ahead and clear or summarize the context deliberately. This can be done by:

clearing the context with /clear or starting a new session. Ensure you have a working todo-list or todo.md file to keep track of what needs to be done.
summarizing the context with /compact and provide the summary back to the agent. ⚠️ Be careful with this, as the summary might miss important details. So far, we found this to be less effective than expected.
use the /context command that will provide you a good overview of the current context window usage and what is taking up the most space.

🔎 Paradigm shift: Read the spec!

For most non-trivial tasks, it really pays of to spend some time to create and evaluate an implementation plan together with the agent (duh!). This will help to get a common understanding of the task and will also help to keep the context focused.
While you don't have to go all-in into spec driven development, simply leveraging Claude's plan mode or generating a markdown file with spec and task breakdown can often be sufficient for smaller tasks.

But if you choose to go down that path, make sure to keep to actually read the the spec thoroughly. You might feel amazed by the agent's well structured output, but it might not align with your expectations or simply is contradictory to the codebase at hand.
This can lead to context poisoning and distraction, as the agent will keep referencing the spec and try to align with it.
It may sounds trivial, but being hyped often leads to skimming through the spec and missing important details.

Behind this there actually a bigger paradigm shift that is approaching fast: The role of the developer is shifting from spending big chunks of time writing code to spending more time on planning, specifying, and validating. For many usecases, the actual coding part is becoming less important, as the agent can take over a lot of that work.
Still feeling productive while writing less code and reading through generated specs is a skill that needs to be learned and practiced.

✏️ Outsource context via filesystem

One nice way of outsourcing context is to leverage the filesystem. Since the filesystem is not limited by the context window, you can use it to store large amounts of data that the agent can reference when needed.
By providing tools like bash scripts or subagents (see below) that write to disk but return a summary/crucial info, you keep the context window small and focused while giving the agent the choice to access more information when needed.

✏️ Directives: Hansel and Gretel

Instead of writing elaborate prompts and precisely embed example or crucial files for the task, you can also use comment directives as breadcrumbs 🍞 to guide the agent.
Giuseppe Gurgone wrote a great article about comment directives and how to use them effectively.
Basically, you scatter certain type of comments across your codebase with some extra information on top. Then, when firing up the agent,
instruct it to look for those comments and use them as context. This way, you can provide a lot of relevant information very locally without bloating the context window upfront.

For example, you when you want to finish a PR and know that in a follow-up PR you want address certain aspects, you can add a comment directive like this:

/* @Implement
 * Replace with generated api-client
 * timeouts should be configurable
 * no retries
 * /
async fetchUserData(userId: string): Promise<UserData> {
/...
}

✂️/🗜️ Subagents: Researchers, not implementers

Subagents are a great way of isolating context, parallelizing tasks and specializing on certain aspects. But they are not a silver bullet and anthropomorphizing them into human roles did not play well for us.
The main lesson learned here is that subagents currently work best as researchers, not implementers. This means that they can be used nicely to gather information, explore possibilities, and provide insights that the main agent can then use to make decisions.

Unsupervised Subagents "fixing" code in parallel

If you use subagents as implementers, you often run into issues like context clash and confusion, as the subagent might take actions that are not aligned with the main agent's goal. Also, spawning subagents in parallel could lead to conflicting writes that the main agent then needs to resolve.
Instead, by using them for aggregating and collecting data, you can leverage their strengths without running into those issues. Plus, read-only tasks can usually be parallelized without any headaches.
We used subagents for tasks like:

finding/traversing codebase for certain patterns
analyzing logs/code for certain errors
checking deployment status/issues of platforms like Kubernetes by providing read-only access to kubectl or cloud CLIs

Test runner and build scripts
Okay, we lied. There are special use case for subagents as implementers we found useful: running tests.
Our tests were quite verbose (see making codebase agent friendly 🫠) and often produced a lot of output (roughly 40k tokens!).
By using a subagent to run the tests, we could isolate the context and only provide the relevant summarized output back to the main agent. This way, the main agent could focus on the task at hand without being distracted by the noise of the test output.
The same goes for build scripts like npm build or docker build. They often produce a lot of output that is not relevant for the main task but were sometimes used to verify if the code can be built successfully, but ymmv.

For example, this could be baked into your subagent definition:

1. Execute tests efficiently across all services in the monorepo
2. Interpret test results and provide actionable feedback
3. Identify the most appropriate test commands based on user context
4. Provide clear summaries of test outcomes with specific error details when failures occur
5. You do not modify source code or fix bugs; your role is strictly to run tests and report results. You can suggest next steps but do not implement them.

Outlook: Spec driven development 📝

While the lessons learned so far are quite generic and can be applied to most agentic engineering setups, there is one aspect that is worth mentioning in more detail: Spec driven development.
With tools like Kiro or Spec Kit gaining traction quickly, spec driven development is becoming more and more popular.
This is an entire topic for itself, so we will cover it in more detail in the next volume of this series - so stay tuned.

Context engineering will still apply, as this is just a more formalized, thorough and elaborate way of providing the right context well aligned with your goals.
As with anything in the agentic engineering space, it is still early days and we are just scratching the surface of what is possible. Take it with a grain of salt and enjoy the ride!

Resources

The RAG Autonomy Spectrum: A Guide to Designing Smarter AI Systems

Dustin — Wed, 11 Jun 2025 07:09:44 +0000

When building a LLM-powered application, having a good overview of possible cognitive architectures patterns can be a key factor in designing effective systems.
Too quickly you can get caught up in the details or latest AI hype, and lose sight of the bigger picture.
Which parts shall be LLM-powered? What parts should be fixed to ensure reproducibility and reliability?
So today we will explore some of the most common cognitive architectures patterns and how they can be applied. As the application at hand can vary tremendously in terms of size, complexity and requirements, we will focus on implementing simple RAG (Retrieval Augmented Generation) systems as a use case to illustrate the concepts.

The Building Blocks of AI: Exploring Cognitive Architectural Patterns

So what do we mean with cognitive architectures patterns? We lend this term from a thought-provoking post by Harrison Chase (Langchain), in which he classifies architectures for AI by their level of autonomy:

Figure 1: Cognitive Architectural Patterns by Harrison Chase [Source]

Let's go through them quickly:

1. Level: Code

Every step and call is hard coded. This is classic code without any LLM involvement.
2. Level: LLM Call
The first level that includes a LLM call - for example translating a selected text. The developer still defines when this single step will be invoked, e.g. receiving and sanitize the text (code), translate using a model (LLM), and post-process and return the response (code).
3. Level: Chain
Instead of using only a single LLM-powered step, you leverage multiple LLM-calls in a defined order to make your application more powerful. For example, you could invoke a model a second time to summarize the content, so that your user gets a brief news feed in the target language.
4. Level: Router
Now we're leaving the realm of applications which steps are defined a priori by the developer. Previously, we called a LLM within a step to produce a result, but here it acts as a router that decides which step to invoke next based on the input and context. This increased flexibility allows for more dynamic and adaptive applications but also more unpredictable results. Note that we do not introduce cycles here, so it represents a directed acyclic graph (DAG). Imagine a web crawler that scans company websites and extracts relevant information, then uses a router to grade and decide whether to add these companies to a list or not.
5. Level: State Machine
We are now entering the field of agents, by adding cycles to a DAG and turning it into a state machine. This allows for even more adaptive applications, enabling the AI to refine its actions and repeating steps until a certain outcome is achieved (please set an iteration/recursion limit 👀). For instance, an agentic web crawler could just be given a an instruction which kind of companies are relevant to the user's interests. The crawler would then iterate through the websites, extracting relevant information, grading it, and deciding whether to add the company to the list or not. When the match quality is below a certain threshold, the crawler could refine the given instruction and try again until it meets the desired outcome. Despite all that variability, the developer stil controls which steps can be taken at any time thus having the rough game plan in their hands.
6. Level: Autonomous Agents
The agent is now also in control in what kind of tools/steps are available. The system will just be given an initial instruction and a set of tools. It can then decide, which steps to take or tools to call. It could also refine prompts or explore and add new tools to its arsenal. While most powerful, it is also the most unpredictable and requires careful monitoring and control.

ℹ️ Please note that there are more ways of classifying LLM-powered agents by their autonomy levels. The smolagents library starts with level 2/3 as base level and is more granular in the agentic realm.

RAG: Grounding AI in Reality

Now that we have established possible levels of autonomy, let's see how we can apply it to one of the most common use case scenarios: Retrieval Augmented Generation (RAG)

A quick primer

Traditionally, large language models (LLMs) have been limited by their reliance on outdated knowledge, susceptibility to hallucination, and inability to access private or real-time data. These limitations have hindered their ability to provide accurate and context-rich responses. To address these challenges, RAG was developed. By using some kind of knowledge base and binding a retrieval mechanism to the LLM, we can achieve factual grounding, specialize on specific domains, provide recent information as well as citations/sources and control what data can be accessed.

Is this still relevant?

Do we even need RAG today? Aren't LLM's context window ever increasing and aren't they getting better at understanding context?
While this development is true, there are still striking points and usecases that render RAG a suitable choice:

if your data is mostly static and you need a needle-in-haystack search
accuracy: LLM can struggle with large contexts, especially when the data is "lost" in the middle
cost: less tokens means greater latency and more costs per call
volume: deal with thousands of documents
no comprehensive understanding of a full document is required (e.g. code, summarization, analysis)

A key thing to remember is that the Retrieval part in RAG does not mean vector embedding search. You can (and often should) retrieve data by various ways,
like a keyword-based search or a hybrid approach for example.
For the sake of brevity, we skip a deep dive into RAG techniques here as this is such a broad topic that we might cover in a future post.

RAG autonomy evolution

Now, with our cognitive architectural patterns in hand, we can nicely dissect common RAG techniques and rank them based on their autonomy levels.
This should give you a practical understanding on how such levels could be applied in the real world.

Legend:

LLM call: blue
Router : red decision
Query: yellow
Response: green

Level 1: Classic Search

While this is not RAG, it serves as the common ground where how traditional and simple retrieval systems can be designed.
The user sends a query, the system looks for relevant documents in a knowledge base and returns them as a response. This is the pure "retrieval" step, no LLM involved.

Figure 2: Classic Search

Level 2: Classic RAG

This is the classic RAG pattern, where the system retrieves relevant documents, augments the context with them, and then generates a response using an LLM.
As we are on level 2, we only incorporate a single LLM call (blue box) , in this case to generate the output. All the other steps are known ahead of time making it a linear process that is easy to grasp. In many cases, the knowledge base is a vector database, but it can also be a keyword-based search or any other retrieval mechanism.

Figure 3: Classic RAG

Also querying multiple knowledge bases in parallel is still considered a level 2 RAG technique, as we do not introduce any additional LLM calls to improve the retrieval process. The LLM is only used to generate the final response based on the retrieved documents. This is shown in the figure below, where we retrieve documents from two different knowledge bases and then generate a response based on the combined context (e.g. by using reciprocal rank fusion (RRF)).

Figure 4: RAG with multi-query and RRF

Level 3: Chained RAG

Here we introduce multiple LLM calls (blue boxes) to improve the system's capabilities. There are many RAG implementations are implemented this way, for example:

Rewrite-Retrieve-Read (RRR): The initial query is rewritten to improve its quality to hopefully retrieve relevant documents (Figure 5).

Figure 5: Rewrite-Retrieve-Read RAG

Rerank RAG: After retrieving documents, we can rerank them based on their relevance to the query. This can be done by using a second LLM call to score the documents or by using a separate ranking model (Figure 6).

Figure 6: Rerank RAG

Hypothetical Document Embeddings (HyDE) : This technique generates hypothetical document embeddings based on the query and then retrieves documents that are similar to these embeddings. This can be used to improve the retrieval quality by generating embeddings that are more relevant to the query.

Of course, nobody stops you from combining the techniques above: Rewrite the query, retrieve documents from multiple knowledge bases, and then rerank them before generating the final response. From an architectural perspective, this is still a linear process, as you know every step and when it will be run a priori.

Level 4: RAG with Routers

On level 4, LLMs take over parts of the control flow and decide which step to take next based on the input and context. This allows for more dynamic and adaptive RAG systems, where the LLM can choose taking additional steps to improve results retrieval technique or decide whether to rerank documents or not.

In the example below (Figure 7), the corrective RAG (CRAG) pattern is implemented. After retrieving documents, the LLM grades the documents with a score. If the documents fall below a certain threshold, a corrective step is taken by invoking a web search to find more relevant documents. This is the first time we see a LLM-powered router in action, as it decides whether to take the corrective step or not based on the retrieved documents' quality.

Figure 7: Corrective RAG

Note that we do not introduce cycles here, so it still represents a directed acyclic graph (DAG). You still know all the steps of this linear process and when they could be invoked, but the LLM decides whether to take them or not.

Level 5: RAG with State Machines

By adding cycles, these agentic RAG techniques can perform reflexive actions by observing and evaluating the results of previous steps, and then deciding whether to take corrective actions or not. Then it can restart (parts of) the process until a certain outcome is achieved. A rather complex example is Self-RAG (Figure 8), that leverages three grading steps (routers) to check for relevant documents, a grounded response and the usefulness w.r.t. to the question.

Figure 8: Self-RAG

Taking a look at this architecture, we can see how many parts of the process are controlled by the LLM. This allows a more adaptive system, but the complexity also increases.
Using structured responses and having a proper tracing in place are crucial to reason about the system's behavior and to debug it when things go wrong.

Level 6: Autonomous RAG Agents

One might think that a RAG technique on level 6 must be so complex that it cannot fit on the screen, when we take a look at the previous example. But in fact, the base is quite simple: The LLM is given an instruction and a set of tools (for example retrieval techniques) and can then decide which steps to take.
This means, that we do not know ahead of time, which steps will be taken, how many times they will be invoked, and in which order.
To fully fulfill the autonomy level 6, the LLM should also be able to refine its instruction and add new tools to its arsenal. One super interesting approach for this is CodeAct, which allows LLMs to write and execute code on the fly. Applied to our use case, it could write a new retrieval technique based on the user's needs and then use it to retrieve relevant documents 🤯.

Figure 9: Autonomous RAG

The right tool for the job 🔨

Does this mean that we should always strive for the highest autonomy level? Not necessarily. While higher autonomy levels can lead to more adaptive and powerful systems, they also come with increased complexity, unpredictability, and potential for failure.
Especially when dealing with large amount of rather static data, a simpler RAG technique might be more suitable. In general, it is advised to use deterministic and less autonomous approaches the more you know the workflow in advance.

Simple Agents?

On the other hand people building coding agents report, that an agent equipped with retrieval tools like can outperform more complex systems that rely on advanced vector embeddings and indices. It also has been shown that for deep research contexts, a simple combination of keyword search like BM25 and an agent can achieve on par results compared to complex RAG systems, while having low inference and low storage requirements costs and complexity. This breaks with common beliefs that large volume of data requires complex vector embeddings for an agentic use case.

Conclusion

In the evolving landscape of AI, cognitive architecture patterns provide a structured approach to design and compare LLM-powered systems. From simple code to complex autonomous agents, each level of autonomy offers its own advantages and challenges.
While more autonomy brings more complexity, it also opens doors to adaptive and powerful systems that can reason, plan, and execute tasks in ways that were previously unfeasible. As with nearly any topic in software architecture, there is no one-size-fits-all solution. Start with the simplest architecture that meets your needs, scaling autonomy only when tasks require dynamic decision-making.

An interesting trend is the rise of Agentic RAG, which combines the power of retrieval with the flexibility of agents. Especially when taking into account the rise of Model Context Protocol (MCP), new datasources and tools can be added on the fly, allowing agentic systems to adapt to new requirements without the need for complex redesign or reconfiguration. What we are particularly excited about is the potential of simple tools like keyword search to be used effectively in Agentic systems, proving that sometimes simple tools, wielded wisely, amplify its power.

Resources

Papers:

Originally posted at SIMPL engineering blog

Model Context Protocol (MCP) - Should I stay or should I go? 🎶

Dustin — Mon, 10 Mar 2025 22:12:17 +0000

In this article, we'll explore the Model Context Protocol (MCP) briefly and help you decide whether it deserves your attention or can be safely ignored for now.
The AI landscape has been buzzing with excitement around Large Language Models (LLMs), and MCP has emerged as one of the key protocols in this rapidly evolving ecosystem.

As with any hype, it is important to take a step back and understand the basics before onboarding the train - choo choo 🚂!
So here is my shot at explaining the MCP use cases and their benefits.

Go to TL;DR section.

Getting external data into the LLM

Traditionally, LLMs are trained on a vast amount of data at a certain point in time. That means, there is going to be a cut-off point of data that is available to the LLM - anything newer than that is not available.
However, in many cases, you want to use LLMs to process data that is stored outside of the training data.
This can be for example:

customer support chats
product descriptions
the latest memes on the internet

Smart people came up with a number of ways to do that:

Finetuning the model on the data. This is a very time consuming process and not scalable.
Use a so called "Knowledge Base" to store the data and use RAG (Retrieval-Augmented Generation) to answer questions. Good fit for knowledge retrieval tasks.
Function calling: Provide functions (e.g. custom code) with semantic meaning to the LLM. Then the LLM can decide, to let the function run or not. For example, the prompt could be: "Please check if the user is eligible for a discount" and the function could be a check_discount_eligibility function.

Options for Integrating External Data into LLM Apps

Design time vs run time

If we look at the 3 techniques above, we can see that they all have one thing in common: They are integrated at design time - meaning, that an engineer needs to carefully integrate the data/code into an application, before the the user can use it.
This works well for many use cases, where you - the developer - have control over the underlying LLM/agent and want achieve the best results.
In addition, as long as the LLM agents are still a bit clunky, constructing the such application requires fine-tuning and adjustments anyway.

However, once you become a user of such an application, you are not in control of the underlying LLM/agent. For example, when you use cursor editor, you use the agent the help you write code but you don't rewire Cursor's internals.

This is where MCP server come into play. These servers provide functionality according to a defined protocol - the MCP protocol - and can be integrated at runtime.
Imagine using Cursor, an AI-powered code editor, to write database queries. You don’t control its internal agent, but with an MCP server, you can plug in your Postgres schema at runtime—no need to wait for Cursor’s developers to build it in. This flexibility lets users extend apps instantly, bypassing the delays of design-time updates.

Is this just an API?

Not quite. Unlike stateless REST APIs or design-time function calling, MCP is an open standardized protocol for applications to provide context to LLMs, using stateful connections, client-server architecture and pre-defined capabilities and messages. It is powered by an JSON-RPC API (not necessarily HTTP) that is defined in the MCP specification and inspired by the Language Server Protocol.
I think lots of confusion arises from the fact, that MCP is often compared to conventional REST APIs or function calling, especially when the use cases are trivial.

For function calling, the integration is done at design time and the function is part of the application - so there is a mismatch.
Let's take a look at the API vs MCP discussion: At first glance, those are similar as you could design an LLM-powered application, that could consume any OpenAPI-spec compliant API and converts it into tools on the fly.
In fact, there is even an OpenAI cookbook for that.
Having a stateful 1:1 mapped client-server connection like MCP defines it just to get the weather in a certain city is a bit of an overkill. And if it is just a small stateless REST API, providing an OpenAPI runner is good enough.

But once you have a more complex use case that involves state or require a deep interaction between the LLM and the application, MCP can be a great fit.
For instance, sampling allows servers to request LLM completions through the client maintaining data control while roots define client's resources like filesystems, where the MCP servers should work with.
Of course such more complex workflows require a powerful client, that might be missing in some users' applications. And with any new technique, the debugging and tooling is not as mature as for the battle-proven HTTP APIs.

The network

No, not the network of the internet - but the people and companies behind a standard.
As MCP is designed for AI engineering, it attracts a fresh group of people that are passionate about the future of AI.
Those can then participate in the development of an open standard - which suggests no lock-in.

What makes it even more interesting is, that it is backed by Anthropic who have a great standing in the developer community thus providing visibility and trust for the long term perspective of the standard.
The more people implement MCP servers, the more attractive it becomes for users to use them as they know, that they will be supported in the future.
This will in turn drive the adoption of MCP and the standard will become more robust and mature. Looking back at the last months, we can definitely see a sharp increase in the number of MCP servers (1100) as well as clients and registries (per Why MCP Won)
Pair this with a fast evolving roadmap and lessons from similar protocols like LSP (Language Server Protocol) and you (might?) have a recipe for success.

TL;DR

If you are not in control of the underlying Agentic LLM, MCP servers can be a great way to add external data and functionality to the application at runtime.
Think of plugins, that extend the capabilities of the LLM without the need to change the underlying model or the application.
Thus if you are a user, MCPs can supercharge your LLM-powered application with additional capabilities.

If you are a developer and design the actual system, MCPs can be an overkill if you just want to integrate a stateless (RESTful) API - which is quite common.
Relying on conventional tooling like OpenAPI, function calling or third-party toolkits like LangChain's is good enough for many use cases. So far, proper tools need tailored agent logic to be useful.

Still, APIs and standards are as powerful as the people behind them and MCP is growing and evolving fast while already have a large group of supporters.
Such network effects can make MCP the de-facto standard for LLM integration in the future - even if it is not perfect for every use case.
As with many topics in the AI space, take predictions with a grain of salt and enjoy the ride.

For an even deeper dive, check out the Why MCP Won article by Latent Space.

Resources

Multi-tenant database patterns - through a SaaS lens

Dustin — Tue, 03 Oct 2023 16:39:35 +0000

This article shall give an overview of various popular multi-tenant database patterns and their pros and cons - through a SaaS lens.
That means that we analyze the patterns in terms of their suitability for SaaS applications and their tradeoffs.
While there are many good posts already available on the internet, I want to bring together different naming conventions and patterns in one place, for a
better overview and comparison.

An example use case

Let's assume we have a SaaS application that helps customers organize their employees and machines. For the sake of brevity,
it will consists of a web client, a backend API and a database.
It could look like this:

Figure 1: Example app

The various scopes of multi-tenancy

Taking a look at the example application, one can imagine that there might be different scopes of multi-tenancy and that is absolutely true.
At the highest level (think of zoom level = 0), we can distinguish between single-tenant and multi-tenant systems, simply meaning that a system is either used by one tenant or multiple tenants:

Figure 2: High level comparison

At the next level (zoom level = 1), we can distinguish between multi-tenancy applied for the backend, the client or the database.
For example, you could have the entire stack (client, backend, database) be single-tenant, meaning that each tenant has its own client, backend and database. Here you essentially achieve the single-tenancy of Figure 1.

Figure 3: Single-tenant stack

Of course, no one is stopping you from applying multi-tenancy on single components, like the backend only for example. Here it means, that some tenants get their own backend, but all tenants share the same database.
In other words, the backend is siloed per tenant, but the database is pooled.

Figure 4: Mixed tenancy example

Since many cloud-native systems are built with stateless backends for scalability, we will focus on applying multi-tenancy on the database level in this article (zoom level = 2).

Figure 4: Multi tenancy for DBs

Terminology and the various ways of applying multi-tenancy

To get things started, let's define some terms that will be used throughout this article. Please note that we focus on the data layer (database),
so the terms are related to processing and storing data.

Tenant: A tenant is a group of users that share the same database. Usually, a tenant is a customer of the SaaS application.
Tenant ID: A tenant ID is a unique identifier for a tenant. It can be used to identify a tenant in the database.
Single-tenant Database: A single-tenant database is a database that is used by only one tenant.
Multi-tenant Database: A multi-tenant database is a database that is used by multiple tenants.

The patterns

As with nearly every software architecture topic, there is no one-size-fits-all solution. The same is true for multi-tenancy, so let
the tradeoff-festival begin!

Pattern 1: Separate database server aka the silo pattern

In this shared-nothing approach, each tenant gets its own database server. This means that each tenant has its own database instance running which enables
maximum isolation between tenants, thus eliminating the noisy-neighbor issue as well and possibly boosting compliance. This pattern is also known as the silo pattern.
It also allows for maximum flexibility in terms of database configuration, since each tenant can have its own configuration.
It strength is also its weakness, since keeping those different servers properly, configured, up-to-date, monitored and backed up is a very resource-intensive task.
Surely Infrastructure-as-Code can help here, but it is still a lot of work. On top of increased complexity in terms of deployment and operation,
this pattern also has a high cost, since each tenant needs its own database server. This pattern is best suited for large tenants that have high security and compliance requirements.

An implementation of this pattern could look like the image above (single-tenant stack) or like this, if you do not want to set up a dedicated backend API for each tenant as well:

Figure 5: Silo pattern

Please keep in mind that somewhere the mapping between tenant and database server has to be stored, so that the backend knows which database server to connect to for a given tenant.
Those mapping/lookup components are sometimes called catalogs and could be a simple key-value store or a more sophisticated service registry. Designing a proper catalog is a topic for another article, but it is important to keep in mind that it is needed.

Pattern 2: Separate by schema or database

This umbrella term describes a pattern where multiple tenants share the same database server, but get their isolation by logical constructs e.g.
having a dedicated database or schema for each tenant. This pattern is also known as the bridge pattern.
By sacrificing some isolation, this pattern reduces the complexity and the costs of the silo pattern, since you do not need to set up a dedicated database server for each tenant.
It also allows for customization, since each tenant can have its own database or schema. This pattern is best suited for tenants that have medium security and compliance requirements under certain conditions.
It is really a hybrid model, where you can benefit or shoot yourself in the foot, depending on the requirements:

it is isolated, but does not provide the same level of isolation as the silo pattern and does not eliminate the noisy-neighbor issue
it has less infrastructure cost, but suffers from a all-or-nothing availability
it is flexible and allows customization for tenant specific custom data , but deployment complexity is still high and needs to be thoroughly orchestrated with the backend deployment

Regardless of the logical construct used, the mapping between tenant and logical construct has to be stored somewhere, so that the backend knows which logical construct to connect to for a given tenant.
If using databases to separate tenants, it could look like this:

Figure 6: Bridge pattern with separate databases

If you choose to use schemas to separate tenants, it could look like this:

Figure 7: Bridge pattern with separate schemas

This pattern was used often in the past, as it offers higher agility and lower costs, than the silo pattern traditionally. Imho,
these advantages are not as relevant anymore, since the cloud-native movement has made it possible to spin up new and manage database servers with much less overhead.
Still, your team needs to have proper experience and resources if you need that extra control for performance tuning and isolation.

For even more agility, let's take a look at the next pattern.

Pattern 3: Separate by table column

This pattern is also known as the pool pattern. It is somewhat similar to the bridge pattern, but instead of separating tenants by database or schema, it separates them by a table column.
This means that all tenants share the same:

database server
database
schema and tables

The isolation is achieved that each relevant table has an additional column like tenant_id that is used to identify and separate the data of different tenants.

Figure 6: Pool pattern with separation by tenant_id column

Note that a true catalog component is not really needed, as the tenant_id is very lightweight and no connection credentials need to be stored.
Such ID can often be kept in a session or in the JWT token of the user.

Of course, there are obvious drawbacks that might make it unsuitable for your use case:

isolation is achieved by a column, so it is not as strong as the bridge pattern and especially not as strong as the silo pattern
per-tenant customizations are tricky, since you would need to add additional columns to the tables
similar issues as bridge pattern:
- noisy-neighbor can be an issue
- all-or-nothing availability
- limited scalability
complex per tenant backups and restores

But especially for SaaS-businnesses, there can be striking reasons to follow this approach once combined it with additional technologies:

data isolation can be enforced with Row-Level-Security (RLS) in the database
if customization can be kept at a minimum: achieve per-tenant customization can be achieved with JSON data types
with sharding the data by tenant_id, scalability can be achieved (see Citus extension for PostgreSQL)
It is very easy to add new tenants, since you do not need to set up a new database or schema
It is straightforward to monitor
unmatched agility in terms of deployment and operation

Conclusion

As you can see, there is no one-size-fits-all solution but more like a spectrum of solutions, ranging from maximum isolation to maximum agility.
Depending on your use case, the following questions might help you to decide which pattern to choose:

How much isolation do you need?
How much agility do you need? Do you need to be able to spin up new tenants quickly? How often do you want to deploy?
What kind of SLA or performance requirements do you have?
Do you need to have precise cost monitoring/metering per tenant?
How much customization do you need? Do you want to provide special features for each tenant?
How much scalability do you need? Do you expects 10s, 100s or 1000s of tenants?
How much resources and expertise do you have? Do you have a dedicated team/expert for database operations/devops?
What kind of regulations do you need to follow? E.g. GDPR, ISO 27001?

What's up next:
For SaaS scale-ups however, the pool pattern can be often a good fit, since it allows for fast iteration cycles through fast deployments and operations.
And as we've put on a SaaS lens, let's keep those goggles and focus on the implementation and the mitigation of risks of the pool pattern in the next article.

Useful resources

How Kubernetes handles offline nodes

Dustin — Sun, 08 Mar 2020 18:18:07 +0000

Kubernetes is a great tool for orchestrating containerized workloads on a cluster of nodes. If you've ever experienced the sudden downtime of a node, you maybe came in touch with Kubernetes' rescheduling strategies of deployments that kicked in after some time. In this post I want to highlight how such situations are recognized by the system. This can be helpful to understand and tune rescheduling mechanics or when developing your own operators and resources.

The process

In order to demonstrate this process in a more appealing way, the following graphic will be used to visualize the key actions and decisions. We will deal with a system consisting of one master and one node.

In a healthy system, the kubelet running on the node continuously reports its status to the master (1).
This is controlled by setting the CLI param --node-status-update-frequency of the kubelet, whose default is 10s.
That way, the master stays informed about the health of the cluster nodes and can schedule pods in a proper way.

Now (2), the kubelet loses its connection to the master. For instance, the node could have crashed or the network is faulty.
The master obviously cannot be informed about this reason, but when monitoring the nodes the timeout --node-monitor-grace-period gets checked (3). Per default this timeout is set to 40 seconds in the controller manager. That means, that a node has 40 seconds to recover and send its status to the master until the next step (4) is entered.

If the node can successfully recover, the system stays healthy and continues with the loop (1).
If the node could not respond in the given timeout, its status is set to Unknown and a second timeout (5) starts. This timeout, called --pod-eviction-timeout, controls when the pods on the node are ready to be evicted (as well as "Taints and Tolerations" in the next section). The default value is set to 5 minutes.
As soon as the nodes responds within this timeframe (6), the master sets its status back to Ready and the process can continue with usual the loop at the beginning.
But when this timeout is exceeded with non-responding node, the pods are finally marked for deletion (7).
It should be noted, that these pods are not removed instantly. Instead, the node has to go online again and connect to the master in order to confirm this deletion (2-phase confirmation).
If that is not possible, for example when the node has left the cluster permanently, you have to remove these pods manually.

Taints and Tolerations

Even though you set the eviction timeout --pod-eviction-timeout to a lower value, you may notice that pods still need 5 minutes to be deleted. This is due to the admission controller that sets a default toleration to every pod, which allows it to stay on a not-ready or unreachable node for period of time.

tolerations:
- key: node.kubernetes.io/not-ready
  effect: NoExecute
  tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
  operator: Exists
  effect: NoExecute
  tolerationSeconds: 30

As you can see in the default configuration above, the value is set to 300 seconds/5 minutes. One possible solution is to apply a custom configuration to each pod, where this value is adjusted to your needs. You can also adjust this setting globally.
For instance, when a value (tolerationSeconds) of 20 seconds is chosen, it will take 60 seconds overall for a pod to be deleted, because the --node-monitor-grace-period value is taken into account before.

Wrapping it up

I hope that you now got a rough idea about how Kubernetes recognizes and handles offline nodes. Especially the two timeouts as well as the default taints and tolerations configuration can be a caveat.
This can come in handy when you develop an own operator, that has to deal with non-responding nodes. For instance, Kubernetes' deployment controller recognizes these situations automatically and reschedules the configured pods.
This also one of the reasons why you should avoid using "naked" pods, because this helpful handling has to be implemented by you in that case.

Modern multi-architecture builds with Docker

Dustin — Sat, 04 Jan 2020 22:30:38 +0000

In this post I'm going to explain several ways to build docker images for multiple architectures. With the ongoing rise of ARM-architectures, for example the Raspberry Pi or Amazon's efficient EC2 A1-Instances, multi-architecture builds will probably gain more focus.

If you're on a single computer, building and running docker images is very easy. The build command analyzes a given Dockerfile and runs the specific instructions. To do so, docker uses the kernel of your OS (or your VM, depending on your setup). This can bound the architecture of the image to the host architecture, especially when you compile a binary inside it.

When using a typical desktop PC, this architecture is probably x86/amd64. So if you run this image on a different computer with the same architecture, everything is fine. But what to do when a different architecture is the target, for example ARM?

Compile programs for different architectures

Generally speaking, 3 typical techniques are used to compile software for a different architecture, which are briefly explained in this section.

1. Build on the target system

Building your software directly on the target system is obviously the easiest approach, since you probably do not have to change anything in your code.

Let's consider building software for the ARM architecture. For example, you could transfer your code to a Raspberry Pi, install the toolchain and build your code there. Since Docker is supported on this device, you can use the same commands as on your desktop computer.
Typical problems with this approach can be the limited access to ARM-powered hardware and since they are often not that powerful, a slow build performance. E.g., compiling dependencies in C is something which can take a lot of time on a Raspberry Pi (though its get faster with every new board).

2. Emulate the hardware

If you do not have access to the target system, emulation is another viable solution. While in virtualization only certain parts of a computer's hardware are simulated in order to run a guest OS, emulation simulates the complete hardware. This makes emulation slower than virtualization, but it also not limited to the underlying hardware, making it possible to simulate hardware like an ARM-processor on a x86 system.

As you might have guessed, this approach is truly powerful by enabling you to build software for various architectures, but simulating the entire hardware is a huge overhead. Wouldn't it be great to simulate only the hardware components required for building the software?

User-mode emulation

Usually when an executable file is passed to a exec-system call, the kernel expects it to be a native binary for the current system.
If you ever encountered the error exec user process caused "exec format error" in a docker image, you tried to run a binary which can't be executed on the processor.
Luckily, with binfmt_misc it is possible to register custom interpreters in the userland to handle foreign binaries.
To do so, you basically register the respective interpreter and a "magic number", which identifies these binaries in the specific format.
The idea is, that in combination with a powerful emulator, you can still run and build binaries for foreign architectures on your system, by emulating only the required parts.

QEMU

One famous emulator which is capable of such a feature, is QEMU. Besides a user-mode emulation, it supports various architectures, full-system emulation and virtualization as well.

To use user-mode emulation with QEMU, we need to register this emulator for some foreign architectures, for example ARM. The Hypriot project has explained this well in this article. They even build a docker image, which runs in privileged mode to do this registering of magic numbers with QEMU for you on your host system.
An excerpt of the register script is put below to give you a rough idea:

# Register new interpreters
# - important: using flags 'C' and 'F'
echo ':qemu-arm:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/qemu-arm:CF' > /proc/sys/fs/binfmt_misc/register
echo ':qemu-aarch64:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/qemu-aarch64:CF' > /proc/sys/fs/binfmt_misc/register
echo ':qemu-ppc64le:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x15\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\x00:/qemu-ppc64le:CF' > /proc/sys/fs/binfmt_misc/register

# Show results
echo "---"
echo "Installed interpreter binaries:"
ls -al /qemu-*
echo "---"
cd /proc/sys/fs/binfmt_misc
for file in *; do
    case "${file}" in
    status|register)
        ;;
    *)
        echo "Registered interpreter=${file}"
        cat ${file}
        echo "---"
        ;;
    esac
done

3. Use a cross-compiler

The last option is using a cross-compiler to build software for different architectures.
While a standard compiler builds for the system it's running on, a cross-compiler can generate binaries for other architectures as well.
Since it does not rely on any emulation but runs natively on your system, the build performance is as good as option 1.
Modern languages put in a lot of effort to support the feature well, especially with Golang this is a breeze. For example, you specify the OS (GOOS) and the target architecture (GOARCH) to cross compile:

# build for mac
GOOS=darwin GOARCH=386 go build main.go
# build for Raspberry Pi
GOOS=linux GOARCH=arm go build main.go

If multi-arch builds are an ongoing task for you, definitely check out such programming languages.

Docker images and multi-arch

Let's take a look at how docker support multi-arch images. Afaik, two approaches are mainly used:

Separate image

One option is to make a different image for each architecture. This can be done by creating a different repository or setting a designated tag for each supported platform.
This usually requires a separate Dockerfile too, as seen in the coreos/flannel repository for example:

Dockerfile

# AMD 64
FROM alpine
ENV FLANNEL_ARCH=amd64
ADD dist/qemu-$FLANNEL_ARCH-static /usr/bin/qemu-$FLANNEL_ARCH-static
...

Dockerfile.arm

FROM arm32v6/alpine
ENV FLANNEL_ARCH=arm
ADD dist/qemu-$FLANNEL_ARCH-static /usr/bin/qemu-$FLANNEL_ARCH-static
...

By passing the architecture as build arguments or environment variables to the Dockerfile, you can run instructions specific to these platforms. In this example, an env variable FLANNEL_ARCH is used for this purpose.

Manifests

A Docker manifest is a very simple concept. Basically, it's an object containing a list of image references for each supported architecture. A docker client can then pull an image by inspecting this manifest file returned by the registry, search the list for a matching platform and then load the image by the identifying digest.

An example docker manifest file, containing images for linux/arm, linux/amd64 and linux/ppc64le, is shown below:

{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 424,
         "digest": "sha256:f67dcc5fc786f04f0743abfe0ee5dae9bd8caf8efa6c8144f7f2a43889dc513b",
         "platform": {
            "architecture": "arm",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 424,
         "digest": "sha256:b64ca0b60356a30971f098c92200b1271257f100a55b351e6bbe985638352f3a",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 425,
         "digest": "sha256:df436846483aff62bad830b730a0d3b77731bcf98ba5e470a8bbb8e9e346e4e8",
         "platform": {
            "architecture": "ppc64le",
            "os": "linux"
         }
      }
   ]
}

While this feature is currently experimental on the docker client, it's already integrated in the docker registry and containerd. The OCI Image specification has included manifests too.

Generating manifests

As stated above, you currently need to enable experimental features on the docker client to work with manifests. Once you've done that, docker manifest should be a valid command. Generating a manifest is then very easy via the CLI.

Let's say we developed a superapp and built an image for amd64 and arm manually.
Now we want to publish these two images, app/superapp-amd64 and app/superapp-arm, as "one" image app/superapp by using a manifest.
This can be achieved with the following two commands:

docker manifest create app/superapp app/superapp-amd64 app/superapp-arm
# Created manifest list docker.io/app/superapp
# Push the manifest
docker manifest push app/superapp

Using Docker's `buildx` for ARM builds

In June 2019, Docker announced tooling support for building docker images and the ARM architecture as an experimental feature.
On Docker Desktop see https://engineering.docker.com/2019/04/multi-arch-images/ and on Linux see https://engineering.docker.com/2019/06/getting-started-with-docker-for-arm-on-linux/ for a setup guide.

This feature combines the approaches above, namely QEMU, binfmt_misc, and manifests and bundles them as a single tool buildx. It allows you to write a single Dockerfile, which can be used to build images for various platforms without changing it. And with the QEMU user-mode emulation, you can build and run images which have a different architecture than your current system.

A simple example

Let's package a simple Go-application as a docker image for arm64 and amd64. We write a simple program that reports the OS and the architecture of the system.

main.go

package main

import "fmt"
import "runtime"

func main() {
    fmt.Printf("OS: %s\nArchitecture: %s\n", runtime.GOOS, runtime.GOARCH)
}

If we take a look at the Dockerfile, we see that it is not aware of the architectures which should be supported.

FROM golang:alpine AS builder
RUN mkdir /app
ADD . /app/
WORKDIR /app
RUN go build -o report .

FROM busybox
RUN mkdir /app
WORKDIR /app
COPY --from=builder /app/report .
CMD ["./report"]

Now comes the interesting part, the cross-architecture build with buildx. Let's build an image for linux/amd64 and linux/arm64:

docker buildx build --platform linux/amd64,linux/arm64 -t foo/bar  --push .

with the --platform flag you specify the platforms the image should be built for
just like docker build, the -t flag lets you define the image tag
--push pushes the image directly to the registry after an successful build

A simple architecture-aware example

Sometimes, you do not want to build everything natively in the Dockerfile, for example by downloading prebuilt third-party binaries for the target architecture. Docker's buildx has you covered here as well, as it exposes for each value in --platform arguments like TARGETARCH or TARGETOS.

For instance, this is a Dockerfile for a multi-architecture image for ipfs-cluster. In order to skip the compilation of the ipfs-cluster binary from source, we download the right version for our architecture via wget.

FROM golang:1.12-stretch AS builder

# This dockerfile builds and runs ipfs-cluster-service.
ENV SUEXEC_VERSION v0.2
ENV TINI_VERSION v0.16.1
ENV IPFS_CLUSTER_VERSION v0.11.0
ARG TARGETARCH

RUN set -x \
  && cd /tmp \
  && git clone https://github.com/ncopa/su-exec.git \
  && cd su-exec \
  && make \      ### native build ###
  && git checkout -q $SUEXEC_VERSION \
...

RUN wget https://dist.ipfs.io/ipfs-cluster-ctl/${IPFS_CLUSTER_VERSION}/ipfs-cluster-ctl_${IPFS_CLUSTER_VERSION}_linux-${TARGETARCH}.tar.gz 
...

When built with the --platform linux/amd64,linux/arm64,linux/arm flag, the TARGETARCH is set to amd64, arm, arm64 once in order to download the correct binary. Still, native builds are possible, since we're emulating the hardware. For instance, su-exec is simply compiled with make. Nice!

Caveats

As an experimental tool, buildx still has some rough edges for the developer. The main issues I encountered where:

You can only push the images directly to a registry. A file export of the docker image is only possible in the OCI format. Issue 166, Issue 186
When an error is thrown while building your image, it's often very hard to decipher the error message in a multi-arch error log. If you can, first try to build with the usual docker build command for your system and if it works, switch to buildx.
Due to the emulation, the build is still magnitudes slower, especially if you need to compile dependencies of your software. Check for prebuilt dependencies/binaries online and load the correct version, using TARGETARCH for example.

Wrapping it up

I hope I could give you a brief overview of the various techniques for multi-architecture docker images or raise your interest in it.
It's really exciting to see that Docker invests in simplifying this process and while buildx is still young, it's already usable.
Besides building on your local machine, installing the user-mode emulation is also possible on CI-servers, so a multi-architecture build pipeline can be set up.
Thinking of containers as a way to package software, it's a logical next step to make it available for various architectures as easy as possible.

DEV Community: Dustin

Agentic Engineering: Lessons Learned Vol. 2

What Actually Stuck from Vol. 1

Subagents: The context firewall

Side quests: Session Forking and Branching

Planning is non-negotiable, even for small tasks

Agent ready codebases

Frontend Validation: Closing the Visual Feedback Loop

Missed potential: Skills and documentation

Beyond Coding: Prototyping and Idea Validation

Making reviews less painful

Outlook: Organizational impacts

Further reading

Agentic Engineering: Lessons Learned Vol. 1

Context Engineering 101

Mental model

Common pitfalls with long contexts

Context window management strategies

Lessons learned

🔎 Don't set yourself up for failure

🔎 Codebase is context too (or make your codebase agent friendly)

🔎/🗜️ Watch the context window size

🔎 Paradigm shift: Read the spec!

✏️ Outsource context via filesystem

✏️ Directives: Hansel and Gretel

✂️/🗜️ Subagents: Researchers, not implementers

Outlook: Spec driven development 📝

Resources

The RAG Autonomy Spectrum: A Guide to Designing Smarter AI Systems

The Building Blocks of AI: Exploring Cognitive Architectural Patterns

RAG: Grounding AI in Reality

RAG autonomy evolution

Level 1: Classic Search

Level 2: Classic RAG

Level 3: Chained RAG

Level 4: RAG with Routers

Level 5: RAG with State Machines

Level 6: Autonomous RAG Agents

The right tool for the job 🔨

Conclusion

Resources

Model Context Protocol (MCP) - Should I stay or should I go? 🎶

Getting external data into the LLM

Design time vs run time

Is this just an API?

The network

TL;DR

Resources

Multi-tenant database patterns - through a SaaS lens

An example use case

The various scopes of multi-tenancy

Terminology and the various ways of applying multi-tenancy

The patterns

Pattern 1: Separate database server aka the silo pattern

Pattern 2: Separate by schema or database

Pattern 3: Separate by table column

Conclusion

Useful resources

How Kubernetes handles offline nodes

The process

Taints and Tolerations

Wrapping it up

Modern multi-architecture builds with Docker

Compile programs for different architectures

1. Build on the target system

2. Emulate the hardware

User-mode emulation

QEMU

3. Use a cross-compiler

Docker images and multi-arch

Separate image

Manifests

Generating manifests

Using Docker's buildx for ARM builds

A simple example

A simple architecture-aware example

Caveats

Wrapping it up

Further reading

Using Docker's `buildx` for ARM builds