<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dustin</title>
    <description>The latest articles on DEV Community by Dustin (@duske).</description>
    <link>https://dev.to/duske</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F288937%2F9611edc5-b8ef-49e6-9ab0-42798e9e9ecb.jpeg</url>
      <title>DEV Community: Dustin</title>
      <link>https://dev.to/duske</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/duske"/>
    <language>en</language>
    <item>
      <title>Agentic Engineering: Lessons Learned Vol. 2</title>
      <dc:creator>Dustin</dc:creator>
      <pubDate>Mon, 23 Mar 2026 20:15:59 +0000</pubDate>
      <link>https://dev.to/duske/agentic-engineering-lessons-learned-vol-2-7mh</link>
      <guid>https://dev.to/duske/agentic-engineering-lessons-learned-vol-2-7mh</guid>
      <description>&lt;p&gt;Six months ago, we shared our &lt;a href="https://dev.to/duske/agentic-engineering-lessons-learned-vol-1-jbj"&gt;first lessons learned&lt;/a&gt; with agentic engineering. Since then, we've shipped real features, refactored infrastructure, screamed at agents — and discovered which of our original recommendations actually survived contact with production.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This space is moving &lt;strong&gt;fast&lt;/strong&gt;. Take these recommendations with a grain of salt and always validate them for your own use case. Our findings are as of March 2026, your mileage may vary.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Actually Stuck from Vol. 1
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Most of our original advice still holds.&lt;/strong&gt; Context engineering remains the core skill and knowing the basic pitfalls like &lt;a href="https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html" rel="noopener noreferrer"&gt;Context Poisoning, Context Distraction, Context Confusion, Context Clash&lt;/a&gt; are key to managing your agent's context effectively. Having those basic principles in mind is essential to know what to expect and avoid delegating tasks that are likely to fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subagents: The context firewall
&lt;/h3&gt;

&lt;p&gt;In Vol. 1, we recommended: "Subagents work best as researchers, not implementers."&lt;br&gt;
While the first part of that statement still holds, we've found that the second part is more nuanced.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1qrtfa60ja667x0tom3d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1qrtfa60ja667x0tom3d.png" alt="context firewall" width="554" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The better mental model is to think of subagents as &lt;strong&gt;context firewalls&lt;/strong&gt; rather than scoping them down to specific actions.&lt;br&gt;
They excel at isolating specific tasks and preventing context pollution, but that doesn't necessarily mean they can only read.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Research Tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By nature, read-only/research tasks lend themselves well to subagents because they often take a lot of grepping, log analysis, or documentation reading—all to produce a much smaller insight that is actually needed.&lt;br&gt;
We use them all the time for that kind of work as you can also spawn them in parallel without producing conflicts.&lt;br&gt;
For example, we have an &lt;code&gt;ops-investigation&lt;/code&gt; skill, that given your prompt about a specific issue on production, spawns subagents to check logs and metrics in grafana, search through sentry issues and recent deploys in parallel.&lt;br&gt;
Since also our Infrastructure as well as the code is in the monorepo, the subagents can also grep through the codebase to produce a distilled report of their findings. This is a huge time saver and allows us to get to the root cause of issues much faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Write Tasks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But with well structured plans, also subagents can be used for write operations - the key theme is here &lt;strong&gt;phased implementation&lt;/strong&gt;.&lt;br&gt;
When the main agent ensures, that the phases can be implemented in isolation, with clear boundaries and pre-coordination, then subagents can be trusted to implement those phases without causing chaos in the main context.&lt;br&gt;
Even better, it still allows you to steer the session as the main agent still holds all the context and thus enables you to iterate and give feedback on the subagent's implementation without losing the overall plot.&lt;/p&gt;

&lt;p&gt;The key learning: be deliberate about write operations, but don't avoid them entirely. The boundary between "safe" and "risky" writes is more nuanced than a simple read/write distinction.&lt;/p&gt;

&lt;p&gt;💰 One more thing: Since you can control the model of the subagents, it is very easy to use cheap models for digging through particularly noisy tasks like log analysis, and then feed the distilled insights into the main agent that can be a more expensive model. This is a great way to optimize token usage while still getting valuable insights.&lt;/p&gt;
&lt;h3&gt;
  
  
  Side quests: Session Forking and Branching
&lt;/h3&gt;

&lt;p&gt;Just like you're used to branch a git repository when you want to experiment with a new feature or fix a bug, you can also branch your agent session. This is especially useful for debugging and investigation tasks that require a lot of context and exploration, but you still have the main session going on that should be clean.&lt;br&gt;
There are many ways to do this and we're seeing more and more tools supporting this pattern natively, but the core idea is the same: &lt;strong&gt;fork the session to isolate the side-quest work from the main implementation context.&lt;/strong&gt; Especially &lt;a href="https://lucumr.pocoo.org/2026/1/31/pi/" rel="noopener noreferrer"&gt;Pi&lt;/a&gt; provides sessions as a tree, which allows you to nicely control the branching of sessions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fth7w82oieq9nj87q6n29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fth7w82oieq9nj87q6n29.png" alt="Diagram showing a main agent session branching into a side quest" width="500" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Some techniques we use are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Handoff:&lt;/strong&gt;&lt;br&gt;
create a handoff file as an artifact with the relevant context. Then spawn a new session and load that file into context. Note that you actively compress the context (which can be good or bad depending on the task).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Main Session --writes context--&amp;gt; handoff.md --compressed load--&amp;gt; New Session with handoff knowledge
Main Session --&amp;gt; clean, proceed with next task
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fork:&lt;/strong&gt;&lt;br&gt;
Use the fork or branch feature of your coding agent. That way both sessions live and can proceed in parallel.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Main Session --&amp;gt; Fork!
  ├── Main Session (keeps building) — production work goes on
  └── Debug Session (goes exploring) — chaos contained, kill when done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rewind:&lt;/strong&gt;&lt;br&gt;
Mark your session, then dive into the investigation, and then rewind to the marked point. Like time travel, the conversation is restored but a bug might be fixed.&lt;br&gt;
Ensure to keep the code changes. Note that this is not parallelizable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Session --&amp;gt; Mark --&amp;gt; Investigate --&amp;gt; Fix bug --&amp;gt; rewind to Session
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Side query:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use &lt;code&gt;/btw&lt;/code&gt; or similar commands to ask or verify things with the agent on the current context.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Deep in implementation --&amp;gt; /btw --&amp;gt; Quick answer
                      └── main work continues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simple context management. Once you have a valuable context built up, protect it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Planning is non-negotiable, even for small tasks
&lt;/h3&gt;

&lt;p&gt;This was already a strong recommendation in Vol. 1, and after six more months it's become our hardest rule: &lt;strong&gt;no plan, no implementation.&lt;/strong&gt; We've seen enough agent sessions derail mid-task to know that skipping the plan is always a false economy — the time you "save" comes back as wasted tokens and broken context.&lt;/p&gt;

&lt;p&gt;For larger work, we follow a &lt;code&gt;research/spec → plan → implement&lt;/code&gt; flow. But even for smaller tasks, a simple &lt;em&gt;grounding&lt;/em&gt; step — spawning an explore subagent to survey the relevant code before touching anything — &lt;strong&gt;makes a huge difference&lt;/strong&gt; in our experience. It forces the agent to build a mental model first, instead of guessing and course-correcting later.&lt;/p&gt;

&lt;p&gt;This also means that any tool that cannot produce or assist in producing a proper plan is not a tool we can use for anything but trivial tasks. Autocomplete-style copilots are great for boilerplate and line-level suggestions, but the moment a task requires understanding across files or making architectural choices, you need a planning step they simply don't offer.&lt;/p&gt;

&lt;p&gt;In the next months, we will map and extend our Software Development Lifecycle with clear phases and checkpoints, and enforce that agents follow that process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent ready codebases
&lt;/h3&gt;

&lt;p&gt;This is obviously an evergreen and as it basically controls what the agent "sees" when running in your project, so its impact&lt;br&gt;
cannot be overstated. And as your project evolves, an agent ready codebase must be maintained and cared for like a garden.&lt;/p&gt;

&lt;p&gt;Key recommendations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a monorepo if you can 🤓 While there are other techniques for multi-repo setups (see &lt;a href="https://x.com/dexhorthy/status/2033972371934368125" rel="noopener noreferrer"&gt;Dexter Horthy's post&lt;/a&gt;), a mono-repo simplifies setup and context not just for you but also for the agent&lt;/li&gt;
&lt;li&gt;Clean &lt;code&gt;AGENTS.md&lt;/code&gt;. Do not ai-slop this crucial file. It shouldn't be longer than 100 lines and as it is read every time, each line must be inspected and it must survive the test of time.&lt;/li&gt;
&lt;li&gt;Commands:

&lt;ul&gt;
&lt;li&gt;Fast commands: To get fast feedback loops, you need fast commands. Ensure testing is fast, linting is fast. We optimize our linting and typecheck setup regularly (looking at you ts-go and oxlint). If you have many distinct packages, caching outputs like turborepo or nx can be a game changer.&lt;/li&gt;
&lt;li&gt;Clear commands: There should be a unified way to run tests, linters, type checkers, and other common tasks. Don't reinvent the wheel for each package and follow conventions that agents understand.&lt;/li&gt;
&lt;li&gt;Non verbose commands: Ensure a passing test suite is not blurting out thousands of lines of output. Agents need to see the signal, not the noise. If your test runner is too verbose, consider switching or configuring it for cleaner output. See &lt;a href="https://www.humanlayer.dev/blog/context-efficient-backpressure" rel="noopener noreferrer"&gt;Context-Efficient Backpressure for Coding Agents&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Manage your MCPs: Even though claude has an mcp loading tool that dynamically loads mcps on demand once you cross a certain initial context threshold, we mostly use skills to connect to external tools and APIs. That way we can tailor them to our needs and control the context better.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;There are also some readiness models going around and until we have a more mature one, this Agent Readiness Model by &lt;a href="https://factory.ai/" rel="noopener noreferrer"&gt;Factory&lt;/a&gt; is a general nice overview.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example Criteria&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Functional&lt;/td&gt;
&lt;td&gt;Code runs, but requires manual setup and lacks automated validation. Basic tooling that every repository should have.&lt;/td&gt;
&lt;td&gt;README, linter, type checker, unit tests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Documented&lt;/td&gt;
&lt;td&gt;Basic documentation and process exist. Workflows are written down and some automation is in place.&lt;/td&gt;
&lt;td&gt;AGENTS.md, devcontainer, pre-commit hooks, branch protection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standardized&lt;/td&gt;
&lt;td&gt;Clear processes are defined, documented, and enforced through automation. Development is standardized across the organization.&lt;/td&gt;
&lt;td&gt;Integration tests, secret scanning, distributed tracing, metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optimized&lt;/td&gt;
&lt;td&gt;Fast feedback loops and data-driven improvement. Systems are designed for productivity and measured continuously.&lt;/td&gt;
&lt;td&gt;Fast CI feedback, regular deployment frequency, flaky test detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Autonomous&lt;/td&gt;
&lt;td&gt;Systems are self-improving with sophisticated orchestration. Complex requirements decompose automatically into parallelized execution.&lt;/td&gt;
&lt;td&gt;Self-improving systems&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Source: &lt;a href="https://docs.factory.ai/web/agent-readiness/overview#the-5-readiness-levels" rel="noopener noreferrer"&gt;Factory - Agent Readiness Levels&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We'd place ourselves somewhere between level 3 and 4 — our processes are standardized and enforced through automation, but we're still working toward systematic flaky test detection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Frontend Validation: Closing the Visual Feedback Loop
&lt;/h3&gt;

&lt;p&gt;In Vol. 1, we focused heavily on backend workflows as they're more straightforward to validate with existing tools. Frontend work, however, presents a unique challenge: &lt;strong&gt;How do you validate visual output without a human in the loop?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For a while, agentic frontend coding lagged behind in our team. The agent can write frontend code, but without visual feedback, they can't validate if it works. This creates a cumbersome loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent writes component&lt;/li&gt;
&lt;li&gt;You open browser, check it&lt;/li&gt;
&lt;li&gt;Report back: "The button is misaligned" or pastes a screenshot&lt;/li&gt;
&lt;li&gt;Agent adjusts&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This manual validation step becomes a bottleneck and drains the attention from the developer, who has to act as a proxy for the agent's eyes.&lt;br&gt;
Luckily, browser use tools are evolving rapidly and we've experimented with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://code.claude.com/docs/en/chrome" rel="noopener noreferrer"&gt;Claude + Chrome extension&lt;/a&gt;:&lt;/strong&gt; Super easy to set up and use, but token-hungry. A great starting point to see what is possible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://agent-browser.dev/" rel="noopener noreferrer"&gt;agent browser&lt;/a&gt;:&lt;/strong&gt; Agent-friendly (headless) browser use CLI. Offers structured snapshot and clear output, cuts a lot of noise. Has a nice auth vault in place, too. Our champion at the moment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/SawyerHood/dev-browser" rel="noopener noreferrer"&gt;dev-browser skill&lt;/a&gt;:&lt;/strong&gt; This was the first tool we used on our CI. Less token-hungry than the mcps and easy to extend with own scripts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first time it fixes a bug completely on its own by verifying its work was a magical moment✨.&lt;br&gt;
But as often with magical AI moments, it quickly became the new normal and we quickly faced new challenges 😅:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;headless environments&lt;/strong&gt;: to fully leverage frontend validation, we want to bring it into our CI pipelines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;token usage&lt;/strong&gt;: navigating a browser and validating screenshots is token intensive, so we need to be strategic about when to use it or how to isolate it.
Tools like agent-browser are great as they have baked-in backpressure compared to the playwright-mcp.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;auth and setup scripts&lt;/strong&gt;: You want to give the agent the ability to start from a clean and well-designed state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;missing product knowledge&lt;/strong&gt;: It's very tedious if the agent needs to "explore" the product by UI again, to know how to do certain things. For example, it should of course detect and use the buttons on the page, but it shouldn't learn on the fly that it needs to go to configuration to create form template in order to create forms in our software. This must either be provided as knowledge (see next section) or ruled out as a task for the agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're still refining this, but it's a massive leap forward.&lt;/p&gt;

&lt;h3&gt;
  
  
  Missed potential: Skills and documentation
&lt;/h3&gt;

&lt;p&gt;We were kinda slow adopters of skills, as it was not entirely clear to us how they fit into our existing commands&lt;br&gt;
and tooling. Now, we're building more and more skills specifically tailored for our use cases. Some general recommendations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic:

&lt;ul&gt;
&lt;li&gt;keep the SKILL.md focused up to max 200 lines. For everything else, use progressive disclosure with &lt;code&gt;references&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;skills are not just markdown files. Put your agent-friendly or vibe-coded helper scripts there.&lt;/li&gt;
&lt;li&gt;the &lt;code&gt;description&lt;/code&gt; field in the metadata is crucial. It is for the &lt;strong&gt;agent&lt;/strong&gt;, not for humans, to check when to invoke it.&lt;/li&gt;
&lt;li&gt;speaking of invocation: Don't expect agents to call skills autonomously.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Skill content:

&lt;ul&gt;
&lt;li&gt;Everything that you notice you have to explain repeatedly to the agent, could be a good candidate for a skill.&lt;/li&gt;
&lt;li&gt;Project-specific patterns or how to deal with certain processes can be nicely encapsulated in skills. Often you find yourself following an implicit or maybe explicit process in doing things, e.g. &lt;code&gt;spec -&amp;gt; plan -&amp;gt; implement -&amp;gt; validate -&amp;gt; review&lt;/code&gt;, babysit PRs, bug-hunting involving several systems etc. Take your time and document them in skills and maybe later into entire processes!&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffh4nv40q02uo0kvfamv0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffh4nv40q02uo0kvfamv0.png" alt="Agent failing a task due to missing process context" width="500" height="771"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just having process steps documented in an actionable way that before were parts of tribal knowledge or onboarding guides is a huge win.&lt;br&gt;
&lt;em&gt;Note that they do not replace documentation though, as specifically the intent in **why&lt;/em&gt;* you are doing things a certain way is mostly not captured.*&lt;/p&gt;

&lt;p&gt;In our company, there is still a lot of potential left in encoding our SDLC and product knowledge, but equally important non-technical information like company missions, processes, sales and customers that is accessible and usable to agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Beyond Coding: Prototyping and Idea Validation
&lt;/h3&gt;

&lt;p&gt;This one is obvious, especially for vibe coders, but this pattern emerged organically: &lt;strong&gt;Coding agents are excellent for rapid prototyping and exploring ideas.&lt;/strong&gt;&lt;br&gt;
We use coding agents not just to ship features and smash bugs, but to do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;UI/UX prototyping:&lt;/strong&gt; "Build me a quick playground prototype of this dashboard concept" → Full interactive mockup in minutes, awesome for validating design ideas with customers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idea validation:&lt;/strong&gt; "Does this architectural approach even work?" → Agent builds a proof-of-concept or grills your ideas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throwaway exploration:&lt;/strong&gt; Testing libraries, frameworks, or patterns without committing to them. Or even more modern, follow &lt;a href="https://github.com/karpathy/autoresearch" rel="noopener noreferrer"&gt;Karpathy's autoresearch pattern&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One tool that's become a staple for us is the &lt;a href="https://github.com/anthropics/claude-plugins-official/tree/main/plugins/playground" rel="noopener noreferrer"&gt;Claude Code Playground skill&lt;/a&gt;. It's a skill you can install in Claude Code that generates self-contained, single-file HTML playgrounds — complete with visual controls, live preview, and natural-language prompt output. No external dependencies, no build step.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ugbetilavp9clotkqxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ugbetilavp9clotkqxr.png" alt="playground-skill" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The workflow is dead simple: you describe what you want to explore — "Create a playground for button design styles", "Build an interactive color palette explorer" — and the skill generates an interactive HTML page you can open in your browser. Tweak parameters, see results instantly, copy out a prompt or config when you're happy. Need to show a customer something tangible? "Build a clickable prototype of the onboarding flow with variants." In minutes you have something real to point at, not a slide deck.&lt;/p&gt;

&lt;p&gt;That way, the question "is this even worth prototyping?" basically disappears. You can just... do it. In an hour you've tried three approaches, learned which one sucks, and moved on.&lt;/p&gt;

&lt;p&gt;Just make sure &lt;em&gt;you're&lt;/em&gt; the one calling the shots on what that insight means. Use your human brain where it counts: taste, direction, and the decisions that actually matter long-term.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making reviews less painful
&lt;/h3&gt;

&lt;p&gt;With code being generated in a fast and iterative way, we suffered like many others from the "review bottleneck".&lt;br&gt;
Pumping out more code means you need to review more code. What is especially crucial for us is to keep the mental alignment&lt;br&gt;
in the team - it is rarely about small lines of code but the overall approach, how it fits into the existing codebase and which tradeoffs are made.&lt;br&gt;
We did not give in to the temptation to just publish agent's code without review like some do, but instead we looked into how to make reviews more efficient and less painful.&lt;/p&gt;

&lt;p&gt;What we've done so far:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strict CI: Same strict checks for every PR, no exceptions. The first quality gate is the CI, not the reviewer. If it doesn't meet the bar, it doesn't even reach human eyes.&lt;/li&gt;
&lt;li&gt;Agents for local self-review: Before you even submit a PR, ask the agent (fresh session, at best another capable model) to review the code. This catches many low-hanging issues and improves the quality of the initial submission, so your colleagues can focus on higher-level feedback&lt;/li&gt;
&lt;li&gt;Code review on CI: We have a CI job that runs an agent to review the PR diff and provide feedback. This is not meant to replace human review but to catch obvious issues and provide a first pass of feedback. This can be your CodeRabbit or GH Copilot integration for example.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human friendly Output&lt;/strong&gt;: Leveraging Nico Bailon's &lt;a href="https://github.com/nicobailon/visual-explainer" rel="noopener noreferrer"&gt;visual-explainer skill&lt;/a&gt;, we can flag certain PRs in need of a human friendly and visual explanation of the changes. The agent then produces a html page with a visual diff and natural language explanations of the changes, which is much easier and pleasant to review than raw code diffs, especially to get started.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fje3zyzqy7gxfp8fwn4sn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fje3zyzqy7gxfp8fwn4sn.png" alt="Sample output of the visual explainer skill for code reviews" width="800" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Outlook: Organizational impacts
&lt;/h2&gt;

&lt;p&gt;We hope this was a useful peek into our learnings and practices with agentic engineering. As we continue to integrate these tools into our workflows and adjust them, the elephant in the room is how our organizational structures and processes will need to evolve to fully leverage the potential of agentic tools. Some early thoughts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Merging of job roles: With cheap prototypes and faster iteration, classic roles like "product manager", "designer", "developer" might blur as individuals can quickly prototype and validate ideas across disciplines.&lt;/li&gt;
&lt;li&gt;Quality: With faster iteration, how do you ensure quality and maintainability?&lt;/li&gt;
&lt;li&gt;Bottlenecks: Your team might have the same amount of people but they can do much more. Where are the new bottlenecks? How do you identify and address them?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So stay tuned for future posts where we will dive into these topics!&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://engineering.simpl.de/post/agentic-engineering-lesson1/" rel="noopener noreferrer"&gt;Vol. 1: Context Engineering&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/news/extended-thinking" rel="noopener noreferrer"&gt;Anthropic: Extended Thinking&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://danielgriesser.com/posts/manage-the-context-window/" rel="noopener noreferrer"&gt;Daniel Griesser: Manage the Context Window&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html" rel="noopener noreferrer"&gt;Drew Breunig: How Contexts Fail&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents" rel="noopener noreferrer"&gt;HumanLayer: Skill Issue - Harness Engineering for Coding Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lucumr.pocoo.org/2026/1/31/pi/" rel="noopener noreferrer"&gt;Pi: The Minimal Agent Within OpenClaw&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>contextengineering</category>
      <category>agentic</category>
      <category>llm</category>
      <category>claude</category>
    </item>
    <item>
      <title>Agentic Engineering: Lessons Learned Vol. 1</title>
      <dc:creator>Dustin</dc:creator>
      <pubDate>Mon, 29 Sep 2025 06:00:00 +0000</pubDate>
      <link>https://dev.to/duske/agentic-engineering-lessons-learned-vol-1-jbj</link>
      <guid>https://dev.to/duske/agentic-engineering-lessons-learned-vol-1-jbj</guid>
      <description>&lt;p&gt;The buzz around agentic engineering is deafening, and for good reason: it promises to be a massive lever for software development. But as we've integrated these agents into our workflow, we've discovered that harnessing their power isn't as simple as writing a good prompt. True success comes from mastering a deeper, more dynamic skill: context engineering.&lt;/p&gt;

&lt;p&gt;This post is a dispatch from the front lines. We're cutting through the noise to share our lessons on managing an agent's context. We'll cover what worked, what failed, and provide practical strategies you can use today, drawn from our experience with tools like Claude Code.&lt;br&gt;
Note that many of these strategies are not new and have been discussed in various forms by experts (&lt;a href="https://www.dbreunig.com/" rel="noopener noreferrer"&gt;1&lt;/a&gt;, &lt;a href="https://rlancemartin.github.io/" rel="noopener noreferrer"&gt;2&lt;/a&gt;, &lt;a href="https://steipete.me/" rel="noopener noreferrer"&gt;3&lt;/a&gt;, &lt;a href="https://lucumr.pocoo.org/" rel="noopener noreferrer"&gt;4&lt;/a&gt;, &lt;a href="https://mariozechner.at/" rel="noopener noreferrer"&gt;5&lt;/a&gt;) in the field. However, we've found that consolidating these insights into actionable lessons has been invaluable for our team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; This space is moving at &lt;strong&gt;fast&lt;/strong&gt;. Take these recommendations with a grain of salt and always validate them for your own use case. Our findings are as of September 2025.&lt;/p&gt;
&lt;h2&gt;
  
  
  Context Engineering 101
&lt;/h2&gt;


ℹ️ We assume that you are already familiar with the basics of LLMs, prompt engineering, and agentic engineering. If not, check out the resources section at the end of this post.


&lt;p&gt;When thinking about agentic engineering, context engineering is one of the most important aspects to get right.&lt;/p&gt;
&lt;h3&gt;
  
  
  Mental model
&lt;/h3&gt;

&lt;p&gt;Since LLMs are stateless, the context is the only way to provide them with the necessary information to perform a task.&lt;br&gt;
When working on a codebase, you can think of this model as a function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Instructions&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nx"&gt;Knowledge&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nx"&gt;Tools&lt;/span&gt;
&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So if the context is your entire input to the model, it is crucial to get it right. While in the early ages of LLMs, prompt engineering was the main focus, context engineering encompasses more than just the prompt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instructions: besides the prompt or even an entire spec, this could be examples (few-shot), constraints, or other memories like agents.md/claude.md&lt;/li&gt;
&lt;li&gt;Knowledge: documentation, facts/memories,&lt;/li&gt;
&lt;li&gt;Tools: regular tool calls like grep, read file, write file but also MCPS, subagents, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that there is not a clear boundary between those categories. For example, a spec could be seen as instructions or knowledge, while documentation could be read with a MCP server like context7. The important part is to understand that the context is more than just the prompt and that all those aspects need to be considered when working with agents.&lt;/p&gt;

&lt;p&gt;So when so many things could be the context, how to get it right? Luckily, there are smart people that already figured out&lt;br&gt;
common issues that you will face. We can highly recommend &lt;a href="https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html" rel="noopener noreferrer"&gt;this article by Drew Breunig&lt;/a&gt; for more details, but let's get right into the main points:&lt;/p&gt;
&lt;h3&gt;
  
  
  Common pitfalls with long contexts
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;as coined by &lt;a href="https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html" rel="noopener noreferrer"&gt;Drew Breunig&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context Poisoning&lt;/strong&gt;: When an error or hallucination enters the context and gets repeatedly referenced, corrupting subsequent responses. For example, if your code assistant hallucinates a non-existent API method early in a debugging session, it may keep trying to use that fictional method throughout the conversation, building increasingly nonsensical solutions around it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Distraction&lt;/strong&gt;: As context grows beyond certain thresholds, models start over-relying on their accumulated history rather than their training. A coding assistant might fixate on repeating past debugging attempts from its context instead of synthesizing new approaches, even when those old strategies clearly aren't working.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Confusion&lt;/strong&gt;: Irrelevant information in the context interferes with response quality. Loading a coding assistant with 40+ tool definitions when you only need 3-4 causes the model to make inappropriate tool calls or get distracted by unrelated capabilities, like trying to use a database migration tool when you asked for string manipulation (Tool loadout).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Clash&lt;/strong&gt;: When information gathered incrementally contains contradictions, models struggle to recover. If your coding assistant makes incorrect assumptions about your codebase architecture early on, those wrong assumptions remain in context and influence later responses—even after you provide corrections. The model gets "lost" and cannot recover from early missteps.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Crucial skill:&lt;/strong&gt;&lt;br&gt;
Think about your first interaction with a LLM that used to be a few messages back and forth. Now with ever-growing turns the agents will&lt;br&gt;
take to fulfill your task, the context will necessarily grow and be expanded with information over time. Keeping this context relevant and focused is a crucial skill to master.&lt;/p&gt;
&lt;h3&gt;
  
  
  Context window management strategies
&lt;/h3&gt;

&lt;p&gt;How to manage the context window effectively?&lt;br&gt;
Lance Martin wrote an exceptional article about &lt;a href="https://rlancemartin.github.io/2025/06/23/context_engineering/" rel="noopener noreferrer"&gt;context engineering strategies&lt;/a&gt; and describes four main techniques to do so:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✏️ Writing context means saving it outside the context window to help an agent perform a task.&lt;/li&gt;
&lt;li&gt;🔎 Selecting context means pulling it into the context window to help an agent perform a task.&lt;/li&gt;
&lt;li&gt;🗜️ Compressing context involves retaining only the tokens required to perform a task.&lt;/li&gt;
&lt;li&gt;✂️ Isolating context involves splitting it up to help an agent perform a task.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each strategy is categorized by an emoji that we will use throughout this post to highlight which strategy we think is employed.&lt;/p&gt;
&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;p&gt;Equipped with that knowledge, let's dive into the lessons learned from our journey with agentic engineering:&lt;/p&gt;
&lt;h3&gt;
  
  
  🔎 Don't set yourself up for failure
&lt;/h3&gt;

&lt;p&gt;If your initial setup is already bad, you will have a hard time achieving good results with coding agents. This often happens when it is not tuned to your specific environment or tries to do too many things at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provide a &lt;strong&gt;focused&lt;/strong&gt; &lt;code&gt;Claude.md/agents.md&lt;/code&gt; file. It should be kept compact and relevant for all use cases, thus task specific instructions should be provided in the prompt/spec, &lt;code&gt;/&amp;lt;command&amp;gt;&lt;/code&gt; or subagent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain&lt;/strong&gt; your &lt;code&gt;Claude.md/agents.md&lt;/code&gt; file. Outdated or conflicting information will poison the context. In our case, we forgot to update how we ran tests after a migration, which made the agent ever-rediscovering how to actually run tests wasting tokens.&lt;/li&gt;
&lt;li&gt;Only keep mcp servers that you &lt;strong&gt;really&lt;/strong&gt; need. Having conflicting tool information or overlapping mcp definitions will lead to context consfusion and clash. It should be &lt;strong&gt;versioned&lt;/strong&gt; and reviewed by the team. Use &lt;code&gt;context&lt;/code&gt; to identify what is actually used.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jta4bh8ll24lc0lamx7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4jta4bh8ll24lc0lamx7.png" alt=" " width="631" height="395"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  🔎 Codebase is context too (or make your codebase agent friendly)
&lt;/h3&gt;

&lt;p&gt;While you can put a lot of work into optimizing the your prompts and context files like documentation, it will only get you so far if you do not make the codebase itself agent friendly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code architecture/patterns:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not only the prompt/spec is context, the codebase itself is context too. Since the model cannot "guess" your codebase, it will scan and crawl through to find files, analyze coding patterns, and understand the architecture. Thus, it is crucial to have a clean and consistent codebase to get good results.&lt;br&gt;
If the codebase has lots of inconsistencies, bad patterns, or is just plain messy, the agent will struggle to understand it and will even use it as reference for generating and aligning new code.&lt;br&gt;
Then you have the good old &lt;code&gt;garbage in -&amp;gt; garbage out&lt;/code&gt; problem: Since the LLM will only predict the next most likely token, if the code is bad, the output will likely be bad too.&lt;/p&gt;

&lt;p&gt;To overcome this, you need to do the stuff most of us should do anyway: Refactor, clean up, and improve the codebase. For example by having uniform structure, implement well-known patterns and keep complex abstractions at bay. Ironically, by making the codebase agent friendly, your fellow humans will also benefit from it 🤡.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent friendly tooling:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can also make your codebase more agent friendly by using well-known tools and precise scripts, that will not stress your context window that much. For example, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;centralize your logging&lt;/strong&gt; into a single place with service prefixes. E.g. route your backend, DB and &lt;a href="https://github.com/mitsuhiko/vite-console-forward-plugin" rel="noopener noreferrer"&gt;frontend logs&lt;/a&gt; to one central log file. Now the Agent can &lt;code&gt;grep/tail&lt;/code&gt; for the service and find all relevant logs in one place every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix noisy tools&lt;/strong&gt; like test runners or linters that might produce a lot of output with verbose data.&lt;/li&gt;
&lt;li&gt;Create &lt;strong&gt;helper scripts&lt;/strong&gt; that will do the heavy lifting for the agent. If you have existing scripts, ensure that they are easy to use and that configuration is documented. This could be api generators, build scripts, or even tools that get you the &lt;a href="https://github.com/automazeio/ccpm/blob/d01e80af9b52582058a76671e8f3b4a2448cc050/.claude/scripts/pm/next.sh#L21" rel="noopener noreferrer"&gt;next task like this example&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🔎/🗜️ Watch the context window size
&lt;/h3&gt;

&lt;p&gt;As a general rule of thumb, keep the context window usage below 60%. This will ensure that the model has enough "space" to reason and generate output. If you go above that, you often start to see issues like context distraction and confusion.&lt;br&gt;
When starting a task that will likely exceed the context window, plan ahead and clear or summarize the context deliberately. This can be done by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;clearing the context with &lt;code&gt;/clear&lt;/code&gt; or starting a new session. Ensure you have a working todo-list or &lt;code&gt;todo.md&lt;/code&gt; file to keep track of what needs to be done.&lt;/li&gt;
&lt;li&gt;summarizing the context with &lt;code&gt;/compact&lt;/code&gt; and provide the summary back to the agent. ⚠️ Be careful with this, as the summary might miss important details. So far, we found this to be less effective than expected.&lt;/li&gt;
&lt;li&gt;use the &lt;code&gt;/context&lt;/code&gt; command that will provide you a good overview of the current context window usage and what is taking up the most space.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🔎 Paradigm shift: Read the spec!
&lt;/h3&gt;

&lt;p&gt;For most non-trivial tasks, it really pays of to spend some time to create and evaluate an implementation plan together with the agent (duh!). This will help to get a common understanding of the task and will also help to keep the context focused.&lt;br&gt;
While you don't have to go all-in into spec driven development, simply leveraging Claude's plan mode or generating a markdown file with spec and task breakdown can often be sufficient for smaller tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8owg8fpdk58y9yu5yqbq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8owg8fpdk58y9yu5yqbq.png" alt=" " width="500" height="680"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But if you choose to go down that path, make sure to keep to actually read the the spec &lt;strong&gt;thoroughly&lt;/strong&gt;. You might feel amazed by the agent's well structured output, but it might not align with your expectations or simply is contradictory to the codebase at hand.&lt;br&gt;
This can lead to context poisoning and distraction, as the agent will keep referencing the spec and try to align with it.&lt;br&gt;
It may sounds trivial, but being hyped often leads to skimming through the spec and missing important details.&lt;/p&gt;

&lt;p&gt;Behind this there actually a bigger paradigm shift that is approaching fast: The role of the developer is shifting from spending big chunks of time writing code to spending more time on planning, specifying, and validating. For many usecases, the actual coding part is becoming less important, as the agent can take over a lot of that work.&lt;br&gt;
Still &lt;strong&gt;feeling productive while writing less code&lt;/strong&gt; and reading through generated specs is a skill that needs to be learned and practiced.&lt;/p&gt;
&lt;h3&gt;
  
  
  ✏️ Outsource context via filesystem
&lt;/h3&gt;

&lt;p&gt;One nice way of outsourcing context is to leverage the filesystem. Since the filesystem is not limited by the context window, you can use it to store large amounts of data that the agent can reference when needed.&lt;br&gt;
By providing tools like bash scripts or subagents (see below) that write to disk but return a summary/crucial info, you keep the context window small and focused while giving the agent the choice to access more information when needed.&lt;/p&gt;
&lt;h3&gt;
  
  
  ✏️ Directives: Hansel and Gretel
&lt;/h3&gt;

&lt;p&gt;Instead of writing elaborate prompts and precisely embed example or crucial files for the task, you can also use comment directives as breadcrumbs 🍞 to guide the agent.&lt;br&gt;
Giuseppe Gurgone wrote a great article about &lt;a href="https://giuseppegurgone.com/comment-directives-claude-code" rel="noopener noreferrer"&gt;comment directives&lt;/a&gt; and how to use them effectively.&lt;br&gt;
Basically, you scatter certain type of comments across your codebase with some extra information on top. Then, when firing up the agent,&lt;br&gt;
instruct it to look for those comments and use them as context. This way, you can provide a lot of relevant information very locally without bloating the context window upfront.&lt;/p&gt;

&lt;p&gt;For example, you when you want to finish a PR and know that in a follow-up PR you want address certain aspects, you can add a comment directive like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/* @Implement
 * Replace with generated api-client
 * timeouts should be configurable
 * no retries
 * /
async fetchUserData(userId: string): Promise&amp;lt;UserData&amp;gt; {
/...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ✂️/🗜️ Subagents: Researchers, not implementers
&lt;/h3&gt;

&lt;p&gt;Subagents are a great way of &lt;strong&gt;isolating context&lt;/strong&gt;, parallelizing tasks and specializing on certain aspects. But they are not a silver bullet and anthropomorphizing them into human roles did not play well for us.&lt;br&gt;
The main lesson learned here is that subagents currently work best as researchers, not implementers. This means that they can be used nicely to gather information, explore possibilities, and provide insights that the main agent can then use to make decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faddilll0xa32fuvzvcet.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faddilll0xa32fuvzvcet.gif" alt="subagent" width="400" height="298"&gt;&lt;/a&gt;&lt;/p&gt;
Unsupervised Subagents "fixing" code in parallel



&lt;p&gt;If you use subagents as implementers, you often run into issues like context clash and confusion, as the subagent might take actions that are not &lt;em&gt;aligned with the main agent's goal&lt;/em&gt;. Also, spawning subagents in parallel could lead to conflicting writes that the main agent then needs to resolve.&lt;br&gt;
Instead, by using them for aggregating and collecting data, you can leverage their strengths without running into those issues. Plus, read-only tasks can usually be parallelized without any headaches.&lt;br&gt;
We used subagents for tasks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;finding/traversing codebase for certain patterns&lt;/li&gt;
&lt;li&gt;analyzing logs/code for certain errors&lt;/li&gt;
&lt;li&gt;checking deployment status/issues of platforms like Kubernetes by providing read-only access to &lt;code&gt;kubectl&lt;/code&gt; or cloud CLIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test runner and build scripts&lt;/strong&gt;&lt;br&gt;
Okay, we lied. There are special use case for subagents as implementers we found useful: running tests.&lt;br&gt;
Our tests were quite verbose (see making codebase agent friendly 🫠) and often produced a lot of output (roughly 40k tokens!).&lt;br&gt;
By using a subagent to run the tests, we could isolate the context and only provide the relevant summarized output back to the main agent. This way, the main agent could focus on the task at hand without being distracted by the noise of the test output.&lt;br&gt;
The same goes for build scripts like &lt;code&gt;npm build&lt;/code&gt; or &lt;code&gt;docker build&lt;/code&gt;. They often produce a lot of output that is not relevant for the main task but were sometimes used to verify if the code can be built successfully, but ymmv.&lt;/p&gt;

&lt;p&gt;For example, this could be baked into your subagent definition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Execute tests efficiently across all services in the monorepo
2. Interpret test results and provide actionable feedback
3. Identify the most appropriate test commands based on user context
4. Provide clear summaries of test outcomes with specific error details when failures occur
5. You do not modify source code or fix bugs; your role is strictly to run tests and report results. You can suggest next steps but do not implement them.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Outlook: Spec driven development 📝
&lt;/h2&gt;

&lt;p&gt;While the lessons learned so far are quite generic and can be applied to most agentic engineering setups, there is one aspect that is worth mentioning in more detail: Spec driven development.&lt;br&gt;
With tools like &lt;a href="https://kiro.dev/" rel="noopener noreferrer"&gt;Kiro&lt;/a&gt; or &lt;a href="https://github.com/github/spec-kit/tree/main" rel="noopener noreferrer"&gt;Spec Kit&lt;/a&gt; gaining traction quickly, spec driven development is becoming more and more popular.&lt;br&gt;
This is an entire topic for itself, so we will cover it in more detail in the next volume of this series - so stay tuned.&lt;/p&gt;

&lt;p&gt;Context engineering will still apply, as this is just a more formalized, thorough and elaborate way of providing the right context well aligned with your goals.&lt;br&gt;
As with anything in the agentic engineering space, it is still early days and we are just scratching the surface of what is possible. Take it with a grain of salt and enjoy the ride!&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html" rel="noopener noreferrer"&gt;https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html" rel="noopener noreferrer"&gt;https://www.dbreunig.com/2025/06/26/how-to-fix-your-context.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://rlancemartin.github.io/2025/06/23/context_engineering/" rel="noopener noreferrer"&gt;https://rlancemartin.github.io/2025/06/23/context_engineering/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://steipete.me/posts/2025/essential-reading" rel="noopener noreferrer"&gt;https://steipete.me/posts/2025/essential-reading&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/" rel="noopener noreferrer"&gt;https://lucumr.pocoo.org/2025/7/30/things-that-didnt-work/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://giuseppegurgone.com/comment-directives-claude-code" rel="noopener noreferrer"&gt;https://giuseppegurgone.com/comment-directives-claude-code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cognition.ai/blog/dont-build-multi-agents" rel="noopener noreferrer"&gt;https://cognition.ai/blog/dont-build-multi-agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-03-own-your-context-window.md" rel="noopener noreferrer"&gt;https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-03-own-your-context-window.md&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nathanonn.com/how-a-read-only-sub-agent-saved-my-context-window-and-fixed-my-wordpress-theme/" rel="noopener noreferrer"&gt;https://www.nathanonn.com/how-a-read-only-sub-agent-saved-my-context-window-and-fixed-my-wordpress-theme/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>agentic</category>
      <category>contextengineering</category>
    </item>
    <item>
      <title>The RAG Autonomy Spectrum: A Guide to Designing Smarter AI Systems</title>
      <dc:creator>Dustin</dc:creator>
      <pubDate>Wed, 11 Jun 2025 07:09:44 +0000</pubDate>
      <link>https://dev.to/duske/the-rag-autonomy-spectrum-a-guide-to-designing-smarter-ai-systems-5eg2</link>
      <guid>https://dev.to/duske/the-rag-autonomy-spectrum-a-guide-to-designing-smarter-ai-systems-5eg2</guid>
      <description>&lt;p&gt;When building a LLM-powered application, having a good overview of possible cognitive architectures patterns can be a key factor in designing effective systems.&lt;br&gt;
Too quickly you can get caught up in the details or latest AI hype, and lose sight of the bigger picture.&lt;br&gt;
Which parts shall be LLM-powered? What parts should be fixed to ensure reproducibility and reliability?&lt;br&gt;
So today we will explore some of the most common cognitive architectures patterns and how they can be applied. As the application at hand can vary tremendously in terms of size, complexity and requirements, we will focus on implementing simple &lt;strong&gt;RAG&lt;/strong&gt; (Retrieval Augmented Generation) systems as a use case to illustrate the concepts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozoqx9634ovgqt1k0eck.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozoqx9634ovgqt1k0eck.jpg" alt="llm meme" width="500" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Building Blocks of AI: Exploring Cognitive Architectural Patterns
&lt;/h2&gt;

&lt;p&gt;So what do we mean with cognitive architectures patterns? We lend this term from a thought-provoking &lt;a href="https://blog.langchain.dev/what-is-a-cognitive-architecture/" rel="noopener noreferrer"&gt;post by Harrison Chase (Langchain)&lt;/a&gt;, in which he classifies architectures for AI by their level of autonomy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4tyo5sj504kzt8rsa23.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4tyo5sj504kzt8rsa23.png" alt="autonomy levels" width="800" height="512"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 1: Cognitive Architectural Patterns by Harrison Chase &lt;a href="https://blog.langchain.dev/what-is-a-cognitive-architecture/" rel="noopener noreferrer"&gt; [Source] &lt;/a&gt;



&lt;p&gt;Let's go through them quickly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;1. Level: Code&lt;/strong&gt; &lt;br&gt;&lt;br&gt;
Every step and call is hard coded. This is classic code without any LLM involvement.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;2. Level: LLM Call&lt;/strong&gt;&lt;br&gt;
The first level that includes a LLM call - for example translating a selected text. The developer still defines when this single step will be invoked, e.g. receiving and sanitize the text (code), translate using a model (LLM), and post-process and return the response (code).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;3. Level: Chain&lt;/strong&gt;&lt;br&gt;
Instead of using only a single LLM-powered step, you leverage multiple LLM-calls in a defined order to make your application more powerful. For example, you could invoke a model a second time to summarize the content, so that your user gets a brief news feed in the target language.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;4. Level: Router&lt;/strong&gt;&lt;br&gt;
Now we're leaving the realm of applications which steps are defined a priori by the developer. Previously, we called a LLM within a step to produce a result, but here it acts as a router that decides which step to invoke next based on the input and context. This increased flexibility allows for more dynamic and adaptive applications but also more unpredictable results. Note that we do not introduce cycles here, so it represents a directed acyclic graph (DAG). Imagine a web crawler that scans company websites and extracts relevant information, then uses a router to grade and decide whether to add these companies to a list or not.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;5. Level: State Machine&lt;/strong&gt;&lt;br&gt;
We are now entering the field of agents, by adding cycles to a DAG and turning it into a state machine. This allows for even more adaptive applications, enabling the AI to refine its actions and repeating steps until a certain outcome is achieved (please set an iteration/recursion limit 👀). For instance, an agentic web crawler could just be given a an instruction which kind of companies are relevant to the user's interests. The crawler would then iterate through the websites, extracting relevant information, grading it, and deciding whether to add the company to the list or not. When the match quality is below a certain threshold, the crawler could refine the given instruction and try again until it meets the desired outcome. Despite all that variability, the developer stil controls which steps can be taken at any time thus having the rough game plan in their hands.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;6. Level: Autonomous Agents&lt;/strong&gt;&lt;br&gt;
The agent is now also in control in what kind of tools/steps are available. The system will just be given an initial instruction and a set of tools. It can then decide, which steps to take or tools to call. It could also refine prompts or explore and add new tools to its arsenal. While most powerful, it is also the most unpredictable and requires careful monitoring and control.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ℹ️ Please note that there are more ways of classifying LLM-powered agents by their autonomy levels. The &lt;a href="https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents" rel="noopener noreferrer"&gt;smolagents&lt;/a&gt; library starts with level 2/3 as base level and &lt;strong&gt;is more granular in the agentic realm&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG: Grounding AI in Reality
&lt;/h2&gt;

&lt;p&gt;Now that we have established possible levels of autonomy, let's see how we can apply it to one of the most common use case scenarios: Retrieval Augmented Generation (RAG)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A quick primer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditionally, large language models (LLMs) have been limited by their reliance on outdated knowledge, susceptibility to hallucination, and inability to access private or real-time data. These limitations have hindered their ability to provide accurate and context-rich responses. To address these challenges, RAG was developed. By using some kind of knowledge base and binding a retrieval mechanism to the LLM, we can achieve factual grounding, specialize on specific domains, provide recent information as well as citations/sources and control what data can be accessed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is this still relevant?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Do we even need RAG today? Aren't LLM's context window ever increasing and aren't they getting better at understanding context?&lt;br&gt;
While this development is true, there are still striking points and usecases that render RAG a suitable choice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;if your data is mostly static and you need a needle-in-haystack search&lt;/li&gt;
&lt;li&gt;accuracy: LLM can struggle with large contexts, especially when &lt;a href="https://arxiv.org/pdf/2307.03172.pdf" rel="noopener noreferrer"&gt;the data is "lost" in the middle&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;cost: less tokens means greater latency and more costs per call&lt;/li&gt;
&lt;li&gt;volume: deal with thousands of documents&lt;/li&gt;
&lt;li&gt;no comprehensive understanding of a full document is required (e.g. code, summarization, analysis)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A key thing to remember is that the Retrieval part in RAG does not mean vector embedding search. You can (and often should) retrieve data by various ways,&lt;br&gt;
like a keyword-based search or a hybrid approach for example.&lt;br&gt;
For the sake of brevity, we skip a deep dive into RAG techniques here as this is such a broad topic that we might cover in a future post.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG autonomy evolution
&lt;/h2&gt;

&lt;p&gt;Now, with our cognitive architectural patterns in hand, we can nicely dissect common RAG techniques and rank them based on their autonomy levels.&lt;br&gt;
This should give you a practical understanding on how such levels could be applied in the real world.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Legend:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;span&gt; &lt;/span&gt; LLM call: blue&lt;/li&gt;
&lt;li&gt;
&lt;span&gt; &lt;/span&gt; Router : red decision&lt;/li&gt;
&lt;li&gt;
&lt;span&gt; &lt;/span&gt; Query: yellow&lt;/li&gt;
&lt;li&gt;
&lt;span&gt; &lt;/span&gt; Response: green&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Level 1: Classic Search
&lt;/h3&gt;

&lt;p&gt;While this is not RAG, it serves as the common ground where how traditional and simple retrieval systems can be designed.&lt;br&gt;
The user sends a query, the system looks for relevant documents in a knowledge base and returns them as a response. This is the pure "retrieval" step, no LLM involved.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotf815hsspzp2e49h5lc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotf815hsspzp2e49h5lc.png" alt="Classic Search" width="446" height="510"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 2: Classic Search



&lt;h3&gt;
  
  
  Level 2: Classic RAG
&lt;/h3&gt;

&lt;p&gt;This is the classic RAG pattern, where the system retrieves relevant documents, augments the context with them, and then generates a response using an LLM.&lt;br&gt;
As we are on level 2, we only incorporate a single LLM call (blue box) , in this case to generate the output. All the other steps are known ahead of time making it a linear process that is easy to grasp. In many cases, the knowledge base is a vector database, but it can also be a keyword-based search or any other retrieval mechanism.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tlgl5840tsl991i0jh8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tlgl5840tsl991i0jh8.png" alt="Classic RAG" width="800" height="491"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 3: Classic RAG



&lt;p&gt;Also querying multiple knowledge bases in parallel is still considered a level 2 RAG technique, as we do not introduce any additional LLM calls to improve the retrieval process. The LLM is only used to generate the final response based on the retrieved documents. This is shown in the figure below, where we retrieve documents from two different knowledge bases and then generate a response based on the combined context (e.g. by using reciprocal rank fusion (RRF)).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy0zi39v8xwonx9nmvy7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy0zi39v8xwonx9nmvy7.png" alt="RAG with multi-query" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 4: RAG with multi-query and RRF



&lt;h3&gt;
  
  
  Level 3: Chained RAG
&lt;/h3&gt;

&lt;p&gt;Here we introduce multiple LLM calls (blue boxes) to improve the system's capabilities. There are many RAG implementations are implemented this way, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://arxiv.org/abs/2305.14283" rel="noopener noreferrer"&gt;Rewrite-Retrieve-Read (RRR)&lt;/a&gt;: The initial query is rewritten to improve its quality to hopefully retrieve relevant documents (Figure 5).
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwiq53v7ef1wp0q8ioo3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwiq53v7ef1wp0q8ioo3.png" alt="Rewrite-Retrieve-Read RAG" width="800" height="361"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 5: Rewrite-Retrieve-Read RAG





&lt;ul&gt;
&lt;li&gt;Rerank RAG: After retrieving documents, we can rerank them based on their relevance to the query. This can be done by using a second LLM call to score the documents or by using a separate ranking model (Figure 6).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzg0dyabb9scxdepboyzx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzg0dyabb9scxdepboyzx.png" alt="Rerank RAG" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 6: Rerank RAG



&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://aclanthology.org/2023.acl-long.99/" rel="noopener noreferrer"&gt;Hypothetical Document Embeddings&lt;/a&gt; (HyDE) : This technique generates hypothetical document embeddings based on the query and then retrieves documents that are similar to these embeddings. This can be used to improve the retrieval quality by generating embeddings that are more relevant to the query.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Of course, nobody stops you from combining the techniques above: Rewrite the query, retrieve documents from multiple knowledge bases, and then rerank them before generating the final response. From an architectural perspective, this is still a linear process, as you know every step and when it will be run a priori.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 4: RAG with Routers
&lt;/h3&gt;

&lt;p&gt;On level 4, LLMs take over parts of the control flow and decide which step to take next based on the input and context. This allows for more dynamic and adaptive RAG systems, where the LLM can choose taking additional steps to improve results retrieval technique or decide whether to rerank documents or not.&lt;/p&gt;

&lt;p&gt;In the example below (Figure 7), the &lt;a href="https://arxiv.org/pdf/2401.15884" rel="noopener noreferrer"&gt;corrective RAG (CRAG) pattern&lt;/a&gt; is implemented. After retrieving documents, the LLM grades the documents with a score. If the documents fall below a certain threshold, a corrective step is taken by invoking a web search to find more relevant documents. This is the first time we see a LLM-powered router in action, as it decides whether to take the corrective step or not based on the retrieved documents' quality. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felb4zuv25nv4fyjhl6it.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Felb4zuv25nv4fyjhl6it.png" alt="Corrective RAG" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 7: Corrective RAG



&lt;p&gt;Note that we do not introduce cycles here, so it still represents a directed acyclic graph (DAG). You still know all the steps of this linear process and when they could be invoked, but the LLM decides whether to take them or not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 5: RAG with State Machines
&lt;/h3&gt;

&lt;p&gt;By adding cycles, these agentic RAG techniques can perform reflexive actions by observing and evaluating the results of previous steps, and then deciding whether to take corrective actions or not. Then it can restart (parts of) the process until a certain outcome is achieved. A rather complex example is &lt;a href="https://arxiv.org/abs/2310.11511" rel="noopener noreferrer"&gt;Self-RAG&lt;/a&gt; (Figure 8), that leverages three grading steps (routers) to check for relevant documents, a grounded response and the usefulness w.r.t. to the question.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9cdxhfhxoeoojngy6bd6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9cdxhfhxoeoojngy6bd6.png" alt="Self-RAG" width="800" height="242"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 8: Self-RAG



&lt;p&gt;Taking a look at this architecture, we can see how many parts of the process are controlled by the LLM. This allows a more adaptive system, but the complexity also increases. &lt;br&gt;
Using structured responses and having a proper tracing in place are crucial to reason about the system's behavior and to debug it when things go wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Level 6: Autonomous RAG Agents
&lt;/h3&gt;

&lt;p&gt;One might think that a RAG technique on level 6 must be so complex that it cannot fit on the screen, when we take a look at the previous example. But in fact, the base is quite simple: The LLM is given an instruction and a set of tools (for example retrieval techniques) and can then decide which steps to take.&lt;br&gt;
This means, that we do not know ahead of time, which steps will be taken, how many times they will be invoked, and in which order. &lt;br&gt;
To fully fulfill the autonomy level 6, the LLM should also be able to refine its instruction and add new tools to its arsenal. One super interesting approach for this is &lt;a href="https://arxiv.org/abs/2402.01030" rel="noopener noreferrer"&gt;CodeAct&lt;/a&gt;, which allows LLMs to write and execute code on the fly. Applied to our use case, it could write a new retrieval technique based on the user's needs and then use it to retrieve relevant documents 🤯.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzyct01pwq7uvghfmwb1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzyct01pwq7uvghfmwb1.png" alt="autonomous rag" width="632" height="571"&gt;&lt;/a&gt;&lt;/p&gt;
Figure 9: Autonomous RAG



&lt;h2&gt;
  
  
  The right tool for the job 🔨
&lt;/h2&gt;

&lt;p&gt;Does this mean that we should always strive for the highest autonomy level? Not necessarily. While higher autonomy levels can lead to more adaptive and powerful systems, they also come with increased complexity, unpredictability, and potential for failure. &lt;br&gt;
Especially when dealing with large amount of rather static data, a simpler RAG technique might be more suitable. In general, &lt;a href="https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents#-when-to-use-agents---when-to-avoid-them" rel="noopener noreferrer"&gt;it is advised&lt;/a&gt; to use deterministic and less autonomous approaches the more you know the workflow in advance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple Agents?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On the other hand people building coding agents &lt;a href="https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-for" rel="noopener noreferrer"&gt;report&lt;/a&gt;, that an agent equipped with retrieval tools like can outperform more complex systems that rely on advanced vector embeddings and indices. &lt;a href="https://x.com/jobergum/status/1928355375847248108" rel="noopener noreferrer"&gt;It also has been shown&lt;/a&gt; that for deep research contexts, a simple combination of keyword search like BM25 and an agent can achieve on par results compared to complex RAG systems, while having low inference and low storage requirements costs and complexity. This breaks with common beliefs that large volume of data requires complex vector embeddings for an agentic use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In the evolving landscape of AI, cognitive architecture patterns provide a structured approach to design and compare LLM-powered systems. From simple code to complex autonomous agents, each level of autonomy offers its own advantages and challenges. &lt;br&gt;
While more autonomy brings more complexity, it also opens doors to adaptive and powerful systems that can reason, plan, and execute tasks in ways that were previously unfeasible. As with nearly any topic in software architecture, there is no one-size-fits-all solution. &lt;em&gt;Start with the simplest architecture that meets your needs, scaling autonomy only when tasks require dynamic decision-making.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;An interesting trend is the rise of Agentic RAG, which combines the power of retrieval with the flexibility of agents. Especially when taking into account &lt;a href="https://www.latent.space/p/why-mcp-won" rel="noopener noreferrer"&gt;the rise of&lt;/a&gt; &lt;a href="https://duske.me/posts/mcp/" rel="noopener noreferrer"&gt;Model Context Protocol (MCP)&lt;/a&gt;, new datasources and tools can be added on the fly, allowing agentic systems to adapt to new requirements without the need for complex redesign or reconfiguration. What we are particularly excited about is the potential of simple tools like keyword search to be used effectively in Agentic systems, proving that sometimes simple tools, wielded wisely, amplify its power.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/smolagents/examples/rag" rel="noopener noreferrer"&gt;smolagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.dev/what-is-a-cognitive-architecture" rel="noopener noreferrer"&gt;https://blog.langchain.dev/what-is-a-cognitive-architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents" rel="noopener noreferrer"&gt;https://huggingface.co/docs/smolagents/conceptual_guides/intro_agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pashpashpash.substack.com/p/understanding-long-documents-with" rel="noopener noreferrer"&gt;https://pashpashpash.substack.com/p/understanding-long-documents-with&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-fo" rel="noopener noreferrer"&gt;https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-fo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://x.com/jobergum/status/1928355375847248108" rel="noopener noreferrer"&gt;https://x.com/jobergum/status/1928355375847248108&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://langchain-ai.github.io/langgraphjs/tutorials/rag/langgraph_self_rag/" rel="noopener noreferrer"&gt;https://langchain-ai.github.io/langgraphjs/tutorials/rag/langgraph_self_rag/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://langchain-ai.github.io/langgraphjs/tutorials/rag/langgraph_crag/" rel="noopener noreferrer"&gt;https://langchain-ai.github.io/langgraphjs/tutorials/rag/langgraph_crag/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/smolagents/v1.17.0/en/conceptual_guides/intro_agents#code-agents" rel="noopener noreferrer"&gt;https://huggingface.co/docs/smolagents/v1.17.0/en/conceptual_guides/intro_agents#code-agents&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Papers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2402.01030" rel="noopener noreferrer"&gt;Executable Code Actions Elicit Better LLM Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2310.11511" rel="noopener noreferrer"&gt;Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2401.15884.pdf" rel="noopener noreferrer"&gt;Corrective Retrieval Augmented Generation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aclanthology.org/2023.acl-long.99/" rel="noopener noreferrer"&gt;Precise Zero-Shot Dense Retrieval without Relevance Labels&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2305.14283" rel="noopener noreferrer"&gt;Query Rewriting for Retrieval-Augmented Large Language Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Originally posted at &lt;a href="https://engineering.simpl.de/post/rag_autonomy/" rel="noopener noreferrer"&gt;SIMPL engineering blog&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>architecture</category>
      <category>rag</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Model Context Protocol (MCP) - Should I stay or should I go? 🎶</title>
      <dc:creator>Dustin</dc:creator>
      <pubDate>Mon, 10 Mar 2025 22:12:17 +0000</pubDate>
      <link>https://dev.to/duske/model-context-protocol-mcp-should-i-stay-or-should-i-go-3c9n</link>
      <guid>https://dev.to/duske/model-context-protocol-mcp-should-i-stay-or-should-i-go-3c9n</guid>
      <description>&lt;p&gt;In this article, we'll explore the Model Context Protocol (MCP) briefly and help you decide whether it deserves your attention or can be safely ignored for now.&lt;br&gt;
The AI landscape has been buzzing with excitement around Large Language Models (LLMs), and MCP has emerged as one of the key protocols in this rapidly evolving ecosystem.&lt;/p&gt;

&lt;p&gt;As with any hype, it is important to take a step back and understand the basics before onboarding the train - choo choo 🚂!&lt;br&gt;
So here is my shot at explaining the MCP use cases and their benefits.&lt;/p&gt;

&lt;p&gt;Go to TL;DR section.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting external data into the LLM
&lt;/h2&gt;

&lt;p&gt;Traditionally, LLMs are trained on a vast amount of data at a certain point in time. That means, there is going to be a cut-off point of data that is available to the LLM - anything newer than that is not available.&lt;br&gt;
However, in many cases, you want to use LLMs to process data that is stored outside of the training data.&lt;br&gt;
This can be for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;customer support chats&lt;/li&gt;
&lt;li&gt;product descriptions&lt;/li&gt;
&lt;li&gt;the latest memes on the internet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Smart people came up with a number of ways to do that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Finetuning the model on the data. This is a very time consuming process and not scalable.&lt;/li&gt;
&lt;li&gt;Use a so called "Knowledge Base" to store the data and use RAG (Retrieval-Augmented Generation) to answer questions. Good fit for knowledge retrieval tasks.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://platform.openai.com/docs/guides/function-calling" rel="noopener noreferrer"&gt;Function calling&lt;/a&gt;: Provide functions (e.g. custom code) with semantic meaning to the LLM. Then the LLM  can decide, to let the function run or not. For example, the prompt could be: "Please check if the user is eligible for a discount" and the function could be a &lt;code&gt;check_discount_eligibility&lt;/code&gt; function.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fn9mbraefvihwqj816h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fn9mbraefvihwqj816h.png" alt="Options for Integrating External Data into LLM Apps" width="800" height="625"&gt;&lt;/a&gt;&lt;br&gt;
Options for Integrating External Data into LLM Apps&lt;/p&gt;

&lt;h2&gt;
  
  
  Design time vs run time
&lt;/h2&gt;

&lt;p&gt;If we look at the 3 techniques above, we can see that they all have one thing in common: They are integrated at &lt;strong&gt;design time&lt;/strong&gt; - meaning, that an engineer needs to carefully integrate the data/code into an application, before the the user can use it.&lt;br&gt;
This works well for many use cases, where you - the developer - have control over the underlying LLM/agent and want achieve the best results.&lt;br&gt;
In addition, as long as the LLM agents are still a bit clunky, constructing the such application requires fine-tuning and adjustments anyway.&lt;/p&gt;

&lt;p&gt;However, once you become a user of such an application, you are not in control of the underlying LLM/agent. For example, when you use cursor editor, you use the agent the help you write code but you don't rewire Cursor's internals.&lt;/p&gt;

&lt;p&gt;This is where MCP server come into play. These servers provide functionality according to a defined protocol - the MCP protocol - and can be integrated at runtime.&lt;br&gt;
Imagine using Cursor, an AI-powered code editor, to write database queries. You don’t control its internal agent, but with an MCP server, you can plug in your Postgres schema at runtime—no need to wait for Cursor’s developers to build it in. This flexibility lets users extend apps instantly, bypassing the delays of design-time updates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is this just an API?
&lt;/h2&gt;

&lt;p&gt;Not quite. Unlike stateless REST APIs or design-time function calling, MCP is an open standardized protocol for applications to provide context to LLMs, using stateful connections, client-server architecture and pre-defined capabilities and messages. It is powered by an JSON-RPC API (not necessarily HTTP) that is defined in the MCP specification and inspired by the &lt;a href="https://langserver.org/" rel="noopener noreferrer"&gt;Language Server Protocol&lt;/a&gt;.&lt;br&gt;
I think lots of confusion arises from the fact, that MCP is often compared to conventional REST APIs or function calling, especially when the use cases are trivial.&lt;/p&gt;

&lt;p&gt;For function calling, the integration is done at design time and the function is part of the application - so there is a mismatch.&lt;br&gt;
Let's take a look at the API vs MCP discussion: At first glance, those are similar as you could design an LLM-powered application, that could consume any OpenAPI-spec compliant API and converts it into tools on the fly.&lt;br&gt;
In fact, there is even an &lt;a href="https://cookbook.openai.com/examples/function_calling_with_an_openapi_spec" rel="noopener noreferrer"&gt;OpenAI cookbook for that&lt;/a&gt;. &lt;br&gt;
Having a stateful 1:1 mapped client-server connection like MCP defines it just to get the weather in a certain city is a bit of an overkill. And if it is just a small stateless REST API, providing an OpenAPI runner is good enough.&lt;/p&gt;

&lt;p&gt;But once you have a more complex use case that involves state or require a deep interaction between the LLM and the application, MCP can be a great fit.&lt;br&gt;
For instance, &lt;a href="https://modelcontextprotocol.io/docs/concepts/sampling#sampling" rel="noopener noreferrer"&gt;sampling&lt;/a&gt; allows servers to request LLM completions through the client maintaining data control while &lt;a href="https://modelcontextprotocol.io/docs/concepts/roots" rel="noopener noreferrer"&gt;roots&lt;/a&gt; define client's resources like filesystems, where the MCP servers should work with.&lt;br&gt;
Of course such more complex workflows require a powerful client, that might be missing in some users' applications. And with any new technique, the debugging and tooling is not as mature as for the battle-proven HTTP APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The network
&lt;/h2&gt;

&lt;p&gt;No, not the network of the internet - but the people and companies behind a standard.&lt;br&gt;
As MCP is designed for AI engineering, it attracts a fresh group of people that are passionate about the future of AI.&lt;br&gt;
Those can then participate in the development of an &lt;strong&gt;open&lt;/strong&gt; standard - which suggests no lock-in. &lt;/p&gt;

&lt;p&gt;What makes it even more interesting is, that it is backed by Anthropic who have a great standing in the developer community thus providing visibility and trust for the long term perspective of the standard.&lt;br&gt;
The more people implement MCP servers, the more attractive it becomes for users to use them as they know, that they will be supported in the future.&lt;br&gt;
This will in turn drive the adoption of MCP and the standard will become more robust and mature. Looking back at the last months, we can definitely see a sharp increase in the number of MCP servers (1100) as well as clients and registries (per &lt;a href="https://www.latent.space/p/why-mcp-won" rel="noopener noreferrer"&gt;Why MCP Won&lt;/a&gt;)&lt;br&gt;
Pair this with a fast evolving roadmap and lessons from similar protocols like LSP (Language Server Protocol) and you (might?) have a recipe for success.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you are &lt;strong&gt;not in control&lt;/strong&gt; of the underlying Agentic LLM, MCP servers can be a great way to add external data and functionality to the application at runtime.&lt;br&gt;
Think of &lt;em&gt;plugins&lt;/em&gt;, that extend the capabilities of the LLM without the need to change the underlying model or the application.&lt;br&gt;
Thus if you are a user, MCPs can supercharge your LLM-powered application with additional capabilities.&lt;/p&gt;

&lt;p&gt;If you are a developer and design the actual system, MCPs can be an overkill if you just want to integrate a stateless (RESTful) API - which is quite common.&lt;br&gt;
Relying on conventional tooling like OpenAPI, function calling or &lt;a href="https://python.langchain.com/docs/integrations/tools/" rel="noopener noreferrer"&gt;third-party toolkits like LangChain's&lt;/a&gt; is good enough for many use cases. So far, proper tools need tailored agent logic to be useful.&lt;/p&gt;

&lt;p&gt;Still, APIs and standards are as powerful as the people behind them and MCP is growing and evolving fast while already have a large group of supporters.&lt;br&gt;
Such network effects can make MCP the de-facto standard for LLM integration in the future - even if it is not perfect for every use case.&lt;br&gt;
As with many topics in the AI space, take predictions with a grain of salt and enjoy the ride.&lt;/p&gt;

&lt;p&gt;For an even deeper dive, check out the &lt;a href="https://www.latent.space/p/why-mcp-won" rel="noopener noreferrer"&gt;Why MCP Won&lt;/a&gt; article by &lt;a href="https://www.latent.space/" rel="noopener noreferrer"&gt;Latent Space&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.latent.space/p/why-mcp-won" rel="noopener noreferrer"&gt;Why MCP Won&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://spec.modelcontextprotocol.io/specification/2024-11-05/" rel="noopener noreferrer"&gt;MCP Specification&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.langchain.dev/mcp-fad-or-fixture/" rel="noopener noreferrer"&gt;MCP: Flash in the Pan or Future Standard?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/integrations/tools/" rel="noopener noreferrer"&gt;Langchain's Tools&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>mcp</category>
      <category>api</category>
      <category>llm</category>
      <category>openapi</category>
    </item>
    <item>
      <title>Multi-tenant database patterns - through a SaaS lens</title>
      <dc:creator>Dustin</dc:creator>
      <pubDate>Tue, 03 Oct 2023 16:39:35 +0000</pubDate>
      <link>https://dev.to/duske/multi-tenant-database-patterns-through-a-saas-lens-1pc5</link>
      <guid>https://dev.to/duske/multi-tenant-database-patterns-through-a-saas-lens-1pc5</guid>
      <description>&lt;p&gt;This article shall give an overview of various popular multi-tenant database patterns and their pros and cons - through a SaaS lens.&lt;br&gt;
That means that we analyze the patterns in terms of their suitability for SaaS applications and their tradeoffs.&lt;br&gt;
While there are many good posts already available on the internet, I want to bring together different naming conventions and patterns in one place, for a &lt;br&gt;
better overview and comparison. &lt;/p&gt;

&lt;h2&gt;
  
  
  An example use case
&lt;/h2&gt;

&lt;p&gt;Let's assume we have a SaaS application that helps customers organize their employees and machines. For the sake of brevity,&lt;br&gt;
it will consists of a web client, a backend API and a database. &lt;br&gt;
It could look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhist0zzlbu2m7q8xqo2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhist0zzlbu2m7q8xqo2m.png" alt="Figure 1: Example app" width="800" height="169"&gt;&lt;/a&gt;&lt;br&gt;
Figure 1: Example app&lt;/p&gt;

&lt;h3&gt;
  
  
  The various scopes of multi-tenancy
&lt;/h3&gt;

&lt;p&gt;Taking a look at the example application, one can imagine that there might be different scopes of multi-tenancy and that is absolutely true.&lt;br&gt;
At the highest level (think of zoom level = 0), we can distinguish between &lt;strong&gt;single-tenant&lt;/strong&gt; and &lt;strong&gt;multi-tenant systems&lt;/strong&gt;, simply meaning that a system is either used by one tenant or multiple tenants:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz2thlo3c3f86taznc233.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz2thlo3c3f86taznc233.png" alt="Figure 2: High level comparison" width="800" height="429"&gt;&lt;/a&gt;&lt;br&gt;
Figure 2: High level comparison&lt;/p&gt;

&lt;p&gt;At the next level (zoom level = 1), we can distinguish between multi-tenancy applied for the backend, the client or the database. &lt;br&gt;
For example, you could have the entire stack (client, backend, database) be single-tenant, meaning that each tenant has its own client, backend and database. Here you essentially achieve the single-tenancy of Figure 1.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqf5vb4w3cef5i73kln7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqf5vb4w3cef5i73kln7.png" alt="Figure 3: Single-tenant stack" width="800" height="272"&gt;&lt;/a&gt;&lt;br&gt;
Figure 3: Single-tenant stack&lt;/p&gt;

&lt;p&gt;Of course, no one is stopping you from applying multi-tenancy on single components, like the backend only for example. Here it means, that some tenants get their own backend, but all tenants share the same database.&lt;br&gt;
In other words, the backend is siloed per tenant, but the database is pooled.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwshweffa481mdmopnobw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwshweffa481mdmopnobw.png" alt="Figure 4: Mixed tenancy example" width="800" height="378"&gt;&lt;/a&gt;&lt;br&gt;
Figure 4: Mixed tenancy example&lt;/p&gt;

&lt;p&gt;Since many cloud-native systems are built with stateless backends for scalability, we will focus on applying multi-tenancy on the database level in this article (zoom level = 2).&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fun0ug92wep8bow9yicol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fun0ug92wep8bow9yicol.png" alt="Figure 4: Multi tenancy for DBs" width="800" height="218"&gt;&lt;/a&gt;&lt;br&gt;
Figure 4: Multi tenancy for DBs&lt;/p&gt;

&lt;h2&gt;
  
  
  Terminology and the various ways of applying multi-tenancy
&lt;/h2&gt;

&lt;p&gt;To get things started, let's define some terms that will be used throughout this article. Please note that we focus on the data layer (database),&lt;br&gt;
so the terms are related to processing and storing data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tenant&lt;/strong&gt;: A tenant is a group of users that share the same database. Usually, a tenant is a customer of the SaaS application.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenant ID&lt;/strong&gt;: A tenant ID is a unique identifier for a tenant. It can be used to identify a tenant in the database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-tenant Database&lt;/strong&gt;: A single-tenant database is a database that is used by only one tenant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-tenant Database&lt;/strong&gt;: A multi-tenant database is a database that is used by multiple tenants.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The patterns
&lt;/h2&gt;

&lt;p&gt;As with nearly every software architecture topic, there is no one-size-fits-all solution. The same is true for multi-tenancy, so let&lt;br&gt;
the tradeoff-festival begin!&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1: Separate database server aka the silo pattern
&lt;/h3&gt;

&lt;p&gt;In this shared-nothing approach, each tenant gets its own database server. This means that each tenant has its own database instance running which enables&lt;br&gt;
maximum isolation between tenants, thus eliminating the noisy-neighbor issue as well and possibly boosting compliance. This pattern is also known as the &lt;strong&gt;silo pattern&lt;/strong&gt;.&lt;br&gt;
It also allows for maximum flexibility in terms of database configuration, since each tenant can have its own configuration.&lt;br&gt;
It strength is also its weakness, since keeping those different servers properly, configured, up-to-date, monitored and backed up is a very resource-intensive task.&lt;br&gt;
Surely Infrastructure-as-Code can help here, but it is still a lot of work. On top of increased complexity in terms of deployment and operation,&lt;br&gt;
this pattern also has a high cost, since each tenant needs its own database server. This pattern is best suited for large tenants that have high security and compliance requirements.&lt;/p&gt;

&lt;p&gt;An implementation of this pattern could look like the image above (single-tenant stack) or like this, if you do not want to set up a dedicated backend API for each tenant as well:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2npxrensptn490cpidc6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2npxrensptn490cpidc6.png" alt="Figure 5: Silo pattern" width="800" height="537"&gt;&lt;/a&gt;&lt;br&gt;
Figure 5: Silo pattern&lt;/p&gt;

&lt;p&gt;Please keep in mind that somewhere the mapping between tenant and database server has to be stored, so that the backend knows which database server to connect to for a given tenant.&lt;br&gt;
Those mapping/lookup components are sometimes called &lt;a href="https://learn.microsoft.com/en-us/azure/azure-sql/database/saas-dbpertenant-provision-and-catalog?view=azuresql#introduction-to-the-saas-catalog-pattern" rel="noopener noreferrer"&gt;catalogs&lt;/a&gt; and could be a simple key-value store or a more sophisticated service registry. Designing a proper catalog is a topic for another article, but it is important to keep in mind that it is needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: Separate by schema or database
&lt;/h3&gt;

&lt;p&gt;This umbrella term describes a pattern where multiple tenants share the same database server, but get their isolation by logical constructs e.g.&lt;br&gt;
having a dedicated database or schema for each tenant. This pattern is also known as the &lt;strong&gt;bridge pattern&lt;/strong&gt;.&lt;br&gt;
By sacrificing some isolation, this pattern reduces the complexity and the costs of the silo pattern, since you do not need to set up a dedicated database server for each tenant.&lt;br&gt;
It also allows for customization, since each tenant can have its own database or schema. This pattern is best suited for tenants that have medium security and compliance requirements under certain conditions.&lt;br&gt;
It is really a hybrid model, where you can benefit or shoot yourself in the foot, depending on the requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it is isolated, but does not provide the same level of isolation as the silo pattern and does not eliminate the noisy-neighbor issue&lt;/li&gt;
&lt;li&gt;it has less infrastructure cost, but suffers from a all-or-nothing availability&lt;/li&gt;
&lt;li&gt;it is flexible and allows customization for tenant specific custom data , but deployment complexity is still high and needs to be thoroughly orchestrated with the backend deployment &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Regardless of the logical construct used, the mapping between tenant and logical construct has to be stored somewhere, so that the backend knows which logical construct to connect to for a given tenant.&lt;br&gt;
If using databases to separate tenants, it could look like this:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrgealseposw7e24pzxs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftrgealseposw7e24pzxs.png" alt="Figure 6: Bridge pattern with separate databases" width="800" height="377"&gt;&lt;/a&gt;&lt;br&gt;
Figure 6: Bridge pattern with separate databases&lt;/p&gt;

&lt;p&gt;If you choose to use schemas to separate tenants, it could look like this:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmb4syfq51d18meuxb0zi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmb4syfq51d18meuxb0zi.png" alt="Figure 7: Bridge pattern with separate schemas" width="800" height="377"&gt;&lt;/a&gt;&lt;br&gt;
Figure 7: Bridge pattern with separate schemas&lt;/p&gt;

&lt;p&gt;This pattern was used often in the past, as it offers higher agility and lower costs, than the silo pattern traditionally. Imho, &lt;br&gt;
these advantages are not as relevant anymore, since the cloud-native movement has made it possible to spin up new and manage database servers with much less overhead.&lt;br&gt;
Still, your team needs to have proper experience and resources if you need that extra control for performance tuning and isolation.&lt;/p&gt;

&lt;p&gt;For even more agility, let's take a look at the next pattern.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: Separate by table column
&lt;/h3&gt;

&lt;p&gt;This pattern is also known as the &lt;strong&gt;pool pattern&lt;/strong&gt;. It is somewhat similar to the bridge pattern, but instead of separating tenants by database or schema, it &lt;strong&gt;separates them by a table column&lt;/strong&gt;. &lt;br&gt;
This means that all tenants share the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;database server&lt;/li&gt;
&lt;li&gt;database&lt;/li&gt;
&lt;li&gt;schema and tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The isolation is achieved that each relevant table has an additional column like &lt;code&gt;tenant_id&lt;/code&gt; that is used to identify and separate the data of different tenants.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9bxc5ro5o98veo1fg7s6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9bxc5ro5o98veo1fg7s6.png" alt="Figure 6: Pool pattern with separation by tenant_id column" width="800" height="334"&gt;&lt;/a&gt;&lt;br&gt;
Figure 6: Pool pattern with separation by tenant_id column&lt;/p&gt;

&lt;p&gt;Note that a true catalog component is not really needed, as the tenant_id is very lightweight and no connection credentials need to be stored.&lt;br&gt;
Such ID can often be kept in a session or in the JWT token of the user.&lt;/p&gt;

&lt;p&gt;Of course, there are obvious drawbacks that might make it unsuitable for your use case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;isolation is achieved by a column, so it is not as strong as the bridge pattern and especially not as strong as the silo pattern&lt;/li&gt;
&lt;li&gt;per-tenant customizations are tricky, since you would need to add additional columns to the tables&lt;/li&gt;
&lt;li&gt;similar issues as bridge pattern:

&lt;ul&gt;
&lt;li&gt;noisy-neighbor can be an issue&lt;/li&gt;
&lt;li&gt;all-or-nothing availability&lt;/li&gt;
&lt;li&gt;limited scalability&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;complex per tenant backups and restores&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;But especially for SaaS-businnesses, there can be &lt;strong&gt;striking reasons&lt;/strong&gt; to follow this approach once combined it with additional technologies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;data isolation can be enforced with Row-Level-Security (RLS) in the database&lt;/li&gt;
&lt;li&gt;if customization can be kept at a minimum: achieve per-tenant customization can be achieved with JSON data types&lt;/li&gt;
&lt;li&gt;with sharding the data by tenant_id, scalability can be achieved (see Citus extension for PostgreSQL)&lt;/li&gt;
&lt;li&gt;It is very easy to add new tenants, since you do not need to set up a new database or schema&lt;/li&gt;
&lt;li&gt;It is straightforward to monitor&lt;/li&gt;
&lt;li&gt;unmatched agility in terms of deployment and operation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;As you can see, there is no one-size-fits-all solution but more like a spectrum of solutions, ranging from maximum isolation to maximum agility.&lt;br&gt;
Depending on your use case, the following questions might help you to decide which pattern to choose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How much isolation do you need? &lt;/li&gt;
&lt;li&gt;How much agility do you need? Do you need to be able to spin up new tenants quickly? How often do you want to deploy?&lt;/li&gt;
&lt;li&gt;What kind of SLA or performance requirements do you have? &lt;/li&gt;
&lt;li&gt;Do you need to have precise cost monitoring/metering per tenant?&lt;/li&gt;
&lt;li&gt;How much customization do you need? Do you want to provide special features for each tenant?&lt;/li&gt;
&lt;li&gt;How much scalability do you need? Do you expects 10s, 100s or 1000s of tenants?&lt;/li&gt;
&lt;li&gt;How much resources and expertise do you have? Do you have a dedicated team/expert for database operations/devops?&lt;/li&gt;
&lt;li&gt;What kind of regulations do you need to follow? E.g. GDPR, ISO 27001?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's up next:&lt;/strong&gt;&lt;br&gt;
For SaaS scale-ups however, the pool pattern can be often a good fit, since it allows for fast iteration cycles through fast deployments and operations.&lt;br&gt;
And as we've put on a SaaS lens, let's keep those goggles and focus on the implementation and the mitigation of risks of the pool pattern in the next article.&lt;/p&gt;

&lt;h3&gt;
  
  
  Useful resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/azure-sql/database/saas-tenancy-app-design-patterns?view=azuresql" rel="noopener noreferrer"&gt;https://learn.microsoft.com/en-us/azure/azure-sql/database/saas-tenancy-app-design-patterns?view=azuresql&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/saas-multitenant-managed-postgresql/matrix.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/prescriptive-guidance/latest/saas-multitenant-managed-postgresql/matrix.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/architecture/guide/multitenant/approaches/storage-data#databases" rel="noopener noreferrer"&gt;https://learn.microsoft.com/en-us/azure/architecture/guide/multitenant/approaches/storage-data#databases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://renatoargh.files.wordpress.com/2018/01/article-multi-tenant-data-architecture-2006.pdf" rel="noopener noreferrer"&gt;https://renatoargh.files.wordpress.com/2018/01/article-multi-tenant-data-architecture-2006.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thenile.dev/blog/multi-tenant-rls" rel="noopener noreferrer"&gt;https://www.thenile.dev/blog/multi-tenant-rls&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/de/blogs/database/choose-the-right-postgresql-data-access-pattern-for-your-saas-application/" rel="noopener noreferrer"&gt;https://aws.amazon.com/de/blogs/database/choose-the-right-postgresql-data-access-pattern-for-your-saas-application/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>architecture</category>
      <category>database</category>
      <category>saas</category>
    </item>
    <item>
      <title>How Kubernetes handles offline nodes</title>
      <dc:creator>Dustin</dc:creator>
      <pubDate>Sun, 08 Mar 2020 18:18:07 +0000</pubDate>
      <link>https://dev.to/duske/how-kubernetes-handles-offline-nodes-53b5</link>
      <guid>https://dev.to/duske/how-kubernetes-handles-offline-nodes-53b5</guid>
      <description>&lt;p&gt;Kubernetes is a great tool for orchestrating containerized workloads on a cluster of nodes. If you've ever experienced the sudden downtime of a node, you maybe came in touch with Kubernetes' rescheduling strategies of deployments that kicked in after some time. In this post I want to highlight how such situations are recognized by the system. This can be helpful to understand and tune rescheduling mechanics or when developing your own operators and resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  The process
&lt;/h2&gt;

&lt;p&gt;In order to demonstrate this process in a more appealing way, the following graphic will be used to visualize the key actions and decisions. We will deal with a system consisting of one master and one node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdmrpf9i8dhd1es0m0aw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdmrpf9i8dhd1es0m0aw.png" alt="kubernetes-timeouts" width="593" height="958"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a healthy system, the kubelet running on the node continuously reports its status to the master (1).&lt;br&gt;
This is controlled by setting the CLI param &lt;code&gt;--node-status-update-frequency&lt;/code&gt; of the kubelet, whose default is 10s.&lt;br&gt;
That way, the master stays informed about the health of the cluster nodes and can schedule pods in a proper way.&lt;/p&gt;

&lt;p&gt;Now (2), the kubelet loses its connection to the master. For instance, the node could have crashed or the network is faulty.&lt;br&gt;
The master obviously cannot be informed about this reason, but when monitoring the nodes the timeout &lt;code&gt;--node-monitor-grace-period&lt;/code&gt; gets checked (3). Per default this timeout is set to &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/" rel="noopener noreferrer"&gt;40 seconds in the controller manager&lt;/a&gt;. That means, that a node has 40 seconds to recover and send its status to the master until the next step (4) is entered.&lt;/p&gt;

&lt;p&gt;If the node can successfully recover, the system stays healthy and continues with the loop (1).&lt;br&gt;
If the node could not respond in the given timeout, its status is set to &lt;code&gt;Unknown&lt;/code&gt; and a second timeout (5) starts. This timeout, called &lt;code&gt;--pod-eviction-timeout&lt;/code&gt;, controls when the pods on the node are ready to be evicted (as well as "Taints and Tolerations" in the next section). The default value is set to 5 minutes.&lt;br&gt;
As soon as the nodes responds within this timeframe (6), the master sets its status back to &lt;code&gt;Ready&lt;/code&gt; and the process can continue with usual the loop at the beginning.&lt;br&gt;
But when this timeout is exceeded with non-responding node, the pods are finally marked for deletion (7).&lt;br&gt;
It should be noted, that these pods are not removed instantly. Instead, the node has to go online again and connect to the master in order to confirm this deletion (&lt;a href="https://github.com/kubernetes/kubernetes/issues/55713#issuecomment-518340883" rel="noopener noreferrer"&gt;2-phase confirmation&lt;/a&gt;).&lt;br&gt;
If that is not possible, for example when the node has left the cluster permanently, you have to remove these pods manually.&lt;/p&gt;
&lt;h2&gt;
  
  
  Taints and Tolerations
&lt;/h2&gt;

&lt;p&gt;Even though you set the eviction timeout &lt;code&gt;--pod-eviction-timeout&lt;/code&gt; to a lower value, you may notice that pods still need 5 minutes to be deleted. This is due to the admission controller that sets a default toleration to every pod, which allows it to stay on a not-ready or unreachable node for period of time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tolerations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node.kubernetes.io/not-ready&lt;/span&gt;
  &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NoExecute&lt;/span&gt;
  &lt;span class="na"&gt;tolerationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;300&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;node.kubernetes.io/unreachable&lt;/span&gt;
  &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Exists&lt;/span&gt;
  &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;NoExecute&lt;/span&gt;
  &lt;span class="na"&gt;tolerationSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see in the default configuration above, the value is set to 300 seconds/5 minutes. One possible solution is to apply a custom configuration to each pod, where this value is adjusted to your needs. You can also &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/" rel="noopener noreferrer"&gt;adjust this setting globally&lt;/a&gt;.&lt;br&gt;
For instance, when a value (&lt;code&gt;tolerationSeconds&lt;/code&gt;) of 20 seconds is chosen, it will take 60 seconds overall for a pod to be deleted, because the &lt;code&gt;--node-monitor-grace-period&lt;/code&gt; value is taken into account before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping it up
&lt;/h2&gt;

&lt;p&gt;I hope that you now got a rough idea about how Kubernetes recognizes and handles offline nodes. Especially the two timeouts as well as the default taints and tolerations configuration can be a caveat.&lt;br&gt;
This can come in handy when you develop an own operator, that has to deal with non-responding nodes. For instance, Kubernetes' deployment controller recognizes these situations automatically and reschedules the configured pods.&lt;br&gt;
This also one of the reasons why you should avoid using "naked" pods, because this helpful handling has to be implemented by you in that case.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Modern multi-architecture builds with Docker</title>
      <dc:creator>Dustin</dc:creator>
      <pubDate>Sat, 04 Jan 2020 22:30:38 +0000</pubDate>
      <link>https://dev.to/duske/modern-multi-architecture-builds-with-docker-51j5</link>
      <guid>https://dev.to/duske/modern-multi-architecture-builds-with-docker-51j5</guid>
      <description>&lt;p&gt;In this post I'm going to explain several ways to build docker images for multiple architectures. With the ongoing rise of ARM-architectures, for example the Raspberry Pi or Amazon's efficient &lt;a href="https://aws.amazon.com/ec2/instance-types/a1/?nc1=h_ls" rel="noopener noreferrer"&gt;EC2 A1-Instances&lt;/a&gt;, multi-architecture builds will probably gain more focus.&lt;/p&gt;

&lt;p&gt;If you're on a single computer, building and running docker images is very easy. The &lt;code&gt;build&lt;/code&gt; command analyzes a given &lt;code&gt;Dockerfile&lt;/code&gt; and runs the specific instructions. To do so, docker uses the kernel of your OS (or your VM, depending on your setup). This can bound the architecture of the image to the host architecture, especially when you compile a binary inside it.&lt;/p&gt;

&lt;p&gt;When using a typical desktop PC, this architecture is probably &lt;code&gt;x86/amd64&lt;/code&gt;. So if you run this image on a different computer with the same architecture, everything is fine. But what to do when a different architecture is the target, for example ARM?&lt;/p&gt;

&lt;h2&gt;
  
  
  Compile programs for different architectures
&lt;/h2&gt;

&lt;p&gt;Generally speaking, 3 typical techniques are used to compile software for a different architecture, which are briefly explained in this section.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Build on the target system
&lt;/h3&gt;

&lt;p&gt;Building your software directly on the target system is obviously the easiest approach, since you probably do not have to change anything in your code.&lt;/p&gt;

&lt;p&gt;Let's consider building software for the ARM architecture. For example, you could transfer your code to a Raspberry Pi, install the toolchain  and build your code there. Since Docker is supported on this device, you can use the same commands as on your desktop computer.&lt;br&gt;
Typical problems with this approach can be the limited access to ARM-powered hardware and since they are often not that powerful, a slow build performance. E.g., compiling dependencies in C is something which can take a lot of time on a Raspberry Pi (though its get faster with every new board).&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Emulate the hardware
&lt;/h3&gt;

&lt;p&gt;If you do not have access to the target system, emulation is another viable solution. While in virtualization only certain parts of a computer's hardware are simulated in order to run a guest OS, emulation simulates the complete hardware. This makes emulation slower than virtualization, but it also not limited to the underlying hardware, making it possible to simulate hardware like an ARM-processor on a x86 system.&lt;/p&gt;

&lt;p&gt;As you might have guessed, this approach is truly powerful by enabling you to build software for various architectures, but simulating the entire hardware is a huge overhead. Wouldn't it be great to simulate only the hardware components required for building the software?&lt;/p&gt;
&lt;h4&gt;
  
  
  User-mode emulation
&lt;/h4&gt;

&lt;p&gt;Usually when an executable file is passed to a &lt;code&gt;exec&lt;/code&gt;-system call, the kernel expects it to be a native binary for the current system.&lt;br&gt;
If you ever encountered the error &lt;code&gt;exec user process caused "exec format error"&lt;/code&gt; in a docker image, you tried to run a binary which can't be executed on the processor.&lt;br&gt;
Luckily, with &lt;code&gt;binfmt_misc&lt;/code&gt; it is possible to register custom interpreters in the userland to handle foreign binaries. &lt;br&gt;
To do so, you basically register the respective interpreter and a "magic number", which identifies these binaries in the specific format.&lt;br&gt;
The idea is, that in combination with a powerful emulator, you can still run and build binaries for foreign architectures on your system, by emulating only the required parts.&lt;/p&gt;
&lt;h4&gt;
  
  
  QEMU
&lt;/h4&gt;

&lt;p&gt;One famous emulator which is capable of such a feature, is &lt;a href="https://www.qemu.org" rel="noopener noreferrer"&gt;QEMU&lt;/a&gt;. Besides a user-mode emulation, it supports various architectures, full-system emulation and virtualization as well.&lt;/p&gt;

&lt;p&gt;To use user-mode emulation with QEMU, we need to register this emulator for some foreign architectures, for example ARM. The Hypriot project has explained this well in &lt;a href="https://blog.hypriot.com/post/docker-intel-runs-arm-containers/" rel="noopener noreferrer"&gt;this article&lt;/a&gt;. They even build a docker image, which runs in &lt;code&gt;privileged&lt;/code&gt; mode to do this registering of magic numbers with QEMU for you on your host system. &lt;br&gt;
An excerpt of the &lt;a href="https://github.com/hypriot/qemu-register/blob/master/register.sh" rel="noopener noreferrer"&gt;register script&lt;/a&gt; is put below to give you a rough idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Register new interpreters&lt;/span&gt;
&lt;span class="c"&gt;# - important: using flags 'C' and 'F'&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;':qemu-arm:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x28\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/qemu-arm:CF'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /proc/sys/fs/binfmt_misc/register
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;':qemu-aarch64:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/qemu-aarch64:CF'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /proc/sys/fs/binfmt_misc/register
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;':qemu-ppc64le:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x15\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\x00:/qemu-ppc64le:CF'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /proc/sys/fs/binfmt_misc/register

&lt;span class="c"&gt;# Show results&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"---"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Installed interpreter binaries:"&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-al&lt;/span&gt; /qemu-&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"---"&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /proc/sys/fs/binfmt_misc
&lt;span class="k"&gt;for &lt;/span&gt;file &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    case&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in
    &lt;/span&gt;status|register&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Registered interpreter=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
        &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"---"&lt;/span&gt;
        &lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="k"&gt;esac&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Use a cross-compiler
&lt;/h3&gt;

&lt;p&gt;The last option is using a cross-compiler to build software for different architectures.&lt;br&gt;
While a standard compiler builds for the system it's running on, a cross-compiler can generate binaries for other architectures as well.&lt;br&gt;
Since it does not rely on any emulation but runs natively on your system, the build performance is as good as option 1.&lt;br&gt;
Modern languages put in a lot of effort to support the feature well, especially with &lt;a href="https://golangcookbook.com/chapters/running/cross-compiling/" rel="noopener noreferrer"&gt;Golang this is a breeze&lt;/a&gt;. For example, you specify the OS (GOOS) and the target architecture (GOARCH) to cross compile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# build for mac&lt;/span&gt;
&lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;darwin &lt;span class="nv"&gt;GOARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;386 go build main.go
&lt;span class="c"&gt;# build for Raspberry Pi&lt;/span&gt;
&lt;span class="nv"&gt;GOOS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linux &lt;span class="nv"&gt;GOARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;arm go build main.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If multi-arch builds are an ongoing task for you, definitely check out such programming languages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docker images and multi-arch
&lt;/h2&gt;

&lt;p&gt;Let's take a look at how docker support multi-arch images. Afaik, two approaches are mainly used:&lt;/p&gt;

&lt;h3&gt;
  
  
  Separate image
&lt;/h3&gt;

&lt;p&gt;One option is to make a different image for each architecture. This can be done by creating a different repository or setting a designated tag for each supported platform.&lt;br&gt;
This usually requires a separate Dockerfile too, as seen in the &lt;a href="https://github.com/coreos/flannel" rel="noopener noreferrer"&gt;coreos/flannel repository&lt;/a&gt; for example:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dockerfile&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# AMD 64&lt;/span&gt;
FROM alpine
ENV &lt;span class="nv"&gt;FLANNEL_ARCH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;amd64
ADD dist/qemu-&lt;span class="nv"&gt;$FLANNEL_ARCH&lt;/span&gt;&lt;span class="nt"&gt;-static&lt;/span&gt; /usr/bin/qemu-&lt;span class="nv"&gt;$FLANNEL_ARCH&lt;/span&gt;&lt;span class="nt"&gt;-static&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Dockerfile.arm&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; arm32v6/alpine&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; FLANNEL_ARCH=arm&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt;&lt;span class="s"&gt; dist/qemu-$FLANNEL_ARCH-static /usr/bin/qemu-$FLANNEL_ARCH-static&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By passing the architecture as build arguments or environment variables to the Dockerfile, you can run instructions specific to these platforms. In this example, an env variable &lt;code&gt;FLANNEL_ARCH&lt;/code&gt; is used for this purpose.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manifests
&lt;/h3&gt;

&lt;p&gt;A Docker manifest is a very simple concept. Basically, it's an object containing a list of image references for each supported architecture. A docker client can then pull an image by inspecting this manifest file returned by the registry, search the list for a matching &lt;code&gt;platform&lt;/code&gt; and then load the image by the identifying &lt;code&gt;digest&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;An example docker manifest file, containing images for &lt;code&gt;linux/arm&lt;/code&gt;, &lt;code&gt;linux/amd64&lt;/code&gt; and &lt;code&gt;linux/ppc64le&lt;/code&gt;, is shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;schemaVersion&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mediaType&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/vnd.docker.distribution.manifest.list.v2+json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
   &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;manifests&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mediaType&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/vnd.docker.distribution.manifest.v2+json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;size&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;424&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;digest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256:f67dcc5fc786f04f0743abfe0ee5dae9bd8caf8efa6c8144f7f2a43889dc513b&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;platform&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;architecture&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;arm&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;os&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;linux&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
         &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mediaType&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/vnd.docker.distribution.manifest.v2+json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;size&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;424&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;digest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256:b64ca0b60356a30971f098c92200b1271257f100a55b351e6bbe985638352f3a&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;platform&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;architecture&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;amd64&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;os&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;linux&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
         &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mediaType&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/vnd.docker.distribution.manifest.v2+json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;size&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;425&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;digest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256:df436846483aff62bad830b730a0d3b77731bcf98ba5e470a8bbb8e9e346e4e8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;platform&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;architecture&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ppc64le&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;os&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;linux&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
         &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While this feature is currently &lt;em&gt;experimental&lt;/em&gt; on the docker client, it's already integrated in the docker registry and containerd. The &lt;a href="https://github.com/opencontainers/image-spec/blob/master/image-layout.md" rel="noopener noreferrer"&gt;OCI Image specification&lt;/a&gt; has included manifests too.&lt;/p&gt;

&lt;h5&gt;
  
  
  Generating manifests
&lt;/h5&gt;

&lt;p&gt;As stated above, you currently need to enable experimental features on the docker client to work with manifests. Once you've done that, &lt;code&gt;docker manifest&lt;/code&gt; should be a valid command. Generating a manifest is then very easy via the CLI.&lt;/p&gt;

&lt;p&gt;Let's say we developed a &lt;code&gt;superapp&lt;/code&gt; and built an image for &lt;code&gt;amd64&lt;/code&gt; and &lt;code&gt;arm&lt;/code&gt; manually.&lt;br&gt;
Now we want to publish these two images, &lt;code&gt;app/superapp-amd64&lt;/code&gt; and &lt;code&gt;app/superapp-arm&lt;/code&gt;,  as "one" image &lt;code&gt;app/superapp&lt;/code&gt; by using a manifest. &lt;br&gt;
This can be achieved with the following two commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker manifest create app/superapp app/superapp-amd64 app/superapp-arm
&lt;span class="c"&gt;# Created manifest list docker.io/app/superapp&lt;/span&gt;
&lt;span class="c"&gt;# Push the manifest&lt;/span&gt;
docker manifest push app/superapp 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Using Docker's &lt;code&gt;buildx&lt;/code&gt; for ARM builds
&lt;/h2&gt;

&lt;p&gt;In June 2019, Docker announced tooling support for building docker images and the ARM architecture as an experimental feature.&lt;br&gt;
On Docker Desktop see &lt;a href="https://engineering.docker.com/2019/04/multi-arch-images/" rel="noopener noreferrer"&gt;https://engineering.docker.com/2019/04/multi-arch-images/&lt;/a&gt; and on Linux see &lt;a href="https://engineering.docker.com/2019/06/getting-started-with-docker-for-arm-on-linux/" rel="noopener noreferrer"&gt;https://engineering.docker.com/2019/06/getting-started-with-docker-for-arm-on-linux/&lt;/a&gt; for a setup guide.&lt;/p&gt;

&lt;p&gt;This feature combines the approaches above, namely QEMU, binfmt_misc, and manifests and bundles them as a single tool &lt;code&gt;buildx&lt;/code&gt;. It allows you to write a single Dockerfile, which can be used to build images for various platforms without changing it. And with the QEMU user-mode emulation, you can build and run images which have a different architecture than your current system.&lt;/p&gt;
&lt;h5&gt;
  
  
  A simple example
&lt;/h5&gt;

&lt;p&gt;Let's package a simple Go-application as a docker image for &lt;code&gt;arm64&lt;/code&gt; and &lt;code&gt;amd64&lt;/code&gt;. We write a simple program that reports the OS and the architecture of the system. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;main.go&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"fmt"&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="s"&gt;"runtime"&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"OS: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Architecture: %s&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOOS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOARCH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If we take a look at the Dockerfile, we see that it is not aware of the architectures which should be supported.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;FROM golang:alpine AS builder
RUN &lt;span class="nb"&gt;mkdir&lt;/span&gt; /app
ADD &lt;span class="nb"&gt;.&lt;/span&gt; /app/
WORKDIR /app
RUN go build &lt;span class="nt"&gt;-o&lt;/span&gt; report &lt;span class="nb"&gt;.&lt;/span&gt;

FROM busybox
RUN &lt;span class="nb"&gt;mkdir&lt;/span&gt; /app
WORKDIR /app
COPY &lt;span class="nt"&gt;--from&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;builder /app/report &lt;span class="nb"&gt;.&lt;/span&gt;
CMD &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"./report"&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now comes the interesting part, the cross-architecture build with &lt;code&gt;buildx&lt;/code&gt;. Let's build an image for &lt;code&gt;linux/amd64&lt;/code&gt; and &lt;code&gt;linux/arm64&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker buildx build &lt;span class="nt"&gt;--platform&lt;/span&gt; linux/amd64,linux/arm64 &lt;span class="nt"&gt;-t&lt;/span&gt; foo/bar  &lt;span class="nt"&gt;--push&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;with the &lt;code&gt;--platform&lt;/code&gt; flag you specify the platforms the image should be built for&lt;/li&gt;
&lt;li&gt;just like &lt;code&gt;docker build&lt;/code&gt;, the &lt;code&gt;-t&lt;/code&gt; flag lets you define the image tag&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--push&lt;/code&gt; pushes the image directly to the registry after an successful build&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  A simple architecture-aware example
&lt;/h5&gt;

&lt;p&gt;Sometimes, you do not want to build everything natively in the Dockerfile, for example by downloading prebuilt third-party binaries for the target architecture. Docker's &lt;code&gt;buildx&lt;/code&gt; has you covered here as well, as it exposes for each value in &lt;code&gt;--platform&lt;/code&gt; &lt;a href="https://docs.docker.com/engine/reference/builder/#automatic-platform-args-in-the-global-scope" rel="noopener noreferrer"&gt;arguments like &lt;code&gt;TARGETARCH&lt;/code&gt; or &lt;code&gt;TARGETOS&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For instance, this is a &lt;a href="https://github.com/Duske/ipfs-cluster-multiarch/blob/master/Dockerfile" rel="noopener noreferrer"&gt;Dockerfile&lt;/a&gt; for a multi-architecture image for ipfs-cluster. In order to skip the compilation of the ipfs-cluster binary from source, we download the right version for our architecture via &lt;code&gt;wget&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;golang:1.12-stretch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;

&lt;span class="c"&gt;# This dockerfile builds and runs ipfs-cluster-service.&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; SUEXEC_VERSION v0.2&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; TINI_VERSION v0.16.1&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; IPFS_CLUSTER_VERSION v0.11.0&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; TARGETARCH&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-x&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd&lt;/span&gt; /tmp &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git clone https://github.com/ncopa/su-exec.git &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;su-exec &lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; make &lt;span class="se"&gt;\ &lt;/span&gt;     &lt;span class="c"&gt;### native build ###&lt;/span&gt;
  &amp;amp;&amp;amp; git checkout -q $SUEXEC_VERSION \
...

&lt;span class="k"&gt;RUN &lt;/span&gt;wget https://dist.ipfs.io/ipfs-cluster-ctl/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;IPFS_CLUSTER_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/ipfs-cluster-ctl_&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;IPFS_CLUSTER_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;_linux-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TARGETARCH&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.tar.gz 
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When built with the &lt;code&gt;--platform linux/amd64,linux/arm64,linux/arm&lt;/code&gt; flag, the &lt;code&gt;TARGETARCH&lt;/code&gt; is set to &lt;code&gt;amd64&lt;/code&gt;, &lt;code&gt;arm&lt;/code&gt;, &lt;code&gt;arm64&lt;/code&gt; once in order to download the correct binary. Still, native builds are possible, since we're emulating the hardware. For instance, &lt;code&gt;su-exec&lt;/code&gt; is simply compiled with &lt;code&gt;make&lt;/code&gt;. Nice!&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveats
&lt;/h3&gt;

&lt;p&gt;As an experimental tool, &lt;code&gt;buildx&lt;/code&gt; still has some rough edges for the developer. The main issues I encountered where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can only push the images directly to a registry. A file export of the docker image is only possible in the &lt;strong&gt;OCI format&lt;/strong&gt;. &lt;a href="(https://github.com/docker/buildx/issues/166#issuecomment-544827163)"&gt;Issue 166&lt;/a&gt;, &lt;a href="https://github.com/docker/buildx/issues/186" rel="noopener noreferrer"&gt;Issue 186&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;When an error is thrown while building your image, it's often very hard to decipher the error message in a multi-arch error log. If you can, first try to build with the usual &lt;code&gt;docker build&lt;/code&gt; command for your system and if it works, switch to &lt;code&gt;buildx&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Due to the emulation, the build is still magnitudes slower, especially if you need to compile dependencies of your software. Check for prebuilt dependencies/binaries online and load the correct version, using &lt;code&gt;TARGETARCH&lt;/code&gt; for example.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrapping it up
&lt;/h2&gt;

&lt;p&gt;I hope I could give you a brief overview of the various techniques for multi-architecture docker images or raise your interest in it.&lt;br&gt;
It's really exciting to see that Docker invests in simplifying this process and while &lt;code&gt;buildx&lt;/code&gt; is still young, it's already usable. &lt;br&gt;
Besides building on your local machine, installing the user-mode emulation is also possible on CI-servers, so a multi-architecture build pipeline can be set up.&lt;br&gt;
Thinking of containers as a way to package software, it's a logical next step to make it available for various architectures as easy as possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://lwn.net/Articles/679308" rel="noopener noreferrer"&gt;containers with binfmt_misc&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://engineering.docker.com/2019/04/multi-arch-images/" rel="noopener noreferrer"&gt;docker buildx setup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://asciinema.org/a/GYOx4B88r272HWrLTyFwo156s" rel="noopener noreferrer"&gt;Demo with buildx&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@bamnet/building-multiarch-docker-images-8a70002b3476" rel="noopener noreferrer"&gt;Multiarch images with Go and without buildx&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>docker</category>
      <category>arm</category>
      <category>compile</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
