<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pramoda Sahu</title>
    <description>The latest articles on DEV Community by Pramoda Sahu (@pramod_sahu_d5bd2e6de82d1).</description>
    <link>https://dev.to/pramod_sahu_d5bd2e6de82d1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3440481%2F1cfb9605-321e-407e-9e91-64194300417c.png</url>
      <title>DEV Community: Pramoda Sahu</title>
      <link>https://dev.to/pramod_sahu_d5bd2e6de82d1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pramod_sahu_d5bd2e6de82d1"/>
    <language>en</language>
    <item>
      <title>Understanding the Agent Loop: How Tool-Using LLM Systems Actually Work</title>
      <dc:creator>Pramoda Sahu</dc:creator>
      <pubDate>Thu, 18 Jun 2026 10:21:25 +0000</pubDate>
      <link>https://dev.to/pramod_sahu_d5bd2e6de82d1/understanding-the-agent-loop-how-tool-using-llm-systems-actually-work-2mb5</link>
      <guid>https://dev.to/pramod_sahu_d5bd2e6de82d1/understanding-the-agent-loop-how-tool-using-llm-systems-actually-work-2mb5</guid>
      <description>&lt;p&gt;If you are building with tool-calling models, the most important design decision is often not the prompt. It is the loop around the model.&lt;/p&gt;

&lt;p&gt;An LLM can decide it wants to use a tool, but it cannot execute that tool by itself. The surrounding application or SDK has to assemble context, inspect the model response, run tools, append results, and continue until a final answer is produced. That runtime cycle is the &lt;strong&gt;agent loop&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article explains what the agent loop actually is, where the model stops and the harness begins, how tool calling works step by step, and which engineering tradeoffs show up once you move beyond demos.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;An agent loop is the execution cycle that lets a model inspect context, request tools, observe results, and continue until it reaches a final answer.&lt;/li&gt;
&lt;li&gt;The model is only one part of the system. The harness or SDK owns orchestration: prompt assembly, tool execution, retries, approvals, and termination.&lt;/li&gt;
&lt;li&gt;State management matters as much as prompting. If you lose prior tool outputs or conversation continuity, the agent will behave like it forgot what just happened.&lt;/li&gt;
&lt;li&gt;Performance depends heavily on prompt growth control, stable prompt prefixes, caching, and bounded tool output.&lt;/li&gt;
&lt;li&gt;Safe agent design requires validation, approval gates for side effects, and clear rules for concurrency and history propagation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Agent Loop Is the System, Not Just the Model
&lt;/h2&gt;

&lt;p&gt;The core problem is simple: a one-shot model call cannot inspect the world, act on it, and adapt to the result unless something outside the model manages that cycle.&lt;/p&gt;

&lt;p&gt;That is the harness's job.&lt;/p&gt;

&lt;p&gt;OpenAI's Codex architecture describes a user interaction as a turn, but a single turn may contain multiple internal iterations of model inference and tool execution. The OpenAI Agents SDK describes the same idea directly: invoke the agent, check whether there is final output, handle handoffs if needed, otherwise execute tool calls and re-run.&lt;/p&gt;

&lt;p&gt;A practical mental model looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build the input state.&lt;/li&gt;
&lt;li&gt;Call the model.&lt;/li&gt;
&lt;li&gt;Inspect the response.&lt;/li&gt;
&lt;li&gt;If the model requested tools, validate and execute them.&lt;/li&gt;
&lt;li&gt;Append tool results back into context.&lt;/li&gt;
&lt;li&gt;Call the model again.&lt;/li&gt;
&lt;li&gt;Stop only when the model returns a final answer.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That means the harness, not the model alone, is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt assembly&lt;/li&gt;
&lt;li&gt;Message history management&lt;/li&gt;
&lt;li&gt;Tool schema registration&lt;/li&gt;
&lt;li&gt;Tool execution&lt;/li&gt;
&lt;li&gt;Validation and error handling&lt;/li&gt;
&lt;li&gt;Retry logic&lt;/li&gt;
&lt;li&gt;Approval workflows&lt;/li&gt;
&lt;li&gt;State persistence&lt;/li&gt;
&lt;li&gt;Loop termination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why two systems using the same model can behave very differently. Their harnesses may make different decisions about context, tool ordering, truncation, approvals, and continuation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Goes Into a Single Turn
&lt;/h2&gt;

&lt;p&gt;Before the loop can run, the system needs to define what the model sees.&lt;/p&gt;

&lt;h3&gt;
  
  
  The input state
&lt;/h3&gt;

&lt;p&gt;A typical turn includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System or developer instructions&lt;/li&gt;
&lt;li&gt;Tool definitions or schemas&lt;/li&gt;
&lt;li&gt;Previous messages&lt;/li&gt;
&lt;li&gt;Previous tool-call results&lt;/li&gt;
&lt;li&gt;The current user request&lt;/li&gt;
&lt;li&gt;Sometimes environment state, session metadata, or hidden runtime instructions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because follow-up reasoning depends on prior observations being present. If the model requested a tool in one iteration and the result is not added back correctly, the next iteration cannot build on that work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inner loop vs outer loop
&lt;/h3&gt;

&lt;p&gt;There are really two loops to think about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inner loop&lt;/strong&gt;: model inference and tool execution inside a single user turn&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outer loop&lt;/strong&gt;: the broader multi-turn conversation across user follow-ups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distinction shows up clearly in Codex-style architectures. A user asks for something once, but the agent may internally perform several tool steps before replying. Then the next user message arrives, and the entire conversation thread continues from that accumulated state.&lt;/p&gt;

&lt;p&gt;That is why state continuity is not optional. Without it, the outer loop breaks and the inner loop starts reasoning from an incomplete view of reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Model Decides Between Text and a Tool Call
&lt;/h2&gt;

&lt;p&gt;Once the harness provides the current turn state, the model has a decision boundary: answer directly, or request one or more tools.&lt;/p&gt;

&lt;p&gt;Tool calling works because the model is given structured tool definitions. Instead of producing only natural language, it can emit a structured request indicating which tool it wants and which arguments it wants to pass.&lt;/p&gt;

&lt;p&gt;At that point, the model is effectively yielding control back to the application.&lt;/p&gt;

&lt;p&gt;With custom tools, the client harness must take over, run the tool, and return the result. With hosted tools, more of that orchestration can happen inside the API itself.&lt;/p&gt;

&lt;p&gt;This is an important architectural choice:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool type&lt;/th&gt;
&lt;th&gt;Who orchestrates execution?&lt;/th&gt;
&lt;th&gt;Main tradeoff&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hosted tool&lt;/td&gt;
&lt;td&gt;API/runtime handles more of the loop&lt;/td&gt;
&lt;td&gt;Simpler orchestration, less direct control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom function tool&lt;/td&gt;
&lt;td&gt;Client harness executes it&lt;/td&gt;
&lt;td&gt;More flexibility, more operational responsibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP tool&lt;/td&gt;
&lt;td&gt;Depends on integration and discovery flow&lt;/td&gt;
&lt;td&gt;Adds discovery and caching concerns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The advantage of client-side orchestration is control. The cost is that you now own the failure modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Execution Mechanics in Practice
&lt;/h2&gt;

&lt;p&gt;Once the model emits a tool request, the harness needs to do more than just run it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Validate before execution
&lt;/h3&gt;

&lt;p&gt;A safe harness should validate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool name&lt;/li&gt;
&lt;li&gt;Argument structure&lt;/li&gt;
&lt;li&gt;Argument types&lt;/li&gt;
&lt;li&gt;Permission rules&lt;/li&gt;
&lt;li&gt;Whether the tool is read-only or mutating&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not just a security concern. It is also a quality concern. If the model asks for a tool with invalid arguments, returning an explicit tool error often gives it enough signal to self-correct on the next loop iteration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Return the observation in the right format
&lt;/h3&gt;

&lt;p&gt;The model needs a structured observation that closes the action-observation cycle.&lt;/p&gt;

&lt;p&gt;A minimal pattern looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;initial_question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;MODEL_DEFAULTS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;function_responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;invoke_functions_from_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_responses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;More reasoning required, continuing...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;function_responses&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;previous_response_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;MODEL_DEFAULTS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key detail is not just the loop itself. It is that the next request continues from the previous response and includes the tool outputs produced by the harness.&lt;/p&gt;

&lt;p&gt;A more explicit observation payload looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function_call_output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;call_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;response_2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;o3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reasoning.encrypted_content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;function_call_output&lt;/code&gt; item is the observation that lets the model continue reasoning with the tool result now available in context.&lt;/p&gt;

&lt;h2&gt;
  
  
  State Management Patterns: Where Many Agents Fail
&lt;/h2&gt;

&lt;p&gt;One of the easiest ways to break an agent is to lose state continuity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common state strategies
&lt;/h3&gt;

&lt;p&gt;There are several patterns in current OpenAI tooling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full history replay managed by the client&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;previous_response_id&lt;/code&gt; for server-managed continuation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;conversation_id&lt;/code&gt; for conversation continuity&lt;/li&gt;
&lt;li&gt;SDK-managed session persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each approach has tradeoffs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full replay vs server-managed continuation
&lt;/h3&gt;

&lt;p&gt;With full replay, the client sends all prior messages and tool results every time. This is simple to reason about, but payload size grows quickly.&lt;/p&gt;

&lt;p&gt;With server-managed continuation, the client can send the new input along with a continuation identifier such as &lt;code&gt;previous_response_id&lt;/code&gt;. That reduces payload size and offloads some history management.&lt;/p&gt;

&lt;p&gt;This example from the Agents SDK shows response chaining:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reply very concisely.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;previous_response_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Setting auto_previous_response_id=True enables response chaining
&lt;/span&gt;        &lt;span class="c1"&gt;# automatically for the first turn, even when there is no actual
&lt;/span&gt;        &lt;span class="c1"&gt;# previous response ID yet.
&lt;/span&gt;        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;previous_response_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;previous_response_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;auto_previous_response_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;previous_response_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last_response_id&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_output&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is convenient, but you still need to choose a consistent state strategy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do not mix incompatible modes
&lt;/h3&gt;

&lt;p&gt;The Agents SDK documentation explicitly warns against combining session persistence with &lt;code&gt;conversation_id&lt;/code&gt;, &lt;code&gt;previous_response_id&lt;/code&gt;, or &lt;code&gt;auto_previous_response_id&lt;/code&gt; in the same run path.&lt;/p&gt;

&lt;p&gt;That is a practical design rule: pick one continuity model per call flow.&lt;/p&gt;

&lt;p&gt;If you mix them, debugging becomes much harder because it is no longer obvious which state the model is actually seeing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt Growth, Caching, and Why Stable Prefixes Matter
&lt;/h2&gt;

&lt;p&gt;As the loop continues, context grows.&lt;/p&gt;

&lt;p&gt;Every new model call may include prior instructions, tool schemas, user messages, and tool outputs. If you simply keep appending everything forever, the number of bytes sent over the lifetime of a conversation can grow quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Codex emphasizes prompt prefixes
&lt;/h3&gt;

&lt;p&gt;The Codex architecture discussion highlights a useful principle: keep old prompt content as an exact prefix of the new prompt whenever possible. That improves prompt-cache reuse.&lt;/p&gt;

&lt;p&gt;In practical terms, stable ordering matters for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System instructions&lt;/li&gt;
&lt;li&gt;Tool definitions&lt;/li&gt;
&lt;li&gt;Environment metadata&lt;/li&gt;
&lt;li&gt;Prior messages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If these move around between calls, cacheability drops. The same issue affects reproducibility. Even tool-definition ordering bugs can introduce cache misses and inconsistent behavior.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compaction strategies
&lt;/h3&gt;

&lt;p&gt;A production harness usually needs some combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Truncating verbose tool output&lt;/li&gt;
&lt;li&gt;Summarizing old history&lt;/li&gt;
&lt;li&gt;Keeping static instructions stable and early&lt;/li&gt;
&lt;li&gt;Bounding shell or retrieval output&lt;/li&gt;
&lt;li&gt;Preserving only the most relevant observations verbatim&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters even more for shell, retrieval, or computer-use tasks, where output can become noisy very quickly.&lt;/p&gt;

&lt;p&gt;The goal is not just lower cost. It is maintaining a usable reasoning substrate for the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Safety and Control in the Loop
&lt;/h2&gt;

&lt;p&gt;The more powerful the tools, the more important the harness becomes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Approval gates and side effects
&lt;/h3&gt;

&lt;p&gt;Read-only tool calls are different from side-effectful operations.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fetching documentation is relatively low risk&lt;/li&gt;
&lt;li&gt;Sending an email, editing a file, or executing a deployment is high risk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mutating actions should often be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serialized instead of run concurrently&lt;/li&gt;
&lt;li&gt;Approval-gated&lt;/li&gt;
&lt;li&gt;Sandboxed when possible&lt;/li&gt;
&lt;li&gt;Logged with enough metadata for auditability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is one reason agent frameworks expose concurrency settings and approval workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Validate arguments, not intentions
&lt;/h3&gt;

&lt;p&gt;You cannot safely assume that a tool request is correct just because it came from the model. Validate the arguments before execution, and return structured error feedback when something is wrong.&lt;/p&gt;

&lt;p&gt;That gives the loop a chance to recover without silently doing the wrong thing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do not over-prompt reasoning models
&lt;/h3&gt;

&lt;p&gt;OpenAI's function-calling guidance for reasoning models notes that you should not force extra "think more before every function call" prompting. Reasoning models already perform internal reasoning, and excessive prompting can degrade performance.&lt;/p&gt;

&lt;p&gt;That is a useful reminder that harness quality is often more important than prompt verbosity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Agent Extensions and Their Tradeoffs
&lt;/h2&gt;

&lt;p&gt;Once a single-agent loop works, teams often add handoffs or agent-as-tool patterns.&lt;/p&gt;

&lt;p&gt;Conceptually, the loop stays the same:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Invoke one agent.&lt;/li&gt;
&lt;li&gt;Detect whether it produced final output, a tool request, or a handoff.&lt;/li&gt;
&lt;li&gt;Route execution accordingly.&lt;/li&gt;
&lt;li&gt;Continue until termination.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Agents SDK summarizes the semantics clearly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The agent will run in a loop until a final output is generated. The loop runs like so:

1. The agent is invoked with the given input.
2. If there is a final output (i.e. the agent produces something of type `agent.output_type`), the loop terminates.
3. If there's a handoff, we run the loop again, with the new agent.
4. Else, we run tool calls (if any), and re-run the loop.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tricky part is not the idea of handoffs. It is history propagation.&lt;/p&gt;

&lt;p&gt;Recent community discussions show that when one agent is exposed as a tool to another, developers are often unsure how much history is forwarded automatically. In practice, this means you should not assume that all relevant context follows the handoff unless your framework explicitly guarantees it.&lt;/p&gt;

&lt;p&gt;For multi-agent systems, explicit context composition is often safer than implicit inheritance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Failure Modes and Debugging Strategies
&lt;/h2&gt;

&lt;p&gt;Most agent bugs look obvious in hindsight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure mode 1: losing continuity
&lt;/h3&gt;

&lt;p&gt;Symptoms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent repeats itself&lt;/li&gt;
&lt;li&gt;It forgets prior tool results&lt;/li&gt;
&lt;li&gt;MCP tool discovery keeps happening again&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check whether you are correctly passing &lt;code&gt;previous_response_id&lt;/code&gt;, &lt;code&gt;conversation_id&lt;/code&gt;, or full message history.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure mode 2: context flooding
&lt;/h3&gt;

&lt;p&gt;Symptoms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Long, low-quality responses&lt;/li&gt;
&lt;li&gt;Poor tool selection&lt;/li&gt;
&lt;li&gt;The model misses relevant facts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check whether tool output is too verbose. Cap output size, summarize logs, and keep only useful observations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure mode 3: unstable prompt construction
&lt;/h3&gt;

&lt;p&gt;Symptoms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache misses&lt;/li&gt;
&lt;li&gt;Inconsistent behavior across similar runs&lt;/li&gt;
&lt;li&gt;Higher token usage than expected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check the ordering of instructions, tool schemas, and environment metadata.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure mode 4: unsafe tool execution
&lt;/h3&gt;

&lt;p&gt;Symptoms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Invalid API calls&lt;/li&gt;
&lt;li&gt;Accidental side effects&lt;/li&gt;
&lt;li&gt;Hard-to-reproduce failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Validate tool names and arguments before execution. Treat tool requests as proposals, not commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure mode 5: incorrect concurrency
&lt;/h3&gt;

&lt;p&gt;Symptoms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Race conditions&lt;/li&gt;
&lt;li&gt;Conflicting writes&lt;/li&gt;
&lt;li&gt;Non-deterministic outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run read-only operations concurrently only when safe. Serialize or approval-gate mutating operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Architecture Takeaways
&lt;/h2&gt;

&lt;p&gt;The recent OpenAI ecosystem changes make one thing clear: the important boundary is no longer just model prompting. It is orchestration design.&lt;/p&gt;

&lt;p&gt;The Responses API, Agents SDK, MCP integrations, and Codex harness examples all point to the same execution model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model chooses actions&lt;/li&gt;
&lt;li&gt;The harness controls reality&lt;/li&gt;
&lt;li&gt;State continuity determines coherence&lt;/li&gt;
&lt;li&gt;Prompt discipline determines scalability&lt;/li&gt;
&lt;li&gt;Safety controls determine whether the system is usable in practice&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are building an agent today, the fastest path to a better system is often not a new prompt. It is a better loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The agent loop is the action-observation cycle that makes tool-using LLM systems possible.&lt;/li&gt;
&lt;li&gt;The harness owns orchestration: context assembly, tool execution, validation, retries, approvals, and termination.&lt;/li&gt;
&lt;li&gt;State continuity is critical. Losing prior responses or tool outputs breaks reasoning quality quickly.&lt;/li&gt;
&lt;li&gt;Server-managed continuation can simplify history handling, but you should choose one state strategy consistently.&lt;/li&gt;
&lt;li&gt;Prompt growth is an engineering problem. Stable prefixes, truncation, compaction, and bounded tool output all matter.&lt;/li&gt;
&lt;li&gt;Hosted tools and custom tools shift the orchestration boundary in different ways.&lt;/li&gt;
&lt;li&gt;Multi-agent patterns introduce history propagation and control-flow complexity that should be designed explicitly.&lt;/li&gt;
&lt;li&gt;Safe execution requires argument validation, side-effect controls, and careful concurrency handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;p&gt;If you want to go deeper, these resources are worth reading next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI function-calling guide&lt;/li&gt;
&lt;li&gt;OpenAI reasoning function-calls cookbook&lt;/li&gt;
&lt;li&gt;OpenAI Agents SDK running agents documentation&lt;/li&gt;
&lt;li&gt;OpenAI's Codex architecture write-up on the agent loop&lt;/li&gt;
&lt;li&gt;OpenAI MCP tool guide&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;An agent loop is not a small implementation detail. It is the core runtime pattern that turns a model into a working system.&lt;/p&gt;

&lt;p&gt;Once you see the loop clearly, many design decisions make more sense: why history management matters, why tool output must be bounded, why prompt ordering affects cacheability, and why side effects need approval and validation.&lt;/p&gt;

&lt;p&gt;If you are building with tool-calling models, make the loop explicit first. Define how state is carried forward, how tools are validated, how observations are appended, and how the run terminates. In practice, that foundation will usually improve reliability more than any prompt tweak.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Understanding Pi Coding Agent: A Minimal, Extensible Architecture for Terminal-First AI Coding Workflow</title>
      <dc:creator>Pramoda Sahu</dc:creator>
      <pubDate>Thu, 18 Jun 2026 09:17:18 +0000</pubDate>
      <link>https://dev.to/pramod_sahu_d5bd2e6de82d1/understanding-pi-coding-agent-a-minimal-extensible-architecture-for-terminal-first-ai-coding-40d4</link>
      <guid>https://dev.to/pramod_sahu_d5bd2e6de82d1/understanding-pi-coding-agent-a-minimal-extensible-architecture-for-terminal-first-ai-coding-40d4</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pi Coding Agent is built as a layered TypeScript toolkit, not a sealed coding assistant product.&lt;/li&gt;
&lt;li&gt;Its architecture separates provider access, agent runtime, coding workflow, and terminal UI into distinct packages.&lt;/li&gt;
&lt;li&gt;Context engineering is a first-class feature through &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;SYSTEM.md&lt;/code&gt;, &lt;code&gt;APPEND_SYSTEM.md&lt;/code&gt;, skills, and extension hooks.&lt;/li&gt;
&lt;li&gt;Pi can run interactively, headlessly over JSONL RPC, or be embedded through its SDK using the same underlying runtime.&lt;/li&gt;
&lt;li&gt;The flexibility comes with tradeoffs: no built-in sandbox, strict RPC framing rules, and extension authors need to understand trust and compaction behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Most coding agents present themselves as finished products: you install them, learn their commands, and work within the boundaries the authors chose. That can be fine if the built-in workflow matches your needs. It becomes limiting when you want to change how prompts are assembled, how tools are registered, how sessions are summarized, or how the agent is embedded inside your own application.&lt;/p&gt;

&lt;p&gt;Pi Coding Agent takes a different path.&lt;/p&gt;

&lt;p&gt;Based on the official Pi homepage, documentation, and repository, Pi from Earendil Works is better understood as a minimal agent harness with a coding-oriented runtime than as a fixed end-user product. It ships with useful defaults, but its architecture assumes users may want to replace or extend large parts of the workflow. The project explicitly positions advanced behavior such as plan-like workflows, extra commands, and other higher-level capabilities as things that can live in extensions or packages instead of being hardcoded into the core.&lt;/p&gt;

&lt;p&gt;That design choice matters for engineers building AI tooling. It affects maintainability, portability, and how easily the system can adapt to terminals, IDE wrappers, automation pipelines, or internal developer platforms.&lt;/p&gt;

&lt;p&gt;In this article, we will look at how Pi is structured, why its layering matters, how its context pipeline works, and what tradeoffs appear once you start using extensions, RPC mode, or SDK embedding.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem Pi is trying to solve
&lt;/h2&gt;

&lt;p&gt;A coding agent has to do several jobs at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Talk to one or more model providers.&lt;/li&gt;
&lt;li&gt;Maintain an agent loop with tool calls and state.&lt;/li&gt;
&lt;li&gt;Manage coding-specific concerns such as filesystem access, shell execution, session history, and context limits.&lt;/li&gt;
&lt;li&gt;Provide a user interface or integration surface.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Many tools solve all four inside one tightly coupled application. That can make the initial experience simple, but it often makes customization expensive. If you want to change prompt composition or session summarization, you may end up forking the project or working against internal assumptions.&lt;/p&gt;

&lt;p&gt;Pi’s architecture addresses this by splitting responsibilities into layers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pi stack: four layers instead of one monolith
&lt;/h2&gt;

&lt;p&gt;According to the repository README, Pi is organized as a monorepo with distinct packages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;@earendil-works/pi-ai&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;@earendil-works/pi-agent-core&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;@earendil-works/pi-coding-agent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;@earendil-works/pi-tui&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This package split is the clearest way to understand the system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: &lt;code&gt;pi-ai&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the provider abstraction layer. Its role is to present a unified interface across multiple model providers.&lt;/p&gt;

&lt;p&gt;Why this layer exists:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent loop should not depend directly on one provider SDK.&lt;/li&gt;
&lt;li&gt;Provider switching should not require rewriting the coding runtime.&lt;/li&gt;
&lt;li&gt;Frontends and extension systems should remain provider-agnostic where possible.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a standard but important decision. If provider-specific details leak into higher layers, the whole system becomes harder to test and evolve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: &lt;code&gt;pi-agent-core&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the runtime layer for core agent behavior, including tool calling and state management.&lt;/p&gt;

&lt;p&gt;Why this matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tool execution is a runtime concern, not a terminal UI concern.&lt;/li&gt;
&lt;li&gt;State transitions in the loop should be reusable in both CLI and embedded modes.&lt;/li&gt;
&lt;li&gt;A headless integration should get the same agent behavior as the interactive one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Architecturally, this is the part that keeps Pi from being “just a CLI.”&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: &lt;code&gt;pi-coding-agent&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is where Pi becomes a coding agent rather than a generic agent harness.&lt;/p&gt;

&lt;p&gt;This layer includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;coding workflow behavior&lt;/li&gt;
&lt;li&gt;sessions and persistence&lt;/li&gt;
&lt;li&gt;built-in file and shell tools&lt;/li&gt;
&lt;li&gt;compaction and summarization&lt;/li&gt;
&lt;li&gt;extensions&lt;/li&gt;
&lt;li&gt;skills&lt;/li&gt;
&lt;li&gt;mode-specific runtime assembly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This package is the operational center of the project. It contains the logic that most users think of as “Pi,” while still remaining separable from the lower-level runtime and the higher-level UI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: &lt;code&gt;pi-tui&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the terminal UI layer.&lt;/p&gt;

&lt;p&gt;Its presence as a distinct package is important because it suggests the user interface is not the agent itself. The same runtime can support different frontends.&lt;/p&gt;

&lt;p&gt;That leads directly to one of Pi’s strongest architectural decisions: frontend/runtime separation.&lt;/p&gt;

&lt;h2&gt;
  
  
  One runtime, multiple modes
&lt;/h2&gt;

&lt;p&gt;The official docs describe four major usage modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;interactive&lt;/li&gt;
&lt;li&gt;print/JSON&lt;/li&gt;
&lt;li&gt;RPC&lt;/li&gt;
&lt;li&gt;SDK embedding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means Pi is not tied to its terminal interface, even if the terminal is the primary experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interactive mode
&lt;/h3&gt;

&lt;p&gt;This is the user-facing CLI workflow most people will start with. It combines the runtime with the terminal UI and built-in commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  Print and JSON modes
&lt;/h3&gt;

&lt;p&gt;These modes are useful for automation or simple scripting where you want structured output without a long-lived interactive session.&lt;/p&gt;

&lt;h3&gt;
  
  
  RPC mode
&lt;/h3&gt;

&lt;p&gt;RPC mode exposes Pi through a JSONL protocol over stdin/stdout. This is the mode that makes IDE integrations, editor plugins, and service wrappers plausible without reimplementing the core runtime.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pi &lt;span class="nt"&gt;--mode&lt;/span&gt; rpc &lt;span class="o"&gt;[&lt;/span&gt;options]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"req-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hello, world!"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a strong design choice because subprocess embedding is often the easiest integration path for tools written in another language or running in another environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  SDK mode
&lt;/h3&gt;

&lt;p&gt;For Node.js and TypeScript applications, Pi can be embedded in-process through its SDK.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;CreateAgentSessionRuntimeFactory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;createAgentSessionFromServices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;createAgentSessionRuntime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;createAgentSessionServices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;getAgentDir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;runRpcMode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;SessionManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@earendil-works/pi-coding-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;createRuntime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CreateAgentSessionRuntimeFactory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sessionManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sessionStartEvent&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;services&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createAgentSessionServices&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;cwd&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;...(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createAgentSessionFromServices&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;services&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;sessionManager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;sessionStartEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})),&lt;/span&gt;
    &lt;span class="nx"&gt;services&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;diagnostics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;services&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;diagnostics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;runtime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createAgentSessionRuntime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;createRuntime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;agentDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getAgentDir&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;sessionManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SessionManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runRpcMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This snippet shows the decomposition clearly: services, session manager, runtime creation, then a mode runner on top.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core runtime flow: prompt, tools, persistence, compaction
&lt;/h2&gt;

&lt;p&gt;For AI agents, architecture is really about workflow under constraints. Pi’s runtime appears to follow a loop like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load startup context and trust-sensitive configuration.&lt;/li&gt;
&lt;li&gt;Assemble the system prompt and working context.&lt;/li&gt;
&lt;li&gt;Run extension hooks before the model call.&lt;/li&gt;
&lt;li&gt;Send the provider request.&lt;/li&gt;
&lt;li&gt;Receive model output, including possible tool calls.&lt;/li&gt;
&lt;li&gt;Execute tool calls and attach results.&lt;/li&gt;
&lt;li&gt;Repeat until the assistant completes.&lt;/li&gt;
&lt;li&gt;Persist session entries.&lt;/li&gt;
&lt;li&gt;Compact older context when token pressure increases.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The interesting part is that this pipeline is not fully hardcoded. The extension system lets you intercept multiple stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension hooks make the loop observable and adjustable
&lt;/h3&gt;

&lt;p&gt;The extension docs describe lifecycle events around startup, provider requests, tool calls, compaction, tree navigation, and shutdown. Examples mentioned in the source material include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;session_start&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;before_agent_start&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_call&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;before_provider_request&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;after_provider_response&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_before_compact&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_compact&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_before_tree&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_tree&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;session_shutdown&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That event model suggests a publish/subscribe architecture around the core loop instead of a single monolithic pipeline. This is one of the biggest reasons Pi feels more like a toolkit than a product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context engineering is built into the architecture
&lt;/h2&gt;

&lt;p&gt;A lot of agent systems treat prompt engineering as text pasted into a config file. Pi treats it as infrastructure.&lt;/p&gt;

&lt;p&gt;According to the docs and homepage, Pi can load:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AGENTS.md&lt;/code&gt; and &lt;code&gt;CLAUDE.md&lt;/code&gt; from user/global and project directories&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SYSTEM.md&lt;/code&gt; to replace the default system prompt&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;APPEND_SYSTEM.md&lt;/code&gt; to append to it&lt;/li&gt;
&lt;li&gt;skills loaded on demand&lt;/li&gt;
&lt;li&gt;prompt templates&lt;/li&gt;
&lt;li&gt;extension-provided prompt modifications&lt;/li&gt;
&lt;li&gt;project trust state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a minor convenience feature. It changes how the system is operated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why on-demand skills matter
&lt;/h3&gt;

&lt;p&gt;Skills are loaded only when needed instead of always being included in the prompt. That helps avoid bloating context windows and prompt caches.&lt;/p&gt;

&lt;p&gt;This is a practical tradeoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always-loaded instructions are simpler.&lt;/li&gt;
&lt;li&gt;On-demand loading is more efficient and gives finer control.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pi chooses the second option, which fits its broader design: minimal default core, dynamic behavior at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt customization through extensions
&lt;/h3&gt;

&lt;p&gt;Pi also allows extensions to modify the assembled system prompt before model execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;promptCustomizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ExtensionAPI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;before_agent_start&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;systemPromptOptions&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;customPrompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;addToolGuidance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;systemPromptOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;appendSection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mergeWithUserAppend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;systemPromptOptions&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;customPrompt&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;appendSection&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a strong example of Pi’s philosophy. Prompt composition is not just a file-loading step; it is part of the runtime and open to modification.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sessions, JSONL persistence, and branching
&lt;/h2&gt;

&lt;p&gt;Pi stores sessions in JSONL and supports commands such as &lt;code&gt;/resume&lt;/code&gt;, &lt;code&gt;/new&lt;/code&gt;, &lt;code&gt;/tree&lt;/code&gt;, &lt;code&gt;/fork&lt;/code&gt;, and &lt;code&gt;/clone&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That combination implies that the session model is not a flat transcript. It supports branching workflows where a user can explore alternate paths.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why JSONL is a sensible choice
&lt;/h3&gt;

&lt;p&gt;JSONL is a practical format for agent session storage because it is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;append-friendly&lt;/li&gt;
&lt;li&gt;easy to inspect&lt;/li&gt;
&lt;li&gt;easy to process line by line&lt;/li&gt;
&lt;li&gt;convenient for event-like histories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For terminal-first tools, that is often a better fit than requiring a heavier database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Branching changes the context story
&lt;/h3&gt;

&lt;p&gt;The source material notes that branch summarization is used when switching branches so that context from the abandoned branch can be injected into the new branch’s working context.&lt;/p&gt;

&lt;p&gt;That matters because branching is not just a UI feature. It affects memory and continuity.&lt;/p&gt;

&lt;p&gt;Pi also distinguishes between full history and in-memory working context. Compaction affects the latter, not the underlying stored session history. That is an important operational detail if you are debugging behavior or writing extensions that depend on prior entries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compaction is not just token trimming
&lt;/h2&gt;

&lt;p&gt;Most agent systems eventually need summarization because context windows are finite. Pi exposes compaction as a visible architectural feature rather than hiding it as internal bookkeeping.&lt;/p&gt;

&lt;p&gt;The docs describe two summarization mechanisms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;auto/manual compaction&lt;/li&gt;
&lt;li&gt;branch summarization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They also define cut-point rules. For example, tool results must remain attached to their tool calls, so valid compaction boundaries are restricted.&lt;/p&gt;

&lt;p&gt;That is exactly the kind of implementation detail extension authors need to know. If your extension assumes history can be split anywhere, you may break tool-call coherence.&lt;/p&gt;

&lt;p&gt;Pi even allows custom compaction logic through hooks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session_before_compact&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;preparation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;branchEntries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;customInstructions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;signal&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Cancel:&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="c1"&gt;// Custom summary:&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;compaction&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;firstKeptEntryId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;preparation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;firstKeptEntryId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tokensBefore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;preparation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokensBefore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes compaction a policy surface, not just an implementation detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tradeoffs of customizable compaction
&lt;/h3&gt;

&lt;p&gt;The flexibility is useful, but it increases the burden on extension authors.&lt;/p&gt;

&lt;p&gt;You need to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;firstKeptEntryId&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tokensBefore&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;serialized and truncated tool outputs&lt;/li&gt;
&lt;li&gt;valid cut points&lt;/li&gt;
&lt;li&gt;how repeated compactions relate to earlier kept boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you ignore those details, summaries may be technically valid but operationally misleading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extensions are the real center of Pi’s design
&lt;/h2&gt;

&lt;p&gt;Pi’s homepage explicitly says it skips some built-in features and expects users to add them through extensions or packages. That is one of the most unusual and important aspects of the project.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dynamic tool registration
&lt;/h3&gt;

&lt;p&gt;Tools are not fixed at compile time. An extension can register them during session startup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ExtensionAPI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@earendil-works/pi-coding-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Type&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;typebox&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ECHO_PARAMS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Message to echo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;dynamicToolsExtension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ExtensionAPI&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;registeredToolNames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Set&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;registerEchoTool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;registeredToolNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;registeredToolNames&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;registerTool&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Echo a message with prefix: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;promptSnippet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Echo back user-provided text with &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt; prefix`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;promptGuidelines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Use echo_session when the user asks for exact echo output.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ECHO_PARAMS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_toolCallId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;}${&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
          &lt;span class="na"&gt;details&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;prefix&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="nx"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;session_start&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;registerEchoTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;echo_session&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Echo Session&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;[session] &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ui&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;notify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Registered dynamic tool: echo_session&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;info&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a clear signal that Pi’s workflow surface is intended to be extended, not merely configured.&lt;/p&gt;

&lt;h3&gt;
  
  
  What extensions can change
&lt;/h3&gt;

&lt;p&gt;Based on the provided material, extensions can influence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;commands&lt;/li&gt;
&lt;li&gt;tools&lt;/li&gt;
&lt;li&gt;provider request/response handling&lt;/li&gt;
&lt;li&gt;prompt assembly&lt;/li&gt;
&lt;li&gt;compaction behavior&lt;/li&gt;
&lt;li&gt;tree navigation behavior&lt;/li&gt;
&lt;li&gt;UI interactions&lt;/li&gt;
&lt;li&gt;workflow logic around session lifecycle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is unusually broad. It also explains why Pi can remain small at the core while still supporting highly specialized workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Headless integrations: RPC mode and its sharp edges
&lt;/h2&gt;

&lt;p&gt;RPC mode is one of Pi’s most practical features for teams building wrappers or custom frontends. But the protocol details matter.&lt;/p&gt;

&lt;p&gt;The docs specify strict JSONL semantics with LF as the record delimiter.&lt;/p&gt;

&lt;p&gt;The source material calls out a concrete gotcha: Node’s &lt;code&gt;readline&lt;/code&gt; is not protocol-compliant for this use case because it can split on Unicode line separators such as &lt;code&gt;U+2028&lt;/code&gt; and &lt;code&gt;U+2029&lt;/code&gt;, which are valid inside JSON strings.&lt;/p&gt;

&lt;p&gt;That means a robust client should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;split records on &lt;code&gt;\n&lt;/code&gt; only&lt;/li&gt;
&lt;li&gt;accept optional &lt;code&gt;\r\n&lt;/code&gt; by stripping the trailing &lt;code&gt;\r&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;avoid generic line readers that reinterpret other Unicode characters as line boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a good example of a small but important systems detail. If you are embedding Pi inside an editor extension or orchestrator, protocol correctness matters more than convenience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security and operational concerns
&lt;/h2&gt;

&lt;p&gt;Pi’s flexibility does not remove operational risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  No built-in sandbox
&lt;/h3&gt;

&lt;p&gt;The repository README states that Pi does not provide a built-in permission system for filesystem, process, network, or credential access. It runs with the launching user’s permissions.&lt;/p&gt;

&lt;p&gt;That has an obvious implication: if you need stronger isolation, you should containerize or otherwise sandbox it externally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trust model affects what loads
&lt;/h3&gt;

&lt;p&gt;Before trust is granted, Pi loads only a subset of context and extension sources. According to the docs, project-local extensions, package-managed project extensions, and project settings are loaded only after trust resolution.&lt;/p&gt;

&lt;p&gt;In non-interactive modes, trust prompts are not shown, so automation behavior depends on defaults or explicit CLI overrides.&lt;/p&gt;

&lt;p&gt;If you are building tooling around Pi, document this clearly. Otherwise, a project may behave differently in interactive use versus CI-like or subprocess-driven environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  Extension lifecycle resets on fork and clone
&lt;/h3&gt;

&lt;p&gt;After &lt;code&gt;/fork&lt;/code&gt; or &lt;code&gt;/clone&lt;/code&gt;, Pi emits &lt;code&gt;session_shutdown&lt;/code&gt; for the old extension instance, reloads and rebinds extensions, and then emits &lt;code&gt;session_start&lt;/code&gt; for the new session.&lt;/p&gt;

&lt;p&gt;That means in-memory extension state is not automatically preserved. If state matters, persist it into session entries or rebuild it during startup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this architecture matters in practice
&lt;/h2&gt;

&lt;p&gt;Pi’s design is especially useful when you need one of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a terminal-first agent that is still scriptable&lt;/li&gt;
&lt;li&gt;a reusable runtime for editor or service integration&lt;/li&gt;
&lt;li&gt;custom prompt assembly without forking the core project&lt;/li&gt;
&lt;li&gt;organization-specific commands, tools, or policies through extensions&lt;/li&gt;
&lt;li&gt;session storage that is inspectable and easy to process&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, Pi is less about delivering one ideal workflow and more about providing a stable substrate for many workflows.&lt;/p&gt;

&lt;p&gt;That is the real architectural difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pi is best understood as a layered toolkit for coding agents, not a fixed assistant product.&lt;/li&gt;
&lt;li&gt;The package split separates providers, agent runtime, coding workflow, and terminal UI in a clean way.&lt;/li&gt;
&lt;li&gt;Context engineering is deeply integrated through files, skills, prompt templates, and hooks.&lt;/li&gt;
&lt;li&gt;Sessions are durable and branch-aware through JSONL persistence and summarization mechanisms.&lt;/li&gt;
&lt;li&gt;Extensions are central to the design and can reshape tools, prompts, compaction, and workflow behavior.&lt;/li&gt;
&lt;li&gt;RPC and SDK modes make the same runtime usable in terminals, subprocess integrations, and custom applications.&lt;/li&gt;
&lt;li&gt;Operational safety is your responsibility: sandboxing, trust configuration, and extension-state handling all need deliberate design.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Pi Coding Agent stands out because it treats extensibility as the default architecture rather than an afterthought. The minimal core is not a limitation by accident; it is the mechanism that keeps the system adaptable.&lt;/p&gt;

&lt;p&gt;That makes Pi especially interesting for engineers who want more than a terminal chatbot. If you need a coding agent that can be embedded, wrapped, or reshaped without forking the entire application, Pi’s layered design is worth studying.&lt;/p&gt;

&lt;p&gt;The practical next step is to evaluate it in the mode closest to your real use case:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you want a terminal workflow, start with interactive mode.&lt;/li&gt;
&lt;li&gt;If you want editor or service integration, inspect RPC framing carefully.&lt;/li&gt;
&lt;li&gt;If you want deep control over behavior, study the extension lifecycle and compaction hooks before writing custom logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Pi, the architecture is the product.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
