<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Florian Zielasko</title>
    <description>The latest articles on DEV Community by Florian Zielasko (@flo1632).</description>
    <link>https://dev.to/flo1632</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3902652%2F12514101-804b-416f-8e56-cce5537dda7b.jpeg</url>
      <title>DEV Community: Florian Zielasko</title>
      <link>https://dev.to/flo1632</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/flo1632"/>
    <language>en</language>
    <item>
      <title>Building a Local AI Agent (Part 1): Six Technical Challenges</title>
      <dc:creator>Florian Zielasko</dc:creator>
      <pubDate>Wed, 29 Apr 2026 18:57:14 +0000</pubDate>
      <link>https://dev.to/flo1632/building-a-local-ai-agent-part-1-six-technical-challenges-424b</link>
      <guid>https://dev.to/flo1632/building-a-local-ai-agent-part-1-six-technical-challenges-424b</guid>
      <description>&lt;p&gt;I've been building Reiseki (霊石) — a fully local AI agent that runs on your machine via Ollama, even if you do not have more than 8-10 GB of RAM. The agent uses a ReAct loop (Reason → Act → Observe) to handle file operations, document generation, reminders, and more.&lt;/p&gt;

&lt;p&gt;Reiseki is open source: &lt;a href="https://github.com/Flo1632/reiseki" rel="noopener noreferrer"&gt;github.com/Flo1632/reiseki&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Along the way I ran into six technical problems that aren't obvious. Here's what I've encountered and learned.&lt;/p&gt;

&lt;p&gt;In Part 2 I'll cover the UX and design challenges — my goal was to make a local AI agent feel understandable and usable for someone who has never touched a terminal.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Six Problems and How I Solved Them
&lt;/h2&gt;

&lt;h2&gt;
  
  
  (Other Suggestions Welcome)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The agent forgot everything on restart
&lt;/h3&gt;

&lt;p&gt;The ReAct loop uses a Python list for session history — it lives in memory and disappears when the server restarts. If you are used to ChatGPT or Claude, this is very confusing. You actually want it to remember what you discussed with it.&lt;/p&gt;

&lt;p&gt;The fix was a &lt;code&gt;chat_log&lt;/code&gt; table in SQLite. Every user and assistant message is written there as it happens. At the start of each new request, the last 10 turns are fetched from the database and prepended to the message history so the model has continuity across sessions. It does not remember everything, but at least the last conversation.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. The agent loop needs a hard iteration cap - especially for small models like Qwen 2.5-coder 7b
&lt;/h3&gt;

&lt;p&gt;Without a limit, a confused model or a buggy tool result can cause the agent to loop forever — calling the same tool repeatedly, getting the same error, never stopping. On a local device with limited RAM, that's a serious problem.&lt;/p&gt;

&lt;p&gt;The fix is a hard cap of 10 iterations. In practice, most tasks finish in 1-3 iterations. The cap is a safety net.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Sending all tools on every request wastes context
&lt;/h3&gt;

&lt;p&gt;With 15+ tools defined, sending the full list on every request fills a meaningful chunk of the context window — and confuses the model. When it sees &lt;code&gt;create_chart&lt;/code&gt; and &lt;code&gt;analyse_data&lt;/code&gt; alongside a simple question like "what time is it?", it sometimes reaches for tools it doesn't need.&lt;/p&gt;

&lt;p&gt;The fix was dynamic tool selection based on relevance scoring. Core tools (read file, write file, list directory, document generators) are always included. Specialized tools (e.g. charts, data analysis) are only added when the query text is relevant to them — scored via string similarity between the query and each tool's description and keywords.&lt;/p&gt;

&lt;p&gt;In practice, this makes the model more focused and reduces unnecessary tool calls.&lt;/p&gt;

&lt;p&gt;After testing I found that if you have a model with a larger context window, it might be better to enable at least the tool calls you would use regularly.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. The context window grows with every tool call
&lt;/h3&gt;

&lt;p&gt;In a ReAct loop, the message history grows by at least two entries per tool call — one for the assistant's decision, one for the tool result. A task like "read these five files and summarize them" can easily hit the model's context limit before it finishes.&lt;/p&gt;

&lt;p&gt;This was handled with three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context compression&lt;/strong&gt; — after every four tool calls in a single turn, the middle portion of the message history is summarized by the model itself, then replaced with that summary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-turn cap&lt;/strong&gt; — the in-memory session history is capped at 20 messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent log cap&lt;/strong&gt; — the SQLite chat log is capped at 2000 rows with a rolling delete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The compression approach: the model summarizes its own previous steps in 2-3 sentences, which gets injected back as a single message. It loses detail but keeps the agent on track without blowing the context limit. If the summarization call itself fails, it falls back to a hard truncation of the last few messages.&lt;/p&gt;




&lt;h3&gt;
  
  
  5. Local models don't always return structured tool calls
&lt;/h3&gt;

&lt;p&gt;The Ollama SDK has a proper structured field for tool calls — but not every model actually uses it. Gemma and Qwen sometimes serialize tool calls as plain JSON text in the response content instead. If you only handle the structured case, the agent silently ignores half its tool calls and you just receive a message in JSON format claiming it called a tool.&lt;/p&gt;

&lt;p&gt;The fix was a layered fallback parser: try structured first, then parse the content as JSON, then scan for embedded JSON objects anywhere in the text, then try newline-by-newline. It's more code than it should be, but it makes the agent reliable across different models.&lt;/p&gt;




&lt;h3&gt;
  
  
  6. Injecting past turns into the system prompt is a security risk
&lt;/h3&gt;

&lt;p&gt;My first approach was to paste previous messages directly into the system prompt as a block of text. But a security audit flagged this.&lt;/p&gt;

&lt;p&gt;The system prompt has operator-level trust — the model treats it like instructions from a developer. Injecting user messages there effectively promotes them to the same level. A past message like "ignore previous instructions" would now carry the same authority as your actual configuration. And because the history is baked into the prompt text, clearing the session doesn't actually reset it.&lt;/p&gt;

&lt;p&gt;The fix is to inject past turns as regular &lt;code&gt;user&lt;/code&gt;/&lt;code&gt;assistant&lt;/code&gt; entries in the message array, not as text in the system prompt. The model treats them with user-level trust, they stay isolated from the system context, and clearing the log actually resets them. It's a small structural change but an important one.&lt;/p&gt;




&lt;p&gt;This project was built entirely with Claude Code. The technical decisions and design goals are mine; Claude handled the implementation.&lt;/p&gt;

&lt;p&gt;What technical problems have you run into building local AI agents? Curious whether others have found better approaches.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Part 2 — UX and Design Challenges: coming soon.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
