<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: josephsenior</title>
    <description>The latest articles on DEV Community by josephsenior (@youssefmejdi).</description>
    <link>https://dev.to/youssefmejdi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1696299%2F234c18dd-e824-4b8d-824a-68a39c7dd405.jpeg</url>
      <title>DEV Community: josephsenior</title>
      <link>https://dev.to/youssefmejdi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/youssefmejdi"/>
    <language>en</language>
    <item>
      <title>Why File Editing Is the Hardest Part of Building a Coding Agent</title>
      <dc:creator>josephsenior</dc:creator>
      <pubDate>Sun, 24 May 2026 21:11:13 +0000</pubDate>
      <link>https://dev.to/youssefmejdi/why-file-editing-is-the-hardest-part-of-building-a-coding-agent-24k8</link>
      <guid>https://dev.to/youssefmejdi/why-file-editing-is-the-hardest-part-of-building-a-coding-agent-24k8</guid>
      <description>&lt;p&gt;&lt;strong&gt;Lessons from building Grinta, an autonomous coding agent runtime from scratch.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When I started building &lt;strong&gt;Grinta&lt;/strong&gt;, my autonomous coding agent runtime, I thought file editing would be one of the easier parts.&lt;/p&gt;

&lt;p&gt;The agent reads files, decides what to change, and writes the result back.&lt;/p&gt;

&lt;p&gt;Simple, right?&lt;/p&gt;

&lt;p&gt;It was not.&lt;/p&gt;

&lt;p&gt;File editing became one of the most painful parts of the whole system. Not because writing files is hard, but because making an LLM edit files reliably is a completely different problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The naive assumption
&lt;/h2&gt;

&lt;p&gt;At first, I thought the problem was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Give the model a file editing tool, ask it for the change, apply the result.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But the reality was much uglier.&lt;/p&gt;

&lt;p&gt;The model does not just need to know &lt;em&gt;what&lt;/em&gt; to edit. It also has to deal with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;preserving indentation&lt;/li&gt;
&lt;li&gt;escaping content correctly&lt;/li&gt;
&lt;li&gt;targeting the right file&lt;/li&gt;
&lt;li&gt;targeting the right symbol or string&lt;/li&gt;
&lt;li&gt;not hallucinating patches&lt;/li&gt;
&lt;li&gt;not corrupting code blocks&lt;/li&gt;
&lt;li&gt;not mixing plain text with tool calls&lt;/li&gt;
&lt;li&gt;recovering when an edit fails&lt;/li&gt;
&lt;li&gt;knowing when to use one editing strategy over another&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a lot of cognitive load for something that is supposed to be a deterministic file operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  JSON was not enough
&lt;/h2&gt;

&lt;p&gt;My first instinct was to rely on normal structured tool calls.&lt;/p&gt;

&lt;p&gt;Something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/app.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"old_string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"old code"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"new_string"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"new code"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sounds clean.&lt;/p&gt;

&lt;p&gt;The problem is that code content inside JSON is still code content inside JSON.&lt;/p&gt;

&lt;p&gt;The model has to produce escaped newlines, escaped quotes, valid JSON strings, correct indentation, and valid source code at the same time.&lt;/p&gt;

&lt;p&gt;That is where things started breaking.&lt;/p&gt;

&lt;p&gt;Sometimes the model produced literal &lt;code&gt;\n&lt;/code&gt; sequences instead of real newlines.&lt;/p&gt;

&lt;p&gt;Sometimes it escaped quotes incorrectly.&lt;/p&gt;

&lt;p&gt;Sometimes the content was technically valid JSON but invalid code.&lt;/p&gt;

&lt;p&gt;Sometimes it mixed markdown-style formatting into the payload.&lt;/p&gt;

&lt;p&gt;The frustrating part was that the model understood the intended edit, but the transport format became the failure point.&lt;/p&gt;

&lt;h2&gt;
  
  
  XML/raw blocks helped, then failed differently
&lt;/h2&gt;

&lt;p&gt;After that, I experimented with XML-style editing blocks and raw content blocks.&lt;/p&gt;

&lt;p&gt;The idea was simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep metadata structured, but let the code payload be raw text.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This reduced some JSON escaping problems.&lt;/p&gt;

&lt;p&gt;But it created a new problem: the model now had to switch mental formats.&lt;/p&gt;

&lt;p&gt;Most tools were normal native tool calls, but file editing used a different XML/raw format.&lt;/p&gt;

&lt;p&gt;That context switch was surprisingly expensive.&lt;/p&gt;

&lt;p&gt;Sometimes the model respected the XML boundary.&lt;/p&gt;

&lt;p&gt;Sometimes it mixed JSON escaping inside the XML block anyway.&lt;/p&gt;

&lt;p&gt;Sometimes it wrote explanations around the block.&lt;/p&gt;

&lt;p&gt;Sometimes it treated the raw block like markdown.&lt;/p&gt;

&lt;p&gt;So the problem was not fully solved. It just moved somewhere else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patches and range edits are not magic either
&lt;/h2&gt;

&lt;p&gt;Then I looked at patch-style editing and range replacement.&lt;/p&gt;

&lt;p&gt;Patches are attractive because they are compact and familiar to developers.&lt;/p&gt;

&lt;p&gt;Line ranges are attractive because they avoid searching for old strings.&lt;/p&gt;

&lt;p&gt;But in an autonomous agent loop, both have weaknesses.&lt;/p&gt;

&lt;p&gt;Patches can fail when surrounding context changes, when the model invents context, or when the patch format is slightly malformed.&lt;/p&gt;

&lt;p&gt;Line ranges can fail when the file changes between read and write, or when the model relies on stale line numbers.&lt;/p&gt;

&lt;p&gt;They are useful internally, but exposing too many of these low-level editing styles directly to the model creates tool-shopping.&lt;/p&gt;

&lt;p&gt;The model starts asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Should I use a patch?&lt;/li&gt;
&lt;li&gt;Should I replace a range?&lt;/li&gt;
&lt;li&gt;Should I rewrite the file?&lt;/li&gt;
&lt;li&gt;Should I use XML?&lt;/li&gt;
&lt;li&gt;Should I use shell?&lt;/li&gt;
&lt;li&gt;Should I use string replacement?&lt;/li&gt;
&lt;li&gt;Should I use AST editing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the wrong mental model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem was not the format
&lt;/h2&gt;

&lt;p&gt;After trying multiple approaches, I realized something important:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The problem was not only JSON vs XML vs patches.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The deeper problem was that I was exposing too many editing mental models to the agent.&lt;/p&gt;

&lt;p&gt;I was asking the model to decide not only what should change, but also how the editing system itself should work.&lt;/p&gt;

&lt;p&gt;That is backwards.&lt;/p&gt;

&lt;p&gt;A coding agent should not need to think in terms of raw file writes, patch formats, XML blocks, shell heredocs, section edits, range edits, and AST edits.&lt;/p&gt;

&lt;p&gt;The model-facing API should describe intent.&lt;/p&gt;

&lt;p&gt;The runtime should handle implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pivot: intent-based editing tools
&lt;/h2&gt;

&lt;p&gt;So I started simplifying Grinta’s editing surface.&lt;/p&gt;

&lt;p&gt;Instead of exposing many editing mechanisms, I am moving toward a smaller set of intent-based tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;read&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;create&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;edit_symbols&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;replace_string&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;multiedit&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The idea is simple.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;read&lt;/code&gt; is for inspecting context: files, ranges, or symbols.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;create&lt;/code&gt; is for creating something new: a file or a code symbol.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;edit_symbols&lt;/code&gt; is for modifying or deleting existing code symbols.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;replace_string&lt;/code&gt; is for exact text replacement inside one file.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;multiedit&lt;/code&gt; is for atomic multi-file refactoring.&lt;/p&gt;

&lt;p&gt;That gives the model a much simpler decision tree.&lt;/p&gt;

&lt;p&gt;It no longer has to choose between ten editing formats.&lt;/p&gt;

&lt;p&gt;It chooses intent.&lt;/p&gt;

&lt;p&gt;The runtime handles path safety, validation, diffs, syntax checks, atomic writes, and rollback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reads can be flexible, writes must be anchored
&lt;/h2&gt;

&lt;p&gt;One rule that became very important is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reads may search. Writes must target.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example, reading a symbol can be flexible.&lt;/p&gt;

&lt;p&gt;If the model asks to read a symbol and there is exactly one match, the runtime can auto-resolve it and return the content.&lt;/p&gt;

&lt;p&gt;If there are multiple matches, it returns candidates.&lt;/p&gt;

&lt;p&gt;If there are no matches, it returns useful feedback.&lt;/p&gt;

&lt;p&gt;That is safe because reading does not mutate the project.&lt;/p&gt;

&lt;p&gt;But writing is different.&lt;/p&gt;

&lt;p&gt;When editing a symbol, the runtime should not guess.&lt;/p&gt;

&lt;p&gt;If the target is ambiguous, the edit should fail and return candidates.&lt;/p&gt;

&lt;p&gt;The model can then retry with a more precise target.&lt;/p&gt;

&lt;p&gt;That one rule removes a lot of dangerous behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  Runtime responsibility matters
&lt;/h2&gt;

&lt;p&gt;This experience also changed how I think about agent architecture.&lt;/p&gt;

&lt;p&gt;A lot of agent reliability does not come from the prompt.&lt;/p&gt;

&lt;p&gt;It comes from the runtime.&lt;/p&gt;

&lt;p&gt;The runtime should enforce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;path safety&lt;/li&gt;
&lt;li&gt;exact matching&lt;/li&gt;
&lt;li&gt;ambiguity rejection&lt;/li&gt;
&lt;li&gt;atomic writes&lt;/li&gt;
&lt;li&gt;validation before commit&lt;/li&gt;
&lt;li&gt;structured errors&lt;/li&gt;
&lt;li&gt;rollback behavior&lt;/li&gt;
&lt;li&gt;diff generation&lt;/li&gt;
&lt;li&gt;stuck detection&lt;/li&gt;
&lt;li&gt;content guards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, if the model sends code content that clearly looks serialized, such as a whole function body containing literal &lt;code&gt;\n&lt;/code&gt; everywhere, the runtime should reject it before corrupting the file.&lt;/p&gt;

&lt;p&gt;The solution is not to beg the model harder.&lt;/p&gt;

&lt;p&gt;The solution is to make dangerous states impossible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The lesson
&lt;/h2&gt;

&lt;p&gt;The biggest lesson so far is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Reliability does not come from giving the model more ways to act.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Reliability comes from giving it fewer, clearer choices, and moving complexity into deterministic code.&lt;/p&gt;

&lt;p&gt;A coding agent should not expose implementation complexity as product surface.&lt;/p&gt;

&lt;p&gt;The model should not have to think about transport formats, patch formats, editor blocks, or shell escaping.&lt;/p&gt;

&lt;p&gt;It should think in terms of intent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read context&lt;/li&gt;
&lt;li&gt;create something new&lt;/li&gt;
&lt;li&gt;edit existing symbols&lt;/li&gt;
&lt;li&gt;replace exact text&lt;/li&gt;
&lt;li&gt;perform an atomic multi-file refactor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else should be runtime responsibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Still building
&lt;/h2&gt;

&lt;p&gt;Grinta is still a work in progress.&lt;/p&gt;

&lt;p&gt;I am still fighting file editing reliability, state machines, finish detection, circuit breakers, TUI integration, async execution, crash recovery, and context management.&lt;/p&gt;

&lt;p&gt;But this specific lesson changed the way I think about autonomous coding agents.&lt;/p&gt;

&lt;p&gt;The hard part is not just making the model smart.&lt;/p&gt;

&lt;p&gt;The hard part is designing a system where the model has fewer opportunities to be wrong.&lt;/p&gt;

&lt;p&gt;That is the real engineering challenge.&lt;/p&gt;

&lt;p&gt;I’m building Grinta in public here:&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/josephsenior/Grinta-Coding-Agent" rel="noopener noreferrer"&gt;https://github.com/josephsenior/Grinta-Coding-Agent&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>python</category>
    </item>
  </channel>
</rss>
