josephsenior

Posted on May 24

Why File Editing Is the Hardest Part of Building a Coding Agent

#ai #python #opensource #programming

Lessons from building Grinta, an autonomous coding agent runtime from scratch.

When I started building Grinta, my autonomous coding agent runtime, I thought file editing would be one of the easier parts.

The agent reads files, decides what to change, and writes the result back.

Simple, right?

It was not.

File editing became one of the most painful parts of the whole system. Not because writing files is hard, but because making an LLM edit files reliably is a completely different problem.

The naive assumption

At first, I thought the problem was simple:

Give the model a file editing tool, ask it for the change, apply the result.

But the reality was much uglier.

The model does not just need to know what to edit. It also has to deal with:

preserving indentation
escaping content correctly
targeting the right file
targeting the right symbol or string
not hallucinating patches
not corrupting code blocks
not mixing plain text with tool calls
recovering when an edit fails
knowing when to use one editing strategy over another

That is a lot of cognitive load for something that is supposed to be a deterministic file operation.

JSON was not enough

My first instinct was to rely on normal structured tool calls.

Something like:

{
  "path": "src/app.py",
  "old_string": "old code",
  "new_string": "new code"
}

This sounds clean.

The problem is that code content inside JSON is still code content inside JSON.

The model has to produce escaped newlines, escaped quotes, valid JSON strings, correct indentation, and valid source code at the same time.

That is where things started breaking.

Sometimes the model produced literal \n sequences instead of real newlines.

Sometimes it escaped quotes incorrectly.

Sometimes the content was technically valid JSON but invalid code.

Sometimes it mixed markdown-style formatting into the payload.

The frustrating part was that the model understood the intended edit, but the transport format became the failure point.

XML/raw blocks helped, then failed differently

After that, I experimented with XML-style editing blocks and raw content blocks.

The idea was simple:

Keep metadata structured, but let the code payload be raw text.

This reduced some JSON escaping problems.

But it created a new problem: the model now had to switch mental formats.

Most tools were normal native tool calls, but file editing used a different XML/raw format.

That context switch was surprisingly expensive.

Sometimes the model respected the XML boundary.

Sometimes it mixed JSON escaping inside the XML block anyway.

Sometimes it wrote explanations around the block.

Sometimes it treated the raw block like markdown.

So the problem was not fully solved. It just moved somewhere else.

Patches and range edits are not magic either

Then I looked at patch-style editing and range replacement.

Patches are attractive because they are compact and familiar to developers.

Line ranges are attractive because they avoid searching for old strings.

But in an autonomous agent loop, both have weaknesses.

Patches can fail when surrounding context changes, when the model invents context, or when the patch format is slightly malformed.

Line ranges can fail when the file changes between read and write, or when the model relies on stale line numbers.

They are useful internally, but exposing too many of these low-level editing styles directly to the model creates tool-shopping.

The model starts asking:

Should I use a patch?
Should I replace a range?
Should I rewrite the file?
Should I use XML?
Should I use shell?
Should I use string replacement?
Should I use AST editing?

That is exactly the wrong mental model.

The real problem was not the format

After trying multiple approaches, I realized something important:

The problem was not only JSON vs XML vs patches.

The deeper problem was that I was exposing too many editing mental models to the agent.

I was asking the model to decide not only what should change, but also how the editing system itself should work.

That is backwards.

A coding agent should not need to think in terms of raw file writes, patch formats, XML blocks, shell heredocs, section edits, range edits, and AST edits.

The model-facing API should describe intent.

The runtime should handle implementation.

The pivot: intent-based editing tools

So I started simplifying Grinta’s editing surface.

Instead of exposing many editing mechanisms, I am moving toward a smaller set of intent-based tools:

read
create
edit_symbols
replace_string
multiedit

The idea is simple.

read is for inspecting context: files, ranges, or symbols.

create is for creating something new: a file or a code symbol.

edit_symbols is for modifying or deleting existing code symbols.

replace_string is for exact text replacement inside one file.

multiedit is for atomic multi-file refactoring.

That gives the model a much simpler decision tree.

It no longer has to choose between ten editing formats.

It chooses intent.

The runtime handles path safety, validation, diffs, syntax checks, atomic writes, and rollback.

Reads can be flexible, writes must be anchored

One rule that became very important is:

Reads may search. Writes must target.

For example, reading a symbol can be flexible.

If the model asks to read a symbol and there is exactly one match, the runtime can auto-resolve it and return the content.

If there are multiple matches, it returns candidates.

If there are no matches, it returns useful feedback.

That is safe because reading does not mutate the project.

But writing is different.

When editing a symbol, the runtime should not guess.

If the target is ambiguous, the edit should fail and return candidates.

The model can then retry with a more precise target.

That one rule removes a lot of dangerous behavior.

Runtime responsibility matters

This experience also changed how I think about agent architecture.

A lot of agent reliability does not come from the prompt.

It comes from the runtime.

The runtime should enforce:

path safety
exact matching
ambiguity rejection
atomic writes
validation before commit
structured errors
rollback behavior
diff generation
stuck detection
content guards

For example, if the model sends code content that clearly looks serialized, such as a whole function body containing literal \n everywhere, the runtime should reject it before corrupting the file.

The solution is not to beg the model harder.

The solution is to make dangerous states impossible.

The lesson

The biggest lesson so far is this:

Reliability does not come from giving the model more ways to act.

Reliability comes from giving it fewer, clearer choices, and moving complexity into deterministic code.

A coding agent should not expose implementation complexity as product surface.

The model should not have to think about transport formats, patch formats, editor blocks, or shell escaping.

It should think in terms of intent:

read context
create something new
edit existing symbols
replace exact text
perform an atomic multi-file refactor

Everything else should be runtime responsibility.

Still building

Grinta is still a work in progress.

I am still fighting file editing reliability, state machines, finish detection, circuit breakers, TUI integration, async execution, crash recovery, and context management.

But this specific lesson changed the way I think about autonomous coding agents.

The hard part is not just making the model smart.

The hard part is designing a system where the model has fewer opportunities to be wrong.

That is the real engineering challenge.

I’m building Grinta in public here:

https://github.com/josephsenior/Grinta-Coding-Agent