DEV Community

Odmar Goyvaerts
Odmar Goyvaerts

Posted on

Rendering is Prompting: Stop Inventing Things Your LLM Already Knows

There's a mindset trap that most people fall into when building LLM agents: they treat the prompt as something special. A carefully engineered control surface. A contract. Something you have to design from scratch.

It's just text.

And once you really internalize that, something shifts. Because if the prompt is just text, then it can render things. And the model already has incredibly dense associations with environments that humans have been using for 30+ years.

That's the core insight I want to share: rendering is prompting. You don't have to describe a world to your agent — you can just show it one it already knows.


The Tax of Invention

Every time you invent custom syntax, custom instructions, or a custom abstraction for your agent, you pay a tax. You spend tokens explaining it. The model has weaker associations with it. Behavior is less predictable.

Most agent frameworks are full of this. Custom tool schemas. Elaborate system prompts describing what the agent "can do". Invented DSLs for multi-step reasoning.

Compare these two approaches to web search:

Invented:

web_search(query="latest transformer research")
Enter fullscreen mode Exit fullscreen mode

Existing:

curl https://www.google.com/search?q=latest+transformer+research
Enter fullscreen mode Exit fullscreen mode

The second one is just a shell command. But the model already knows curl deeply — it implies HTTP, headers, response codes, piping output. It composes naturally with everything else in a shell environment. You didn't have to document anything.

Every tool you can map to an existing unix command is a tool you don't have to explain.

That said — this isn't a rule against ever inventing anything. Sometimes a custom action is the right call, especially when the operation has no natural existing equivalent. The point is to reach for existing conventions first, and only invent when you genuinely have to. The less you invent, the less you explain.


Shell Agents: Flexibility Through Familiarity

A shell agent is simple: the prompt renders like a real terminal.

(master) [~/project/src] write_file utils.py
Enter fullscreen mode Exit fullscreen mode

That one line gives the agent its git branch, its virtual environment, its current directory — all the situational awareness a developer would have. You didn't describe any of it. You just rendered it.

The model pattern-matches immediately to "I am a developer in a terminal" and pulls in decades of implicit knowledge about conventions, available operations, and expected output format. The rendering primes not just what the agent does but how it communicates — shell output is terse, precise, no fluff. The model picks that up naturally.

MCP calls? Just commands:

(master) [~/project] mcp filesystem read_file config.json
Enter fullscreen mode Exit fullscreen mode

Sub-shells, REPLs, switching contexts — all of it just falls out of the shell metaphor. python3 shifts the prompt to >>>. exit brings you back. The model already knows what those mean.


DOM Agents: Structure Through Rendering

When you need more rigidity — structured multi-step execution, parallel actions, inspectable state — you can render a different kind of environment: a document.

The action syntax is minimal:

^-- <action_name> [optional parameter]
[body]
Enter fullscreen mode Exit fullscreen mode

The agent can emit multiple actions in a single response:

^-- think
I need to fetch both files before writing the output.

^-- read_file config.json

^-- read_file user.json
Enter fullscreen mode Exit fullscreen mode

A parser sweeps the response, dispatches all actions (potentially in parallel), and injects results back inline after the node that generated them:

^-- read_file config.json
^-- result
{"theme": "dark", "lang": "en"}

^-- read_file user.json
^-- result
{"name": "Alex", "role": "admin"}
Enter fullscreen mode Exit fullscreen mode

Think of it like a webpage for the model. The document is the state. The agent reads it, acts, the document updates, it reads again. No external memory to sync. No state object to maintain. The model always has an unambiguous read of what happened and in what order just by reading top to bottom.

And because results are injected locally — immediately after the action that triggered them — the model always sees cause and effect together.


Combining Them

The shell and DOM aren't separate systems. They're just different render modes. The DOM is the default environment, and the agent can drop into a shell whenever it needs to:

^-- open_shell
ls -la && cat config.json

^-- result
drwxr-xr-x  src/
-rw-r--r--  config.json
{"theme": "dark"}
Enter fullscreen mode Exit fullscreen mode

Then exit re-renders back to the document. The agent doesn't context-switch mentally — it's still reading one document, it just has a shell widget embedded in it. Like a terminal pane in an IDE.

Which raises an interesting point.


Buffers All the Way Down

Here's a realization that took me a while to land on: the DOM and shell models are just different buffers in a prompt. Different regions of text with different render modes.

An IDE is also just multiple buffers with different roles — editor, file tree, terminal, status bar. If the prompt is just text, and different sections are different buffers, then an IDE-like environment for a model is just... a layout problem.

For the editor buffer specifically, something like ed is a natural fit. It's line-addressed, text-based, and the model knows it deeply. No GUI needed — the entire editing interface is just text in a buffer:

(master) [~/project] ed utils.py
156
1,5p
def foo():
    pass
Enter fullscreen mode Exit fullscreen mode

The status buffer handles ambient context:

[utils.py:24] [errors: 2] [git: 3 changes]
Enter fullscreen mode Exit fullscreen mode

That one line gives the model file position, error state, and git status — without explaining any of it.

One important clarification: when I say the model "already knows" these commands, that doesn't mean curl or ed literally execute. These are still custom actions implemented by the developer — curl dispatches to whatever HTTP handler you've written, ed dispatches to your file editing logic. The key is that the names and conventions carry dense associations. The model already knows what curl implies, what valid usage looks like, what output to expect. You're borrowing the semantics, not the binary.

This is the direction I'm taking the framework next: buffers as a first-class abstraction. Named, independently updatable, composable into layouts. Control over which buffers appear in the prompt at any given time to manage context window pressure.


The Takeaway

The model absorbed 30+ years of human computer interaction during training. Terminals, browsers, REPLs, document formats, editors — it knows these environments with extraordinary depth.

You're not locked into treating the prompt as a blank canvas you have to fill with instructions. You can render an environment the model already inhabits.

Stop describing worlds to your agents. Start rendering them.

Top comments (0)