DEV Community

Udit
Udit

Posted on

Building an Agent Harness from Scratch: The Loop and the Tools

How I built a secure coding agent for the terminal, and why the best way to understand agents is to build a small harness yourself.

LLMs are powerful, but by themselves they are not really “agents.”

A model can generate text. It can explain code. It can suggest commands. It can describe how to fix a bug.

But it cannot actually inspect your project, read files, edit code, run tests, or iterate on the result unless something around it gives it those abilities.

That “something” is what I call an Agent Harness.

An Agent Harness is the runtime layer that connects an LLM to:

  • user input
  • conversation history
  • tools
  • tool results
  • safety rules
  • approvals
  • model/provider configuration
  • session state

In simple words:

An Agent Harness is the system that turns a language model into an acting system.

I recently built my own terminal coding agent, uai-agent, and the biggest thing I learned is this:

Building a small harness from scratch is the best way to understand agents.

Not reading papers.

Not only using LangChain or existing frameworks.

Not only prompting ChatGPT.

Actually building the loop yourself teaches you what agents really are.

In this article, I want to break down the two most important parts of an Agent Harness:

  1. The Loop
  2. The Tools

These two pieces are the heart of the system.


What Is an Agent Harness?

A basic LLM app usually looks like this:

User input
→ LLM
→ Response
Enter fullscreen mode Exit fullscreen mode

That is just chat.

An agent harness looks more like this:

User input
→ LLM
→ Tool call?
→ Execute tool
→ Send tool result back to LLM
→ LLM continues
→ Repeat until final answer
Enter fullscreen mode Exit fullscreen mode

This repeated cycle is what gives the agent the ability to act.

In my project, the harness runs inside the terminal. The goal is to let an AI coding assistant work inside the current workspace while still giving the developer control.

The agent can:

  • read files
  • write files
  • edit files
  • run safe shell commands
  • use workspace context
  • switch models/providers
  • save and load sessions
  • ask for approval before risky actions

The project structure looks like this:

uai-agent/
├── index.js              # Main CLI and agent loop
├── config.js             # Provider, model, and approval configuration
├── config/
│   ├── SYSTEM.md         # Agent behavior instructions
│   └── tools.js          # Tool schemas exposed to the model
├── tools/
│   ├── bash.js           # Controlled shell execution
│   ├── fsOps.js          # File read/write/edit operations
│   └── toolCall.js       # Tool dispatching
├── utils/
│   ├── approval.js       # Approval workflow
│   ├── pathSecurity.js   # Workspace restrictions
│   ├── userAppend.js     # Context tag processing
│   └── commands.js       # Slash command handlers
└── test/
Enter fullscreen mode Exit fullscreen mode

The important idea is that the model is not directly touching my computer.

The model only produces structured tool requests.

The harness decides:

  • whether the tool exists
  • whether the arguments are valid
  • whether the operation is safe
  • whether the user must approve it
  • how to execute it
  • how to send the result back to the model

That separation is the core of a good agent harness.


Part 1: The Loop

The loop is the heart of the agent.

Without a loop, the model can only answer once.

With a loop, the model can:

  1. receive a task
  2. inspect the project
  3. call a tool
  4. observe the output
  5. make another decision
  6. continue until the task is done

A simplified version of the loop looks like this:

while (true) {
  const userInput = await askUser()

  messages.push({
    role: "user",
    content: userInput
  })

  const response = await model.chat({
    messages,
    tools
  })

  if (response.hasToolCalls) {
    messages.push(response.assistantMessage)

    const toolResults = await executeTools(response.toolCalls)

    messages.push(...toolResults)

    continue
  }

  messages.push({
    role: "assistant",
    content: response.content
  })

  print(response.content)
}
Enter fullscreen mode Exit fullscreen mode

That is the basic shape of an agent.

The important thing is not just calling the model.

The important thing is maintaining the conversation state.

In my actual harness, the conversation is stored in a message array:

const msgArray = [
  {
    role: "system",
    content: systemPrompt
  }
]
Enter fullscreen mode Exit fullscreen mode

Every user message, assistant response, tool call, and tool result gets added to this array.

That means the model can see the history of what happened.

For example:

System: You are a coding assistant.
User: Find why the tests are failing.
Assistant: I need to inspect package.json.
Assistant tool_call: read package.json
Tool: contents of package.json
Assistant: I should run npm test.
Assistant tool_call: bash npm test
Tool: test output
Assistant: The failure is in fsOps.test.js...
Enter fullscreen mode Exit fullscreen mode

This is what makes the agent feel continuous.

The LLM is stateless by itself.

The harness gives it memory through the message array.


The Real Agent Loop in uai-agent

In index.js, the main loop is basically:

(async () => {
  while (true) {
    await main()
  }
})()
Enter fullscreen mode Exit fullscreen mode

Inside main(), the harness does several things:

  1. initialize the model client
  2. ask the user for input
  3. handle slash commands like /clear, /model, /save
  4. attach extra context from tags like @./file.js or @workspace
  5. send messages and tools to the model
  6. stream the assistant response
  7. collect tool calls if the model requests them
  8. ask for approval if needed
  9. execute the tools
  10. send tool results back into the conversation
  11. save the session

The conceptual flow is:

User
 ↓
Add context
 ↓
Send messages + tools to model
 ↓
Stream assistant response
 ↓
Did model request tools?
 ↓
Yes → approve → execute → append tool result → loop again
No  → print final answer → wait for next user
Enter fullscreen mode Exit fullscreen mode

This is the most important mental model for building agents.

The loop is not magic.

It is just controlled repetition.


Why the Loop Matters

A single LLM response is not enough for coding work.

Imagine asking:

Fix the failing tests.
Enter fullscreen mode Exit fullscreen mode

The model needs to do multiple steps:

1. Inspect the project.
2. Read package.json.
3. Run the test command.
4. Read the failing test file.
5. Read the source file.
6. Edit the source file.
7. Run tests again.
8. Report the fix.
Enter fullscreen mode Exit fullscreen mode

That cannot be done in one normal chat completion unless the harness lets the model call tools and observe results.

The loop is what creates this behavior:

Think → Act → Observe → Think → Act → Observe
Enter fullscreen mode Exit fullscreen mode

Or in coding-agent terms:

Prompt → Tool call → Tool result → Next prompt
Enter fullscreen mode Exit fullscreen mode

This is the agent pattern.


Part 2: Tools

If the loop is the heart of the agent, tools are the hands.

Tools define what the agent can actually do.

In my harness, I started with four practical coding tools:

read
bash
write
edit
Enter fullscreen mode Exit fullscreen mode

These are simple, but they are enough to build a useful coding assistant.

Tool 1: read

The read tool lets the agent inspect a file.

{
  name: "read",
  description: "Read the contents of a file.",
  parameters: {
    filePath: "string"
  }
}
Enter fullscreen mode Exit fullscreen mode

The model might request:

{
  "filePath": "./package.json"
}
Enter fullscreen mode Exit fullscreen mode

The harness then reads the file and sends the content back as a tool result.

This is important because the model should not guess what is inside your codebase.

A coding agent should inspect before editing.


Tool 2: bash

The bash tool lets the agent run terminal commands.

Example:

{
  "command": "npm test"
}
Enter fullscreen mode Exit fullscreen mode

But this is also the most dangerous tool.

A shell command can do real damage if you are careless.

So in my harness, bash is heavily restricted.

The code uses an allowlist of commands such as:

const ALLOWED_COMMANDS = new Set([
  "ls",
  "dir",
  "pwd",
  "echo",
  "cat",
  "head",
  "tail",
  "find",
  "grep",
  "wc",
  "git",
  "npm",
  "node",
  "true",
  "false",
  "seq"
])
Enter fullscreen mode Exit fullscreen mode

It also blocks dangerous patterns like:

const BLOCKED_COMMANDS = [
  "rm -rf /",
  "mkfs",
  "dd if=",
  "shutdown",
  "reboot",
  "sudo",
  "cat /etc/shadow"
]
Enter fullscreen mode Exit fullscreen mode

The harness does not simply trust the model.

It validates the command first.

That is a very important principle:

The model can request an action, but the harness decides whether the action is allowed.


Tool 3: write

The write tool lets the agent create or overwrite a file.

{
  name: "write",
  parameters: {
    filePath: "string",
    content: "string"
  }
}
Enter fullscreen mode Exit fullscreen mode

This is useful for generating new files, tests, docs, or config files.

But again, the harness checks the path before writing.

The model should not be able to write anywhere on your machine.

It should only operate inside the current workspace.


Tool 4: edit

The edit tool replaces exact text in a file.

{
  name: "edit",
  parameters: {
    filePath: "string",
    oldContent: "string",
    newContent: "string"
  }
}
Enter fullscreen mode Exit fullscreen mode

This is safer than asking the model to rewrite entire files every time.

The model must provide the exact old content and the replacement.

A simplified implementation looks like this:

function editFile(filePath, oldContent, newContent) {
  const data = fs.readFileSync(filePath, "utf-8")

  if (!data.includes(oldContent)) {
    return "Error: oldContent not found. No changes made."
  }

  fs.writeFileSync(
    filePath,
    data.replace(oldContent, newContent),
    "utf-8"
  )

  return "File edited successfully."
}
Enter fullscreen mode Exit fullscreen mode

This gives the agent precision.

It also reduces accidental edits.


Tool Schemas: How the Model Knows What It Can Do

The model does not magically know your tools.

You have to describe them.

In config/tools.js, each tool is defined as a function schema.

For example, the read tool:

{
  type: "function",
  function: {
    name: "read",
    description: "Read the contents of a file.",
    strict: true,
    parameters: {
      type: "object",
      properties: {
        filePath: {
          type: "string",
          description: "The path to the file to read."
        }
      },
      required: ["filePath"],
      additionalProperties: false
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This schema does three things:

  1. tells the model the tool exists
  2. explains when to use it
  3. defines the exact argument shape

The important part is:

strict: true
Enter fullscreen mode Exit fullscreen mode

and:

additionalProperties: false
Enter fullscreen mode Exit fullscreen mode

This pushes the model to produce clean structured arguments.

The model should not call:

{
  "path": "./index.js",
  "random": true
}
Enter fullscreen mode Exit fullscreen mode

It should call:

{
  "filePath": "./index.js"
}
Enter fullscreen mode Exit fullscreen mode

Tool schemas are the contract between the model and the harness.


Tool Dispatching: Turning Model Requests into Real Actions

Once the model emits a tool call, the harness has to execute it.

In my project, this happens in tools/toolCall.js.

The idea is to keep a map of tool names to handlers:

const toolHandlers = new Map()

toolHandlers.set("read", async (input) => {
  return readFile(input.filePath)
})

toolHandlers.set("write", async (input) => {
  return writeFile(input.filePath, input.content)
})

toolHandlers.set("edit", async (input) => {
  return editFile(
    input.filePath,
    input.oldContent,
    input.newContent
  )
})

toolHandlers.set("bash", async (input) => {
  return bash(input.command)
})
Enter fullscreen mode Exit fullscreen mode

Then, when the model asks for a tool, the harness does:

const handler = toolHandlers.get(toolName)

if (!handler) {
  return {
    role: "tool",
    content: `Unknown tool: ${toolName}`
  }
}

const output = await handler(input)

return {
  role: "tool",
  tool_call_id: toolCallId,
  content: output
}
Enter fullscreen mode Exit fullscreen mode

This is a clean design because adding a new tool becomes simple:

  1. implement the tool
  2. register the handler
  3. add the schema
  4. write tests

The Most Important Safety Rule: The Model Requests, the Harness Decides

One mistake people make when building their first agent is giving the model too much direct power.

That is dangerous.

A model can hallucinate.

A model can misunderstand.

A model can produce risky commands.

A model can accidentally leak sensitive information.

So the harness must be the authority.

There are several safety layers.


1. Workspace-Scoped File Access

The agent should only operate inside the current project.

In utils/pathSecurity.js, paths are resolved against the current working directory.

The idea is:

function resolveWorkspacePath(filePath) {
  const root = realpath(process.cwd())
  const candidate = realpath(resolve(filePath))

  if (!candidate.startsWith(root)) {
    return {
      ok: false,
      reason: "Path outside workspace is not allowed"
    }
  }

  return {
    ok: true,
    realPath: candidate
  }
}
Enter fullscreen mode Exit fullscreen mode

This prevents the model from reading or writing files like:

/etc/passwd
~/.ssh/id_rsa
../some-other-project/.env
Enter fullscreen mode Exit fullscreen mode

This is critical.

If you are building a coding agent, do not skip workspace restrictions.


2. .gitignore Protection

My harness can also treat .gitignore files as unsafe.

Why?

Because .gitignore often contains:

.env
node_modules
dist
secrets.json
coverage
Enter fullscreen mode Exit fullscreen mode

If a file is ignored, there is a good chance it should not be casually sent to an LLM.

So the harness checks .gitignore patterns and can require approval or deny access.

That is a practical safety feature many simple agents miss.


3. Bash Validation

The bash tool validates commands before execution.

It blocks:

  • shell operators
  • redirection
  • command substitution
  • absolute paths
  • parent traversal
  • dangerous git operations
  • unsafe npm commands
  • destructive commands

For example, the harness rejects commands with shell metacharacters:

const SHELL_METACHARS = /[;&|`$<>\n\r]/
Enter fullscreen mode Exit fullscreen mode

That means the model cannot do:

cat package.json && rm -rf .
Enter fullscreen mode Exit fullscreen mode

It also executes commands using execFile, not a shell:

execFileAsync(file, args, {
  cwd: process.cwd(),
  shell: false
})
Enter fullscreen mode Exit fullscreen mode

This is much safer than passing arbitrary strings to a shell.


4. Approval Workflow

A good harness should not always auto-execute everything.

In config.js, I use approval modes:

export const autoApprove = {
  default: "auto",

  bash: {
    promptExecution: true,
    promptSending: true
  },

  read: {
    promptExecution: false,
    promptSending: true
  },

  write: {
    promptExecution: true,
    promptSending: false
  },

  edit: {
    promptExecution: true,
    promptSending: false
  }
}
Enter fullscreen mode Exit fullscreen mode

The modes are:

auto   → safe operations can run automatically, risky ones ask
manual → use per-tool settings
block  → block all tool calls
allow  → approve everything, useful only for testing
Enter fullscreen mode Exit fullscreen mode

This gives the developer control.

For example, if the agent wants to run a shell command, the CLI can ask:

Execute this tool call? (y/N):
Enter fullscreen mode Exit fullscreen mode

And after getting the output, it can ask whether to send that output back to the model.

This matters because command output may contain sensitive information.


Context Tags: A Small Feature That Makes the Agent Feel Useful

One feature I added is context tags.

The user can type:

Review @./README.md and suggest improvements.
Enter fullscreen mode Exit fullscreen mode

or:

Look at @workspace and tell me how this project is structured.
Enter fullscreen mode Exit fullscreen mode

The harness detects these tags before sending the message to the model.

In utils/userAppend.js, it parses patterns like:

@./file.js
@workspace
Enter fullscreen mode Exit fullscreen mode

Then it adds the file contents or workspace listing to the user message.

This is a nice middle ground between manual copy-paste and fully autonomous file reading.

The user can explicitly attach context, and the agent can still use tools later if it needs more.


Provider-Agnostic Model Configuration

Another design decision: I did not want the harness to be locked to one provider.

The project uses OpenAI-compatible chat completion APIs, so the config can support multiple providers.

In config.js, providers are registered like:

export const models = {
  openai: {
    apiKey: keys.OPENAI_API_KEY,
    baseURL: keys.OPENAI_BASE_URL,
    gpt54mini: {
      model: "gpt-5.4-mini"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

System Prompt: The Agent’s Operating Manual

Tools define what the agent can do.

The system prompt defines how the agent should behave.

In config/SYSTEM.md, I describe rules like:

  • be concise and action-oriented
  • inspect files before changing them
  • prefer small focused edits
  • do not invent tools
  • operate inside the workspace
  • avoid destructive operations
  • use bash only when needed
  • follow command restrictions

This is important because tool schemas alone are not enough.

You also need behavioral instructions.

The system prompt is like the agent’s operating manual.

But the system prompt is not security.

Security must still live in code.

A good rule is:

Prompt for behavior.
Code for enforcement.
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

An agent is not just an LLM.

An agent is an LLM inside a loop, connected to tools, controlled by a harness.

The loop gives the agent continuity.

Tools give the agent the ability to act.

Safety rules keep that action under control.

Building my own harness made agents feel much less mysterious. Under the hood, the core idea is simple:

User asks
Model decides
Harness validates
Tool executes
Result returns
Model continues
Enter fullscreen mode Exit fullscreen mode

That is the whole pattern.

Of course, production systems add many more layers: memory, planning, retries, tracing, evals, permissions, sandboxes, and deployment infrastructure.

But the core is still the same.

If you want to understand agents deeply, build a small harness yourself.

It runs inside your workspace, gives the model practical coding tools, and keeps the developer in control with safety checks and approvals.

If you are interested, check out the repo here:

GitHub: https://github.com/uditrajput03/uai-agent

And if you are building your own harness, start with the loop and the tools.

Everything else grows from there.

Top comments (0)