Introduction
This article is my point of view on agents with a technical deep dive on them. I'll be sharing my journey on how I built a working AI agent from scratch, decomposing every component and discussing the trade offs, the latency, the cost and reliability all along.
My goal is to make this deterministic system that's wrapped around a probabilistic core explicit.
Defining an AI Agent
Before building anything, we need to draw hard boundaries between three commonly conflated systems, which are scripts, chatbots and finally agents.
Assumptions
You should be comfortable with JavaScript (async/await, APIs), Basic HTTP concepts and JSON data structures.
Scripts (Deterministic Program)
A script is a just fixed mapping:
y=f(x)
- Same input → same output
- No adaptation
- No internal state beyond execution context for example:
function classify(input){
if(input.includes("error")) return "bug";
return "general";
}
With no notion of iteration, decision under uncertainty or external tool usage.
Chatbots (Single-Step LLM System)
A chatbot introduces probabilistic behavior:
y∼p(y∣x)
The output is generated from a probability distribution, so still just a single step with no iterative reasoning loop and no explicit action execution.
for example:
const response = await llm("Explain recursion simply");
Even with conversation history, this remains a mapping, not a system, with no persistent goal tracking and no structured interaction with the environment.
Agent (Iterative, Stateful System)
An agent is fundamentally different:
at∼π(a∣st),st+1=f(st,at,rt)
| Mathematical Term | Meaning | Code Representation |
|---|---|---|
| ( st ) | Current state |
state object |
| ( at ) | Chosen action |
action JSON |
| ( rt ) | Tool execution result | result |
| ( \pi ) | Policy (decision model) |
llm() function |
| ( f ) | State transition | updateState() |
async function step(state, memory){
const action = await policy(state, memory); // at
const result = await execute(action); // rt
const nextState = updateState(state, result); // s_{t+1}
return { nextState, action, result };
}
It's iterative with multi-step execution, stateful maintaining memory across stepsand action-oriented interacting with tools/environment
for example:
while(!done){
const action = decide(state);
const result = act(action);
state = update(state, result);
}
This loop is the defining feature, without it you just have a wrapper around an API.
Overview
I myself, just began learning about AI agents, and this is my very first small one, tinybot:
import { groq } from '@ai-sdk/groq';
import { generateText } from 'ai';
const model = groq('llama-3.3-70b-versatile', {
apiKey: 'groq api key goes here',
});
const { text } = await generateText({
model: model,
system: 'Answer everything in exactly 3 words.',
prompt: 'What is the meaning of life?',
});
console.log(text);
tinybot response:
taha@192 tinybot % node tinybot.js
Find True Happiness
Why Most AI Agent Tutorials Fall Short
For me, I think most tutorials treat AI agents as black boxes, thus creating an over abstraction because of the reliance on frameworks that hide core mechanics.
Like this:
const agent = new Agent({...});
await agent.run();
And as a result many or some cannot debug failures, extend functionality and reason about performance.
A More Precise View
At its core, an AI agent can be modeled as a discrete-time control system.
At each time step t, the agent:
- Observes a state st
- Chooses an action a
- Receives a result rt
- Transitions to a new state st+1
We can express this formally:
st+1=f(st,at,rt)
Where:
- st = current state (input + memory)
- at = action chosen by the agent
- rt = result of executing the action
- f = state transition function
State Representation (st)
State is the most underexplained part of agent systems.
Formally, it is everything the agent conditions on:
st=(x,mt,ht)
Where:
- x = current input
- mt= memory (retrieved knowledge)
- ht= interaction history
example:
const state = {
input: "Find a good fishing rod under $1000",
memory: [...retrievedDocs],
history: [...previousSteps]
};
Key insight:
- The LLM never “sees” your system, only the serialized state you provide; meaning
Bad state design = bad decisions.
Deterministic System, Probabilistic Core
An important distinction is that an agent system (loop, tools, memory) is deterministic and the policy (LLM) is probabilistic.
We can think of the full system as:
Deterministic Runtime + Probabilistic Policy = AI Agent
Or more formally:
Agent=Runtime(π,T,M)
Where:
- π = policy (LLM)
- T = set of tools
- M = memory system
Why This Matters
This framing is not academic, it just impacts how you build systems, for example if you don’t control st, the agent behaves unpredictably, if you don’t constrain at, the agent may hallucinate actions, and if f is poorly designed, the system becomes unstable.
Stateless vs Stateful Systems
Stateless
Stateless, means each decision is independent:
at∼π(a∣x)
- No memory
- No accumulation of knowledge
- Limited reasoning depth
Stateful
Decisions depend on history:
at∼π(a∣st)
- Enables multi-step reasoning
- Allows correction and refinement
- Introduces complexity (memory growth, noise)
Code Comparison
- Stateless:
await llm("Summarize this article");
- Stateful:
await llm(buildPrompt({
input,
history,
retrievedMemory
}));
From Theory to Execution: Full Step Trace
Let’s walk one iteration concretely:
Step 1: Initial state
state = {
input: "Find a good fishing rod under $1000",
history: [],
memory: []
};
Step 2: Policy decision
{
"action": "search_products",
"args": { "query": "fishing rod under 1500" }
}
Step 3: Tool execution
result = [
{ name: "Rod 1", price: 800 },
{ name: "Rod 2", price: 650 }
];
Step 4: Policy decision
{
"action": "search_products",
"args": { "query": "fishing rod under 1000" }
}
Step 5: State transition
state = {
...state,
history: [
{
action: "search_products",
result
}
]
};
Key Takeaways
- An agent is defined by its loop, not its model
- State design directly determines decision quality
- The LLM is just a policy function, not the system itself
- Determinism is a configuration choice, not a default
Core Architecture
After defining what an agent, we need to see the structure of this system. The questions we need to answer is
how do we decompose an agent into components that are modular, testable, and maybe scalable?
The answer is that an agent can be represented as a composition of interacting modules:
Agent=(π,M,T,E)
Where:
- π = policy (LLM decision function)
- M = memory system
- T = toolset
- E = execution runtime (loop + orchestration)
Conceptual Architecture
User Input
↓
State Builder (input + memory + history)
↓
Policy (LLM)
↓
Action (JSON)
↓
Tool Executor
↓
Result
↓
Memory Update
↓
Loop (repeat or terminate)
How I see it is this conceptual architecture you see above is that its more of a feedback system than a pipeline.
Data Flow
- State → Policy
Serialize state into a prompt
- Policy → Action
LLM outputs structured decision
- Action → Tool
System executes external function
- Tool → Result
Returns data to agent
- Result → State Update
Incorporated into next iteration
Concrete representation:
async function agentStep(state, memory){
const prompt = buildPrompt(state, memory);
const action = await llm(prompt); // π(st)
const parsed = parseAction(action); // structured at
const result = await execute(parsed); // T(at)
const nextState = updateState(state, parsed, result); // f(...)
return { nextState, parsed, result };
}
plaintext
Serialization Boundary
A serialization boundary is the checkpoint, as for an agent to "packs its bags" to travel across a network or wait in storage it needs to take a formal format, like a JSON, YAML and TOON formats.
AT the end, the keypoint to remember is that the LLM cannot operate on objects, it operates on text.
So we define a serialization function:
function buildPrompt(state) {
return `
You are an agent.
User goal:
${state.input}
History:
${JSON.stringify(state.history)}
Available tools:
${JSON.stringify(toolSchemas)}
`;
}
plaintext
Final Verdict: The serialization function is the encoding half of the process and the decoding half happens inside the LLM's "brain" when it parses your prompt to understand the context.
Memory Systems
Without memory, the agent reduces to just a stateless function, as it turns an agent from a reactive loop into a system capable of contextual reasoning and personalization.
Short Term Memory
Short term memory is what you pass directly into the model.
Implementation
const history = [
{
action: { name: "search_products", args: { query: "fishing rod" } },
result: [{ name: "Rod 1", price: 850 }]
}
];
plaintext
Injecting into prompt
function buildPrompt(state) {
return `
User goal:
${state.input}
History:
${JSON.stringify(state.history, null, 2)}
`;
}
plaintext
Long Term Memory
Short term memory is insufficient for the agent to remember large documents, user preferences and cross session knowledge; well here persistent memory is introduced.
Storage Options
- Database (PostgreSQL, MongoDB)
- Vector database (for semantic search)
- File based storage (simple file systems)
for example:
await db.insert({
userId: "123",
text: "User prefers Scorpion fishing rods",
createdAt: Date.now()
});
plaintext
Tooling and Action Execution
Memory allows an agent to think with context but tools allow an agent to act on the world.
With tools, it becomes an interactive system capable of retrieving data, triggering workflows, and producing side effects; and without these tools, an agent is limited to text generation.
What Makes a “Tool”
A tool is any callable function that:
- Accepts structured input
- Performs an operation (internal or external)
- Returns a result to the agent
Examples of Tools
- API calls (weather, search, payments)
- File system operations
- Computation utilities
for example:
const tools = {
getWeather: async ({ city }) => {
const res = await fetch(`https://api.weather.com/${city}`);
return res.json();
}
};
plaintext
Bare in mind that in tools scope, timeouts matter a lot as without constraints, latency can go endelessly.
One slow tool can block the entire agent loop.
Core Runtime
The center of the entire system is the agent loop. Everything we’ve built so far, from policy, memory to tools, only becomes meaningful when orchestrated through a controlled execution loop
Minimal loop
async function runAgent(input) {
let state = {
input,
history: [],
memory: []
};
for (let step = 0; step < 10; step++) {
const action = await policy(state);
const result = await execute(action);
state = updateState(state, action, result);
if (isDone(state, action)) break;
}
return state.output;
}
plaintext
Termination Conditions
Without termination logic, the loop is unbounded.
Practical Conditions
1. Explicit Final Action
if (action.type === "final"){
return action.output;
}
plaintext
2. Max Step Limit
if (step >= MAX_STEPS){
throw new Error("Max steps exceeded");
}
plaintext
3. Heuristic Completion
function isDone(state){
return state.history.length > 0 &&
state.history[state.history.length - 1].action.type === "final";
}
Why This Matters
Without termination, we will have:
- Infinite loops
- Unbounded cost
- API rate issues
Conclusion
This article walked through what an AI agent actually looks like under the hood, from the control loop to memory and tools with small minimal JavaScript implementation. Keep in mind that this is not a deep or complete system just a minimal, educational implementation, basically just what I learned while exploring AI agents and there’s still a lot missing.
If you’re trying to learn this too, my advice is don’t start with frameworks, just try to build a small agent yourself; even a basic version will force you to understand a lot and that’s where the real learning happens.
Top comments (0)