When we tackle any task such as debugging code, answering questions, analysing data, completing workflows, we're essentially problem-solving agents using tools at our disposal:
- If it's familiar: We recall the steps from our memory or check our knowledge base
- If it's new: We are googling error messages, scanning Stack Overflow, pinging colleagues on Slack, tailing logs, running diagnostic commands
The Learning Loop
Once we complete a task, one of two things or both happens:
- We document it - Add it to the wiki, create a runbook, update the knowledge base
- It lives in our head - We just remember "now I know how to do this"
Next time the same task appears: We skip the research phase. We go straight to our documented solution or our memory. It's faster, more confident. Sometimes the old solution doesn't work anymore. The environment changed, dependencies updated, or the root cause shifted. Now we're back to research mode, but with context. We're not starting from zero. We're debugging why our known pattern failed. Once we find the new path, we update our mental model or (ideally) update the documentation.
This is exactly how AI agents with memory should work Try the known pattern first, and if it fails, explore new paths while updating their knowledge base.
Building Agents That Learn Like We Do
Agents need the same learning loop we use, but systematised into layers:
Layer 1: The Ledger (Execution History)
Every action gets logged: which tool was called, what happened, did it work.
What to store:
- Agent actions (which tool was called, when, why)
- Inputs (user queries, context, parameters)
- Outputs (tool results, agent responses, errors)
- Metadata (timestamps, latency, token usage, success/failure)
{
"trajectory_id": "uuid",
"timestamp": "iso_datetime",
"agent_id": "agent_name",
"step": {
"action": "tool_call",
"tool_name": "web_search",
"input": {...},
"output": {...},
"reasoning": "why this tool was chosen"
},
"context": {...}
}
Layer 2: Smart Tool Selection (Retrieval Layer)
Instead of passing all available tools to the agent, dynamically serve only relevant tools.
How it works:
- Embed tool descriptions using text embeddings
- At runtime, embed the user's task
- Retrieve top-k most relevant tools via vector similarity
- Pass only these tools to the agent
# Index tools
tools = [
{"name": "web_search", "description": "Search the web for current information"},
{"name": "calculator", "description": "Perform mathematical calculations"},
# ... more tools
]
# Create embeddings
tool_embeddings = embed_texts([t["description"] for t in tools])
# At runtime
task_embedding = embed_text(user_query)
relevant_tool_indices = vector_search(task_embedding, tool_embeddings, top_k=5)
available_tools = [tools[i] for i in relevant_tool_indices]
Layer 3: Tool Relationship Map (Knowledge Graph)
Model which tools work well together and in what sequences.
What to capture:
- Tool dependencies (tool A requires output from tool B)
- Sequential patterns (tool chains that succeed together)
- Conditional relationships (if X fails, try Y)
- Context requirements (tool C needs specific input types)
# Analyse trajectories
for trajectory in trajectories:
for i in range(len(trajectory) - 1):
current_tool = trajectory[i].tool_name
next_tool = trajectory[i+1].tool_name
graph.add_node(current_tool)
graph.add_node(next_tool)
graph.add_edge(current_tool, next_tool, weight=success_rate)
The point: Agents shouldn't just remember individual solutions. They should learn patterns and workflows.
The Execution Flow
User submits task
-
Retrieve relevant tools based on task embedding from Layer 2
- → Returns: [tool_a, tool_b, tool_c]
-
Consult knowledge graph for tool relationships from Layer 3
- → Identifies: "When tool_a was used successfully, tool_d and tool_e were often needed"
- → Returns: [tool_d, tool_e] - complementary tools from past successful trajectories
-
Agent executes with curated toolset
- → Available tools: [tool_a, tool_b, tool_c, tool_d, tool_e]
-
Store complete trajectory in Layer 1
- → Records which tools were actually used and in what order
-
Update Layer 3 based on success/failure.
- → Strengthens edges between tools that worked well together
- → Weakens or removes edges for failed combinations
The result: An agent that gets better with every problem it solves just like we do, but without forgetting and faster.
Handling the Cold Start Problem
Layer 3 faces a classic bootstrapping challenge. It needs trajectories to learn patterns, but agents need patterns to select optimal tools. Here's how to address this:
1. Pre-seed with Expert Knowledge
Start with manually curated tool relationships based on documentation and common workflows:
# Pre-populate graph with known tool relationships
expert_patterns = [
("web_search", "web_fetch", {"weight": 0.9, "source": "expert"}),
("gdrive_get", "salesforce_update", {"weight": 0.8, "source": "expert"}),
("database_query", "data_analysis", {"weight": 0.85, "source": "expert"})
]
for source, target, metadata in expert_patterns:
graph.add_edge(source, target, **metadata)
This pre-seeding allows agent to skip the cold start phase entirely, beginning in "warm start" mode with baseline patterns that improve over time.
2. System Maturity Phases
The system adapts based on how much it has learned:
-
Warm Start (Initial Phase):
- Layer 3 contains pre-seeded expert patterns
- Layer 2 remains primary, Layer 3 provides supplementary hints
- Example: "User needs web_search → Layer 2 returns [web_search, api_call], Layer 3 weakly suggests [web_fetch] based on expert knowledge"
-
Hot Start (After Learning):
- Layer 3 has rich, validated patterns from real trajectories
- Layer 3 provides strong suggestions based on proven workflows
- Example: "User needs web_search → Layer 2 returns [web_search, api_call], Layer 3 strongly recommends [web_fetch, content_parser] based on 47 successful patterns"
Key Point: Pre-seeded patterns serve as a starting baseline. As the agent executes tasks, real trajectories either validate and strengthen these patterns or reveal better alternatives.
The Efficiency Gains
1. Avoiding Tool Definitions Overload
Traditional agents load every available tool into their context. All 50+ tool descriptions, schemas, and examples. This burns through tokens before the agent even starts thinking.
Layer 2 changes this. Instead of "here are all your tools" the system retrieves only the 3-5 relevant tools for the specific task.
The result: smaller context windows, faster processing, lower costs, and agents that can scale to hundreds of tools without drowning in their own toolbox.
2. Reducing Intermediate Tool Result Token Consumption
When Layer 3 knows "Tool B needs Tool A's output," the agent can write code to pipe data directly between tools without the LLM processing it twice.
Traditional approach: Consider a task like "Download my meeting transcript from Google Drive and attach it to the Salesforce lead." The model makes calls like:
TOOL CALL: gdrive.getDocument(documentId: "abc123")
→ returns "Discussed Q4 goals...\n[full transcript text]"
(loaded into model context)
TOOL CALL: salesforce.updateRecord(
objectType: "SalesMeeting",
recordId: "00Q5f000001abcXYZ",
data: { "Notes": "Discussed Q4 goals...\n[full transcript text]" }
)
(model needs to write entire transcript into context again)
Intermediate tool results flow through the model twice. Once when reading, once when writing to the next tool.
Code execution approach: Agent writes code that passes Tool A's output directly to Tool B in the execution environment. The LLM never sees the intermediate data, only the final result.
This can reduce token usage drastically for workflows involving large documents or datasets.
So
3. Continuous Learning
Every successful or failed trajectory refines the graph:
if trajectory.is_successful():
strengthen_edges(trajectory.tool_sequence)
else:
weaken_edges(trajectory.tool_sequence)
Pre-seeded expert patterns gradually evolve into data driven patterns based on actual performance. If an expert defined relationship doesn't work well in practice, the system learns to deprioritise it.
Summary
We solve problems by trying known solutions first, then researching when we hit something new. We document our discoveries and build mental models of what works together. AI agents need the same learning loop.
This three layer memory architecture transforms agents from stateless tools into learning systems:
- Layer 1 remembers what happened
- Layer 2 finds relevant tools efficiently
- Layer 3 learns which tools complement each other
Start with expert curated tool relationships to overcome the cold start problem, then let the agent learn from real trajectories. Each successful (or failed) workflow strengthens the graph's understanding of tool relationships.
The payoff: Agents that handle hundreds of tools without drowning in context, reduce token usage in complex workflows, and continuously improve with every problem they solve, just like we do.
Resources
Agent-as-a-Service: The Blueprint for the Next Generation of SaaS
Top comments (0)