Dinindu Suriyamudali

Posted on Nov 9

How We Solve Problems (And How Agents Should Too)

#agents

When we tackle any task such as debugging code, answering questions, analysing data, completing workflows, we're essentially problem-solving agents using tools at our disposal:

If it's familiar: We recall the steps from our memory or check our knowledge base
If it's new: We are googling error messages, scanning Stack Overflow, pinging colleagues on Slack, tailing logs, running diagnostic commands

The Learning Loop

Once we complete a task, one of two things or both happens:

We document it - Add it to the wiki, create a runbook, update the knowledge base
It lives in our head - We just remember "now I know how to do this"

Next time the same task appears: We skip the research phase. We go straight to our documented solution or our memory. It's faster, more confident. Sometimes the old solution doesn't work anymore. The environment changed, dependencies updated, or the root cause shifted. Now we're back to research mode, but with context. We're not starting from zero. We're debugging why our known pattern failed. Once we find the new path, we update our mental model or (ideally) update the documentation.

This is exactly how AI agents with memory should work Try the known pattern first, and if it fails, explore new paths while updating their knowledge base.

Building Agents That Learn Like We Do

Agents need the same learning loop we use, but systematised into layers:

Layer 1: The Ledger (Execution History)
Every action gets logged: which tool was called, what happened, did it work.

What to store:

Agent actions (which tool was called, when, why)
Inputs (user queries, context, parameters)
Outputs (tool results, agent responses, errors)
Metadata (timestamps, latency, token usage, success/failure)

{
  "trajectory_id": "uuid",
  "timestamp": "iso_datetime",
  "agent_id": "agent_name",
  "step": {
    "action": "tool_call",
    "tool_name": "web_search",
    "input": {...},
    "output": {...},
    "reasoning": "why this tool was chosen"
  },
  "context": {...}
}

Layer 2: Smart Tool Selection (Retrieval Layer)
Instead of passing all available tools to the agent, dynamically serve only relevant tools.

How it works:

Embed tool descriptions using text embeddings
At runtime, embed the user's task
Retrieve top-k most relevant tools via vector similarity
Pass only these tools to the agent

# Index tools
tools = [
  {"name": "web_search", "description": "Search the web for current information"},
  {"name": "calculator", "description": "Perform mathematical calculations"},
  # ... more tools
]

# Create embeddings
tool_embeddings = embed_texts([t["description"] for t in tools])

# At runtime
task_embedding = embed_text(user_query)
relevant_tool_indices = vector_search(task_embedding, tool_embeddings, top_k=5)
available_tools = [tools[i] for i in relevant_tool_indices]

Layer 3: Tool Relationship Map (Knowledge Graph)
Model which tools work well together and in what sequences.

What to capture:

Tool dependencies (tool A requires output from tool B)
Sequential patterns (tool chains that succeed together)
Conditional relationships (if X fails, try Y)
Context requirements (tool C needs specific input types)

# Analyse trajectories
for trajectory in trajectories:
    for i in range(len(trajectory) - 1):
        current_tool = trajectory[i].tool_name
        next_tool = trajectory[i+1].tool_name
        graph.add_node(current_tool)
        graph.add_node(next_tool)
        graph.add_edge(current_tool, next_tool, weight=success_rate)

The point: Agents shouldn't just remember individual solutions. They should learn patterns and workflows.

The Execution Flow

User submits task
Retrieve relevant tools based on task embedding from Layer 2
- → Returns: [tool_a, tool_b, tool_c]
Consult knowledge graph for tool relationships from Layer 3
- → Identifies: "When tool_a was used successfully, tool_d and tool_e were often needed"
- → Returns: [tool_d, tool_e] - complementary tools from past successful trajectories
Agent executes with curated toolset
- → Available tools: [tool_a, tool_b, tool_c, tool_d, tool_e]
Store complete trajectory in Layer 1
- → Records which tools were actually used and in what order
Update Layer 3 based on success/failure.
- → Strengthens edges between tools that worked well together
- → Weakens or removes edges for failed combinations

The result: An agent that gets better with every problem it solves just like we do, but without forgetting and faster.

Handling the Cold Start Problem

Layer 3 faces a classic bootstrapping challenge. It needs trajectories to learn patterns, but agents need patterns to select optimal tools. Here's how to address this:

1. Pre-seed with Expert Knowledge

Start with manually curated tool relationships based on documentation and common workflows:

# Pre-populate graph with known tool relationships
expert_patterns = [
    ("web_search", "web_fetch", {"weight": 0.9, "source": "expert"}),
    ("gdrive_get", "salesforce_update", {"weight": 0.8, "source": "expert"}),
    ("database_query", "data_analysis", {"weight": 0.85, "source": "expert"})
]

for source, target, metadata in expert_patterns:
    graph.add_edge(source, target, **metadata)

This pre-seeding allows agent to skip the cold start phase entirely, beginning in "warm start" mode with baseline patterns that improve over time.

2. System Maturity Phases

The system adapts based on how much it has learned:

Warm Start (Initial Phase):
- Layer 3 contains pre-seeded expert patterns
- Layer 2 remains primary, Layer 3 provides supplementary hints
- Example: "User needs web_search → Layer 2 returns [web_search, api_call], Layer 3 weakly suggests [web_fetch] based on expert knowledge"
Hot Start (After Learning):
- Layer 3 has rich, validated patterns from real trajectories
- Layer 3 provides strong suggestions based on proven workflows
- Example: "User needs web_search → Layer 2 returns [web_search, api_call], Layer 3 strongly recommends [web_fetch, content_parser] based on 47 successful patterns"

Key Point: Pre-seeded patterns serve as a starting baseline. As the agent executes tasks, real trajectories either validate and strengthen these patterns or reveal better alternatives.

The Efficiency Gains

1. Avoiding Tool Definitions Overload

Traditional agents load every available tool into their context. All 50+ tool descriptions, schemas, and examples. This burns through tokens before the agent even starts thinking.

Layer 2 changes this. Instead of "here are all your tools" the system retrieves only the 3-5 relevant tools for the specific task.

The result: smaller context windows, faster processing, lower costs, and agents that can scale to hundreds of tools without drowning in their own toolbox.

2. Reducing Intermediate Tool Result Token Consumption

When Layer 3 knows "Tool B needs Tool A's output," the agent can write code to pipe data directly between tools without the LLM processing it twice.

Traditional approach: Consider a task like "Download my meeting transcript from Google Drive and attach it to the Salesforce lead." The model makes calls like:

TOOL CALL: gdrive.getDocument(documentId: "abc123")
→ returns "Discussed Q4 goals...\n[full transcript text]"
   (loaded into model context)

TOOL CALL: salesforce.updateRecord(
    objectType: "SalesMeeting",
    recordId: "00Q5f000001abcXYZ",
    data: { "Notes": "Discussed Q4 goals...\n[full transcript text]" }
)
(model needs to write entire transcript into context again)

Intermediate tool results flow through the model twice. Once when reading, once when writing to the next tool.

Code execution approach: Agent writes code that passes Tool A's output directly to Tool B in the execution environment. The LLM never sees the intermediate data, only the final result.

This can reduce token usage drastically for workflows involving large documents or datasets.

3. Continuous Learning

Every successful or failed trajectory refines the graph:

if trajectory.is_successful():
    strengthen_edges(trajectory.tool_sequence)
else:
    weaken_edges(trajectory.tool_sequence)

Pre-seeded expert patterns gradually evolve into data driven patterns based on actual performance. If an expert defined relationship doesn't work well in practice, the system learns to deprioritise it.

Summary

We solve problems by trying known solutions first, then researching when we hit something new. We document our discoveries and build mental models of what works together. AI agents need the same learning loop.

This three layer memory architecture transforms agents from stateless tools into learning systems:

Layer 1 remembers what happened
Layer 2 finds relevant tools efficiently
Layer 3 learns which tools complement each other

Start with expert curated tool relationships to overcome the cold start problem, then let the agent learn from real trajectories. Each successful (or failed) workflow strengthens the graph's understanding of tool relationships.

The payoff: Agents that handle hundreds of tools without drowning in context, reduce token usage in complex workflows, and continuously improve with every problem they solve, just like we do.