DEV Community

Sridhar Subramani
Sridhar Subramani

Posted on

AI Needs More Than Just a Model: The Missing Role of Context, Orchestration, and Agents

Solving the AI Puzzle: Why External Context, Orchestration, and Agents Are Key to Intelligence


Disclaimer:

This blog is intended as a beginner-friendly guide to AI, specifically focusing on why Large Language Models (LLMs) should interact with agents to solve any AI query.

It does not cover advanced topics such as training, fine-tuning, or building LLMs from scratch because, at the time of writing, I am not deeply familiar with these areas. However, my experience comes from building an in-house orchestration system, which serves as the central component for facilitating communication between different AI components to solve any AI query. This system routes requests from input agents to classifiers and further to specific agents and tools, ultimately generating a response.

The goal is to introduce key concepts, terminology, and importance of Agents and the need for Orchestration.

Target Audience:

  • Beginners in AI who are exploring concepts like LLMs, agents, and orchestration.
  • Developers from Non-AI Domain & Enthusiasts looking to understand how AI models interact with real-world systems.
  • Product Managers seeking high-level insights without deep technical details.

Analogy

If we think of the human brain as an AI model (e.g., an LLM), then the hands, vocal cords, eyes, etc. would be its agents.

In humans, the brain processes sensory inputs and issues commands to physical agents (e.g., muscles). Here's a simplified analogy:

  1. Your retina captures visual data of a ball coming toward you. The retina acts as an input agent, sending this data to the brain for processing.
  2. Your brain analyzes the visual input and decides that you need to catch the ball. The brain functions as the LLM in this analogy.
  3. The brain sends a signal to your hands to catch the ball. Your hands serve as the output agent, executing the action.

If the brain (LLM) is trained properly, it instructs your hands to act correctly. However, if the training is insufficient, it might make an incorrect decision, such as instructing your eyes to close instead of catching the ball, leaving you vulnerable to an accident.

AI (LLM) can process input, reason with it, and generate outputs, much like the brain processes sensory data and makes decisions. However, just as the brain requires physical agents like hands to interact with the environment, LLMs require agents to bridge the gap between reasoning and action.

Assume we use LLaMA 3.2 as the LLM. Our primary input and output type is text, which makes interaction straightforward. While this works well for text-based outputs like answering questions, it doesn't suffice for all use cases. For instance, text output is suitable when we only want an answer, but if we need to perform an action (e.g., sending an email, opening a file), the LLM cannot handle this because its sole responsibility is to generate text-based outputs. It has no inherent capability to perform actions or interact with the physical or digital world.


Importance of Agents in AI

What are Agents

From our earlier analogy, we assumed that our hands and eyes are agents in the physical world. In the digital world, we call the software component an agent; an agent is anything that can enhance your LLM to read its environment, and it can provide some actionable output from the LLM response.

Can LLM Directly Call Agents?

At present, LLMs cannot directly perform actions. They can:

  1. Understand the input.
  2. Provide reasoning and structured outputs.

However, LLM can instruct at best what should happen but cannot interact with the environment directly to p*erform those actions*. For instance:

  • An LLM can generate an HTML response but cannot open a browser to display it.
  • It can suggest sending an email but cannot send it.

To bridge this gap, we need output agents. These agents read the LLM's output and perform the necessary actions.

Example: Output Agent

If the LLM generates an HTML response, an output agent can:

  • Open a web browser or embed a browser within an application.
  • Render the HTML content for the user to view.

How Agents Extend the Capabilities of LLMs

Agents serve as intermediaries that interact with the LLM to sense the environment or execute actions.

To make the LLM's outputs actionable, we need to augment the system with agents that can process inputs going into the LLM and/or process output from the LLM to make it perform any actions.

Let's consider we are going to create an agent-based system that leverages an LLM to function as a humorous AI assistant.

In this setup, we define two key agents: the Input Agent and the Output Agent.

Flow of Request Handling

  1. User submits a query.
  2. The Input Agent receives the query, appends a system prompt, and forwards it to the LLM.
  3. The LLM processes the prompt, understands the context, and generates a response accordingly.
  4. The Output Agent, which understands the LLM's response format, parses it and executes the necessary actions - such as rendering the output, speaking it aloud, or performing other predefined tasks.

User → Input Agent → LLM → Output Agent

Example:

Let's consider an example where we instruct the LLM to entertain the user with a joke and format the response in HTML:

System Prompt: "You are a funny AI. Your job is to entertain users. Make sure your responses are in HTML format with inbuilt styles."

Query: "Tell me a joke."

Input to LLM = "System Prompt + Query"

The user submits the "Tell me a joke" query, Input Agents creates a Prompt to LLM by appending the "System Prompt with the User Query."

Expected Output from LLM:

<html lang="en">
  <body style="font-family: Arial; color: blue;">
    <p>Why don't scientists trust atoms? Because they make up everything!</p>
  </body>
</html>
Enter fullscreen mode Exit fullscreen mode

The LLM successfully generates the output, but displaying it as a webpage or further acting on this HTML requires an output agent.

The output agent we created recognizes the HTML format in the response. thus rendering the output in a web browser.

What we discussed so far is a simple agent system where each agent is hardwired to interact with the LLM in a predefined way. While this setup works for limited use cases, it becomes impractical as we introduce more agents and tasks.

For example:

  • What if we want to add a new agent that performs speech synthesis?
  • What if multiple agents need to work together dynamically, rather than following a fixed sequence?
  • What if we need to support multiple LLMs based on different tasks?
  • How do we manage interdependencies between agents efficiently?

A manually hardcoded agent system is rigid and difficult to scale. Each new agent requires manual integration, and there's no flexibility in execution. To build t*ruly scalable AI applications, we need a **dynamic system that can decide which agents to use, route tasks efficiently, and adapt to new capabilities*, all without modifying the entire workflow.

This is where Orchestration comes in, acting as the brain of the system, intelligently managing agents, LLM interactions, and task execution.


The Need for an Orchestrator

Agents play a crucial role in AI systems by extending the capabilities of LLMs. However, since LLMs cannot directly call agents, we need a central component, an Orchestrator that connects the LLM with the required context and agents to complete a task.

List of Components in the Orchestrator System

Assume we are going to create an orchestrator system that has the below components:

  1. Input Client: Sends user queries to the Orchestrator.
  2. Agents: Responsible for different tasks, categorized as:
    • Classifier Agent: Determines the type of request and decides which agent should handle it.
    • Intermediate Agent: Performs predefined tasks.
  3. LLM: Processes input queries and helps in decision-making.

The Orchestrator itself runs as a daemon process, exposing APIs that allow input clients, agents, and the LLM to register and interact within the ecosystem.

In this system, we assume:

  • Multiple input clients and agents can register.
  • Only one LLM can be registered at a time.

Registration Process

  • Input Clients register with a unique ID so that the Orchestrator knows where to send responses once the request is fulfilled.
  • LLM is also registered, so agents and input clients can access it via the Orchestrator when needed.
  • Agents register with:
    • A unique ID
    • A description of the agent role
    • An input schema specifying what kind of input agent expect to perform predefined task
    • An output schema defining what agent will return after processing

This metadata allows the LLM to determine whether the system has the necessary capability to process a given request.


Example Query Flow

User Query:
"Take a snapshot from the dashboard camera, merge it with a selfie from the interior camera, and send it to Sridhar's email."

Agents & Their Responsibilities

[1] Camera Agent

Role: Captures snapshots from either the dashboard or interior camera.

Description: "I'm a Camera Agent. I can take snapshots from the dashboard or interior camera."

Input Schema:

{
  "takePicFromDashboard": bool,
  "takePicFromInterior": bool
}
Enter fullscreen mode Exit fullscreen mode

Output Schema:

{
  "dashboardPicPath": "string|null",
  "interiorPicPath": "string|null"
}
Enter fullscreen mode Exit fullscreen mode

[2] Photo Editor Agent

Role: Creates a collage from multiple photos.

Description: "I'm a Photo Editor Agent. I can create a collage from multiple images."

Input Schema:

{
  "picPaths": ["string"]
}
Enter fullscreen mode Exit fullscreen mode

Output Schema:

{
  "collagePicPath": "string|null"
}
Enter fullscreen mode Exit fullscreen mode

[3] Email Agent

Role: Sends an email with multimedia attachments.

Description: "I'm an Email Agent. I can query user contacts, accept multimedia content, and send an email."

Input Schema:

{
  "emailSender": "string",
  "content": "string",
  "mimeType": "string"
}
Enter fullscreen mode Exit fullscreen mode

Output Schema:

{
  "isEmailSent": bool
}
Enter fullscreen mode Exit fullscreen mode

How the Orchestrator Uses LLM to Execute the Plan

When the user submits a query, the Orchestrator sends it to the LLM along with the list of registered agent descriptions.

Example LLM Prompt to Generate an Execution Plan

I want you to act as a planner. Given a user query, devise a plan to execute it.

You can find the list of attached agents to understand the system's capabilities.

Agent descriptions:
[  
  /* All agent descriptions will be added here */  
]

Input Query: "/* Replace user query here */"

Output Format:
[
  {
    "agentName": "string",
    "agentParam": { /* The input schema expected by the agent */ }
  },
  {
    "agentName": "string",
    "agentParam": { /* The input schema expected by the agent */ }
  }
]
Enter fullscreen mode Exit fullscreen mode

This allows the LLM to analyze the system's capabilities and generate an execution plan.

Since the Orchestrator understands the LLM's structured response, it carefully executes the agents in the correct order to fulfill the request.

Final Execution Flow

Input Client → Orchestrator → LLM → Camera Agent → Photo Editor Agent → Email Agent → (Optional: LLM Summarizes Response)

As AI systems evolve, orchestrators will become even more essential in bridging the gap between reasoning and action. But will AI systems ever replace human roles entirely?


Will AI Replace Humans?

I see two major work streams emerging in AI:

  1. Model Development: Engineers and researchers focused on building more efficient and powerful AI models.
  2. Agent & Orchestration Systems: People working on AI agents, orchestration, and external context integration, which often requires domain-specific knowledge.

Let's say, we would like to build a video logging system which would be mounted on a car dashboard, an AI agent might be responsible for:

  • Periodically capturing photo feeds.
  • Geo-tagging them with location data.
  • And converting the data into embeddings for an LLM to process.

In this case, the agent must understand the domain, including how to interact with the camera, read GPS data, and structure the output for further AI-driven analysis. This is where human expertise in software, hardware, and system integration remains essential.

AI's Role in the Workforce

AI is already transforming the workforce by automating mundane and repetitive tasks. However,** AI is not yet at a stage where it can function independently**, for at least the next 5–10 years, human involvement will be crucial in guiding AI systems, making them more useful, adaptable, and safe.

For instance, while AI can generate code, we are still far from an AI-driven system that can write, deploy, and manage software autonomously, such as using AWS APIs to deploy a new system without any human intervention. Until AI can reliably handle real-world complexity, human oversight will be crucial.


Summary

To ensure scalability, an Orchestrator system is essential. It seamlessly integrates various agents, classifiers, and LLMs, efficiently managing request routing, coordinating multiple agents, and utilizing the LLM at different stages of the workflow to accomplish a desired task.

While there are open-source agent orchestration systems available, I haven't explored them yet. For our specific use case, we built a custom Orchestrator tailored to our requirements. Once I've had the opportunity to evaluate other solutions, I'll be able to provide more insights into the existing open-source options.


Annexure

PS: I used ChatGPT and QuillBot's grammar check to refine my writing.
https://quillbot.com/grammar-check

Image of Quadratic

Python + AI + Spreadsheet

Chat with your data and get insights in seconds with the all-in-one spreadsheet that connects to your data, supports code natively, and has built-in AI.

Try Quadratic free

Top comments (0)

AWS GenAI LIVE!

GenAI LIVE! is a dynamic live-streamed show exploring how AWS and our partners are helping organizations unlock real value with generative AI.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️