DEV Community: Nagashree Bhat

Designing a Multi-Agent AI System for Content Analysis and Recommendations

Nagashree Bhat — Mon, 18 May 2026 18:49:10 +0000

As AI systems evolve, a single model is often no longer enough.

One model may be good at rewriting content, another at analyzing tone, and another at evaluating quality or extracting insights. Very quickly, what starts as a simple LLM integration turns into a coordination problem.

This is where multi-agent systems become powerful.

Instead of relying on one model to do everything, we can design a system where multiple specialized agents collaborate to solve a larger task. Each agent has a focused responsibility, while an orchestration layer manages communication, context, and execution flow.

I recently worked on systems that moved in this direction — where AI was not just generating responses, but coordinating analysis, recommendations, and contextual reasoning across multiple components. This article draws from those ideas while keeping the architecture generic.

In this article, we’ll walk through how to design a scalable multi-agent AI system for content analysis and recommendations using AWS, while also exploring why orchestration and context management become the real engineering challenge at scale.

The Problem

Imagine a marketing or product team reviewing a webpage.

Instead of manually analyzing content, they want an AI system that can:

evaluate clarity and tone
compare messaging against competitors
suggest improvements
generate alternative versions
explain why one version may perform better

At first glance, this seems like a straightforward LLM problem. Just send everything to a large model and ask for recommendations.

But in practice, this approach quickly becomes difficult to maintain.

Prompts grow larger, costs increase, outputs become inconsistent, and responsibilities blur together. One massive prompt ends up trying to perform analysis, reasoning, generation, evaluation, and comparison all at once.

A better approach is to split responsibilities across specialized agents.

Why Multi-Agent Systems Matter

Multi-agent systems work well because they mirror how humans solve complex problems.

Instead of one person doing everything, specialists collaborate:

one analyzes
one researches
one critiques
one generates solutions

AI systems can follow the same pattern.

Rather than building one enormous prompt, we create smaller focused agents that coordinate through orchestration. Each agent is optimized for a narrower task, which improves maintainability, prompt quality, and scalability.

This shift is important because modern AI systems are increasingly becoming orchestration problems rather than pure model problems.

The challenge is no longer just generating text — it’s coordinating reasoning across multiple components while maintaining consistency and control.

High-Level Architecture

Instead of a simple request-response flow, the system behaves more like a coordinated network of specialized workers.

                 User Request
                        │
                        ▼
              API Gateway Layer
                        │
                        ▼
               Orchestrator Agent
        (Task Planning & Coordination)
                        │
                        ▼
                    MCP Layer
     (Structured Context + Shared Schema)
                        │
        ┌───────────────┼────────────────┐
        │               │                │
        ▼               ▼                ▼
 Content Agent   Competitor Agent   Tone Agent
        │               │                │
        └───────────────┼────────────────┘
                        ▼
              Recommendation Agent
                        │
                        ▼
                 LLM Inference Layer
              (Bedrock / External APIs)
                        │
                        ▼
                   Final Response

At a high level, the flow starts with a user request entering through Amazon API Gateway. The request is then passed to an orchestration layer, which determines which agents should execute and what context they require.

Before requests reach downstream agents, MCP standardizes the structure of context, metadata, and task instructions. This ensures that all agents operate using a consistent interface rather than exchanging loosely structured prompts.

Each agent performs a specialized task and returns structured outputs that are later combined into a final recommendation.

What matters most here is not the individual model call — it’s the coordination between components.

The Role of the Orchestrator

The orchestrator is effectively the brain of the system.

Instead of directly generating responses, it decides:

which agents should execute
how tasks should be sequenced
what context each agent needs
how outputs should be combined

This represents one of the biggest architectural shifts in modern AI systems.

In simpler applications, the backend directly calls the model.

In multi-agent systems, the backend coordinates reasoning across multiple specialized workflows.

The orchestrator becomes less of a request handler and more of a lightweight decision engine.

In AWS-based systems, this orchestration layer can be implemented using AWS Lambda for event-driven workloads or containerized services for more complex orchestration requirements.

Specialized Agents

The strength of the system comes from specialization.

For example:

a Content Agent may evaluate clarity and structure
a Tone Agent may determine whether messaging matches the intended audience
a Competitor Agent may compare positioning against external content
a Recommendation Agent may synthesize all outputs into actionable suggestions

Because each agent focuses on a narrower task, prompts remain smaller, easier to optimize, and more consistent.

This also creates flexibility. Teams can independently improve or replace individual agents without redesigning the entire system.

For example, a Competitor Agent may retrieve publicly available messaging and identify differences in positioning, pricing language, or value propositions.

If a product page says:

“Simple pricing for growing businesses”

the Competitor Agent may retrieve competing messaging such as:

“Transparent pricing with no hidden fees”
“Built for small teams scaling quickly”

The agent can then identify differences in positioning, clarity, and emphasis before passing insights to the Recommendation Agent.

A Recommendation Agent can then combine outputs from multiple agents and generate actionable suggestions such as:

simplifying technical language
improving audience alignment
strengthening differentiation
increasing clarity around pricing or value

This layered approach allows recommendations to feel more contextual and explainable rather than purely generative.

Why MCP Becomes Critical

As soon as multiple agents are introduced, context management becomes significantly harder.

Different agents may:

require different inputs
produce different output structures
depend on shared metadata
need awareness of previous reasoning steps

Without structure, orchestration quickly becomes chaotic.

This is where MCP becomes essential.

MCP, or Model Context Protocol, is an open protocol introduced to standardize how context and structured interactions flow between AI systems, tools, and models.

In multi-agent architectures, MCP acts as a structured interface between the orchestration layer and downstream agents. Instead of allowing every component to exchange arbitrary prompts and responses, MCP standardizes how context flows through the system.

It defines:

structured inputs
shared metadata
response schemas
task instructions
contextual constraints

This creates a clean separation between orchestration logic and model interaction.

More importantly, it transforms prompt engineering from scattered application logic into a manageable architectural layer.

A Simple MCP Example

To make this more concrete, imagine a user selects the following text:

“Our pricing plans work for businesses of all sizes.”

The orchestration layer may construct an MCP payload like this before routing it to downstream agents:

{
  "task": "content_optimization",
  "context": {
    "audience": "small business owners",
    "tone": "confident",
    "goal": "increase engagement"
  },
  "input": {
    "selected_text": "Our pricing plans work for businesses of all sizes."
  },
  "agents": [
    "tone_agent",
    "recommendation_agent"
  ]
}

Instead of passing loosely structured prompts between components, MCP standardizes how context, metadata, and instructions move through the system.

For example:

the Tone Agent may evaluate whether the messaging aligns with the target audience
the Recommendation Agent may generate alternative versions optimized for clarity and engagement

As systems grow, this structured approach becomes increasingly important for maintainability and consistency.

A Simple Orchestrator Flow

Once the MCP payload is created, the orchestrator determines which agents should execute based on the task type and context.

A simplified orchestration flow may look like this:

def orchestrate_request(mcp_payload):
    task = mcp_payload["task"]
    agents = mcp_payload["agents"]

    results = {}

    # In production systems, independent agents
    # may execute in parallel to reduce latency.

    if "tone_agent" in agents:
        results["tone"] = run_tone_agent(mcp_payload)

    if "competitor_agent" in agents:
        results["competitor"] = run_competitor_agent(mcp_payload)

    if "recommendation_agent" in agents:
        results["recommendation"] = run_recommendation_agent(
            mcp_payload,
            previous_results=results
        )

    return results

In production systems, orchestration becomes significantly more complex:

some agents execute in parallel
others depend on prior outputs
retries and fallbacks must be managed carefully

But even in simplified form, the key idea remains the same:
the orchestrator coordinates reasoning across specialized agents rather than relying on a single monolithic prompt.

Model Inference Layer

The actual model calls can be handled through Amazon Bedrock.

A simplified inference call may look like this:

response = bedrock.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    body=prompt
)

One advantage of this architecture is model flexibility.

Not every agent needs the same model:

lightweight analysis agents may use smaller, faster models
reasoning-heavy agents may use larger foundation models

This improves both performance and cost efficiency.

In production systems, choosing the right model for the right task is often more important than simply using the largest available model everywhere.

Managing Cost and Latency

Multi-agent systems introduce a new challenge: orchestration overhead.

More agents mean:

more prompts
more model calls
higher latency
increased operational cost

This means orchestration design matters just as much as model quality.

Several practical strategies help manage this:

executing independent agents in parallel
caching repeated outputs
routing lightweight tasks to smaller models
selectively invoking agents only when necessary

One important lesson from production systems is that unnecessary orchestration can quickly become expensive.

Good orchestration is often about deciding when not to invoke an agent.

Reliability and Failure Handling

Distributed AI systems must assume partial failure.

An individual agent may:

timeout
fail
return inconsistent output

The overall system should still remain functional.

This means agents should fail independently, and orchestration should support graceful degradation.

For example, if a competitor analysis agent becomes unavailable, the recommendation system should still be capable of generating useful suggestions using internal analysis alone.

The goal is resilience, not perfection.

Observability in Multi-Agent Systems

Observability becomes significantly more important once multiple agents are introduced.

In traditional systems, monitoring is often focused on infrastructure health, API latency, and request throughput. Multi-agent systems introduce an additional layer of complexity because reasoning itself becomes distributed across multiple components.

Teams now need visibility into:

which agents executed
token usage per agent
orchestration paths
model failures
response quality trends

Without strong observability, debugging becomes difficult because failures may not come from infrastructure issues alone — they may emerge from orchestration flow, context inconsistencies, or low-quality intermediate outputs generated by downstream agents.

As systems scale, observability becomes just as important as the models themselves.

Final Thoughts

Multi-agent systems represent a major shift in how production AI applications are designed.

The complexity no longer comes primarily from the model itself.

It comes from:

orchestration
coordination
context management
maintaining consistency across agents and workflows

That is why abstractions like MCP matter so much.

They provide the architectural foundation needed to keep AI systems maintainable as workflows, agents, and models continue to evolve.

In many ways, MCP is what transforms AI integrations from experimental prototypes into scalable production systems.

Designing an AI-powered content optimization system using LLMs on AWS

Nagashree Bhat — Wed, 06 May 2026 05:21:20 +0000

Modern applications are no longer just about functionality — they are expected to be intelligent, adaptive, and personalized.

Whether its rewriting a headline, improving product descriptions, or suggesting better UI copy, users increasingly expect systems to assist them in thinking, not just execute tasks.

I recently built a system like this — a GenAI-powered content optimization service for marketing teams. This article draws from that experience while keeping the design generic and broadly applicable.

In this article, we’ll walk through how to design a scalable system that uses large language models(LLMs) to generate high-quality text improvements in real time. More importantly, we’ll focus not just on the model, but on the architecture decisions, tradeoffs, and production challenges that make such a system reliable at scale

The Problem

Imagine a user interacting with a product where they can select a piece of text — a headline, a paragraph, or a short description — and ask the system to improve it.

The system should respond within seconds, offering multiple variations tailored to tone, clarity, or audience. Behind the scenes, this means handling a large number of requests, constructing meaningful prompts, calling an LLM, and returning structured outputs — all while keeping latency low and costs under control.

At small scale, this might seem straightforward. But as usage grows, challenges around consistency, orchestration, and performance start to emerge.

High Level Architecture

At a high level, the system can be viewed as a pipeline with a few key stages: receiving the request, constructing the prompt, generating responses using an LLM, and post-processing the output before returning it to the user.

Instead of a simple request-response system, I model this as a context-driven pipeline where MCP acts as a first-class abstraction between orchestration and model inference.

Keeping these stages loosely coupled is essential for scaling and evolving the system over time.

How the system works

When a user submits a request, it first enters through Amazon API Gateway, which acts as the front door to the system. It handles routing, authentication, and rate limiting, ensuring that incoming traffic is controlled and secure.

From there, the request moves into the orchestration layer, typically powered by AWS Lambda. This is where the system interprets the input, applied business rules, and prepares the prompt for the language model.

Rather than embedding all prompt logic directly inside application code, introducing a clean abstraction for managing context becomes critical as the system grows.

Why a Model Context Protocol(MCP) matters

As AI systems evolve, one of the hardest problems is not calling the model — it’s managing context in a consistent and scalable way.

Prompts are no longer static strings. They are dynamic, structured, and influenced by user input, metadata, and system constraints. Without a clear abstraction, the logic quickly becomes fragmented across the codebase.

A Model Context Protocol(MCP) addresses this by acting as a structured interface between the orchestration layer and the model.

Instead of tightly coupling prompt construction with application logic, MCP standardizes how inputs are built, how context is passed, and how outputs are structured. In practice, the orchestration layer prepares the request, MCP transforms it into a consistent format, and the model consumes it in a predictable way.

This separation significantly improves maintainability. It allows teams to swap models without rewriting business logic, ensures consistent outputs across use cases, and creates a foundation for scaling into more advanced patterns like multi-agent systems.

Most importantly, it turns prompt engineering from scattered logic into a first-class, manageable layer in the architecture.

Model inference and response generation

Once the prompt is constructed, it is sent to the model layer. In a managed AWS setup, this can be handled by Amazon Bedrock, which provides access to multiple foundation model without requiring infrastructure management.

The model generates variations of the input text, which are then passed back to the orchestration layer.

Before returning results to the user, the system performs post-processing. This step ensures that outputs are safe, relevant, and consistently formatted. It also provides an opportunity to enforce constraints and improve overall quality.

To support debugging and continuous improvement, requests and responses can be stored in Amazon DynamoDB. This enables teams to analyze outputs, refine prompts, and track performance over time.

Tradeoffs that shape the system

Designing AI systems is fundamentally about making tradeoffs.

A single-step generation approach is fast and simple, but a multi-step pipeline can produce higher-quality results at the cost of increased latency and complexity.

Model selection introduces another tradeoff. Larger models generally produce better outputs but slower and more expensive, while smaller models offer faster responses with less nuance. The right choice depends on the user experience you want to deliver.

Cost becomes increasingly important at scale. Techniques like caching repeated prompt, limiting request rates, and optimizing prompt size help control expenses without sacrificing quality.

There is also a balance between flexibility and control. More flexible prompts allow for creative outputs but can lead to inconsistency, while structured prompts improve predictability at the expense of variation.

Scaling and Reliability

As the system grows, it must handle increasing traffic without compromising performance.

Serverless components like Lambda scale naturally with demand, making them well-suited for event-driven workloads. At the same time, reliability must be built into every layer.

Caching helps reduce redundant model calls. Parallelizing requests enables the system to generate multiple variations efficiently. Fallback mechanisms ensure that even if the model fails, the system can still return a meaningful response.

Together, these strategies ensure that the system remains responsive and resilient under load.

Safety and Observability

AI systems require strong guardrails to operate safely in production.

Inputs must be validated, and outputs should be filtered to avoid unsafe or irrelevant responses. Prompt constraints further guide the model toward acceptable behavior.

Observability is equally important. Tracking metrics such as latency, error rates, token usage, and cost per request provides visibility into system performance and helps teams make informed improvements.

A practical insight

In real-world systems, the hardest challenges are rarely about the model itself.

They are about designing effective prompts, managing latency, controlling costs, and ensuring consistent outputs across a wide range of inputs.

The surrounding system — not just the model — determines whether the solution succeeds.

Final Thoughts

Building an AI-powered content optimization system is not just about integrating an LLM. It’s about designing a system that can reliably deliver value under real-world constraints.

By separating concerns, introducing structured abstractions like MCP, and carefully balancing tradeoffs, you can build systems that are both intelligent and production-ready.

Closing Insight

As AI systems scale, the complexity doesn’t come from the model — it comes from managing context, consistency, and coordination across the system.

That’s where MCP becomes a true differentiator.

It turns prompt engineering into an architectural layer, enables clean separation between logic and models, and creates a foundation for evolving simple LLM integrations into fully orchestrated, multi-agent systems.

In many ways, MCP is not just an implementation detail — it’s what makes modern AI systems maintainable at scale.