marcosomma

Posted on Dec 8

How to Design Two Practical Orchestration Loops for LLM Agents

#ai #python #machinelearning #rag

Building a useful AI assistant is no longer about a single clever prompt.

Once you have tools, memory, and multiple agents, you need an orchestrator.

In my own work (expecially with OrKa-reasoning experiments) I eventually converged on two simple orchestration loops that cover most real use cases:

A linear loop for step by step analysis and context extraction.
A circular streaming loop for voice and live chat, where background agents enrich context in real time.

This guide explains why you need both, when to use each one, and how to design them in any stack or framework.

You can think of this as a blueprint that you can map to your own code, whether you use OrKa, LangChain, your own custom orchestrator, or plain queues and workers.

1. The three layers you should always separate

Before loops, define your layers. This makes every diagram, API and code path clearer.

1. Execution layer

Agents and responders live here.
"Agent" means any unit that does work: a model call, a tool, a heuristic function, a router.
"Responder" is the agent that produces the final user facing output for a turn or a session.

2. Communication layer

How agents talk to each other and to the orchestrator.
Examples: queues, events, internal RPC calls, function callbacks.
You rarely want agents to call each other directly. Route everything through this layer so you can trace and control it.

3. Memory layer

Where you store and retrieve state across time.
Can be a vector store, a key value store, a database, or a log.
It should not be "hidden in the prompt". Treat memory as its own component.

4. Time as a first class dimension

Both loops treat time explicitly:

In the linear loop you have discrete steps: T0, T1, T2, T3.
In the circular loop you have a continuous stream while the conversation is active.

Once you have these pieces, you can design the two orchestration patterns.

2. Loop 1: Linear orchestrator for context extraction and analysis

The first pattern is a linear pipeline. Think of it as a conveyor belt for understanding.

2.1 When to use the linear loop

Use it when:

You have a fixed input (text, transcript, document, set of logs).
You want to run several analytic passes over it.
Latency is important but not sub second interactive.
Output is usually a summary, a report, a classification, or structured data.

Good examples:

Conversation analysis after a call has ended.
Extracting entities and topics from chat logs.
Multi stage document processing (OCR, cleaning, classification, summarization).
Offline quality checks for previous sessions.

2.2 Mental model

Picture a horizontal diagram:

Left: an INPUT arrow.
Right: a Responder that produces the final structured output.
In between: time steps T0 to Tn.
Each time slice has:
- one or more agents in the Execution layer
- a Communication band in the middle
- a Memory band at the top

At each step, agents:

may retrieve from memory
may store new facts or summaries back into memory

The orchestrator walks through these steps one by one.

2.3 Step by step design

You can design a linear workflow in five steps.

Step 1: Define the final output

Decide what the responder will produce. Some examples:

JSON with fields like intent, sentiment, entities, summary.
A human readable report that you will send to a dashboard.
Labels and scores that feed another system.

Write this down early. Every other agent should exist to help this responder succeed.

Step 2: Split the job into stages

Ask yourself:

What must be known first so that later steps can reuse it?
What can be done independently?

For example, for conversation analysis:

Normalization and language detection.
Entity extraction (names, account ids, products).
Topic and intent detection.
Sentiment and escalation risk.
Final summary and suggestions.

Each stage becomes a time slice with one or more agents.

Step 3: Design the memory schema

For each stage, list:

What the agent reads from memory.
What the agent writes back.

A very simple schema might be:

{
  "language": "en",
  "entities": {...},
  "topics": [...],
  "sentiment": {...},
  "summary": "..."
}

You can also scope memory by:

session_id
user_id
time_window (for rolling analysis)

The key rule: agents should not depend on hidden context inside prompts. The orchestrator passes them a clean input and a structured slice of memory.

Step 4: Wire store and retrieve

For each agent, specify two small functions:

read(memory) -> context
write(memory, result) -> memory

In code it can look like this:

for step in steps:
    # 1. Load what this step needs
    ctx = step.read(memory)

    # 2. Run the agent with input and context
    result = step.agent.run(raw_input, ctx)

    # 3. Write new facts
    memory = step.write(memory, result)

Note the use of may store and may retrieve. Some steps will only write, some will only read.

Step 5: Implement the responder as the last step

The responder is just another agent with a special role:

It reads everything it needs from memory.
It produces the final answer.
It may log additional metadata back to memory.

In many stacks this is a single chat completion call that uses:

The original input.
The outputs of previous analytic agents.
Any long term user or session memory you decide to attach.

2.4 Example: conversation analysis pipeline

Imagine you want to analyze support chats after they end.

You can define:

LanguageDetectorAgent
- Reads: raw transcript
- Writes: memory["language"]
EntityExtractorAgent
- Reads: transcript, language
- Writes: memory["entities"]
TopicClassifierAgent
- Reads: transcript, entities
- Writes: memory["topics"]
SentimentAgent
- Reads: transcript
- Writes: memory["sentiment"]
SummaryResponder
- Reads: transcript, entities, topics, sentiment
- Writes: final human readable summary and a JSON record.

This maps perfectly to the linear diagram and is easy to debug step by step.

3. Loop 2: Circular streaming orchestrator for live chat and voice

The second pattern appears once you move from offline analysis to live interaction.

With voice or interactive chat, you want to:

React quickly while the user is still speaking or typing.
Run several background analyses in parallel.
Avoid sending the full transcript to every agent on every turn.

The circular loop pattern is built for that.

3.1 When to use the circular loop

Use it when:

You stream audio or tokens in and out.
You have a central "assistant" that talks to the user.
You also want background agents that detect things like:
- sentiment shifts
- safety or compliance issues
- intent changes
- entities that should update a CRM
- interesting moments to bookmark

Think of a voice assistant, a real time meeting copilot, or a smart chatbot with live tools.

3.2 Mental model

Picture a circular diagram with concentric rings.

From center to outside:

Responder in the middle.
Main Execution ring around it.
Communication ring.
Memory ring.
Agents Execution ring at the outside.
An outer Time band that wraps around everything.

Input and output are green arrows that cross all rings. Time flows along the outer band as a stream of chunks or tokens.

Key idea:

The responder loop processes the conversation in real time.
Outer agents run in parallel, watch the same stream, and provide context through memory.

3.3 Step by step design

Step 1: Define the central responder loop

Your responder is the "voice" of the system.

Define:

How it receives input chunks.
How it produces output chunks.
How often it reads from memory.

For example:

while session_active:
    chunk = read_input_chunk()          # text or audio tokens
    context = memory.read_recent(...)   # signals from context agents
    reply_chunk = responder(chunk, context)
    write_output_chunk(reply_chunk)

You can implement responder as:

One LLM call with a rolling window.
A chain of small agents that produce tokens.
A hybrid of LLM plus rule based logic.

The key is that this loop does not own all the work. It asks memory for extra signals that the outer agents have produced.

Step 2: Identify which signals can live in outer agents

Ask yourself:

What information would help the responder, but does not need to be computed inside its main prompt every time?

Examples:

Current sentiment and its trend over the last N seconds.
Detected entities and slots like {customer_name}, {product}, {order_id}.
Safety flags with severity scores.
Topics that have been discussed so far.
Next best actions suggested for the human operator.

Each of these can be produced by one or more context agents on the outer ring.

Step 3: Design the memory schema for streaming

Memory in streaming systems often has:

A rolling part (last N seconds or tokens).
A session part (facts that are true for the whole session).
A global or user part (long term facts across sessions).

For example:

{
  "rolling": {
    "recent_sentiment": [...],
    "recent_topics": [...]
  },
  "session": {
    "customer_name": "...",
    "current_ticket_id": "...",
    "has_accepted_terms": true
  },
  "user": {
    "lifetime_value_segment": "gold",
    "preferred_language": "en"
  }
}

Outer agents usually:

Read the rolling slice plus some session context.
Write updated signals back, possibly aggregating multiple chunks.

The responder:

Reads what it needs from all three scopes.

Step 4: Wire context agents around the stream

Each context agent has a simple shape:

def context_agent_loop():
    while session_active:
        chunk = read_input_chunk()
        mem_view = memory.read_scope("rolling", "session")
        signal = run_agent_logic(chunk, mem_view)
        memory.write_signal(agent_id, signal)

Implementation tips:

You do not need every agent to inspect every chunk. Some can run at a lower frequency, for example every N seconds.
Use queues or topics per agent so the orchestrator can control resource usage.
Tag signals with timestamps so the responder can select only fresh ones.

Step 5: Let the responder consume context selectively

Inside the responder, treat signals from context agents as hints, not as gospel.

For example, the prompt can say:

You receive input from the user and a set of context signals created by other agents.

Each signal has a name and a confidence.

Use them as hints to guide your reply, but prefer the actual user message when signals look inconsistent.

That way your outer ring can fail safely without breaking the core interaction.

3.4 Example: voice support assistant

You can combine these ideas into a simple design.

Outer agents:

ASRAgent (if you handle raw audio)
- Converts audio into text chunks.
- Writes into rolling.transcript.
SentimentWatcherAgent
- Reads recent transcript.
- Writes a rolling sentiment score and trend.
EntityTrackerAgent
- Extracts order ids, product names, locations.
- Writes them into session.entities.
ComplianceAgent
- Watches for forbidden phrases.
- Writes risk flags into rolling.compliance.

Central responder:

Reads the current user utterance and:
- latest sentiment
- recognized entities
- any active compliance flags
Generates the next reply chunk in real time.

All of this happens while the user is talking, without sending the full raw transcript to every agent at every step.

4. How to choose between linear and circular

Here is a practical checklist.

Use the linear orchestrator if:

Input is fixed and finite.
You can afford to wait for all stages to finish before replying.
Main goal is analysis, extraction, or offline insight.
You want reproducible deterministic workflows.

Use the circular streaming orchestrator if:

You must keep latency low while a conversation is ongoing.
You need long running observers that enrich context.
You want to separate the "voice" of the system from its background intelligence.
You treat the session as an ongoing process rather than as isolated turns.

Many products actually need both:

Circular loop during the live session.
Linear loop right after the session to produce deeper analysis and training data.

If you keep the three layers and the time dimension clear in your head, switching between both becomes straightforward.

5. Practical tips and pitfalls

5.1 Keep memory explicit and queryable

Avoid hiding crucial state in the prompt history.
Use structured memory objects and explicit read/write functions.
Log memory changes so you can replay and debug sessions.

5.2 Make agents idempotent and composable

Wherever possible, design agents so that running them twice on the same input produces the same result.
This helps with retries and with mixing them in different workflows.

5.3 Watch cost and latency separately

In linear flows you usually pay in total cost and overall latency.
In circular flows you pay in per chunk latency and in steady state cost.
Monitor both, and be ready to move some work from inner to outer loop or vice versa.

5.4 Use diagrams as living documentation

The two diagrams that inspired this guide are simple:

A horizontal banded diagram for the linear loop.
A circular banded diagram for the streaming loop.

Keep them close to your code:

In a docs/ folder.
In your orchestrator repository README.
Even inside your OrKa or other YAML definitions as comments.

They help new contributors answer the question:

"Where does this agent live, and which loop is it part of?"

6. Light touch: how OrKa fits in

In my own project, OrKA-reasoning, I encode both loops as YAML workflows and use an orchestrator runtime to execute them. The diagrams here are direct visualizations of those flows.

You do not need OrKa to benefit from this guide, though.

The key ideas are independent:

Separate execution, communication, and memory.
Treat time explicitly.
Use two simple loops instead of one giant graph.

Once you think in these terms, you can map them to any framework or stack you like.

7. Next steps

To apply this guide in your own project:

Pick one use case that feels messy today.
Decide if it is primarily analytic or live interactive.
Draw either the linear or the circular diagram for it.
List agents, memory fields, and store/retrieve rules.
Implement the orchestrator loop in your existing toolchain.
Add one or two context agents on the side, and see how much simpler the main responder becomes.

You will notice that many problems which felt like "prompt engineering" issues were actually orchestration issues all along.

Once you solve those at the architecture level, prompts become smaller, agents become clearer, and the overall system is easier to reason about and to evolve.

DEV Community