Author: Maria Shimkovska
If you came to our London tech event, you saw me walk through this as a live demo. A few people asked if I could write it up, so here it is. Same demo, but something you can clone, run, and poke at yourself, and see how you can take some of your own business processes and build them into an agentic system like this one. Keep in mind this is just a demo so the goal here is to show you how you can build a production agentic system and how you can add orchestration to overlook everything.
You can grab the code here, where I also cover setup in more detail.
Quick context before we dig in. An "agent" in this post means an AI model that can use tools and make judgment calls on its own, not just answer a question. A "Customer 360" is a complete picture of a customer pulled from every system where their data lives, like billing, support, and product usage. The goal of the demo is to show how agents can assemble that picture and decide what to do about it.
Getting it running
The whole thing is designed to go from clone to running UI in about a minute.
Clone the repo, then copy .env.example to .env at the repo root and fill in your Orkes credentials and OpenAI key. That's the only configuration you need.
Then start the stack with one command:
./start_demo.sh
That's genuinely it. The script boots the Agentspan server, waits for it to be ready, sets your API credentials, spins up the Conductor workers, the Express backend, and the React frontend, all in one go. If you already have an Agentspan server running from a previous session, it'll restart it cleanly. Logs for each component go to the logs/ folder if you need to debug. Hit Ctrl+C to stop everything.
Then open http://localhost:5173 in your browser.
The UI is honestly the smallest part of this. The interesting pieces are Conductor and Agentspan, but I wanted a full end-to-end flow so you can see how everything connects.
What happens when you hit Run
The UI is honestly the smallest part of this. The interesting pieces are Conductor and Agentspan, but I wanted a full end-to-end flow so you can see how everything connects.
-
Pick a scenario in the UI. There are three, each designed to exercise different branches of the system:
- John Doe, an at-risk existing customer
- Marcus Webb, a watchlist case whose usage is softening but isn't yet critical
- Marina Petrova, a brand new customer the system has never seen
- Click Run. The frontend calls the Express backend, which starts the Conductor workflow on Orkes.
- Workers pick up each task and run the agents via Agentspan.
- The UI polls every 500ms and shows progress as each step completes.
- Final output appears when the workflow finishes.
That's the user-facing loop. Before we dig into the agents themselves, it's worth zooming out to see how the pieces underneath fit together, because the architecture is doing a lot of the heavy lifting.
The architecture, piece by piece
Before we look at the agents individually, it helps to zoom out and see the whole system on one page. Here's what the pipeline actually looks like:
Incoming event
│
▼
┌─────────────────┐
│ Identity Agent │ Works out who the event belongs to
└────────┬────────┘
│
▼
Is this a new customer?
│
┌────┴────┐
│ Yes │ No
▼ ▼
┌──────────┐ ┌───────────────┐ ┌────────────────┐
│Onboarding│ │ Health Agent │──▶│ Strategy Agent │
│ Agent │ └───────────────┘ └────────────────┘
└──────────┘
A new customer gets routed to Onboarding. An existing customer goes through Health, then Strategy. Every agent receives everything the previous agents produced, so by the end you have one combined payload covering identity, health, and the recommended next action.
Now the systems that make that happen.
The three main systems
There are three moving parts: Conductor, Agentspan, and the agents themselves. Each does a distinct job, and they work independently of each other, which is the point.
Conductor is the coordinator
This is essentially the project manager for the whole system. It owns the workflow definition: what runs, in what order, and what happens at each fork in the road. When you click run in the UI, the Express backend tells Orkes (the hosted version of Conductor) to start a new execution of the customer_360_refresh workflow.
From that point, Conductor is in charge. It queues up the first task, waits for a worker to pick it up, receives the result, and decides what comes next. It handles retries if something fails, tracks state across every step, and enforces the routing logic.
For example, it uses a branching step to send new customers down the onboarding path and existing customers down the health and strategy path. Conductor doesn't know or care what the agents are doing inside each task. It just moves data through the pipeline.
Agentspan is where the agents actually run
It runs as a local server on port 6767 and is what executes the AI model calls. Each agent is registered there with its model, its tools, its instructions, and its safety checks.
When a worker needs to run the health agent, it calls Agentspan with the input. Agentspan handles the back and forth with the model, including tool calls, retries when a safety check fails, and making sure the output matches the expected format.
If Conductor is the nervous system connecting everything, Agentspan is the brain doing the actual thinking.
The workers are the bridge between the two
They're Python processes that keep asking Conductor, "do you have any tasks for me?" When Conductor hands one off, the worker unpacks the input, calls the right Agentspan agent, and posts the result back to Conductor.
The workers reach out to Conductor rather than Conductor pushing work to them, which means you can run as many workers as you want and they'll never step on each other.
The agents
The agents sit at the end of this chain, and this is where the reasoning actually happens. Each one is scoped to a single responsibility:
- Identity works out who the incoming event belongs to
- Health combines signals from four systems into a score and a risk summary
- Strategy decides the single most important next action
- Onboarding runs only for brand new customers, to kick off the welcome process
Each agent receives the accumulated output of every step before it, adds its own section, and passes the whole thing forward. By the time the workflow completes, you have one unified payload covering identity, health, and recommended action, assembled piece by piece as it moved through the pipeline. (We'll dig into each agent individually in the next section.)
The supporting pieces
Three systems do the orchestration and the thinking, but a few other parts of the repo keep the whole thing honest.
Data stores live in /data. customer_store.py is the identity graph: every known customer and all the different IDs they have across source systems (so an event from Salesforce with a contact ID can be traced back to the same person in Zendesk, Stripe, and so on). health_store.py holds the signals the Health Agent needs, like product usage, support tickets, billing events, and engagement history, plus the playbooks that match each health status. scenario_inputs.py is just sample data for the three demo scenarios. In a real system these would be connections to your live databases; for a demo they're self-contained Python files you can read and change.
Guardrails (in guardrails.py) are safety checks that run on every agent's inputs and outputs. They're deterministic code, meaning they always run the same way regardless of what the AI model decides, and they sit at the boundary of each agent to catch things the model shouldn't be trusted with. A few examples:
-
validate_input_recordchecks that an incoming event has the required fields and comes from a known source system -
no_prompt_injectionblocks attempts to smuggle instructions into user-supplied text fields -
conservative_identity_matchflags suspicious combinations, like aNO_MATCHresult paired with a high confidence score, for a human to review - no_pii_in_output blocks patterns like social security numbers or credit card numbers from appearing in any agent's output
These exist because AI models are good at reasoning but bad at being reliably boring. The guardrails handle the boring, must-not-fail parts so the agents don't have to.
The UI (in /demo-ui) has two halves. The frontend is a React app on port 5173 with the three scenario buttons, a step-by-step progress view, and a results panel. The backend is a small Express API on port 3001 that kicks off workflow executions and proxies the status polling to Orkes. The UI is genuinely the least interesting part of the system, but it gives you a way to see what's happening. The pipeline runs the same way whether the UI is open or not.
With all of this connection clear, let's get into why each of those four steps is an agent and not just a regular function, because this is another huge part of building agentic systems.
Why every agent is actually an agent in this example
It's tempting, when you're building something like this, to let "agent" become a label you slap on any AI model call. I've tried to be strict about it here. Each of the four components below earns the name because there's real judgment involved that you can't cleanly reduce to code that always follows the same rules.
Identity Agent
What it does: Takes a raw event from any source system (like Salesforce, Zendesk, Stripe, and so on) and decides whether it belongs to a known customer of this company.
Why it has to be an agent: Matching people is inherently messy. The same person shows up as j.doe@acme.com in one system and John Doe / Acme, Inc. in another. A rules engine can calculate similarity scores, and ours does, but it can't reason about whether a 0.78 score with a shared team email like billing@ is actually the account rather than a specific person, or whether two candidates with similar names at the same company are the same human or two colleagues.
The agent's real job is the judgment call in the gray zone: MATCH, UNCERTAIN, or NO_MATCH. It has to weigh conflicting signals, apply the conservative matching rule ("false merges are worse than missed ones"), and decide when to escalate to a human reviewer. That reasoning step, given all of this, what's the right call and why, is where an AI model earns its place over code in this example.
Health Agent
What it does: Pulls signals from four separate systems (usage, support, billing, and customer records), combines them into a score, and surfaces risks and opportunities.
Why it has to be an agent: The scoring logic itself (calculate_health_score) is fixed, meaning the same inputs always produce the same number. That's intentional. You want a reproducible score. But the agent earns its place in the steps before and after.
Before scoring: it has to decide which customer ID to use. A person record arrives, but their health data lives on the account. The agent has to navigate that relationship, call the right tools, and pass the right data to calculate_health_score. A hardcoded pipeline would break the moment the data model shifts.
After scoring: it has to interpret the outputs in context and produce a human-readable summary. "Product usage declined 38.2% over the last 30 days" combined with "2 escalated tickets" combined with "renewal in 21 days" tells a story that's more than the sum of its parts. The agent connects those dots into a coherent risk narrative rather than just spitting out a list of triggered rules.
Strategy Agent
What it does: Reads the identity and health output and decides the single most important next action.
Why it has to be an agent: This is the most agent-like of the four. Our prioritize_customer_action tool has a priority order built in (escalations beat renewal risk, renewal risk beats usage decline), but that order is static. Real accounts don't fit cleanly into one bucket. Marcus Webb (WATCHLIST) has usage decline and stale engagement and a ticket backlog. None of those trigger the highest-priority rules on their own, but together they tell a different story.
The agent has to weigh which combination of signals matters most for this specific customer, pull the right playbook, decide whether to create a task or trigger outreach or both, and write the summary in language a customer success manager can act on. That synthesis, turning context into a specific, personalized recommendation with reasoning attached, is what separates it from a simple decision tree.
Onboarding Agent
What it does: For brand new customers only. It creates a kickoff task, builds a 30-day plan, and triggers a welcome sequence.
Why it has to be an agent: This one is the most tool-like of the four; the tools are largely static templates right now. But it still earns the "agent" label for two reasons.
First, routing. It only runs when action_taken == "created". That condition is checked by the workflow router, but the agent still has to confirm it's in the right context before acting, and gracefully handle edge cases like a missing email, an unknown role, or no customer success manager assigned yet.
Second, personalization. build_onboarding_plan returns the same four-week template for everyone today, but an agent can adapt it. A VP of Engineering gets different week-3 actions than a Head of Operations. As the tools get richer, the agent can tailor the plan to the customer's role, company size, and plan tier without anyone having to hardcode every combination.
Wrapping up
The thread running through all four: the parts that should stay consistent stay consistent, and the agents sit around them doing the reasoning work that brittle code can't. Scoring is a function. Priority ordering is a lookup. Matching thresholds are numbers in a config. What the agents handle is everything in between: deciding which tool to call, how to interpret the output, when the rules don't fit the situation, and how to narrate the result in a way a human can actually use.
Top comments (0)