Every B2B company I know has the same inbox problem. Someone fills out a contact form, sends an email, or reaches out on LinkedIn. That message lands in a shared inbox. Then it sits there. Someone reads it eventually, tries to figure out what it's about, who should handle it, and forwards it to the right person. Sometimes that takes an hour. Sometimes a day. Sometimes the message just gets buried and nobody ever responds.
This is not a technology problem. It's a triage problem. And it costs real money because every hour a qualified lead sits unanswered in your inbox is an hour where that lead might decide to go somewhere else.
I got tired of watching this happen at our company. So I spent a weekend building an agent that handles the entire process automatically. No manual reading, no forwarding, no guessing who should deal with it. A request comes in, the agent figures out what it is, researches the company behind it, derives concrete next steps, and routes everything to the right person on the team. The whole thing takes a few seconds.
Here's how I built it and what I learned along the way.
The idea behind the pipeline
The core concept is simple. Instead of one big prompt that tries to do everything at once, the agent runs through four distinct steps. Each step produces structured output that feeds into the next one. That makes the results predictable and easy to debug when something goes off.
Step one is classification. The agent reads the incoming message and determines the category. Is this a sales inquiry, a support request, a partnership proposal, press, or something else entirely? It also assigns a priority level and detects the language. The output is a short summary plus metadata that the rest of the pipeline can work with.
Step two is company research. This is where it gets interesting. The agent takes the company name from the request and runs a web search via Tavily to gather real time information. Company size, industry, what they do, what challenges they probably face. If Tavily is not available it falls back to what the model already knows, which is usually enough for larger companies but obviously less useful for smaller ones. The point is that by the time a human looks at this request, they already have a full brief on who's asking.
Step three takes the classification and the company research and derives concrete action items. Not vague suggestions like "follow up soon" but specific tasks. Something like "prepare a demo focused on their compliance requirements" or "check if our API supports the integration they mentioned." At least three actions per request, each one tied to what we actually know about the sender.
Step four is routing. The agent compares the request against predefined team profiles and decides who should handle it. These profiles live in a simple JSON structure. Here's an example with dummy data to illustrate how it works:
[
{
"id": "cto",
"name": "Max Müller",
"title": "CTO",
"focus": "Technology, architecture, AI implementation, infrastructure",
"handles": [
"Technical implementation questions",
"AI and automation projects",
"System integrations and APIs",
"Technical due diligence"
]
},
{
"id": "ceo",
"name": "Lisa Schmidt",
"title": "CEO",
"focus": "Strategy, partnerships, enterprise deals, company leadership",
"handles": [
"Strategic partnerships",
"Large enterprise inquiries",
"Investors and press",
"Topics that no other role clearly covers"
]
},
{
"id": "cro",
"name": "Tom Weber",
"title": "CRO (Sales & Marketing)",
"focus": "New customer acquisition, proposals, campaigns, lead qualification",
"handles": [
"Pricing inquiries and proposals",
"Product demos and onboarding",
"Marketing partnerships",
"General sales conversations"
]
}
]
The routing step takes the classification and the company research, holds them against each profile, and picks the best match. The assignment comes with a written explanation so the person receiving it immediately understands why it landed on their desk. Adding a new role is literally just appending another object to the array.
The tech behind it
The backend is Python with FastAPI. One single endpoint that takes a JSON body and kicks off the entire pipeline:
POST /inbound
{
"name": "Anna Bauer",
"email": "anna@example.de",
"company": "TechStartup GmbH",
"message": "We're looking for a solution to automate the deployment of our container workloads. Do you have experience with regulated industries?"
}
That's it. Name, email, company, message go in. Classification, company research, action items, and the routing decision come back as structured JSON.
Each step in the pipeline calls Claude via the Anthropic API. I use claude-sonnet-4-6 as the default model because it's fast enough for real time processing and smart enough to produce useful structured output. Every call gets a specific system prompt that constrains the output format to JSON. That's important because the steps need to feed into each other cleanly.
The web search runs through the Tavily SDK. It's optional, meaning the agent works without it, but the company research is noticeably better with live search results. Tavily returns structured web data that the model can reason about, which is more reliable than asking the model to just guess.
In production this doesn't need a frontend at all. The structured JSON response is designed to be pushed directly into whatever tool your team already uses. A ticket system, Notion, Slack, a CRM. The agent does the thinking, the target system does the presenting.
To give you a sense of what that structured output actually looks like, here's the classification step. The function takes the raw request data, builds a prompt, and tells Claude to classify the inquiry and respond exclusively in valid JSON. Claude reads the free text, picks up on implicit signals, and returns something like this:
{
"category": "sales",
"urgency": "medium",
"summary": "TechStartup GmbH is looking for automated container deployment with a focus on regulated industries.",
"language": "en"
}
No keyword matching, no rule engine, no if "price" in message logic. The model figures out from context that someone looking for a solution implies a sales inquiry, that mentioning regulated industries signals compliance awareness, and that a VP of Strategy reaching out about a long term collaboration is a partnership rather than a sales lead. That's the whole point of using an LLM here instead of writing classification rules by hand.
Each subsequent step follows the same pattern. Build a focused prompt, get structured JSON back, pass it forward. There's no framework, no agent SDK, no complex orchestration layer. Just sequential async functions that each do one thing well.
What I learned building this
The biggest lesson was that splitting the work into separate steps makes the output dramatically better than trying to do everything in a single prompt. When you ask a model to classify, research, plan, and route all at once, the quality degrades across the board. When you give it one job at a time with clear constraints, each output is significantly more useful.
The second lesson was about structured output. Making the model return JSON with a defined schema for every step means the pipeline is deterministic in structure even if the content varies. That's what makes it possible to actually integrate this into other systems. You could take the JSON response and post it straight to a CRM, send it as a Slack message, or store it in a database without any parsing or transformation.
The third thing I noticed is how much value the company research step adds. Without it the routing still works, but the action items are generic. With it they become specific enough that the person receiving the request can actually act on them immediately. That's the difference between "follow up with the lead" and "schedule a call focused on their migration from on-prem to cloud, they're a 50 person manufacturing company dealing with NIS2 compliance deadlines."
Where this goes next
Right now the agent processes requests that I send manually to the API. That's fine for testing but obviously not how this should work in production.
The first extension I'm planning is connecting the agent to Outlook mailboxes and web forms directly. The idea is that the agent polls incoming messages on a schedule, processes everything new, and pushes the structured results into Notion via their API. No human in the loop for the triage part. The right person on the team just opens Notion and sees a fully prepared brief waiting for them, complete with company context, priority, and suggested actions.
The second and more ambitious extension is a feedback loop. Right now the agent suggests what to do but it never sees what actually happens. The plan is to feed the actual responses back into the system so the agent can learn from them over time. Which types of requests get answered with which template. What tone works for which industry. How urgency correlates with actual conversion. Once that loop is running the agent could start drafting responses on its own. Not sending them automatically at first, but preparing a draft that a human reviews and approves. And eventually, for certain categories where the pattern is clear enough, the agent could handle the entire response autonomously.
That's the long term vision. An agent that doesn't just sort your inbox but actually handles a meaningful chunk of your inbound communication. Not because it replaces the team but because it handles the repetitive 80% so the team can focus on the conversations that actually need a human.
What's your take?
I'm curious what others are building in this space. Are you automating inbound workflows? Using agents for internal routing or triage? What would you do differently in the pipeline, and what's the biggest challenge you've run into with structured LLM output in production?
Would love to hear your ideas in the comments.
Top comments (0)