DEV Community: Hello Insurance

Future of Agentic Underwriting Workbench

Hello Insurance — Sun, 03 Aug 2025 23:52:46 +0000

We didn’t start by asking how to add agents to underwriting. We asked how underwriting should work in an agentic world, and built from there.

After months of quiet iteration, we're opening the doors to what we've been building:
The Agentic Underwriter Workbench a modern, extensible platform designed from first principles to support experimentation, scale, and AI-native underwriting.

Demo at Future of Agentic Underwriting

Why We Built This

Too many GenAI projects start with the tech and retrofit the workflow.

We did the opposite. We mapped how underwriting should work if built today with connected agents, schema-first design, flexible product modeling, and an interface that treats intelligence as a teammate, not a sidecar.

This is the foundation.

What Makes It Different

Here’s a preview of the architectural highlights:

Cortex: Pluggable AI Engine

Built in FastAPI with swappable LLMs (OpenAI, Anthropic, Gemini, Phi) via config, Cortex is designed for goal-based, multi-agent orchestration and modular tool invocation via JSON-RPC.

Schema-Driven Dynamic Forms

Using rjsf + shared AJV-validated datacontracts, we render multi-product forms (Life, Auto, Renters...) from JSON Schema. The data capture form for product lines are defined at the schema level to avoid costly UI rewrites needed for new product lines.

Activity Stream: A Living Timeline

Think GitHub meets audit logs. Built with Zustand, Fuse.js, and UI card components, the stream captures all events (ratings, comments, file uploads) in a searchable, real-time feed.

Agentic UI

The assistant panel is persistent, context-aware, file-friendly, and can run in recommend or autonomous mode, just like a teammate. It integrates deeply with the rest of the workflow, not bolted onto the side.

Design Language Inspired by NotebookLM

We were inspired by what made Google’s NotebookLM intuitive and borrowed its principles:

Left nav for queue context
Central workspace for focus
Right assistant panel for support

What's Next

This post lays the groundwork. In future installments, we’ll go deeper into:

Cortex’s evolving multi-agent orchestration engine
The JSON-RPC + MCP architecture behind tool invocation
UI design trade-offs around schema-first workflows
Creating agent memory and semantic retrieval layers
Embedding underwriting intelligence into every pixel of the platform

We’re building this in the open.

If you're working on underwriting, claims automation, agentic workflows, or GenAI-driven UIs—we’d love to connect.

Built by a folks that’s lived the technology pain in insurance, and dont want to duct-tape AI to legacy systems.

Feel free to check us out at http://helloinsurance.substack.com

Growing the Tree: Multi-Agent LLMs Meet RAG, Vector Search, and Goal-Oriented Thinking - Part 2

Hello Insurance — Sun, 04 May 2025 15:14:27 +0000

Simulating Better Decision-Making in Insurance and Care Management Through RAG

In the Part 1 post, I walked through how I built a CLI that runs multi-agent conversations in a social media (Reddit-inspired) conversation thread style. Each persona responds, builds off the others, and together they simulate a deeper discussion. It worked, but it had limits. The agents lacked memory. They didn't have direction. And they couldn’t access supporting documents. As I said in the previous post, I made some updates to incorporate improvements.

This post covers the updates I have made:

Added a goal round.
Each agent can now reference external files (support for URLs coming next).
Integrated Qdrant and OpenAI embeddings for retrieval-augmented generation (RAG).
Started scoring how well responses relate to retrieved context.
Showcasing the decision goal round with some real business use cases (Auto Insurance Quote and developing a Care Plan).

More Flexible Persona Design

In our earlier version, personas were defined directly inside the CLI using --persona flags. That worked, but it got messy quickly. With this update, I have moved persona definitions into a standalone personas.json file.

This change gave us a lot more flexibility:

Unlimited(within reason) personas.
Each persona can now define a regular_prompt, goal_prompt, and ref_files.
We can assign unique files for each persona to ground their responses.
It keeps the CLI clean while making persona behaviors extensible.

I have also introduced a schema file (persona.schema.json) to validate each persona object. This schema ensures that required fields are present, and that file references are structured consistently. That structure not only helps us catch errors early, but also gives us a stable contract to build tooling, validations, and even a visual persona builder down the line.

This design opens up new paths for evolving the system: dynamic persona loading, fine-tuned goal routing, optimized routes for each persona, or even persona specialization per domain.

[
  {
    "name": "Surgeon",
    "llm": "ChatGPT",
    "model": "gpt-3.5-turbo",
    "engagement": 1,
    "references": [
      { "type": "file", "value": "./ref-files/caremgmt-hip/hip-replacement-recovery.md" },
      { "type": "file", "value": "./ref-files/caremgmt-hip/patient-background.md" }
    ],
    "regular_prompt": "You are the orthopedic surgeon who performed the hip replacement. Offer clinical insights on the patient's recovery trajectory, highlight any red flags to watch for, and ensure the physical milestones are on track.",
    "goal_prompt": "Summarize your final post-operative assessment for Mrs. Carter's hip replacement recovery. Highlight red flags, clearance criteria for outpatient PT, and any clinical restrictions that must be followed."
  },
  {
    "name": "Care Manager",
    "llm": "ChatGPT",
    "model": "gpt-3.5-turbo",
    "engagement": 1,
    "references": [
      {   "type": "vector:qdrant",
      "value": "collection=care_guidelines,product=care" },
      { "type": "file", "value": "./ref-files/caremgmt-hip/patient-background.md" }
    ],
    "regular_prompt": "You are a care management specialist assigned to Mrs. Carter. Evaluate discharge readiness, ensure safe transitions, and recommend support services such as PT, home health, or equipment based on her environment and needs.",
    "goal_prompt": "Submit a structured care coordination plan for Mrs. Carter. Include: (1) home health referrals, (2) safety enhancements, (3) follow-up schedule, and (4) caregiver instructions."
  },
  {
    "name": "Michael (Son)",
    "llm": "ChatGPT",
    "model": "gpt-3.5-turbo",
    "engagement": 1,
    "references": [
      { "type": "file", "value": "./ref-files/caremgmt-hip/patient-background.md" }
    ],
    "regular_prompt": "You are Michael, Mrs. Carter’s son. Express family concerns, ask questions about her safety and comfort, and advocate for what would help her most during recovery at home.",
    "goal_prompt": "List your top 2–3 concerns for your mother's recovery and what kind of help you hope the care team can provide. Be specific about her home environment and daily challenges."
  }
]

Adding a Goal Round

Conversations need direction. Without it, threads drift.

I have introduced a goal round to ground the discussion. It's simple: before any agents speak, the system sets a clear goal for the conversation. That goal gets injected into the first message and referenced in later turns.

The system now supports structured goal types like decision, summary, consensus, reflection, and rebuttal. These give the conversation a clear intent for its final round. In this update, I have implemented the decision goal, where each agent weighs the discussion and makes a call. This structure helps tie the conversation together and surface final judgments in a consistent, directed way.

This change alone made the threads feel more cohesive. Instead of rambling or contradicting each other, the agents start circling the same target.

File Support for Personas

Next, I gave agents some superpowers—aka access to external files. Now, each persona can have a references field that lists one or more local files. Those files get summarized and added into that agent's system prompt.

In this iteration, it's straightforward. The files are short and injected as-is. At this point, I haven’t implemented chunking or RAG. But even with basic support, it unlocked new use cases:

Underwriters referencing a rate table
Care planners reading recovery guidelines

The point is: agents can now ground their reasoning over real documents, not just a one-shot prompt.

Combined with the new persona file format, this setup lets each agent control its own references independently. We can fine-tune prompts and contextual grounding per persona, which is critical for simulating real-world expertise.

And as we evolve the system—adding chunked documents, URL-based sources, or dynamic embedding refreshes—this design scales naturally without breaking the prompt pipeline.

Retrieval-Augmented Generation (RAG) with Qdrant

Once file support worked, I took the next iteration further. Instead of shoving full files into prompts, I started embedding content and retrieving only what’s relevant.

This can be broken down into three stages

Tool Selection: Embedding and Vector DB

Each chunk is embedded using OpenAI’s text-embedding-3-small. I evaluated a few options:

To keep things simple and reduce friction, I went with OpenAI. I am already using OpenAI APIs, so this choice aligned with my stack. The embeddings are fast, cost-effective, and accurate enough for what I needed. This let me move faster without spinning up additional infrastructure.

As for the vector store, here’s a quick comparison of Open Source self-hosting options:

I ultimately chose Qdrant. It hit the right balance for my needs: easy to host and integrate, flexible enough for semantic search, and well-documented. It supports multiple embedding types (I plan to use video/image embedding in future) and matched my goal of keeping iteration speed high without sacrificing performance.

Qdrant Setup: Embedding and Upload

To embed and load documents, I run a containerized process that uses our upload_to_qdrant.py script.

Example command:

docker run --rm \
  --add-host=host.docker.internal:host-gateway \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -e QDRANT_HOST=host.docker.internal \
  -v "${PWD}/vector-setup:/app" \
  multillm-tot-vector-setup \
  python /app/upload_to_qdrant.py \
    --folder /app/caremgmt-hip \
    --manifest /app/hello-care-guidelines.json \
    --collection care_guidelines

This script reads a manifest file, embeds Markdown documents, and uploads them to a named Qdrant collection with all metadata (title, tags, filename, etc.).

Retrieval Flow During Conversation

When a persona is about to respond, we embed the current message (and optionally, the goal).
We query Qdrant to retrieve the top-matching chunks.
Retrieved chunks are deduplicated by filename and section title to avoid redundant context.
The final set of unique chunks is injected into the agent's system prompt.

This gives us tight, focused prompts that still bring in meaningful external knowledge, without blowing past token limits.

This setup let us scale cleanly from static file injection to dynamic retrieval across large corpora.

RAG Beyond Vector Search

It’s worth noting that while RAG often gets reduced to “just vector search,” a full retrieval-augmented system touches multiple layers of infrastructure:

Vector DB — for semantic similarity search (e.g., Qdrant)
Search DB — for keyword/hybrid retrieval (e.g., Elasticsearch)
Document DB — for managing source documents, versions, and metadata
Graph DB — for modeling relationships and entity linkages; believed to be a key frontier for improving reasoning and recall

RAG doesn’t have to use all of these at once, but thoughtful integration of more than one often leads to better relevance, traceability, and depth. As these systems matures, this kind of architecture is something we should keep in mind.

Introducing a RAG Score

To understand how well each agent is using the context, I started experimenting with a RAG score. It's not fancy. Right now, I am just checking semantic similarity between the agent's response and the retrieved chunks.

The idea is to track alignment. Are they using what was retrieved? Are they staying on topic? This score isn’t exposed to the agents, but it's useful for me as a debugging and tuning tool.

Down the line, this could be used to drive filtering, voting, or reward signals.

Real-World Use Cases

I tested all this on two business use cases. Both were grounded in actual vector-embedded files that agents used to retrieve context. The inputs were curated, and the outputs were near-perfect. The real world is never that simple. Creating the knowledge base itself is a long, arduous task. And that’s where most of the work lies. Fortunately, with LLMs, it's now easier than ever to build and enrich that knowledge base.

1️⃣Auto Insurance Underwriting

Prompt

You are reviewing an auto insurance application. Make a timely decision on whether to approve with terms or decline coverage.

CLI Command

python main.py \
  --prompt "You are reviewing an auto insurance application. Make a timely decision on whether to approve with terms or decline coverage." \
  --rounds 2 \
  --personas-file './input/underwriting-auto/insurance-personas.json' \
  --output html \
  --save-to './output/underwriting-auto/submission-discussion.html'
  --goal-round decision

I ran a conversation where agents assessed risk for an auto insurance policy. Three agents participated: Insurance Agent, Underwriter, and Actuary.

The Insurance Agent had access to the submission application (submission-application.md).
The Actuary used a static rate table (underwriting-rate-table.md) and underwriting guidelines (underwriting-guidelines.md).
The Underwriter had access to all of the above plus the applicant's claims history (loss-history.md) and underwriting decision workflow (vector embedding).

Files used for RAG were embedded via Qdrant, and the persona prompts were grounded in their respective views. This made it possible to have a multi-perspective evaluation of the same submission with clear logic, tradeoffs, and pricing risk assessments.

2️⃣Care Planning

Prompt

You are developing a personalized care plan for Mrs. Elaine Carter, a 62-year-old woman recovering from a total left hip replacement. Collaborate across clinical, care coordination, and family perspectives to ensure a safe recovery, appropriate support services, and readiness for outpatient transition.

CLI Command

python main.py \
  --prompt "You are developing a personalized care plan for Mrs. Elaine Carter, a 62-year-old woman recovering from a total left hip replacement. Collaborate across clinical, care coordination, and family perspectives to ensure a safe recovery, appropriate support services, and readiness for outpatient transition." \
  --rounds 2 \
  --personas-file './input/caremgmt-hip/care-personas.json' \
  --output html \
  --save-to './output/caremgmt-hip/care-plan-discussion.html'
  --goal-round decision

I created a discussion around a 62-year-old woman recovering from hip replacement surgery (Mrs. Carter). The setup included a Surgeon, a Care Manager, and the patient’s son (Michael).

The Surgeon referenced a clinical recovery timeline (hip-replacement-recovery.md).
The Care Manager retrieved content from a vectorized care guideline (care-guidelines.md) and a home safety checklist (home-environment-checklist.md).
Michael (the son) responded based on family-provided background info (patient-background.md).

The responses were coordinated, personalized, and relevant. They showed how each persona could bring its expertise forward while grounding its reasoning in specific content. This approach supports everything from planning safe discharge protocols to modeling risk acceptance for a new policy.

As you can see in the screenshot, the agent made a decision, offered supporting recommendations, and explained the rationale, in structured JSON format. In real-world settings like underwriting or care planning, this kind of assistive intelligence could save valuable time for teams constrained by human capacity.
It’s not about replacing knowledge workers, it's about surfacing relevant context and generating suggestions before the case is even reviewed. Think of it as a recommendation engine: one that boosts productivity, speeds up decision cycles, and creates a more consistent starting point for complex decisions.

Both of these showed us this setup isn't just a hello-world implementation of Agentic RAG. It's a flexible system for making decisions with embedded context and specialized roles.

What's Next

Here’s what I am looking at next:

Chunking strategies for longer documents
Incorporating external URLs
CAG(Cache Augumented Retrieval) for less frequently changing sources
Voting, route pruning, and exploration strategies for agents
Mindmap or workflow views for agent threads
Semantic grouping of documents
Real-time updates to vector embeddings

If the first version was about simulating conversation, this version is about adding memory, purpose, and context. It’s no longer just talk. Now, the tree can think 🧠.

While this tool taught me a lot about prompt orchestration, vector-based retrieval, and agent coordination, I wouldn’t necessarily re-implement it in a production stack. For real-world use, I’d probably reach for tools like AutoGen, LangGraph, Dify, or Flowise, etc., which are built for scalability and orchestration out of the box.

That said, building it from scratch gave me a far deeper appreciation for how these systems work under the hood.

👉 The full repo is open source. Feel free to follow along or contribute to the code on GitHub.

Personal Reflections on Prompting

Although I consider myself an advanced prompt user, I learned a lot more during the buildout of this tool. Designing for multi-agent orchestration, goal alignment, and context-aware interactions pushed me to understand how prompts interact, not just with the LLM, but with each other. The nuances of injection order, repetition, and grounding became far more visible at scale than they ever did in single-agent experiments.
At their core, agentic systems like this rely on orchestrated prompt scaffolding (injection and chaining) and semantic retrieval. And when composed correctly, that scaffolding can mimic thought, enabling agents to weigh tradeoffs, simulate judgment, and converge on decisions that feel intentional.
Maybe that’s what humans do too. We just don’t think of it in terms of prompt flows and vector contexts.
It’s like a chef crafting a dish, they’re not magically creating ingredients, but they know how to combine them in a way that feels like magic. And often, that’s just enough to feed the hungry crowd!

Building a CLI for Multi-Agent Tree-of-Thought: From Idea to Execution - Part 1

Hello Insurance — Sun, 04 May 2025 14:37:36 +0000

What if LLMs had opinions and argued in threads? What happens when we give LLMs not just memory and tools, but also autonomy, voices, perspectives, and structure?

I built a simple CLI tool, a tiny experiment in multi-agent, tree-of-thought reasoning. The tool lets you simulate a conversation between different AI personas, each contributing their thoughts in a threaded format.

This concept of enabling language models to work together to solve complex problems, with each agent assuming a unique role based on its strengths, is a central tenet of multi-agent LLM systems. Such systems are often observed to outperform traditional single-agent models, particularly when dealing with intricate tasks that necessitate diverse expertise and collaborative decision-making.

🛠️ The CLI: What It Does

The CLI tool I built supports:

Defining multiple personas (e.g., Philosopher, Technologist, Policymaker, Educator)
Assigning each a role or lens to interpret the prompt
Running rounds of discussion threads (like Reddit comments)
Outputting results in HTML, Markdown, or JSON
Optional logging of engagement and “most insightful” paths

The first part wasn’t meant to be a goal-oriented enterprise-grade agent framework. It’s intentionally simple, an experiment in structure, not infrastructure.

👉 Git repo: http://github.com/gajakannan/public-showcase/tree/main/multillm-tot

💡 Try It Yourself

Here are a few example prompts and command-line inputs to experiment with:

python main.py 
     --prompt "With increased complexity should we relook proliferation of Microservice and build Modular Monoliths" 
     --rounds 5 
     --personas-file './input/microservice-personas.json' 
     --output html 
     --save-to "./output/microservice-discussion.html"

python main.py --prompt "Can a specialized AI or AGI replace primary care physicians" --rounds 20 --personas-file './input/pcp-personas.json' --output html --save-to "./output/pcp-discussion.html"

python main.py 
     --prompt "Is AI gonna replace primary care physicians" 
     --rounds 8 
     --personas-file './input/pcp-personas.json'
     --output html 
     --save-to "./output/pcp-discussion.html"

python main.py 
     --prompt "Which is the good front end technology to develop web applications" 
     --rounds 25 
     -personas-file './input/frontend-personas.json' 
     --output html 
     --save-to "./output/discussion.html"

🧎‍♂️ Why Tree-of-Thought?

LLMs are great at linear reasoning, but sometimes, ideas need branches, not just steps. It’s less about “right answers,” more about expanding thought space. This approach enables agents to diverge in their thinking, reflect on different possibilities, and build upon each other's ideas, with the primary focus being on expanding the thought space rather than solely seeking a single "right answer".

This CLI is a simple inspiration, not a full-fledged ToT that implements path optimization or scoring. Perhaps in the future, it could.

🤖 Where This Fits: RAG, Agentic RAG, and CAG

If you’ve been following the evolution of LLM architecture, you’ll recognize these three models:

RAG (Retrieval-Augmented Generation): Adds external knowledge to LLMs.
Agentic RAG: Enables LLMs to delegate tasks to autonomous agents (e.g., web search, coding).
CAG (Cache-Augmented Generation): Optimizes for speed by caching memory instead of retrieving it.

This CLI sits squarely as a collaborative multi-agent conversational tool. It doesn’t fetch external data, but it orchestrates reasoning. Each agent is a simulated persona, capable of reflecting, responding, and evolving the conversation.

🌱 What’s Next?

This CLI is just a start. I’m toying with ideas like:

Integrating OpenAI tools to let each agent choose its own approach—whether that means browsing the web for real-time data, executing code snippets, reflecting quietly on prior context, or invoking APIs. This modularity opens the door to more flexible decision-making pathways where agents can act autonomously based on the task at hand. Frameworks like LangChain, CrewAI, or AutoGPT can be used to orchestrate these multi-agent workflows, where each agent has a set of tools and reasoning capabilities and can decide which to invoke depending on its role and context. For instance, one agent might fact-check a statement using retrieval tools powered by LangChain, while another writes simulation code, and a third prompts itself with counterfactuals for deeper reflection. CrewAI can be especially useful when simulating structured teams, while AutoGPT lends itself to more open-ended exploration.
Assigning voting power or influence to different personas—where each agent's opinion can carry a weight based on their domain authority, confidence level, or even engagement score. This allows the system to resolve disagreement in a structured way, such as weighted consensus, probabilistic sampling, or majority opinion. For example, an actuarial persona might be granted higher voting weight in pricing discussions, while a customer advocate might have more say in usability debates.
Implementing Agentic RAG and CAG options—where agents can dynamically retrieve data or leverage cached memory to balance responsiveness with accuracy, opening the door for adaptive workflows like delegated web searches or instant responses based on frequently accessed context.
Assigning a goal for the agents, such as reaching consensus, challenging assumptions, ranking solutions, or role-playing stakeholder positions can dramatically shift how they interact and contribute. For example, a debate-style goal encourages conflict and contrast, while a synthesis goal prioritizes convergence. These behavioral nudges can help simulate more realistic human-like collaboration or disagreement.

Bringing this into the insurance domain, here are some business use cases that could benefit from multi-agent reasoning and simulation:

Underwriting: Simulating multi-underwriter collaboration in property insurance scenarios, an Agentic RAG example where each underwriter agent brings a different lens to evaluating the same risk. For instance, one agent may specialize in structural risk, another in climate exposure, and another in occupancy or usage-based data. These agents could debate, validate, or challenge each other’s views asynchronously, ultimately helping the human underwriter synthesize a more holistic decision.
Claims: where agents represent medical experts, policy terms, and historical precedent to triage a complex case. Each agent can provide input based on its specialization like, medical necessity, coverage interpretation, or comparative case history and the system can surface areas of alignment or contention. The result might be a collaborative decision summary, a weighted score, or even a generated explanation suitable for review or audit.
Fraud detection: where multiple perspectives evaluate anomalies in claim data or customer behavior. Vector embeddings can significantly enhance this use case by enabling similarity searches across high-dimensional claim histories, customer behavior, or provider patterns. Each agent can retrieve semantically similar past cases using vector embeddings, offering comparative reasoning. Combined with RAG, the agents can pull in structured anomaly flags, historical fraud investigations, and contextual metadata to enrich their evaluations, making the simulation both data-aware and behaviorally diverse.
Product design: where marketing, actuarial, and distribution personas simulate how a new coverage option might perform or be perceived. By combining Agentic RAG and Tree-of-Thought prompting, these personas can retrieve relevant market data, historical uptake metrics, and regulatory constraints to shape their opinions. The simulation can branch into competing strategies, for example, a low-premium, high-volume plan versus a niche, high-margin variant, and converge through deliberation or voting. The result might be a ranked list of product ideas, a synthesized go-to-market narrative, or an early warning about friction points across departments.
Customer service: where empathy agents, compliance agents, and procedural agents work together to craft responses that are both kind and accurate.

But mostly, I wanted to share a tiny, runnable proof‑of‑concept—something that shows how easy it is to spark new ideas once your LLMs are in conversation, not isolation.

In the following Part 2 installment we evolve this CLI into a goal driven ToT engine, add RAG powered by vector embeddings, benchmark different agent roles, and walk through two real‑world scenarios, insurance underwriting and care‑management, to demonstrate how the tool can be used in business context.