Daniel Cordeiro

Posted on Jun 5

Building an AI Billing Assistant: Integrating LangChain ReAct Agents with Spring Boot Microservices

#ai #langchain #python #java

Introduction

A significant portion of telecom customer support calls follow the same pattern: "What is my current bill?", "Why did it go up?", "I want to dispute this charge." These are structured, predictable requests with clear resolution paths — exactly the kind of interaction that a well-designed AI agent can handle reliably, without a human in the loop.

This post presents the design and development of the Smart Billing Assistant, an AI-powered telecom customer support agent that puts this idea into practice. A Python FastAPI service hosts a LangChain ReAct agent that gives customers a natural language interface to five self-service flows: viewing invoices, understanding bill changes, filing disputes, requesting plan changes, and checking payment status. Under the hood, the agent orchestrates two independent Java Spring Boot microservices — a reactive billing service (Spring WebFlux + R2DBC) and a transactional provisioning service (Spring MVC + JPA) — backed by PostgreSQL. The development lifecycle was shaped by spec-driven requirements, test-driven implementation and Claude Code as an AI pair programmer.

1. The Project Overview

Note: The v1 term mentionated through this post is a demo version of the Smart Billing Assistant project, built for exploring purposes. Several design decisions throughout this document are explicitly simplified to keep scope manageable.

Why This Domain

Business Support Systems (BSS) are the software backbone of a telecom company: billing, invoicing, customer accounts, payments. They are high-volume, data-intensive, and historically painful for customer support teams. A large portion of inbound support calls are about bill questions and simple service changes — exactly the kind of structured, predictable request that an AI agent handles well.

What the Agent Can Do

Six user stories were implemented:

User Story	What the customer says	What happens
US-01	"What is my current bill?"	The agent retrieves the customer's current invoice, including the billing summary and the individual line items that make up the total.
US-02	"Why is my bill so high?"	The agent compares the current billing cycle against the prior one, identifies overages and one-time charges, and explains what drove the increase.
US-03	"I want to dispute this charge"	The agent opens a dispute ticket for the specified charge, returns a unique reference number, and informs the customer that resolution takes up to 5 business days.
US-04	"Can I switch to a cheaper plan?"	The agent verifies whether the customer's line is eligible for the requested plan and either applies the change immediately for upgrades or schedules it for the next billing cycle for downgrades.
US-05	"Did you receive my payment?"	The agent returns the status of the customer's most recent payment — whether it was received, pending, or failed — together with the timestamp of the last update.
US-06	"Why was it so high?" (follow-up)	The agent resolves the reference to "it" from the active session context and answers without asking the customer to re-identify themselves or repeat previous information.

The Architecture

The system was decomposed into three independently deployable services:

agent-service (Python/FastAPI + LangChain): The sole customer entry point. Owns JWT validation, conversational session state, LangChain ReAct orchestration, and escalation logic. Calls the Java services over synchronous REST.
billing-service (Java/Spring WebFlux): The source of truth for invoices, line items, payments, and disputes. Uses reactive R2DBC for non-blocking database access.
provisioning-service (Java/Spring MVC): The source of truth for plan catalogues, eligibility rules, and customer line configuration. Uses JPA/Hibernate with blocking I/O — plan changes are infrequent transactional writes.

In summary, this is a holistic view of the project idea and the architecture it produced, and the following sections narrow the process in detail.

2. The Development Process

2.1 Using Claude Code as a Pair Programmer

The entire project was built using Claude Code [1] — Anthropic's CLI-based AI coding agent — as an interactive pair programmer. Rather than treating it as a one-shot code generator, it was used as a persistent collaborator across 13 implementation sessions, each one driving a task from TDD flow (Red → Green → Refactor).

CLAUDE.md: The Project Instruction File

The key to making Claude Code useful across sessions is the CLAUDE.md file. This file lives in the project root and is automatically loaded by Claude Code at the start of every session. It acts as the project contract: what the system should do, what design principles to follow, what quality gates to enforce, and exactly what steps to execute after each task is completed.

Depending on the project, a CLAUDE.md might include:

Design principles: KISS, YAGNI, AHA, SOLID — as concrete enforcement rules (e.g. "Don't add features beyond what was asked"; "No Redis in v1 — YAGNI").
Quality gates: ≥80% line coverage, SonarQube Maintainability Rating 'A', Cognitive Complexity ≤15 per method (≤10 for Python production code).
Task Completion Protocol: An 8-step automated loop — implement → run tests → local validation (for feat: tasks) → commit → open PR → monitor CI → fix failures → report green.
Git conventions: Conventional Commits required; Semantic Release handles versioning; no manual version bumps in pom.xml or pyproject.toml.

Why Start Simple and Evolve

A comprehensive CLAUDE.md was not written upfront. The project started with a minimal version — basic rules about TDD and commit conventions — and expanded it as new needs emerged during actual development. So, each CLAUDE.md addition was prompted by a real friction point, evolving the process incrementally, driven by actual need.

Why Document Artifacts Instead of Prompting

One of the highest-leverage decisions was treating PROJECT_IDEA.md, REQUIREMENTS.md, DESIGN.md, and TASKS.md as first-class project artifacts — files Claude Code reads directly rather than content repeated in every chat prompt.

Benefits:

Token efficiency: The context is loaded once from stable files, not repeated in every session.
Consistency: The same scope, terminology, and decisions are visible in every session.
Guardrails: Claude Code stays bounded by what is documented; speculative features don't creep in.
Memory: Session history notes in the docs capture every design decision — when returning to a task, the rationale is already there.

2.2 Defining the Requirements

Spec-Driven Development

Before writing a single line of code, REQUIREMENTS.md was produced: user stories with explicit acceptance scenarios for every state the system needs to handle. This approach is inspired by spec-driven development tools like Kiro [2] — requirements come first, and the code is tested against them.

To ground the requirements in real domain knowledge, Claude Code was asked to adopt the role of a Billing Manager stakeholder with expertise in BSS telecom. This simulated discovery session drove a structured discussion: what do customers actually call about? What are the edge cases? What is in scope and what crosses a line the support agent should not cross? The conversation surfaced business rules (the 90-day dispute window, the delinquency threshold, the OSS/BSS boundary) that would otherwise have been discovered late — during implementation or testing.

The format: each user story has an "As a / I want / So that" header, followed by numbered acceptance scenarios (S1, S2, S3...) that map directly to test cases and eventually to BDD Gherkin feature files.

Example (US-03 — Dispute a Charge):

US-03: As a customer, I want to dispute a charge on my invoice,
       so that I can get incorrect charges reviewed.

S1 — Valid dispute filed: Customer provides invoice ID and line item ID.
     System creates a dispute record and returns a reference number
     with a 5-business-day resolution SLA.

S2 — Duplicate dispute blocked: An open dispute already exists for
     that line item. System returns the existing reference number
     instead of creating a second dispute.

S3 — Outside 90-day window: The charge is more than 90 days old.
     System rejects the request and explains the eligibility window.

These scenarios drove every test: unit tests mocked the repository layer and asserted each branch; integration tests against Testcontainers PostgreSQL confirmed end-to-end behavior; BDD feature files in Cucumber (Java) and pytest-bdd (Python) verified full conversation flows.

GLOSSARY.md: Capturing the Domain

One output of the requirements session was GLOSSARY.md — a glossary of BSS telecom terms. Proration, CDR, dunning, delinquency, cold handoff — these terms appear in the code, in tests, and in the agent's responses. Having a shared glossary ensures that when the code says OUTSTANDING or the agent mentions "5-business-day SLA", it means the same thing to everyone reading it.

2.3 Defining the Design

LangChain, LangGraph, and the ReAct Pattern

LangChain [3] is a Python framework for building applications powered by LLMs. Its core concept is the tool: a Python function the LLM can call to take actions or retrieve information. The LLM decides which tool to call, what arguments to pass, and what to do with the result.

Each tool is a plain Python function decorated with @tool. The LLM reads the function's docstring to understand when and how to use it — no routing tables, no decision trees. LangChain handles the mechanics of formatting the tool call, parsing the LLM's response, and invoking the function.

ReAct [4] (Reason + Act) is the agent pattern used here. The LLM alternates between:

Reasoning: "The customer asked about their bill. I should call get_current_invoice."
Acting: Call the tool, get the result.
Observing: "The invoice shows a $45 data overage. The customer needs an explanation."
Reasoning again: Do I need more information, or can I answer now?

This loop runs entirely inside the agent-service. From the customer's perspective, they send a message and receive a reply. Inside, the LLM may have called two or three tools, observed intermediate results, and reasoned about each before generating the final response.

LangGraph adds state management to this loop. It is a graph-based runtime where nodes are functions (like "call the LLM" or "execute a tool call") and edges control flow. Crucially, it provides a MemorySaver checkpointer that persists conversation history between turns — keyed by a thread_id. This is what gives the agent multi-turn memory.

The thread_id is the session UUID. On every POST /chat request, the agent loads the conversation history for that thread, runs the ReAct loop, and saves the updated state — including the new user message, any tool calls, and the final AI response. The customer's next message picks up exactly where the last one left off.

LangSmith traces every ReAct loop: which tools were called, what the LLM reasoned at each step, how long each operation took. Invaluable for debugging agent behavior.

Spring WebFlux + R2DBC and Spring MVC + JPA

Spring WebFlux + R2DBC powers the billing-service. WebFlux is Spring's reactive web framework [5]: instead of one blocking thread per HTTP request, it uses a small, fixed thread pool with event-loop-based I/O. Requests are represented as non-blocking streams (Mono<T> for a single value, Flux<T> for a sequence). R2DBC (Reactive Relational Database Connectivity) is the reactive counterpart to JDBC — database queries return Mono or Flux publishers rather than blocking the calling thread. This stack is a good fit for the billing-service because invoice lookups, comparison queries, and payment status checks are all read-heavy operations that hit the database frequently and concurrently.

Spring MVC + JPA powers the provisioning-service. Spring MVC is the classic, blocking web framework [6]: one thread per request, synchronous database calls through JPA/Hibernate. This is the choice for the provisioning-service because plan changes are low-frequency, write-heavy transactional operations — the simplicity of blocking code outweighs any throughput benefit from reactive streams. JPA's entity mapping and transaction management make the write path straightforward to reason about and test.

Prometheus + Grafana Observability: Both Java services expose metrics through the Micrometer instrumentation library, which is included with Spring Boot Actuator. Prometheus scrapes the /actuator/prometheus endpoint on both services and stores the metrics as time-series data. Grafana connects to Prometheus as a datasource and visualises the data in dashboards.

Key Design Decisions

Decision	Chosen	Alternative	Reason
Language split	Python + Java	Monolith (either)	LangChain is Python-first; Spring WebFlux is battle-tested for high-volume BSS data. Each language serves the layer where it is strongest, and that domain alignment justifies the operational complexity of running two runtimes.
Agent pattern	LangChain ReAct	Structured routing	A structured router requires every customer intent to be hardcoded upfront. ReAct lets the LLM reason dynamically across multi-step billing queries.
Session state	LangGraph in-process	Redis	Redis adds a new infrastructure dependency, serialisation, and failure handling for no v1 benefit. Session loss on restart is accepted at this scale. Redis is the natural v2 step when horizontal scaling is needed.
Inter-service	REST (sync)	Message queues	The customer waits in real time — async messaging would require correlation IDs and timeout handling for an inherently synchronous interaction. REST gives predictable latency and simple error propagation.
Disputes	Flag-only	Auto-reversal	Auto-reversal requires a Revenue Assurance approval workflow that is outside the agent's authority and out of v1 scope. The agent captures the claim and issues a reference number; the reversal decision stays with a human reviewer.
JWT validation	agent-service boundary	API Gateway	JWT validation was placed at the FastAPI boundary rather than in a dedicated API Gateway because the agent-service is the only external-facing service in v1. Single external-facing service makes a dedicated gateway YAGNI for v1.
Database type	PostgreSQL for billing and provisioning services	MongoDB for provisioning	Eligibility checks depend on structured SQL queries across `plans` and `customer_lines`. The array columns and event-log pattern in provisioning create minor relational friction but do not outweigh the operational cost of running a second database technology at v1 scale. MongoDB is the natural revisit if the plan catalogue grows to support dozens of configurable attributes per tier.

2.4 Executing the Tasks

TASKS.md: From Design to Executable Work

The final design output was TASKS.md: 13 tasks, each decomposed into sub-tasks, each sub-task referencing specific user story scenarios. TDD within each task followed the same rhythm: write a failing unit test, implement just enough to pass, write a failing integration test, implement until it passes, then refactor.

The Three Services and Their Roles: Putting It All Together

agent-service is the sole customer-facing entry point. It validates JWT tokens, creates and manages conversational sessions, and runs the LangChain ReAct agent loop. When a customer sends a message, the agent reasons about intent, calls whichever billing or provisioning tool is needed, observes the result, and formulates a natural language response.
billing-service is the source of truth for all financial data. It owns invoices and their line items, payment records, and dispute tickets. It is the only service that knows whether a customer's account is active or suspended, and whether their balance is overdue. Its reactive stack (Project Reactor + R2DBC) handles high read volumes — invoice queries, comparison lookups, payment status checks — without blocking server threads.
provisioning-service owns the plan catalogue, eligibility rules, and each customer's current line configuration. It decides which plans a customer can switch to (based on network capability flags and regional availability) and applies or schedules the resulting plan change.

3. Conclusion

Key Takeaways

Each language in its own domain. Python is the natural home for LangChain. Java is proven for high-volume transactional services. Combining them means the agent layer and the data layer each run in the ecosystem where they are best supported.
Spec-driven development pays off at test time. Producing REQUIREMENTS.md with explicit acceptance scenarios before touching the code makes test-writing focused. Every scenario has a name, a precondition, and an expected outcome.
ReAct agents are powerful but need guardrails. The ReAct loop gives the LLM significant autonomy — it decides which tool to call, in what order, and when to stop. For billing queries, this flexibility is valuable: a customer asking "why did my bill change?" may require the agent to call two tools and reason about both results before answering — a flow that would be brittle to hardcode. But autonomy introduces risk: the LLM could call a write tool (like file_dispute) when the customer only asked a question. The tool closure pattern (capturing customer_id in the closure, returning status-keyed dicts instead of raising exceptions) and the clear separation of read and write tools are the guardrails that keep the agent predictable. LangSmith traces make any misbehavior visible and debuggable.
Load the customer's data once at session start. Fetching the customer's invoice and eligible plans at session creation makes every follow-up question in the conversation instantaneous and natural. The customer does not need to repeat their account details on every follow-up message, because the agent already has the relevant data loaded from the moment the session opened.
CLAUDE.md as a living document. The instruction file that guides an AI pair programmer grows alongside the project. Each friction point or new decision is an opportunity to add a rule that prevents the same issue from recurring.

Trade-offs and Limitations

Session loss on restart: LangGraph's MemorySaver stores all conversation state in the agent-service process memory. When the process restarts — during a deployment, a crash, or a container restart — every active session is immediately lost. A customer mid-conversation would receive a "session not found" error and have to start over from scratch. For v1, where there is a single process and restarts are infrequent, this is acceptable. In a multi-instance production setup, it becomes a hard blocker: a session created on instance A would not be visible to instance B, making load balancing impossible without sticky sessions. Redis would solve this by persisting each conversation turn to a shared external store, making sessions portable across instances and restart-safe.
JWT validation at the application boundary: In a proper production microservices architecture, JWT validation is the responsibility of an API Gateway (Kong, AWS API Gateway, nginx with an auth module, etc.). The gateway validates the token, extracts verified claims, and forwards them as trusted headers (e.g., X-Customer-Id) to services behind it. In v1, this responsibility was placed directly in the FastAPI boundary because the agent-service is the only external-facing service — it effectively acts as its own gateway, making a dedicated one YAGNI. The problem surfaces when the system grows: a second external-facing service would need to duplicate the same JWT logic, and rotating the signing key or changing the token format would require updating every service that validates tokens.
Delinquency as synchronous cross-service call: When a customer requests a plan change, the provisioning-service makes a blocking HTTP call to the billing-service to check whether the account has an overdue balance. This creates a direct runtime dependency between the two services: if the billing-service is slow or temporarily unavailable, the provisioning plan change endpoint is also degraded — even though plan logic has nothing to do with invoice processing. For v1 with a small user base, this coupling is manageable. At higher volume, the right approach is an event-driven model: the billing-service publishes an account status event when delinquency is detected, and the provisioning-service maintains a local read-model of account statuses updated from those events. Plan change requests then query the local cache — no synchronous cross-service call needed at runtime.
Single PostgreSQL instance: The billing and provisioning schemas both live inside one PostgreSQL container in Docker Compose. While the application code enforces strict schema separation (no cross-schema queries, no shared tables), they share the same database process, disk I/O, connection pool, and resource limits. A slow billing query can starve provisioning reads. In production, each service should own its own PostgreSQL instance — separate containers, separate data volumes, potentially separate machines — so that they can be tuned, backed up, scaled, and failed over independently.
Relational database for both services: The billing-service is unambiguous — financial records, ACID guarantees, complex aggregation queries across billing cycles map naturally to relational. The provisioning-service is less clear-cut: the plans table uses array columns (regions[], network_flags) and plan_changes is effectively an event log — both patterns that a document store handles more naturally. For v1, eligibility checks benefit from structured SQL queries, KISS applies, and the operational cost of introducing a second database technology outweighs the schema flexibility it would bring at this scale. If the plan catalogue grows to support dozens of configurable attributes per tier, MongoDB would be the natural migration path for the provisioning-service's plan data.

Source code: github.com/dancodingbr/smart-billing-assistant

References

[1] Claude Code — Anthropic's CLI-based AI coding agent. github.com/anthropics/claude-code

[2] Kiro — AWS AI-powered IDE built around spec-driven development. kiro.dev

[3] LangChain — Framework for building LLM-powered applications. python.langchain.com

[4] ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al., 2022. arxiv.org/abs/2210.03629

[5] Spring WebFlux — Reactive web framework built on Project Reactor. docs.spring.io/spring-framework/reference/web/webflux.html

[6] Spring Boot — Convention-over-configuration framework for Java microservices. spring.io/projects/spring-boot

DEV Community