DEV Community: Jobinesh Purushothaman

Healthcare Standards Every Software Engineer Should Know Before Building Healthcare Applications

Jobinesh Purushothaman — Sun, 10 May 2026 01:27:04 +0000

Healthcare Engineering Standards — A Practical Guide for Software Engineers

HL7, FHIR, openEHR, DICOM, CDA, and what actually matters when you build for healthcare.

Why this guide exists

Healthcare is one of the most complex domains in software engineering. If you're coming from SaaS, the rules of the game change the moment you join a health-tech team.

In an e-commerce app, renaming a field breaks a few internal services. In healthcare, the same mistake can break a lab integration, delay a diagnosis, or compromise patient safety.

That's why this domain leans heavily on standards — not frameworks.

What makes healthcare software different

In most apps, you control the schema, APIs evolve quickly, and integrations are optional.

In healthcare, it's the opposite:

Systems must exchange data across organizations.
Patient safety depends on accurate data.
Interoperability is mandatory, not a nice-to-have.

Picture a single patient journey: clinic → diagnostic lab → pharmacy → insurance. Four organizations, four tech stacks, one shared truth about the patient. Making that work is most of the job.

Most healthcare engineering is integration engineering.

The healthcare standards landscape

Standard	Purpose
HL7 v2	Hospital messaging
FHIR	Modern healthcare APIs
SMART on FHIR	EHR app integration
CDA	Clinical documents
openEHR	Structured health records
DICOM	Medical imaging
SNOMED / LOINC / ICD	Clinical terminology
X12 / EDI	Insurance & billing

A single hospital workflow often uses several at once: HL7 for admissions, FHIR for patient APIs, DICOM for MRI scans, SNOMED for diagnosis codes.

Let's walk through the ones that matter most.

1. HL7 v2 — the backbone of hospital integration

HL7 v2 is the messaging standard for admissions, lab results, scheduling, and billing events. It's old, it's quirky, and it's still everywhere.

When a patient is admitted, an HL7 ADT message fans out: pharmacy picks it up, the lab spins up records, billing kicks in.

A real message looks like this:

MSH|^~\&|LAB|HOSPITAL|EHR|HOSPITAL
PID|1||12345||DOE^JOHN
OBX|1|NM|WBC||5.4

Pipe-delimited, terse, and full of routing metadata.

What you'll actually do: TCP/MLLP messaging, interface engines, transformations, retries — and a lot of normalizing inconsistent data. One hospital sends M. Another sends Male. Your integration layer makes them agree.

Common tools: Mirth Connect, Rhapsody, Cloverleaf. A typical Mirth pipeline receives HL7 lab messages, validates them, transforms the payload, and pushes records into a FHIR API downstream.

Docs: hl7.org/implement/standards

2. FHIR — modern healthcare APIs

FHIR is what happens when healthcare standards meet REST. JSON, OAuth2, predictable resource URLs.

Instead of vendor-specific endpoints like:

GET /patientInfo?id=123

FHIR gives you:

GET /Patient/123

A resource looks like any modern API payload:

{
  "resourceType": "Patient",
  "id": "123"
}

Resources cover the usual suspects: Patient, Observation, Encounter, Medication, and dozens more.

Why engineers like it: it feels normal. React, Angular, or a mobile app can consume FHIR APIs the same way they'd consume any SaaS API.

The catch: FHIR allows profiles and extensions. Two hospitals can both "support FHIR" while modelling allergies or medications differently. Mapping logic doesn't disappear — it just moves up the stack.

Docs: hl7.org/fhir

3. SMART on FHIR — apps inside EHRs

SMART on FHIR layers OAuth2, OpenID Connect, and a launch workflow on top of FHIR. The result: third-party apps that run securely inside Epic, Cerner, and other EHRs.

A doctor clicks an AI clinical assistant from inside their EHR. The app receives patient context, fetches FHIR data, and shows recommendations — no separate login, no copy-pasting MRNs.

It's how healthcare apps become plugins instead of silos. Patient portals, clinical dashboards, and AI assistants all ride on it.

Docs: docs.smarthealthit.org

4. CDA — clinical document architecture

CDA is XML-based and built for documents: discharge summaries, referrals, clinical notes.

After surgery, a hospital generates a CDA discharge document with diagnosis, medications, procedures, and physician notes, then ships it to the next provider.

Reality check: CDA is verbose. A "simple" patient summary easily becomes hundreds of nested tags. Parsing and validation take real effort, and FHIR has been quietly eating its lunch for newer use cases.

Docs: HL7 CDA product brief

5. openEHR — structured longitudinal records

openEHR is about the long game: structured, lifelong health records built on archetypes and templates.

The big idea is separating clinical knowledge from technical implementation. Clinicians define what "blood pressure" means once, as a reusable template. Engineers store it in a standardized structure. When the clinical definition changes, the database schema doesn't have to.

That separation is what makes long-term maintainability possible.

Docs: openehr.org · specifications.openehr.org

6. DICOM — medical imaging

DICOM is the standard for medical imaging, radiology workflows, and PACS systems. An MRI machine doesn't output a PNG — it outputs DICOM.

A single CT scan can contain hundreds of high-resolution slices. That means engineering for:

large files and streaming,
fast retrieval,
optimized rendering in specialized viewers.

Common tools: Orthanc, OHIF Viewer, dcm4che. OHIF, for example, lets doctors open scans in a browser, zoom, annotate, and compare studies side by side.

Docs: dicomstandard.org

7. Terminologies — the unsung heroes

Without standard terminology, analytics breaks and interoperability fails.

Standard	Purpose
ICD-10	Disease classification
SNOMED CT	Clinical terminology
LOINC	Lab observations
RxNorm	Medications

One system stores Heart Attack. Another stores Myocardial Infarction. SNOMED makes sure both map to the same concept — which is the difference between a working dashboard and a misleading one.

Docs: SNOMED CT · LOINC · ICD-10 · RxNorm

8. Security and compliance

Healthcare systems need privacy, auditability, access control, and traceability — by law.

You must always be able to answer: who accessed what, when, and what did they change?

Region	Regulation
US	HIPAA
Europe	GDPR
India	ABDM

Practically, that translates to encryption, RBAC/ABAC, and immutable audit logs. Even DBAs typically can't read patient records directly without leaving a trace.

Docs: HIPAA · GDPR · ABDM · OWASP Top 10 · OWASP API Security

9. Integration is the real work

A real healthcare platform might tie together 20-year-old HL7 systems, modern cloud APIs, insurance gateways, and imaging systems — all at once.

Two patterns show up again and again.

HL7 integration hub:

Hospital Systems → Integration Engine → Internal Services

The engine receives lab events, validates messages, and routes them downstream.

FHIR gateway:

Legacy Systems → FHIR Gateway → External Apps

The legacy database stays put. Modern apps talk to a clean FHIR layer in front of it.

10. Skills that actually matter

If you want to grow in healthcare engineering, invest in:

distributed systems,
interoperability engineering,
API design,
security engineering,
event-driven architecture.

A strong healthcare engineer can debug an HL7 message, design a secure API, navigate compliance, and integrate a legacy system without breaking it.

Final thoughts

Healthcare engineering sits at the intersection of interoperability, compliance, distributed systems, legacy integration, and clinical workflows. Engineers who understand the standards — not just the frameworks — become disproportionately valuable.

A single career-defining project might look like this: build a FHIR API platform, integrate lab systems via HL7, secure patient access with SMART on FHIR, store scans with DICOM, and normalize diagnoses with SNOMED.

That mix of engineering and domain depth is what makes healthcare software different — and worth learning.

Learn the standards first. The frameworks get easier after that.

What a Real HIPAA Audit Actually Looks Like for Healthcare AI

Jobinesh Purushothaman — Sun, 26 Apr 2026 18:20:29 +0000

An auditor sits across from you with a single page of questions. They are not interested in your model architecture, your prompt engineering, or your evaluation harness. They want to know one thing: when your AI agent answered a clinician's question last Tuesday, what data did it see, who authorized that access, and can you prove it.
This is the moment most clinical AI systems quietly fail. Not because the team did not care about compliance — they did — but because the system was architected to make AI work, not to make audits work. Authorization was an application-layer concern. Audit logs captured user clicks but not model retrievals. The vector database lived outside the compliance perimeter. The agent reached data through generated queries that were never persisted in a form an auditor could reconstruct.
Clinical AI is shipping into hospitals now. The first wave of HIPAA audits and security reviews of these systems is already underway. The architectural patterns most teams are using were not designed for regulated workloads, and they do not hold up under serious scrutiny. This article is the question list I wish more teams had on the wall before their first audit.

What a HIPAA Audit Is Actually Looking For

HIPAA audits, whether driven by the Office for Civil Rights or by a covered entity's own internal review, do not test whether your AI is good. They test whether your handling of Protected Health Information is defensible. The Privacy Rule, the Security Rule, and the Breach Notification Rule define the structure. The questions an auditor asks fall into a narrow set of categories that map to those rules — and they are the same questions, in roughly the same order, every time.
There are six categories worth designing for explicitly. Each is a question you should be able to answer in minutes, not weeks, with evidence drawn directly from your system.

1. Who saw what, and when?

This is the foundational audit question. For any patient, any record, any field, the auditor expects you to produce a record of every access — read or write, by a human or a system — with a timestamp, an actor, and a reason. The HIPAA Security Rule requires audit controls; the Privacy Rule's accounting of disclosures provision adds a patient-facing layer that requires the same data, in a different format.
In a non-AI system, this is hard but tractable. Application-level access logs, database audit triggers, and a periodic export are usually enough. In an AI system, this question fragments. A clinician asks an agent a question. The agent retrieves five structured records and three free-text notes. It calls a model. The model returns a draft. The clinician sees the draft. Which of those five structured records counts as a disclosure to the clinician? All of them, even the ones that did not influence the answer? The ones that were quoted in the response? The ones the clinician scrolled past in the source citations? The auditor will ask, and "the model decided what to surface" is not an answer that survives the meeting.
What the architecture must support: every retrieval the agent performs — structured query, vector search, tool call — must produce an audit record tied to the requesting user, the clinical justification, the records returned, and the records ultimately surfaced. The records returned and the records surfaced are different sets, and both matter.

2. Was the access authorized?

Audit logs are necessary but not sufficient. The next question is whether each access was permitted under the user's role, the patient's consent directives, the purpose of use declared at session start, and the minimum-necessary standard. If a behavioral health note appears in the agent's retrieval set for a request that did not require it, the system has failed the test, even if the note never reached the user.
The hardest part is that authorization in clinical systems is contextual. The same physician has different access to the same patient depending on whether they are the patient's attending, the patient's covering provider, a consulting specialist, or none of the above. A psychiatric note may be visible to the patient's psychiatrist but not to a cardiologist consulting on the same encounter. A break-the-glass declaration permits access that would otherwise be denied, but creates an obligation to document the justification.
What the architecture must support: authorization belongs in the data layer, not the application layer. Every read — structured, vector, tool-mediated — must pass through the same policy engine that knows about role, relationship, consent, purpose of use, and minimum necessary. Filtering after retrieval is too late; the auditor will ask whether the agent saw the data, not whether it surfaced the data.

3. What did the model actually see?

This is the question that separates AI systems from the systems that came before them. When a model produces a response, the audit record must show not only the user's query and the model's output but the full prompt the model received — including the retrieval context, the system instructions, and any tool results that were inlined. If the model saw a sentence from a note in its prompt, that sentence is part of the disclosure record, whether or not it appeared in the final response.
The corollary is that any de-identification you applied to the prompt is also part of the audit. If your egress gateway redacted patient names before sending the prompt to an external model, the auditor will ask to see the redaction logs, the redaction rules, and evidence that the rules worked correctly on this specific prompt. Safe-harbor de-identification has eighteen specific identifier categories; expert-determination de-identification has a different standard. The auditor will ask which one you used and how it was validated.
What the architecture must support: every model invocation produces an audit event recording who caused it, which model received the prompt, whether de-identification was applied, and whether the prompt left the compliance boundary. The prompt and response themselves go to a separate prompt store, linked to the audit event by a single ID. The audit event records the decision; the prompt store carries the content. Both are queryable years later. "We don't log prompts because they're large" is a finding, not an excuse.

4. Did the data leave your perimeter, and under what agreement?

If your clinical AI uses an external model — Claude, GPT, Gemini, anything hosted outside your environment — the audit shifts to the egress boundary. The auditor will ask which model host received PHI, what business associate agreement governs the relationship, what data residency commitments exist, and whether any prompts crossed a region or jurisdiction boundary. Multi-region deployments under HIPAA and GDPR add layers of complexity here, especially when the model host's infrastructure is itself multi-region.
If you use an on-premises or in-boundary model, the questions are different but no less rigorous. The auditor will ask about the network boundaries, the model's training data lineage, and whether the model's outputs can be traced back to specific inputs in a way that distinguishes hallucination from disclosure.
What the architecture must support: every model call is routed through a gateway that records the model host, the BAA in force, the region of execution, the de-identification applied, and the user and patient context. "We send prompts directly from the application to OpenAI" is a sentence that ends an audit before it begins.

5. Can you reconstruct any single decision the system made?

This question is the audit equivalent of source-code traceability. The auditor picks a single response the agent produced — three months ago, six months ago, a year ago — and asks you to reconstruct it. What was the user's exact question? What retrieval context was assembled? What tools were called and what did they return? What prompt was sent to the model? What response came back? Which parts of the response were surfaced to the user? Were there any human-in-the-loop edits, and what were they?
If you can answer this in an afternoon with a query against your audit store, you are ready. If you need to engage your AI vendor to extract logs, your engineering team to dig through three different stores, and your security team to correlate timestamps, you are not. The retention period is also part of the question — most HIPAA programs require six years, longer in some states for some categories of records.
What the architecture must support: lineage as a first-class data citizen. Every AI output should carry a trace ID that resolves to the full reconstruction of how it was produced. This is not a feature you add in year three. It is the feature you build first.

6. What happens when something goes wrong?

Breach notification is the question that focuses minds. The auditor will ask how you would detect that PHI was disclosed inappropriately by your AI system, how quickly you could identify the affected patients, and how you would notify them under HIPAA's 60-day rule. "Our model hallucinated and we are not sure who saw what" is a breach response that becomes a breach itself.
The harder version of this question concerns inference. If your AI system produced an answer that revealed PHI the user was not authorized to see — not because of a retrieval failure but because the model inferred it from non-PHI context — is that a disclosure? Under most reasonable readings of the Privacy Rule, yes. Designing for that case requires the same logging discipline as the others, plus an evaluation framework that can detect inference leaks before they ship.
What the architecture must support: incident response that begins with a query. Given a suspected disclosure, you must be able to identify the specific records exposed, the users who received the output, the timeframe, and the patients affected, in hours not weeks. This is a function of how cleanly your audit data is structured, not of how skilled your incident responders are.

The Architecture That Answers These Questions

None of these six questions are surprising. The HIPAA rules are public and have not changed materially in years. What is new is that AI systems make answering them harder — because they fragment the access pattern, mediate retrieval through models, generate prompts dynamically, and call out to external services that are not in your direct control.

The architecture that makes these questions answerable rests on four design choices. None are exotic. All are non-negotiable if you intend to operate in regulated clinical environments.
Authorization in the data layer. Every read passes through a single policy engine that knows about user, role, relationship, consent, purpose of use, and minimum necessary. Structured queries, vector retrieval, and agent tool calls are all subject to the same rules and produce the same audit records.
Typed tool interfaces between agents and data. Agents do not write SQL or FHIR search queries. They invoke narrow, audited tools — search_patients, get_observations, semantic_search_notes — each of which inherits the user's permissions, grounds clinical concepts through a terminology service, and writes a record an auditor can read. Letting a model write queries directly is a compliance incident waiting to happen.
Vector storage inside the compliance boundary. Embeddings of clinical notes are PHI, even when the original text is chunked and transformed. They live in storage that meets the same standards as the relational source of truth, with metadata that supports ACL-aware filtering at query time. A third-party vector SaaS is rarely the right answer.
An egress gateway for every model call. All prompts to all model hosts, internal or external, route through a single gateway that handles de-identification, BAA-aware routing, region selection, and token-level logging. The gateway is the only path out of your perimeter, and its logs are the spine of your audit posture.

What an Audit Event Actually Looks Like

The four design choices above land on a simple deliverable: every read, write, agent action, and model call produces one structured audit event. The event records a decision, not a payload. It tells an auditor who did what, to whose data, in what context, and whether it was permitted. Anything beyond that lives in a separate store linked by ID.
A clinician opening a patient's chart and listing the patient's recent observations produces an event that looks like this:
Successful clinical read

event_name          = OBSERVATION_READ
application_name    = EHR
action_category     = READ
user_role_type      = CLINICIAN
operation_outcome   = SUCCESS
user_identity       = jchen.md
tenant_identifier   = 7b2e9f04-5a31-4d8c-9e72-1c4f8a6d5b29
event_timestamp_utc = 2026-04-27T00:00:00Z
attributes     = [
    patient_id      = [PT-9182734],
    purpose_of_use  = [TREATMENT],
    resource_count  = [47]
]

That is enough to satisfy the first audit question. The auditor knows who the user was, what they did, to which patient, in what application, under what declared purpose, and that the access succeeded. The timestamp is in the event envelope. Forty-seven observations were returned. The minimum-necessary standard is defensible because the purpose of use is recorded; if the purpose were ever set to a non-treatment value, the policy engine would have decided differently.
The event that matters even more is the denial. Most teams forget to emit one. Auditors do not.
Authorization denial — the event most teams forget to emit

event_name          = NOTE_READ
application_name    = EHR
action_category     = READ
user_role_type      = CLINICIAN
operation_outcome   = DENIED
user_identity       = jchen.md
tenant_identifier   = 7b2e9f04-5a31-4d8c-9e72-1c4f8a6d5b29
event_timestamp_utc = 2026-04-27T00:00:00Z
attributes     = [
    patient_id      = [PT-9182734],
    purpose_of_use  = [TREATMENT],
    deny_reason     = [BEHAVIORAL_HEALTH_SEGMENTATION]
]

This event is the proof that your data layer enforced consent. The denial reason names the specific rule that fired. When an auditor asks whether your system protects behavioral health records correctly, you do not show them documentation; you show them the denial events.
When the AI copilot reaches data on the user's behalf, the same event shape covers the access. Two fields establish the chain of accountability: the agent identifier, and the user the agent is acting on behalf of.
Agent acting on behalf of a user

event_name          = SEMANTIC_SEARCH_NOTES_READ
application_name    = AI_COPILOT
action_category     = READ
user_role_type      = CLINICIAN
operation_outcome   = SUCCESS
user_identity       = jchen.md
tenant_identifier   = 7b2e9f04-5a31-4d8c-9e72-1c4f8a6d5b29
event_timestamp_utc = 2026-04-27T00:00:00Z
attributes     = [
    patient_id      = [PT-9182734],
    purpose_of_use  = [TREATMENT],
    on_behalf_of    = [jchen.md],
    agent_id        = [chart_copilot],
    resource_count  = [6]
]

The agent retrieved six notes. The user is jchen.md. The on_behalf_of field is also jchen.md. That equality is the proof, recorded in the audit event itself, that the agent did not exceed the user's permissions. If on_behalf_of ever differed from user, or were absent, that is the finding. No prose needed.
Notice what is not in any of these events. There is no prompt content, no token count, no model output, no embedding vector, no hashes. Those belong in a prompt store and an observability store, linked by ID. The audit event records the decision. Mixing the two is the failure mode that produces 50-field audit records that are expensive to store and useless to read.

The Reframe

Most clinical AI teams approach compliance as a layer they add late, often under pressure from a security review or a customer's procurement process. This works in non-regulated AI domains because the cost of getting compliance wrong is reputational. In healthcare, the cost is patient harm, regulatory action, and existential risk to the organization.
The systems that ship and survive are the ones designed, from day one, to answer an auditor's questions in minutes. Authorization, audit, and lineage are not features bolted onto a working system; they are the load-bearing structure of the system itself. Build them first, and the AI fits. Build them last, and the AI does not ship.
If you are responsible for a clinical AI system, the most useful exercise you can do this week is to walk through these six questions for a single response your agent produced last week. If you cannot answer them in an afternoon, you know where the work is.

Solving Tool Integration and Orchestration in AI Agents with MCP

Jobinesh Purushothaman — Mon, 20 Apr 2026 03:47:41 +0000

Once you move beyond simple LLM demos, the complexity shifts from the model to everything around it. The real problem becomes how your system interacts with tools, APIs, and data in a way the model can reliably use.

Most implementations handle this by wiring tools directly into the application layer. That usually leads to duplicated definitions, hardcoded execution paths, and tightly coupled logic. It works at small scale, but breaks down as the number of tools and use cases grows.

This is the gap Model Context Protocol(MCP) is trying to solve.

What MCP Actually Does

MCP is a standard way to expose tools and data to a model.

Instead of embedding tool logic inside every app, you define them once and expose them through an MCP server. Any agent or client can connect to it and use those capabilities.

This separates:

capabilities (tools) → MCP server
decision-making → agent
Why This Matters for Agents

Most so-called agents are still structured like this:

if (intent === "create_ticket") {
  return createTicket()
}

That’s just routing logic.

An actual agent should:

choose tools dynamically
decide the sequence of actions
adapt based on results

For that to work, tools need to be:

discoverable
structured
decoupled from application logic

That’s exactly what MCP enables.

Where JSON-RPC Fits In

MCP uses JSON-RPC 2.0 as its communication layer.

It’s a simple protocol for calling functions using JSON. Nothing fancy, but very effective for this use case.

JSON-RPC Request

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "get_user",
    "arguments": {
      "user_id": "42"
    }
  }
}

JSON-RPC Response

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "name": "John Doe",
    "email": "john@example.com"
  }
}

This is the core interaction. MCP builds on top of this structure.

Minimal MCP Setup

Define tools once on the server:

mcpServer.tool("get_user", async ({ user_id }) => {
  return db.users.find(user_id)
})

mcpServer.tool("create_ticket", async ({ issue }) => {
  return jira.create(issue)
})

This becomes your reusable capability layer.

What Execution Looks Like

Input:

User 42 has a billing issue

Agent flow:

call get_user
call create_ticket
return response Under the Hood (JSON-RPC Calls)

{
  "method": "tools/call",
  "params": {
    "name": "get_user",
    "arguments": { "user_id": "42" }
  }
}
{
  "method": "tools/call",
  "params": {
    "name": "create_ticket",
    "arguments": { "issue": "Billing issue" }
  }
}

There is no predefined workflow here. The agent decides what to do based on available tools.

Why MCP + JSON-RPC Works

tools are defined once and reused
no repeated integration logic
agents can chain calls naturally
clean separation between execution and decision-making

This is what makes systems feel more agentic instead of scripted.

Without vs With MCP

Without MCP

hardcoded flows
duplicated integrations
tightly coupled logic

With MCP

shared tool layer
dynamic execution
cleaner architecture

What MCP Does Not Handle

MCP does not solve:

authentication
validation
safety

Those still need to be implemented at the tool level.

if (!query.toLowerCase().startsWith("select")) {
  throw new Error("Only read queries allowed")
}

When This Makes Sense

Use MCP when:

multiple tools are involved
tools need to be reused across systems
building agent-style workflows

Skip it when:

scope is small
only a few functions are needed
Final Take

This shifts the model from:

calling predefined functions

to:

interacting with a system of capabilities

That shift is what enables real agent behavior instead of scripted flows.

Building a Simple AI Agent with Micronaut, MCP, and LangChain4j

Jobinesh Purushothaman — Tue, 14 Apr 2026 04:52:32 +0000

An AI agent is a system that uses a language model to understand instructions, decide on actions, and execute them using available tools.
In practice, what does this look like?
In this article, we build a simple task-management AI agent in Java using Micronaut, LangChain4j, and Model Context Protocol (MCP). It demonstrates how an agent interprets natural language, selects the right action, and executes it safely through a structured tool interface.

Full code for this project is available here:

What is an AI Agent?

An AI agent is a software system that uses a language model to interpret user input, reason about it, and take actions by invoking tools or APIs.

At a minimum, an AI agent consists of:

A reasoning model – typically an LLM that understands user instructions
A set of tools – functions or APIs the agent can invoke
An execution loop – a cycle of: → understand → decide → act → return result

In this project, that loop is implemented cleanly and explicitly.

How the agent works in this repo

TaskPlannerAiService (LangChain4j) prompts the model to produce a single structured JSON tool call
TaskAgentOrchestrator parses and validates that JSON
McpTaskClient executes the selected tool via MCP

This design enforces an important rule:

The model never directly modifies business data.
It only decides what should happen, while the system controls how it happens.

What is MCP (Model Context Protocol)?

Model Context Protocol (MCP) is a standardized protocol that defines how AI agents interact with external tools and services in a structured and reliable way.

Without MCP, applications often implement custom tool-calling formats, leading to inconsistent integrations and fragile systems.

MCP provides:

A standard interface for exposing tools
Structured schemas for tool arguments
A JSON-RPC-based communication model
A clean separation between AI decision-making and system execution

Why MCP matters

In this project, MCP provides:

A stable tool interface (create-task, list-tasks, complete-task, etc.)
Structured and validated arguments
A predictable lifecycle (initialize → tools/call)
Loose coupling between the agent and backend services

In simple terms:

MCP is the contract between AI reasoning and real-world actions.

Project Architecture

This project is split into two modules:

1. `task-mcp-server` — MCP Tool Server (Micronaut)

This module exposes task-related operations as MCP tools using Micronaut.

Tools are defined using annotations like:

@Tool(name = "create-task")
@Tool(name = "list-tasks")
@Tool(name = "complete-task")
@Tool(name = "set-priority")

All tools operate on an in-memory TaskStore.

💡 Key detail:
Both REST APIs and MCP tools share the same store. So:

Data created via REST is visible to MCP
Data created via MCP is visible to REST

2. `task-agent` — AI Agent Runtime

This module contains the AI-driven decision-making layer.

Skills as configuration

Instead of hardcoding behavior, the agent uses a skills.md file:

@SystemMessage(fromResource = "skills.md")
@UserMessage("User instruction: {{instruction}}")
String plan(@V("instruction") String instruction);

This allows you to:

Update agent behavior without recompiling
Define tool usage rules in Markdown

Orchestration layer

TaskAgentOrchestrator is responsible for:

Parsing model output
Validating JSON structure
Applying safe defaults
Calling MCP tools via McpTaskClient

MCP client

McpTaskClient communicates with the MCP server using JSON-RPC:

Endpoint: http://127.0.0.1:8080/mcp
Flow: initialize → tools/call

End-to-End Flow

Example instruction

"Create task Buy milk with high priority and tag home"

Execution steps

Agent sends instruction + skills definition to the model
Model returns structured JSON:

{
  "tool": "create-task",
  "arguments": {
    "title": "Buy milk",
    "priority": "HIGH",
    "tags": "home"
  }
}

Orchestrator parses and validates the JSON
MCP client calls create-task
MCP server executes and returns the result

Why This Pattern Works

This architecture is simple but powerful.

Benefits

Add new tools without changing agent logic
Update behavior via skills.md
Swap LLM providers easily
Keep execution deterministic and safe
Avoid unpredictable model side effects

Instead of letting the model “do everything,” you:

Let the model decide
Let your system execute

Running the Project Locally

Start MCP server

cd task-mcp-server
mvn exec:java

Start agent

cd task-agent
OPENAI_API_KEY=<your-key> mvn exec:java

Call the agent

curl -sS -X POST http://127.0.0.1:8081/api/agent/run \
  -H 'content-type: application/json' \
  -d '{"instruction":"Create task Buy milk with high priority and tag home"}'

Inspect skills

curl -sS http://127.0.0.1:8081/api/agent/skills

Repositories

Here are the key modules used in this article and what they do:

task-mcp-server
https://github.com/jobinesh/java-ai-lab/tree/main/task-mcp-server
Micronaut-based MCP server that exposes task management tools (create-task, list-tasks, etc.) via MCP and REST. This is where all actual business logic executes.

task-agent
https://github.com/jobinesh/java-ai-lab/tree/main/task-agent
LangChain4j-based AI agent that interprets user instructions, decides which tool to call, and invokes MCP endpoints.

Final Takeaway

Think of the system in three layers:

AI Agent (LangChain4j): decides what to do
MCP Server (Micronaut): defines what can be done
Business Logic: ensures how it is done safely

That separation is the key to building reliable AI systems.

It keeps your AI flexible, your APIs stable, and your business logic safe.

If you're exploring AI agents in Java, this pattern is a great starting point—and a solid foundation for production-grade systems.

Designing a Scalable Recovery Service for Distributed Systems

Jobinesh Purushothaman — Sun, 12 Apr 2026 01:08:24 +0000

Failures are a normal and expected part of distributed systems.

If your application processes data asynchronously—for example, by consuming messages from Kafka or running background jobs—failures will happen regularly. A service might crash, a downstream dependency might become unavailable, or the data itself might fail validation.

If these failures are not handled properly, your system can lose data, create duplicates, or leave you with no visibility into what went wrong.

This article explains a simple and practical approach to building a recovery service that helps you handle failures in a reliable and scalable way.

What problem are we solving?

Before going deeper, let us clarify a few terms.

An event or message is simply a unit of work your system processes, such as a Kafka message.
Asynchronous processing means this work is handled in the background, not immediately in a user request.
A failure occurs when this processing does not complete successfully. In many systems, failures are handled by retrying immediately. However, this approach is not sufficient in real-world systems.

Why simple retries are not enough

In-memory retries are useful, but they do not solve real-world problems:

If the service crashes, all in-progress retries are lost.
Some failures should be retried later, not immediately.
There is no persistent record of what failed and how many times it was retried.
There is no clear final state, which can lead to endless retry loops.

To address these issues, failures must be persisted and handled asynchronously.

What is a recovery service?

A recovery service is a background component responsible for handling failed work.

It performs three main functions:

It stores failed tasks in a database.
It retries them later in a controlled manner.
It tracks the final outcome of each task.

Instead of retrying immediately, the system records the failure and processes it later using dedicated worker processes.

How failures are stored

Each failure is stored as a recovery task in a database table.
A recovery task typically contains the following information.

Failure context

This includes details about what failed, such as the event type and the payload being processed.

Lifecycle status

This represents the outcome of the task and can be one of the following:

FAILED, meaning it is waiting to be retried
RESOLVED, meaning it was successfully recovered
PERMANENT_FAILURE, meaning it will not be retried again

Retry metadata

This includes how many times the task has been retried and when it should be retried next.

Lock information

This indicates which worker is currently processing the task and when the lock was acquired.

Important design principle: status and lock are different

The lifecycle status and the execution lock represent different concepts.

The status describes the business outcome of the task.
The lock indicates which worker is currently processing the task. Keeping these two concepts separate helps avoid race conditions and keeps the design clear.

How multiple workers process tasks safely

To scale the system, multiple worker instances can run in parallel. The challenge is to ensure that the same task is not processed more than once at the same time.

This is solved using database row locking.

Workers use a query like the following:

SELECT *
FROM recovery_task
WHERE status = 'FAILED'
  AND next_retry_at <= NOW()
FOR UPDATE SKIP LOCKED
LIMIT 10;

This query locks the selected rows and ensures that other workers skip them.

As a result, each worker processes a different set of tasks, and no central coordination mechanism is required.

Worker lifecycle

Each recovery worker follows a simple loop.

First, it fetches a batch of failed tasks.
Then, it processes each task using the appropriate recovery logic.
Finally, it updates the task based on the outcome.

If the processing succeeds, the task is marked as RESOLVED.
If it fails but can be retried, the system schedules the next retry.
If the maximum number of retries is reached, the task is marked as PERMANENT_FAILURE.

Keeping the system generic

The recovery system should not contain business-specific logic.

Instead, responsibilities should be clearly separated.

The worker is responsible for coordination, retry handling, and state updates.
The handler is responsible for the actual recovery logic. For example, a handler might republish a Kafka message or re-trigger a failed operation.

This separation makes the recovery system reusable across different parts of the application.

Why this works without leader election

Some systems use leader election to ensure that only one instance performs certain tasks.

In this design, leader election is not required because the work is divided at the database row level.

Each worker processes different rows, and the database ensures that no two workers process the same task at the same time.

This approach allows the system to scale horizontally without introducing additional coordination complexity.

Safety guarantees

This design provides several important guarantees.

It prevents duplicate processing through database locks.
It allows recovery from worker crashes through lock expiration.
It provides full visibility into failures and retries.
It ensures that each task reaches a clear final state.

Practical tips
All updates, such as claiming tasks and updating results, should be performed within transactions.
Recovery handlers should be idempotent so that repeated execution does not cause issues.
Useful debugging information, such as error messages and request identifiers, should be stored.
Retry limits should be clearly defined.

- Metrics should be added to monitor system behavior.

Final thoughts

Failures are unavoidable in distributed systems, especially when processing data asynchronously.

Instead of relying only on retries, introducing a dedicated recovery service allows you to handle failures in a controlled and reliable way.

This approach improves system reliability, scalability, and observability, and it forms an essential part of building production-ready systems.

Understanding AI Metering in Enterprise Systems

Jobinesh Purushothaman — Wed, 08 Apr 2026 01:19:13 +0000

As AI becomes part of everyday workflows, organizations need a simple way to understand how it is being used. It is no longer enough to know that an AI feature exists or that users are interacting with it. Teams also need visibility into how usage is measured, how access is governed, and how consumption maps to what has been purchased or allocated.

This is where AI metering helps.

AI metering is a structured way to track AI consumption across products, teams, and workflows. It gives organizations a practical view of usage, entitlement, reporting, and planning.

Why AI Metering Matters

AI adoption in enterprise systems is rarely uniform. Some workflows use AI occasionally. Others depend on it heavily. Without a common way to measure usage, organizations end up with fragmented visibility. Different teams see different signals, but no one gets a clear picture of overall consumption.

A metering model helps solve that. It gives organizations a consistent way to measure usage across multiple AI capabilities and answer practical questions:

How much AI capacity is available?
How much has been consumed?
Which capabilities are driving usage?
How should teams plan for growth, renewal, or limits?

This visibility is useful not only for finance and operations, but also for product teams, administrators, and customers.

From Activity to Measurable Consumption

A core idea in AI metering is that meaningful AI activity should translate into measurable consumption. Once that is done consistently, usage can be tracked across different AI capabilities, even when those capabilities do different kinds of work.

This matters because not all AI interactions are equal. Some tasks are lightweight and frequent. Others are more complex or resource-intensive. A useful metering model reflects those differences through predefined consumption rules.

That turns raw activity into something more useful: a structured view of consumption that supports reporting, analysis, and decision-making.

The Role of AI Credits

One practical way to manage AI consumption is to use a normalized unit such as AI credits. AI credits create a shared language for measuring different types of AI usage under one model.

This makes it easier to report usage consistently, connect consumption to entitlement, and compare activity across multiple AI capabilities. The exact term is less important than the idea behind it: a common measure that makes different kinds of AI usage easier to understand.

Why Shared Credit Pools Help

In many systems, it is more practical to manage AI consumption through a shared pool instead of tying usage rigidly to one user or one workflow. A shared pool gives organizations flexibility. As priorities shift and adoption grows, capacity can be used where it adds the most value.

This is especially useful in enterprise environments where different teams adopt AI at different times. A pooled model reduces friction and makes it easier to scale usage without repeated changes to entitlement structures.

The main benefit is simple: pooled consumption supports flexibility while still keeping usage measurable and governed.

How AI Metering Works

At a high level, AI metering starts by identifying what a customer or organization is allowed to use. That entitlement may come from a subscription, package, contract, or internal allocation model. It defines the amount of AI capacity available and the scope of AI capabilities included.

When an AI workflow runs, the metering process identifies which capability was used and which customer, tenant, or organization should be associated with that activity. It then checks whether the usage falls within the valid access scope.

Once the activity is valid, the system applies a predefined consumption rule. That rule determines how much of the available AI credit balance should be counted for the interaction. The amount may vary depending on the type of task, the capability involved, or the selected consumption model.

After the usage amount is calculated, the remaining balance is updated. The interaction is also recorded so there is a reliable history for reporting, review, and reconciliation.

Finally, usage data is published to reporting or analytics systems so stakeholders can monitor adoption, understand trends, and track how credits are being used over time.

In simple terms, the flow looks like this:
[Entitlement Check] -> [Usage Detection] -> [Rule Lookup] -> [Credit Calculation] -> [Balance Update] -> [Audit Record] -> [Reporting]

A High-Level Algorithm for AI Metering

A simple way to think about AI metering is through this business-level flow:

Identify the active entitlement and available AI credits.
Detect when an AI capability completes a meaningful usage event.
Determine which capability generated the event and who owns the usage.
Verify that the usage is covered by the current access scope.
Retrieve the relevant consumption rule.
Calculate the credits consumed for the event.
Update the remaining balance and aggregate totals.
Store a usage record for audit and reconciliation.
Publish summarized usage to reporting or analytics systems.
Check for conditions such as low balance, exhaustion, renewal, or adjustment.
Update balances when subscriptions or allocations change.
Support correction or replay when data arrives late or needs reconciliation.

This algorithm stays intentionally high level. The goal is to explain the operating model, not the implementation details.

More Than Billing

AI metering is often associated with billing, but it is useful well beyond that.

A good metering model also supports governance, planning, transparency, and product insight. When usage is visible and structured, organizations can better understand adoption patterns, evaluate which capabilities are delivering value, identify overuse or underuse, and make better operational decisions.

For administrators and customers, metering answers simple but important questions: what is available, what has been consumed, and what may need attention next. For product teams, it provides a clearer view of how AI capabilities are being used in real environments.

A Practical Capability for Scaled AI Adoption

As AI becomes more embedded in enterprise systems, organizations need a way to manage it with the same clarity they apply to other core services. That does not mean adding complexity for its own sake. It means adding enough structure to support visibility, accountability, and confident scaling.

AI metering helps provide that structure.

It turns AI usage into something measurable, reviewable, and easier to plan around. That is the main takeaway: AI metering helps organizations treat AI not as a black box, but as a service with understandable consumption and clearer operational control.

DEV Community: Jobinesh Purushothaman

Healthcare Standards Every Software Engineer Should Know Before Building Healthcare Applications

Why this guide exists

What makes healthcare software different

The healthcare standards landscape

1. HL7 v2 — the backbone of hospital integration

2. FHIR — modern healthcare APIs

3. SMART on FHIR — apps inside EHRs

4. CDA — clinical document architecture

5. openEHR — structured longitudinal records

6. DICOM — medical imaging

7. Terminologies — the unsung heroes

8. Security and compliance

9. Integration is the real work

10. Skills that actually matter

Final thoughts

What a Real HIPAA Audit Actually Looks Like for Healthcare AI

What a HIPAA Audit Is Actually Looking For

The Architecture That Answers These Questions

What an Audit Event Actually Looks Like

The Reframe

Solving Tool Integration and Orchestration in AI Agents with MCP

What MCP Actually Does

Where JSON-RPC Fits In

Minimal MCP Setup

What Execution Looks Like

Why MCP + JSON-RPC Works

Without vs With MCP

What MCP Does Not Handle

When This Makes Sense

Building a Simple AI Agent with Micronaut, MCP, and LangChain4j

What is an AI Agent?

How the agent works in this repo

What is MCP (Model Context Protocol)?

Why MCP matters

Project Architecture

1. task-mcp-server — MCP Tool Server (Micronaut)

2. task-agent — AI Agent Runtime

Skills as configuration

Orchestration layer

MCP client

End-to-End Flow

Example instruction

Execution steps

Why This Pattern Works

Benefits

Running the Project Locally

Start MCP server

Start agent

Call the agent

Inspect skills

Repositories

Final Takeaway

Designing a Scalable Recovery Service for Distributed Systems

What problem are we solving?

Why simple retries are not enough

What is a recovery service?

How failures are stored

Failure context

Lifecycle status

Retry metadata

Lock information

Important design principle: status and lock are different

How multiple workers process tasks safely

Worker lifecycle

Keeping the system generic

Why this works without leader election

Safety guarantees

Practical tips

- Metrics should be added to monitor system behavior.

Final thoughts

Understanding AI Metering in Enterprise Systems

Why AI Metering Matters

From Activity to Measurable Consumption

The Role of AI Credits

Why Shared Credit Pools Help

How AI Metering Works

A High-Level Algorithm for AI Metering

More Than Billing

A Practical Capability for Scaled AI Adoption

1. `task-mcp-server` — MCP Tool Server (Micronaut)

2. `task-agent` — AI Agent Runtime