<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jobinesh Purushothaman</title>
    <description>The latest articles on DEV Community by Jobinesh Purushothaman (@jobinesh).</description>
    <link>https://dev.to/jobinesh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1634980%2F36f9cf15-ad8e-473b-83a9-1f8e6623834d.jpg</url>
      <title>DEV Community: Jobinesh Purushothaman</title>
      <link>https://dev.to/jobinesh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jobinesh"/>
    <language>en</language>
    <item>
      <title>What a Real HIPAA Audit Actually Looks Like for Healthcare AI</title>
      <dc:creator>Jobinesh Purushothaman</dc:creator>
      <pubDate>Sun, 26 Apr 2026 18:20:29 +0000</pubDate>
      <link>https://dev.to/jobinesh/what-a-real-hipaa-audit-asks-of-your-healthcare-ai-system-73e</link>
      <guid>https://dev.to/jobinesh/what-a-real-hipaa-audit-asks-of-your-healthcare-ai-system-73e</guid>
      <description>&lt;p&gt;An auditor sits across from you with a single page of questions. They are not interested in your model architecture, your prompt engineering, or your evaluation harness. They want to know one thing: when your AI agent answered a clinician's question last Tuesday, what data did it see, who authorized that access, and can you prove it.&lt;br&gt;
This is the moment most clinical AI systems quietly fail. Not because the team did not care about compliance — they did — but because the system was architected to make AI work, not to make audits work. Authorization was an application-layer concern. Audit logs captured user clicks but not model retrievals. The vector database lived outside the compliance perimeter. The agent reached data through generated queries that were never persisted in a form an auditor could reconstruct.&lt;br&gt;
Clinical AI is shipping into hospitals now. The first wave of HIPAA audits and security reviews of these systems is already underway. The architectural patterns most teams are using were not designed for regulated workloads, and they do not hold up under serious scrutiny. This article is the question list I wish more teams had on the wall before their first audit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffkz4bg4yrwh6g9jpfysm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffkz4bg4yrwh6g9jpfysm.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What a HIPAA Audit Is Actually Looking For
&lt;/h2&gt;

&lt;p&gt;HIPAA audits, whether driven by the Office for Civil Rights or by a covered entity's own internal review, do not test whether your AI is good. They test whether your handling of Protected Health Information is defensible. The Privacy Rule, the Security Rule, and the Breach Notification Rule define the structure. The questions an auditor asks fall into a narrow set of categories that map to those rules — and they are the same questions, in roughly the same order, every time.&lt;br&gt;
There are six categories worth designing for explicitly. Each is a question you should be able to answer in minutes, not weeks, with evidence drawn directly from your system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Who saw what, and when?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the foundational audit question. For any patient, any record, any field, the auditor expects you to produce a record of every access — read or write, by a human or a system — with a timestamp, an actor, and a reason. The HIPAA Security Rule requires audit controls; the Privacy Rule's accounting of disclosures provision adds a patient-facing layer that requires the same data, in a different format.&lt;br&gt;
In a non-AI system, this is hard but tractable. Application-level access logs, database audit triggers, and a periodic export are usually enough. In an AI system, this question fragments. A clinician asks an agent a question. The agent retrieves five structured records and three free-text notes. It calls a model. The model returns a draft. The clinician sees the draft. Which of those five structured records counts as a disclosure to the clinician? All of them, even the ones that did not influence the answer? The ones that were quoted in the response? The ones the clinician scrolled past in the source citations? The auditor will ask, and "the model decided what to surface" is not an answer that survives the meeting.&lt;br&gt;
What the architecture must support: every retrieval the agent performs — structured query, vector search, tool call — must produce an audit record tied to the requesting user, the clinical justification, the records returned, and the records ultimately surfaced. The records returned and the records surfaced are different sets, and both matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Was the access authorized?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Audit logs are necessary but not sufficient. The next question is whether each access was permitted under the user's role, the patient's consent directives, the purpose of use declared at session start, and the minimum-necessary standard. If a behavioral health note appears in the agent's retrieval set for a request that did not require it, the system has failed the test, even if the note never reached the user.&lt;br&gt;
The hardest part is that authorization in clinical systems is contextual. The same physician has different access to the same patient depending on whether they are the patient's attending, the patient's covering provider, a consulting specialist, or none of the above. A psychiatric note may be visible to the patient's psychiatrist but not to a cardiologist consulting on the same encounter. A break-the-glass declaration permits access that would otherwise be denied, but creates an obligation to document the justification.&lt;br&gt;
What the architecture must support: authorization belongs in the data layer, not the application layer. Every read — structured, vector, tool-mediated — must pass through the same policy engine that knows about role, relationship, consent, purpose of use, and minimum necessary. Filtering after retrieval is too late; the auditor will ask whether the agent saw the data, not whether it surfaced the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. What did the model actually see?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the question that separates AI systems from the systems that came before them. When a model produces a response, the audit record must show not only the user's query and the model's output but the full prompt the model received — including the retrieval context, the system instructions, and any tool results that were inlined. If the model saw a sentence from a note in its prompt, that sentence is part of the disclosure record, whether or not it appeared in the final response.&lt;br&gt;
The corollary is that any de-identification you applied to the prompt is also part of the audit. If your egress gateway redacted patient names before sending the prompt to an external model, the auditor will ask to see the redaction logs, the redaction rules, and evidence that the rules worked correctly on this specific prompt. Safe-harbor de-identification has eighteen specific identifier categories; expert-determination de-identification has a different standard. The auditor will ask which one you used and how it was validated.&lt;br&gt;
What the architecture must support: every model invocation produces an audit event recording who caused it, which model received the prompt, whether de-identification was applied, and whether the prompt left the compliance boundary. The prompt and response themselves go to a separate prompt store, linked to the audit event by a single ID. The audit event records the decision; the prompt store carries the content. Both are queryable years later. "We don't log prompts because they're large" is a finding, not an excuse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Did the data leave your perimeter, and under what agreement?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your clinical AI uses an external model — Claude, GPT, Gemini, anything hosted outside your environment — the audit shifts to the egress boundary. The auditor will ask which model host received PHI, what business associate agreement governs the relationship, what data residency commitments exist, and whether any prompts crossed a region or jurisdiction boundary. Multi-region deployments under HIPAA and GDPR add layers of complexity here, especially when the model host's infrastructure is itself multi-region.&lt;br&gt;
If you use an on-premises or in-boundary model, the questions are different but no less rigorous. The auditor will ask about the network boundaries, the model's training data lineage, and whether the model's outputs can be traced back to specific inputs in a way that distinguishes hallucination from disclosure.&lt;br&gt;
What the architecture must support: every model call is routed through a gateway that records the model host, the BAA in force, the region of execution, the de-identification applied, and the user and patient context. "We send prompts directly from the application to OpenAI" is a sentence that ends an audit before it begins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Can you reconstruct any single decision the system made?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This question is the audit equivalent of source-code traceability. The auditor picks a single response the agent produced — three months ago, six months ago, a year ago — and asks you to reconstruct it. What was the user's exact question? What retrieval context was assembled? What tools were called and what did they return? What prompt was sent to the model? What response came back? Which parts of the response were surfaced to the user? Were there any human-in-the-loop edits, and what were they?&lt;br&gt;
If you can answer this in an afternoon with a query against your audit store, you are ready. If you need to engage your AI vendor to extract logs, your engineering team to dig through three different stores, and your security team to correlate timestamps, you are not. The retention period is also part of the question — most HIPAA programs require six years, longer in some states for some categories of records.&lt;br&gt;
What the architecture must support: lineage as a first-class data citizen. Every AI output should carry a trace ID that resolves to the full reconstruction of how it was produced. This is not a feature you add in year three. It is the feature you build first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. What happens when something goes wrong?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Breach notification is the question that focuses minds. The auditor will ask how you would detect that PHI was disclosed inappropriately by your AI system, how quickly you could identify the affected patients, and how you would notify them under HIPAA's 60-day rule. "Our model hallucinated and we are not sure who saw what" is a breach response that becomes a breach itself.&lt;br&gt;
The harder version of this question concerns inference. If your AI system produced an answer that revealed PHI the user was not authorized to see — not because of a retrieval failure but because the model inferred it from non-PHI context — is that a disclosure? Under most reasonable readings of the Privacy Rule, yes. Designing for that case requires the same logging discipline as the others, plus an evaluation framework that can detect inference leaks before they ship.&lt;br&gt;
What the architecture must support: incident response that begins with a query. Given a suspected disclosure, you must be able to identify the specific records exposed, the users who received the output, the timeframe, and the patients affected, in hours not weeks. This is a function of how cleanly your audit data is structured, not of how skilled your incident responders are.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Architecture That Answers These Questions
&lt;/h2&gt;

&lt;p&gt;None of these six questions are surprising. The HIPAA rules are public and have not changed materially in years. What is new is that AI systems make answering them harder — because they fragment the access pattern, mediate retrieval through models, generate prompts dynamically, and call out to external services that are not in your direct control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zmqkwe60sfw5ke3ixpv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5zmqkwe60sfw5ke3ixpv.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;br&gt;
The architecture that makes these questions answerable rests on four design choices. None are exotic. All are non-negotiable if you intend to operate in regulated clinical environments.&lt;br&gt;
Authorization in the data layer. Every read passes through a single policy engine that knows about user, role, relationship, consent, purpose of use, and minimum necessary. Structured queries, vector retrieval, and agent tool calls are all subject to the same rules and produce the same audit records.&lt;br&gt;
Typed tool interfaces between agents and data. Agents do not write SQL or FHIR search queries. They invoke narrow, audited tools — search_patients, get_observations, semantic_search_notes — each of which inherits the user's permissions, grounds clinical concepts through a terminology service, and writes a record an auditor can read. Letting a model write queries directly is a compliance incident waiting to happen.&lt;br&gt;
Vector storage inside the compliance boundary. Embeddings of clinical notes are PHI, even when the original text is chunked and transformed. They live in storage that meets the same standards as the relational source of truth, with metadata that supports ACL-aware filtering at query time. A third-party vector SaaS is rarely the right answer.&lt;br&gt;
An egress gateway for every model call. All prompts to all model hosts, internal or external, route through a single gateway that handles de-identification, BAA-aware routing, region selection, and token-level logging. The gateway is the only path out of your perimeter, and its logs are the spine of your audit posture.&lt;/p&gt;
&lt;h2&gt;
  
  
  What an Audit Event Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;The four design choices above land on a simple deliverable: every read, write, agent action, and model call produces one structured audit event. The event records a decision, not a payload. It tells an auditor who did what, to whose data, in what context, and whether it was permitted. Anything beyond that lives in a separate store linked by ID.&lt;br&gt;
A clinician opening a patient's chart and listing the patient's recent observations produces an event that looks like this:&lt;br&gt;
Successful clinical read&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;event_name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;OBSERVATION_READ&lt;/span&gt;
&lt;span class="py"&gt;application_name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;EHR&lt;/span&gt;
&lt;span class="py"&gt;action_category&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;READ&lt;/span&gt;
&lt;span class="py"&gt;user_role_type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;CLINICIAN&lt;/span&gt;
&lt;span class="py"&gt;operation_outcome&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;SUCCESS&lt;/span&gt;
&lt;span class="py"&gt;user_identity&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;jchen.md&lt;/span&gt;
&lt;span class="py"&gt;tenant_identifier&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;7b2e9f04-5a31-4d8c-9e72-1c4f8a6d5b29&lt;/span&gt;
&lt;span class="py"&gt;event_timestamp_utc&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2026-04-27T00:00:00Z&lt;/span&gt;
&lt;span class="py"&gt;attributes&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[&lt;/span&gt;
    &lt;span class="py"&gt;patient_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[PT-9182734],&lt;/span&gt;
    &lt;span class="py"&gt;purpose_of_use&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[TREATMENT],&lt;/span&gt;
    &lt;span class="py"&gt;resource_count&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[47]&lt;/span&gt;
&lt;span class="err"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is enough to satisfy the first audit question. The auditor knows who the user was, what they did, to which patient, in what application, under what declared purpose, and that the access succeeded. The timestamp is in the event envelope. Forty-seven observations were returned. The minimum-necessary standard is defensible because the purpose of use is recorded; if the purpose were ever set to a non-treatment value, the policy engine would have decided differently.&lt;br&gt;
The event that matters even more is the denial. Most teams forget to emit one. Auditors do not.&lt;br&gt;
Authorization denial — the event most teams forget to emit&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;event_name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;NOTE_READ&lt;/span&gt;
&lt;span class="py"&gt;application_name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;EHR&lt;/span&gt;
&lt;span class="py"&gt;action_category&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;READ&lt;/span&gt;
&lt;span class="py"&gt;user_role_type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;CLINICIAN&lt;/span&gt;
&lt;span class="py"&gt;operation_outcome&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;DENIED&lt;/span&gt;
&lt;span class="py"&gt;user_identity&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;jchen.md&lt;/span&gt;
&lt;span class="py"&gt;tenant_identifier&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;7b2e9f04-5a31-4d8c-9e72-1c4f8a6d5b29&lt;/span&gt;
&lt;span class="py"&gt;event_timestamp_utc&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2026-04-27T00:00:00Z&lt;/span&gt;
&lt;span class="py"&gt;attributes&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[&lt;/span&gt;
    &lt;span class="py"&gt;patient_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[PT-9182734],&lt;/span&gt;
    &lt;span class="py"&gt;purpose_of_use&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[TREATMENT],&lt;/span&gt;
    &lt;span class="py"&gt;deny_reason&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[BEHAVIORAL_HEALTH_SEGMENTATION]&lt;/span&gt;
&lt;span class="err"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This event is the proof that your data layer enforced consent. The denial reason names the specific rule that fired. When an auditor asks whether your system protects behavioral health records correctly, you do not show them documentation; you show them the denial events.&lt;br&gt;
When the AI copilot reaches data on the user's behalf, the same event shape covers the access. Two fields establish the chain of accountability: the agent identifier, and the user the agent is acting on behalf of.&lt;br&gt;
Agent acting on behalf of a user&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;event_name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;SEMANTIC_SEARCH_NOTES_READ&lt;/span&gt;
&lt;span class="py"&gt;application_name&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;AI_COPILOT&lt;/span&gt;
&lt;span class="py"&gt;action_category&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;READ&lt;/span&gt;
&lt;span class="py"&gt;user_role_type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;CLINICIAN&lt;/span&gt;
&lt;span class="py"&gt;operation_outcome&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;SUCCESS&lt;/span&gt;
&lt;span class="py"&gt;user_identity&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;jchen.md&lt;/span&gt;
&lt;span class="py"&gt;tenant_identifier&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;7b2e9f04-5a31-4d8c-9e72-1c4f8a6d5b29&lt;/span&gt;
&lt;span class="py"&gt;event_timestamp_utc&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;2026-04-27T00:00:00Z&lt;/span&gt;
&lt;span class="py"&gt;attributes&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[&lt;/span&gt;
    &lt;span class="py"&gt;patient_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[PT-9182734],&lt;/span&gt;
    &lt;span class="py"&gt;purpose_of_use&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[TREATMENT],&lt;/span&gt;
    &lt;span class="py"&gt;on_behalf_of&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[jchen.md],&lt;/span&gt;
    &lt;span class="py"&gt;agent_id&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[chart_copilot],&lt;/span&gt;
    &lt;span class="py"&gt;resource_count&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;[6]&lt;/span&gt;
&lt;span class="err"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent retrieved six notes. The user is jchen.md. The on_behalf_of field is also jchen.md. That equality is the proof, recorded in the audit event itself, that the agent did not exceed the user's permissions. If on_behalf_of ever differed from user, or were absent, that is the finding. No prose needed.&lt;br&gt;
Notice what is not in any of these events. There is no prompt content, no token count, no model output, no embedding vector, no hashes. Those belong in a prompt store and an observability store, linked by ID. The audit event records the decision. Mixing the two is the failure mode that produces 50-field audit records that are expensive to store and useless to read.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Reframe
&lt;/h2&gt;

&lt;p&gt;Most clinical AI teams approach compliance as a layer they add late, often under pressure from a security review or a customer's procurement process. This works in non-regulated AI domains because the cost of getting compliance wrong is reputational. In healthcare, the cost is patient harm, regulatory action, and existential risk to the organization.&lt;br&gt;
The systems that ship and survive are the ones designed, from day one, to answer an auditor's questions in minutes. Authorization, audit, and lineage are not features bolted onto a working system; they are the load-bearing structure of the system itself. Build them first, and the AI fits. Build them last, and the AI does not ship.&lt;br&gt;
If you are responsible for a clinical AI system, the most useful exercise you can do this week is to walk through these six questions for a single response your agent produced last week. If you cannot answer them in an afternoon, you know where the work is.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>healthcare</category>
      <category>hipaa</category>
    </item>
    <item>
      <title>Solving Tool Integration and Orchestration in AI Agents with MCP</title>
      <dc:creator>Jobinesh Purushothaman</dc:creator>
      <pubDate>Mon, 20 Apr 2026 03:47:41 +0000</pubDate>
      <link>https://dev.to/jobinesh/solving-tool-integration-and-orchestration-in-ai-agents-with-mcp-3gpn</link>
      <guid>https://dev.to/jobinesh/solving-tool-integration-and-orchestration-in-ai-agents-with-mcp-3gpn</guid>
      <description>&lt;p&gt;Once you move beyond simple LLM demos, the complexity shifts from the model to everything around it. The real problem becomes how your system interacts with tools, APIs, and data in a way the model can reliably use.&lt;/p&gt;

&lt;p&gt;Most implementations handle this by wiring tools directly into the application layer. That usually leads to duplicated definitions, hardcoded execution paths, and tightly coupled logic. It works at small scale, but breaks down as the number of tools and use cases grows.&lt;/p&gt;

&lt;p&gt;This is the gap &lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;Model Context Protocol(MCP) &lt;/a&gt;is trying to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MCP Actually Does
&lt;/h2&gt;

&lt;p&gt;MCP is a standard way to expose tools and data to a model.&lt;/p&gt;

&lt;p&gt;Instead of embedding tool logic inside every app, you define them once and expose them through an MCP server. Any agent or client can connect to it and use those capabilities.&lt;/p&gt;

&lt;p&gt;This separates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;capabilities (tools) → MCP server&lt;/li&gt;
&lt;li&gt;decision-making → agent&lt;/li&gt;
&lt;li&gt;Why This Matters for Agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most so-called agents are still structured like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;create_ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;createTicket&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s just routing logic.&lt;/p&gt;

&lt;p&gt;An actual agent should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;choose tools dynamically&lt;/li&gt;
&lt;li&gt;decide the sequence of actions&lt;/li&gt;
&lt;li&gt;adapt based on results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For that to work, tools need to be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;discoverable&lt;/li&gt;
&lt;li&gt;structured&lt;/li&gt;
&lt;li&gt;decoupled from application logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s exactly what MCP enables.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where JSON-RPC Fits In
&lt;/h2&gt;

&lt;p&gt;MCP uses &lt;a href="https://www.jsonrpc.org/specification" rel="noopener noreferrer"&gt;JSON-RPC 2.0 &lt;/a&gt;as its communication layer.&lt;/p&gt;

&lt;p&gt;It’s a simple protocol for calling functions using JSON. Nothing fancy, but very effective for this use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;JSON-RPC Request&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jsonrpc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tools/call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"42"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;JSON-RPC Response&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"jsonrpc"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"John Doe"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"john@example.com"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core interaction. MCP builds on top of this structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimal MCP Setup
&lt;/h2&gt;

&lt;p&gt;Define tools once on the server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;mcpServer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;get_user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;user_id&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="nx"&gt;mcpServer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;create_ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;issue&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;jira&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This becomes your reusable capability layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Execution Looks Like
&lt;/h2&gt;

&lt;p&gt;Input:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User 42 has a billing issue&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Agent flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;call get_user&lt;/li&gt;
&lt;li&gt;call create_ticket&lt;/li&gt;
&lt;li&gt;return response
Under the Hood (JSON-RPC Calls)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tools/call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"42"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tools/call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"create_ticket"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"issue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Billing issue"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no predefined workflow here. The agent decides what to do based on available tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP + JSON-RPC Works
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;tools are defined once and reused&lt;/li&gt;
&lt;li&gt;no repeated integration logic&lt;/li&gt;
&lt;li&gt;agents can chain calls naturally&lt;/li&gt;
&lt;li&gt;clean separation between execution and decision-making&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what makes systems feel more agentic instead of scripted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Without vs With MCP
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Without MCP&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hardcoded flows&lt;/li&gt;
&lt;li&gt;duplicated integrations&lt;/li&gt;
&lt;li&gt;tightly coupled logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;With MCP&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shared tool layer&lt;/li&gt;
&lt;li&gt;dynamic execution&lt;/li&gt;
&lt;li&gt;cleaner architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What MCP Does Not Handle
&lt;/h2&gt;

&lt;p&gt;MCP does not solve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;authentication&lt;/li&gt;
&lt;li&gt;validation&lt;/li&gt;
&lt;li&gt;safety&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those still need to be implemented at the tool level.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;select&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Only read queries allowed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  When This Makes Sense
&lt;/h2&gt;

&lt;p&gt;Use MCP when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiple tools are involved&lt;/li&gt;
&lt;li&gt;tools need to be reused across systems&lt;/li&gt;
&lt;li&gt;building agent-style workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Skip it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scope is small&lt;/li&gt;
&lt;li&gt;only a few functions are needed&lt;/li&gt;
&lt;li&gt;Final Take&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This shifts the model from:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;calling predefined functions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;interacting with a system of capabilities&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That shift is what enables real agent behavior instead of scripted flows.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Building a Simple AI Agent with Micronaut, MCP, and LangChain4j</title>
      <dc:creator>Jobinesh Purushothaman</dc:creator>
      <pubDate>Tue, 14 Apr 2026 04:52:32 +0000</pubDate>
      <link>https://dev.to/jobinesh/building-a-simple-ai-agent-with-micronaut-mcp-and-langchain4j-21k6</link>
      <guid>https://dev.to/jobinesh/building-a-simple-ai-agent-with-micronaut-mcp-and-langchain4j-21k6</guid>
      <description>&lt;p&gt;An AI agent is a system that uses a language model to understand instructions, decide on actions, and execute them using available tools.&lt;br&gt;
&lt;strong&gt;In practice, what does this look like?&lt;/strong&gt;&lt;br&gt;
In this article, we build a simple task-management AI agent in Java using Micronaut, LangChain4j, and Model Context Protocol (MCP). It demonstrates how an agent interprets natural language, selects the right action, and executes it safely through a structured tool interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full code for this project is available here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/jobinesh/java-ai-lab/tree/main/task-agent" rel="noopener noreferrer"&gt;https://github.com/jobinesh/java-ai-lab/tree/main/task-agent&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/jobinesh/java-ai-lab/tree/main/task-mcp-server" rel="noopener noreferrer"&gt;https://github.com/jobinesh/java-ai-lab/tree/main/task-mcp-server&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1yqr249zv9t1io8lu7p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1yqr249zv9t1io8lu7p.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What is an AI Agent?
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;AI agent&lt;/strong&gt; is a software system that uses a language model to interpret user input, reason about it, and take actions by invoking tools or APIs.&lt;/p&gt;

&lt;p&gt;At a minimum, an AI agent consists of:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A reasoning model&lt;/strong&gt; – typically an LLM that understands user instructions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A set of tools&lt;/strong&gt; – functions or APIs the agent can invoke&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An execution loop&lt;/strong&gt; – a cycle of:
→ understand → decide → act → return result&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this project, that loop is implemented cleanly and explicitly.&lt;/p&gt;
&lt;h3&gt;
  
  
  How the agent works in this repo
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;TaskPlannerAiService&lt;/code&gt; (LangChain4j) prompts the model to produce a &lt;strong&gt;single structured JSON tool call&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;TaskAgentOrchestrator&lt;/code&gt; parses and validates that JSON&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;McpTaskClient&lt;/code&gt; executes the selected tool via MCP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design enforces an important rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The model &lt;strong&gt;never directly modifies business data&lt;/strong&gt;.&lt;br&gt;
It only decides &lt;em&gt;what should happen&lt;/em&gt;, while the system controls &lt;em&gt;how it happens&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  What is MCP (Model Context Protocol)?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is a standardized protocol that defines how AI agents interact with external tools and services in a structured and reliable way.&lt;/p&gt;

&lt;p&gt;Without MCP, applications often implement custom tool-calling formats, leading to inconsistent integrations and fragile systems.&lt;/p&gt;

&lt;p&gt;MCP provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;standard interface for exposing tools&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Structured schemas for tool arguments&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;JSON-RPC-based communication model&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A clean separation between &lt;strong&gt;AI decision-making&lt;/strong&gt; and &lt;strong&gt;system execution&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Why MCP matters
&lt;/h3&gt;

&lt;p&gt;In this project, MCP provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A stable tool interface (&lt;code&gt;create-task&lt;/code&gt;, &lt;code&gt;list-tasks&lt;/code&gt;, &lt;code&gt;complete-task&lt;/code&gt;, etc.)&lt;/li&gt;
&lt;li&gt;Structured and validated arguments&lt;/li&gt;
&lt;li&gt;A predictable lifecycle (&lt;code&gt;initialize&lt;/code&gt; → &lt;code&gt;tools/call&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Loose coupling between the agent and backend services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In simple terms:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;MCP is the &lt;strong&gt;contract between AI reasoning and real-world actions&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Project Architecture
&lt;/h2&gt;

&lt;p&gt;This project is split into two modules:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. &lt;code&gt;task-mcp-server&lt;/code&gt; — MCP Tool Server (Micronaut)
&lt;/h3&gt;

&lt;p&gt;This module exposes task-related operations as MCP tools using Micronaut.&lt;/p&gt;

&lt;p&gt;Tools are defined using annotations like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;@Tool(name = "create-task")&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;@Tool(name = "list-tasks")&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;@Tool(name = "complete-task")&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;@Tool(name = "set-priority")&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All tools operate on an in-memory &lt;code&gt;TaskStore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;💡 &lt;strong&gt;Key detail:&lt;/strong&gt;&lt;br&gt;
Both REST APIs and MCP tools share the same store. So:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data created via REST is visible to MCP&lt;/li&gt;
&lt;li&gt;Data created via MCP is visible to REST&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2. &lt;code&gt;task-agent&lt;/code&gt; — AI Agent Runtime
&lt;/h3&gt;

&lt;p&gt;This module contains the AI-driven decision-making layer.&lt;/p&gt;
&lt;h4&gt;
  
  
  Skills as configuration
&lt;/h4&gt;

&lt;p&gt;Instead of hardcoding behavior, the agent uses a &lt;code&gt;skills.md&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@SystemMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fromResource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"skills.md"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@UserMessage&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"User instruction: {{instruction}}"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;plan&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@V&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"instruction"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update agent behavior without recompiling&lt;/li&gt;
&lt;li&gt;Define tool usage rules in Markdown&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Orchestration layer
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;TaskAgentOrchestrator&lt;/code&gt; is responsible for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parsing model output&lt;/li&gt;
&lt;li&gt;Validating JSON structure&lt;/li&gt;
&lt;li&gt;Applying safe defaults&lt;/li&gt;
&lt;li&gt;Calling MCP tools via &lt;code&gt;McpTaskClient&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  MCP client
&lt;/h4&gt;

&lt;p&gt;&lt;code&gt;McpTaskClient&lt;/code&gt; communicates with the MCP server using JSON-RPC:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Endpoint: &lt;code&gt;http://127.0.0.1:8080/mcp&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Flow: &lt;code&gt;initialize&lt;/code&gt; → &lt;code&gt;tools/call&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  End-to-End Flow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Example instruction
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Create task Buy milk with high priority and tag home"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Execution steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Agent sends instruction + skills definition to the model&lt;/li&gt;
&lt;li&gt;Model returns structured JSON:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"create-task"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Buy milk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HIGH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"home"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Orchestrator parses and validates the JSON&lt;/li&gt;
&lt;li&gt;MCP client calls &lt;code&gt;create-task&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;MCP server executes and returns the result&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why This Pattern Works
&lt;/h2&gt;

&lt;p&gt;This architecture is simple but powerful.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Add new tools without changing agent logic&lt;/li&gt;
&lt;li&gt;Update behavior via &lt;code&gt;skills.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Swap LLM providers easily&lt;/li&gt;
&lt;li&gt;Keep execution deterministic and safe&lt;/li&gt;
&lt;li&gt;Avoid unpredictable model side effects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of letting the model “do everything,” you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Let the model &lt;strong&gt;decide&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Let your system &lt;strong&gt;execute&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Running the Project Locally
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Start MCP server
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;task-mcp-server
mvn &lt;span class="nb"&gt;exec&lt;/span&gt;:java
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Start agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;task-agent
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your-key&amp;gt; mvn &lt;span class="nb"&gt;exec&lt;/span&gt;:java
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Call the agent
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://127.0.0.1:8081/api/agent/run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'content-type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"instruction":"Create task Buy milk with high priority and tag home"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inspect skills
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sS&lt;/span&gt; http://127.0.0.1:8081/api/agent/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Repositories
&lt;/h2&gt;

&lt;p&gt;Here are the key modules used in this article and what they do:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;task-mcp-server&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/jobinesh/java-ai-lab/tree/main/task-mcp-server" rel="noopener noreferrer"&gt;https://github.com/jobinesh/java-ai-lab/tree/main/task-mcp-server&lt;/a&gt;&lt;br&gt;
Micronaut-based MCP server that exposes task management tools (create-task, list-tasks, etc.) via MCP and REST. This is where all actual business logic executes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;task-agent&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/jobinesh/java-ai-lab/tree/main/task-agent" rel="noopener noreferrer"&gt;https://github.com/jobinesh/java-ai-lab/tree/main/task-agent&lt;/a&gt;&lt;br&gt;
LangChain4j-based AI agent that interprets user instructions, decides which tool to call, and invokes MCP endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Takeaway
&lt;/h2&gt;

&lt;p&gt;Think of the system in three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI Agent (LangChain4j):&lt;/strong&gt; decides &lt;em&gt;what to do&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Server (Micronaut):&lt;/strong&gt; defines &lt;em&gt;what can be done&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business Logic:&lt;/strong&gt; ensures &lt;em&gt;how it is done safely&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation is the key to building reliable AI systems.&lt;/p&gt;

&lt;p&gt;It keeps your &lt;strong&gt;AI flexible&lt;/strong&gt;, your &lt;strong&gt;APIs stable&lt;/strong&gt;, and your &lt;strong&gt;business logic safe&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;If you're exploring AI agents in Java, this pattern is a great starting point—and a solid foundation for production-grade systems.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>micronaut</category>
      <category>langchain</category>
    </item>
    <item>
      <title>Designing a Scalable Recovery Service for Distributed Systems</title>
      <dc:creator>Jobinesh Purushothaman</dc:creator>
      <pubDate>Sun, 12 Apr 2026 01:08:24 +0000</pubDate>
      <link>https://dev.to/jobinesh/designing-a-scalable-recovery-service-for-distributed-systems-1oio</link>
      <guid>https://dev.to/jobinesh/designing-a-scalable-recovery-service-for-distributed-systems-1oio</guid>
      <description>&lt;p&gt;Failures are a normal and expected part of distributed systems.&lt;/p&gt;

&lt;p&gt;If your application processes data asynchronously—for example, by consuming messages from Kafka or running background jobs—failures will happen regularly. A service might crash, a downstream dependency might become unavailable, or the data itself might fail validation.&lt;/p&gt;

&lt;p&gt;If these failures are not handled properly, your system can lose data, create duplicates, or leave you with no visibility into what went wrong.&lt;/p&gt;

&lt;p&gt;This article explains a simple and practical approach to building a recovery service that helps you handle failures in a reliable and scalable way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What problem are we solving?
&lt;/h2&gt;

&lt;p&gt;Before going deeper, let us clarify a few terms.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An event or message is simply a unit of work your system processes, such as a Kafka message.&lt;/li&gt;
&lt;li&gt;Asynchronous processing means this work is handled in the background, not immediately in a user request.&lt;/li&gt;
&lt;li&gt;A failure occurs when this processing does not complete successfully.
In many systems, failures are handled by retrying immediately. However, this approach is not sufficient in real-world systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why simple retries are not enough
&lt;/h2&gt;

&lt;p&gt;In-memory retries are useful, but they do not solve real-world problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the service crashes, all in-progress retries are lost.&lt;/li&gt;
&lt;li&gt;Some failures should be retried later, not immediately.&lt;/li&gt;
&lt;li&gt;There is no persistent record of what failed and how many times it was retried.&lt;/li&gt;
&lt;li&gt;There is no clear final state, which can lead to endless retry loops.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To address these issues, failures must be &lt;strong&gt;persisted and handled asynchronously&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a recovery service?
&lt;/h2&gt;

&lt;p&gt;A recovery service is a background component responsible for handling failed work.&lt;/p&gt;

&lt;p&gt;It performs three main functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It stores failed tasks in a database.&lt;/li&gt;
&lt;li&gt;It retries them later in a controlled manner.&lt;/li&gt;
&lt;li&gt;It tracks the final outcome of each task.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of retrying immediately, the system records the failure and processes it later using dedicated worker processes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwxlgt0vakzhyqy2dxcm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmwxlgt0vakzhyqy2dxcm.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How failures are stored
&lt;/h2&gt;

&lt;p&gt;Each failure is stored as a &lt;strong&gt;recovery task&lt;/strong&gt; in a database table. &lt;br&gt;
A recovery task typically contains the following information.&lt;/p&gt;
&lt;h3&gt;
  
  
  Failure context
&lt;/h3&gt;

&lt;p&gt;This includes details about what failed, such as the event type and the payload being processed.&lt;/p&gt;
&lt;h3&gt;
  
  
  Lifecycle status
&lt;/h3&gt;

&lt;p&gt;This represents the outcome of the task and can be one of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;FAILED&lt;/code&gt;, meaning it is waiting to be retried&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RESOLVED&lt;/code&gt;, meaning it was successfully recovered&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PERMANENT_FAILURE&lt;/code&gt;, meaning it will not be retried again&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Retry metadata
&lt;/h3&gt;

&lt;p&gt;This includes how many times the task has been retried and when it should be retried next.&lt;/p&gt;
&lt;h3&gt;
  
  
  Lock information
&lt;/h3&gt;

&lt;p&gt;This indicates which worker is currently processing the task and when the lock was acquired.&lt;/p&gt;
&lt;h2&gt;
  
  
  Important design principle: status and lock are different
&lt;/h2&gt;

&lt;p&gt;The lifecycle status and the execution lock represent different concepts.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;status&lt;/strong&gt; describes the business outcome of the task.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;lock&lt;/strong&gt; indicates which worker is currently processing the task.
Keeping these two concepts separate helps avoid race conditions and keeps the design clear.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  How multiple workers process tasks safely
&lt;/h2&gt;

&lt;p&gt;To scale the system, multiple worker instances can run in parallel. The challenge is to ensure that the same task is not processed more than once at the same time.&lt;/p&gt;

&lt;p&gt;This is solved using database row locking.&lt;/p&gt;

&lt;p&gt;Workers use a query like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;recovery_task&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'FAILED'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;next_retry_at&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;NOW&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;SKIP&lt;/span&gt; &lt;span class="n"&gt;LOCKED&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query locks the selected rows and ensures that other workers skip them.&lt;/p&gt;

&lt;p&gt;As a result, each worker processes a different set of tasks, and no central coordination mechanism is required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Worker lifecycle
&lt;/h2&gt;

&lt;p&gt;Each recovery worker follows a simple loop.&lt;/p&gt;

&lt;p&gt;First, it fetches a batch of failed tasks.&lt;br&gt;
Then, it processes each task using the appropriate recovery logic.&lt;br&gt;
Finally, it updates the task based on the outcome.&lt;/p&gt;

&lt;p&gt;If the processing succeeds, the task is marked as &lt;code&gt;RESOLVED&lt;/code&gt;.&lt;br&gt;
If it fails but can be retried, the system schedules the next retry.&lt;br&gt;
If the maximum number of retries is reached, the task is marked as &lt;code&gt;PERMANENT_FAILURE&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping the system generic
&lt;/h2&gt;

&lt;p&gt;The recovery system should not contain business-specific logic.&lt;/p&gt;

&lt;p&gt;Instead, responsibilities should be clearly separated.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;worker&lt;/strong&gt; is responsible for coordination, retry handling, and state updates.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;handler&lt;/strong&gt; is responsible for the actual recovery logic.
For example, a handler might republish a Kafka message or re-trigger a failed operation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation makes the recovery system reusable across different parts of the application.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this works without leader election
&lt;/h2&gt;

&lt;p&gt;Some systems use leader election to ensure that only one instance performs certain tasks.&lt;/p&gt;

&lt;p&gt;In this design, leader election is not required because the work is divided at the database row level.&lt;/p&gt;

&lt;p&gt;Each worker processes different rows, and the database ensures that no two workers process the same task at the same time.&lt;/p&gt;

&lt;p&gt;This approach allows the system to scale horizontally without introducing additional coordination complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Safety guarantees
&lt;/h2&gt;

&lt;p&gt;This design provides several important guarantees.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It prevents duplicate processing through database locks.&lt;/li&gt;
&lt;li&gt;It allows recovery from worker crashes through lock expiration.&lt;/li&gt;
&lt;li&gt;It provides full visibility into failures and retries.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It ensures that each task reaches a clear final state.&lt;/p&gt;
&lt;h2&gt;
  
  
  Practical tips
&lt;/h2&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All updates, such as claiming tasks and updating results, should be performed within transactions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Recovery handlers should be idempotent so that repeated execution does not cause issues.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Useful debugging information, such as error messages and request identifiers, should be stored.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retry limits should be clearly defined.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - Metrics should be added to monitor system behavior.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Failures are unavoidable in distributed systems, especially when processing data asynchronously.&lt;/p&gt;

&lt;p&gt;Instead of relying only on retries, introducing a dedicated recovery service allows you to handle failures in a controlled and reliable way.&lt;/p&gt;

&lt;p&gt;This approach improves system reliability, scalability, and observability, and it forms an essential part of building production-ready systems.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>distributedsystems</category>
      <category>sre</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Understanding AI Metering in Enterprise Systems</title>
      <dc:creator>Jobinesh Purushothaman</dc:creator>
      <pubDate>Wed, 08 Apr 2026 01:19:13 +0000</pubDate>
      <link>https://dev.to/jobinesh/understanding-ai-metering-in-enterprise-systems-4b7f</link>
      <guid>https://dev.to/jobinesh/understanding-ai-metering-in-enterprise-systems-4b7f</guid>
      <description>&lt;p&gt;As AI becomes part of everyday workflows, organizations need a simple way to understand how it is being used. It is no longer enough to know that an AI feature exists or that users are interacting with it. Teams also need visibility into how usage is measured, how access is governed, and how consumption maps to what has been purchased or allocated.&lt;/p&gt;

&lt;p&gt;This is where AI metering helps.&lt;/p&gt;

&lt;p&gt;AI metering is a structured way to track AI consumption across products, teams, and workflows. It gives organizations a practical view of usage, entitlement, reporting, and planning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Metering Matters
&lt;/h2&gt;

&lt;p&gt;AI adoption in enterprise systems is rarely uniform. Some workflows use AI occasionally. Others depend on it heavily. Without a common way to measure usage, organizations end up with fragmented visibility. Different teams see different signals, but no one gets a clear picture of overall consumption.&lt;/p&gt;

&lt;p&gt;A metering model helps solve that. It gives organizations a consistent way to measure usage across multiple AI capabilities and answer practical questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How much AI capacity is available?&lt;/li&gt;
&lt;li&gt;How much has been consumed?&lt;/li&gt;
&lt;li&gt;Which capabilities are driving usage?&lt;/li&gt;
&lt;li&gt;How should teams plan for growth, renewal, or limits?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This visibility is useful not only for finance and operations, but also for product teams, administrators, and customers.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Activity to Measurable Consumption
&lt;/h2&gt;

&lt;p&gt;A core idea in AI metering is that meaningful AI activity should translate into measurable consumption. Once that is done consistently, usage can be tracked across different AI capabilities, even when those capabilities do different kinds of work.&lt;/p&gt;

&lt;p&gt;This matters because not all AI interactions are equal. Some tasks are lightweight and frequent. Others are more complex or resource-intensive. A useful metering model reflects those differences through predefined consumption rules.&lt;/p&gt;

&lt;p&gt;That turns raw activity into something more useful: a structured view of consumption that supports reporting, analysis, and decision-making.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Role of AI Credits
&lt;/h2&gt;

&lt;p&gt;One practical way to manage AI consumption is to use a normalized unit such as AI credits. AI credits create a shared language for measuring different types of AI usage under one model.&lt;/p&gt;

&lt;p&gt;This makes it easier to report usage consistently, connect consumption to entitlement, and compare activity across multiple AI capabilities. The exact term is less important than the idea behind it: a common measure that makes different kinds of AI usage easier to understand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Shared Credit Pools Help
&lt;/h2&gt;

&lt;p&gt;In many systems, it is more practical to manage AI consumption through a shared pool instead of tying usage rigidly to one user or one workflow. A shared pool gives organizations flexibility. As priorities shift and adoption grows, capacity can be used where it adds the most value.&lt;/p&gt;

&lt;p&gt;This is especially useful in enterprise environments where different teams adopt AI at different times. A pooled model reduces friction and makes it easier to scale usage without repeated changes to entitlement structures.&lt;/p&gt;

&lt;p&gt;The main benefit is simple: pooled consumption supports flexibility while still keeping usage measurable and governed.&lt;/p&gt;

&lt;h2&gt;
  
  
  How AI Metering Works
&lt;/h2&gt;

&lt;p&gt;At a high level, AI metering starts by identifying what a customer or organization is allowed to use. That entitlement may come from a subscription, package, contract, or internal allocation model. It defines the amount of AI capacity available and the scope of AI capabilities included.&lt;/p&gt;

&lt;p&gt;When an AI workflow runs, the metering process identifies which capability was used and which customer, tenant, or organization should be associated with that activity. It then checks whether the usage falls within the valid access scope.&lt;/p&gt;

&lt;p&gt;Once the activity is valid, the system applies a predefined consumption rule. That rule determines how much of the available AI credit balance should be counted for the interaction. The amount may vary depending on the type of task, the capability involved, or the selected consumption model.&lt;/p&gt;

&lt;p&gt;After the usage amount is calculated, the remaining balance is updated. The interaction is also recorded so there is a reliable history for reporting, review, and reconciliation.&lt;/p&gt;

&lt;p&gt;Finally, usage data is published to reporting or analytics systems so stakeholders can monitor adoption, understand trends, and track how credits are being used over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In simple terms, the flow looks like this:&lt;/strong&gt;&lt;br&gt;
[Entitlement Check] -&amp;gt; [Usage Detection] -&amp;gt; [Rule Lookup] -&amp;gt; [Credit Calculation] -&amp;gt; [Balance Update] -&amp;gt; [Audit Record] -&amp;gt; [Reporting]&lt;/p&gt;

&lt;h2&gt;
  
  
  A High-Level Algorithm for AI Metering
&lt;/h2&gt;

&lt;p&gt;A simple way to think about AI metering is through this business-level flow:&lt;/p&gt;

&lt;p&gt;Identify the active entitlement and available AI credits.&lt;br&gt;
Detect when an AI capability completes a meaningful usage event.&lt;br&gt;
Determine which capability generated the event and who owns the usage.&lt;br&gt;
Verify that the usage is covered by the current access scope.&lt;br&gt;
Retrieve the relevant consumption rule.&lt;br&gt;
Calculate the credits consumed for the event.&lt;br&gt;
Update the remaining balance and aggregate totals.&lt;br&gt;
Store a usage record for audit and reconciliation.&lt;br&gt;
Publish summarized usage to reporting or analytics systems.&lt;br&gt;
Check for conditions such as low balance, exhaustion, renewal, or adjustment.&lt;br&gt;
Update balances when subscriptions or allocations change.&lt;br&gt;
Support correction or replay when data arrives late or needs reconciliation.&lt;/p&gt;

&lt;p&gt;This algorithm stays intentionally high level. The goal is to explain the operating model, not the implementation details.&lt;/p&gt;

&lt;h2&gt;
  
  
  More Than Billing
&lt;/h2&gt;

&lt;p&gt;AI metering is often associated with billing, but it is useful well beyond that.&lt;/p&gt;

&lt;p&gt;A good metering model also supports governance, planning, transparency, and product insight. When usage is visible and structured, organizations can better understand adoption patterns, evaluate which capabilities are delivering value, identify overuse or underuse, and make better operational decisions.&lt;/p&gt;

&lt;p&gt;For administrators and customers, metering answers simple but important questions: what is available, what has been consumed, and what may need attention next. For product teams, it provides a clearer view of how AI capabilities are being used in real environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Capability for Scaled AI Adoption
&lt;/h2&gt;

&lt;p&gt;As AI becomes more embedded in enterprise systems, organizations need a way to manage it with the same clarity they apply to other core services. That does not mean adding complexity for its own sake. It means adding enough structure to support visibility, accountability, and confident scaling.&lt;/p&gt;

&lt;p&gt;AI metering helps provide that structure.&lt;/p&gt;

&lt;p&gt;It turns AI usage into something measurable, reviewable, and easier to plan around. That is the main takeaway: AI metering helps organizations treat AI not as a black box, but as a service with understandable consumption and clearer operational control.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>metering</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
