DEV Community: Thuyavan

Moving Beyond Probabilistic Outputs: Designing AI for High-Stakes Reliability

Thuyavan — Fri, 05 Jun 2026 05:44:28 +0000

Many of the AI applications we interact with today are built on a streamlined, direct architecture:

User → Prompt → LLM → Response

That works surprisingly well for:

chat assistants,
summarization,
content generation,
and general productivity tooling.

While this approach is incredibly effective for creative tasks and general productivity, high-stakes environments—where accuracy is non-negotiable—require a different level of structural support.

In specialized fields like healthcare or finance, a probabilistic response isn't just a minor hurdle; it's a risk that needs to be managed through robust system design.

I’ve spent the last few weeks exploring a decision-support architecture specifically tailored for these critical settings. The goal is to ensure that every output is grounded in fact, every recommendation is fully explainable, and every step of the reasoning process is auditable.

The central shift in this approach is viewing the Large Language Model (LLM) not as the entire system, but as a specialized component that requires clear boundaries and deterministic oversight.

Rethinking the Role of the Model

Standard AI architectures often rely heavily on the model's internal memory and prompt engineering. While impressive, LLMs are fundamentally designed to predict the next likely token, which introduces a level of uncertainty that can be challenging for regulated industries.

In sectors like compliance, policy-making, or healthcare, the model needs to be supported by a framework that provides authority and verification. The architecture itself acts as a safeguard, guiding the model's reasoning toward consistent and safe outcomes.

An Engineering-First Framework

This architecture treats the LLM as a "reasoning engine" situated within a larger, deterministic pipeline. The goal is a design that prioritizes visibility and control at every stage, built on the core philosophy of Orchestration Over Generation.

Instead of relying on an LLM to manage its own multi-step reasoning, the entire pipeline is managed by an Agent Orchestrator. This system uses an explicit, checkpointed state machine to run the case pipeline, providing total observability and natively supporting human-in-the-loop (HITL) interrupts.

The Knowledge and Data Layer

Rather than relying on the model to "know" facts, the system retrieves them from verified databases and structured records. To combat the semantic blurring common in a standard vector database, the data foundation is split into strict categories:

The Enterprise Fact Store: The immutable system of record for the entity (e.g., the client or case file). It sits on PostgreSQL and handles deterministic queries based on strict temporal logic.
The Taxonomy Server: Before the orchestrator searches for context, an Intake & Normalization node parses unstructured input and maps it to standardized industry codes.
The Knowledge Index: A hybrid retrieval system (vector + keyword) that searches over curated, versioned business rules and guidelines. Crucially, it returns passages with stable IDs to enforce strict citation.

The Pipeline: Constrained Reasoning and Verification

When a user submits an unstructured report, the orchestrator executes a tightly controlled sequence via FastAPI, ensuring every output is grounded in fact. The sequence of steps includes:

Input Normalization: The Taxonomy Server parses unstructured input and maps it to standardized industry codes.
Deterministic Retrieval of Facts: The orchestrator simultaneously builds the specific entity's context from the Fact Store and retrieves the relevant domain knowledge in a Parallel Retrieval step.
Structured Context Assembly: The retrieved facts and knowledge are assembled as context.
Constrained Reasoning: The LLM acts purely as a Reasoner/Proposer under contract. It is strictly instructed to generate typed, evidence-bearing suggestions constrained by a JSON schema.
Rule-Based Safety and Faithfulness Verification: Before any output proceeds, two dedicated layers ensure its validity.

The Absolute Guardrail: A Dedicated Safety Engine

A key decision in this architecture is the separation of probabilistic reasoning from deterministic safety. This engine ensures that n*o LLM is allowed to make the final safety or compliance decision.*

After the LLM proposes an action, the output is intercepted by this dedicated Deterministic Safety Engine. Built on traditional, verifiable code, this engine runs conflict resolution, constraint violations, and duplicate-action checks using versioned rules and structured data. If the LLM proposes an action that violates an established hard rule, it is programmatically blocked before the operator ever sees it.

Ensuring Faithfulness Through Verification

A Faithfulness Verifier then cross-checks the model's output against the retrieved evidence. This secondary NLI-style (Natural Language Inference) check confirms that every single generated claim is directly entailed by its cited evidence. If the model hallucinations a fact, the verifier flags it or forces an abstention or signal for human review—a feature that is essential for building long-term trust.

Privacy, Latency, and Infrastructure

Deploying a mission-critical system requires addressing data sovereignty and response times.

Trust Boundaries: A pluggable LLM Gateway handles all routing. If an external, hosted model is used, the gateway strips the text of sensitive PII (de-identification) before it crosses the trust boundary, and re-identifies the response when it returns.
Squeezing out Latency: The pipeline relies heavily on parallelization and a Redis cache for entity summaries and prompt-cache keys. To achieve true low-latency inference in production, deployment strategy can shift: running API and process orchestrators as bare-metal PM2-managed services on dedicated VMs, and keeping a self-hosted engine like vLLM physically adjacent to the embedding generation and orchestrator. This drastically reduces the Time-To-First-Token.

The Future of Trustworthy AI

Designing for reliability means shifting our focus from prompt engineering to system engineering. By creating clear trust boundaries, implementing observability, and treating the LLM as a specialized component within a robust infrastructure, we can deploy autonomous systems in environments where precision is paramount.

Ultimately, the most dependable AI systems will feel less like "black boxes" and more like carefully engineered distributed systems—reliable, predictable, and ready for the most critical tasks. LLMs are not databases, and they are not deterministic logic engines; by restricting the LLM to a highly constrained reasoning role within a rigid, stateful architecture, you ensure that being wrong is simply not an option.

Build Log: Untangling SameSite, Same-Origin, and Cookie Auth in a Microservice Platform

Thuyavan — Wed, 27 May 2026 13:46:23 +0000

Over the last few days I went deep into one of those deceptively simple auth problems that turns into a browser security rabbit hole.

The original goal sounded straightforward:

Move the platform from localStorage JWT auth toward secure cookie-based authentication across multiple microservices.

But once I started reconciling the actual implementation against the original migration epic, I realized the real problem wasn’t JWTs.

It was understanding:

same-origin vs same-site,
how browsers attach cookies,
whether SameSite=None was actually necessary,
and how deployment topology changes security behavior.

The Existing Architecture:

The platform is composed of multiple services:

login/auth service
BFF
billing
notification
frontend application

The frontend was still heavily localStorage-token based:

Authorization: Bearer ${localStorage.getItem("token")}

Meanwhile:

OTP login was already cookie-based
backends already supported:

credentials: true
cookie parsing
auth middleware

So the backend groundwork for cookie auth was mostly there.

The original migration epic proposed:

switch cookies to SameSite=None; Secure
move frontend to cookie auth
remove bearer-token usage
stop returning access tokens in login responses

But there was a problem.

The Security Contradiction

While reviewing the actual codebase, I noticed something important:

The auth cookies had already been hardened to:

SameSite=Strict

That change was intentional.

It had been introduced earlier to close a CSRF exposure.

So now there was a contradiction:

the migration epic wanted SameSite=None
but security hardening had intentionally moved to Strict

That immediately raised the question:

Do we even need SameSite=None?

And the answer depends entirely on deployment topology.

Same-Origin vs Same-Site

This turned out to be the key distinction.

Same-Origin

An origin is:

scheme + host + port

Examples:

URL A	URL B	Same-Origin?
`https://app.company.com`	`https://app.company.com/x`	Yes
`https://app.company.com`	`https://api.company.com`	No
`http://localhost:5104`	`http://localhost:3001`	No

Same-origin controls:

CORS
JS access
localStorage isolation

Same-Site

A site is roughly the registrable domain (eTLD+1).

Examples:

URL A	URL B	Same-Site?
`https://app.company.com`	`https://api.company.com`	Yes
`http://localhost:5104`	`http://localhost:3001`	Yes
`https://company.com`	`https://billing.io`	No

This distinction matters because:

SameSite cookie behavior operates at the site level, not the origin level.

That means:

app.company.com -> api.company.com

is:

cross-origin
but still same-site

So SameSite=Strict cookies still work there.

That realization changed the entire migration plan.

Reverse Proxy Topology Changes Everything

The next major insight came from analyzing nginx routing.

There are two fundamentally different deployment models.

Option A — Single Gateway (Same-Origin)

Browser sees:

https://app.company.com/
https://app.company.com/auth/*
https://app.company.com/billing/*
https://app.company.com/notifications/*

Internally nginx fans requests out to different services:

auth:3001
billing:3009
notifications:3010

But the browser never sees that.

So to the browser:

everything is same-origin
cookies are trivially attached
no SameSite=None
minimal CORS complexity

This is the cleanest architecture.

Option B — Separate API Hostnames

Browser sees:

https://app.company.com
https://api.company.com

This becomes:

cross-origin
but same-site

In this setup:

SameSite=Strict still works
but CORS credentials become mandatory

Example:

fetch(url, {
  credentials: "include"
})

and server-side:


credentials: true

Important Realization

A reverse proxy alone does NOT automatically make things same-origin.

What matters is:

what hostname the browser sees.

That was probably the biggest conceptual breakthrough in this debugging session.

SPA Routing Pitfalls

I also explored consolidating multiple SPAs under one gateway.

Example:

location /billing-app/ {
    alias /usr/share/nginx/billing/;
    try_files $uri $uri/ /billing-app/index.html;
}

This introduces several subtle problems.

1. Prefix collisions

If:


/billing/

already proxies billing APIs, then the frontend cannot also live there.

Solution:

/billing/ for APIs
/billing-app/ for SPA

2. Vite base path issues

Frontend builds must specify:


base: "/billing-app/"

Otherwise assets load from /assets/...
and break in production.

3. React Router basename

<BrowserRouter basename="/billing-app">

Without this:

client-side navigation breaks
refreshes 404

4. SPA fallback routing

Each SPA needs its own fallback:


try_files $uri /billing-app/index.html;

Otherwise deep routes fail.

The HTTPS Problem

One final issue surfaced during review.

The deployment was still running on:


http://<raw-ip>

But production cookies were marked:

Secure

Browsers reject secure cookies over plain HTTP.

Which means:

cookie auth silently fails
even before any SameSite logic matters

So before the final auth migration:

HTTPS termination needs to exist
preferably with a proper domain
not a raw IP

Final Direction

After walking through all of this, the architecture direction became much clearer:

Best End-State
single gateway
path-based routing
same-origin frontend/API
SameSite=Strict
cookie-only auth
no localStorage bearer tokens

That gives:

simpler auth
lower CSRF exposure
fewer CORS headaches
cleaner frontend architecture

Biggest Lesson

The most valuable part of this debugging session wasn’t a code change.

It was realizing that:

browser security behavior is deeply tied to deployment topology.

Two systems with identical backend code can behave completely differently depending on:

domains,
ports,
proxies,
and what the browser actually sees.

That understanding made the rest of the migration decisions much more obvious.