DEV Community

Cover image for Stop letting the prompt be your state machine
Mudassir Khan
Mudassir Khan

Posted on

Stop letting the prompt be your state machine

Stop letting the prompt be your state machine

You shipped an LLM feature six months ago. Now the same user input produces wildly different outputs depending on... nothing you can point to. Something in the sampling? The time the context filled up and a chunk got dropped? Nobody knows. This is what happens when the prompt becomes your runtime.


The trap: the prompt as an accidental runtime

Here is what the trap looks like in TypeScript:

async function handleUserRequest(input: string): Promise<string> {
  const prompt = `
    You are a helpful assistant.
    The user said: ${input}
    Previous context: ${someGlobalContext}

    Decide what to do, gather any information you need,
    format the response, and return it.
  `;
  return llm.complete(prompt);
}
Enter fullscreen mode Exit fullscreen mode

The model is doing everything here: deciding the intent, gathering data, formatting output, choosing what to persist. That is a footgun. You handed the runtime to a stochastic function.

Gartner attributes many failed agentic AI projects to unclear value and inadequate risk controls. Deterministic, testable workflows address both. The fix is not a better prompt. The fix is to stop using the prompt as an architecture.


What "deterministic" can and cannot mean here

Be honest about what you can and cannot control.

You cannot control: the model's exact output. It is probabilistic by design.

You can control:

  • The shape of the output (structured output plus schema validation)
  • The steps that run before and after the model call
  • What data enters the model
  • What happens when the output fails validation
  • Whether a human reviews the result before it commits to anything irreversible

Determinism here means: the same inputs, the same workflow steps, the same guardrails every time. Not the same tokens every time. That is a realistic and achievable target. It is also the thing teams skip when they are moving fast.


Typed workflow steps around the model call

Break the work into discrete typed steps. Each step has a clear input type and a clear output type. The model call is one step in the pipeline, not the whole thing.

type WorkflowInput = {
  userId: string;
  rawRequest: string;
};

type EnrichedInput = WorkflowInput & {
  userContext: UserContext;
  relevantDocs: string[];
};

type ModelOutput = {
  intent: "summarize" | "search" | "draft" | "unknown";
  confidence: number;
  payload: string;
};

type WorkflowResult = {
  response: string;
  audit: {
    intent: string;
    humanReviewed: boolean;
  };
};

async function enrich(input: WorkflowInput): Promise<EnrichedInput> {
  const [userContext, relevantDocs] = await Promise.all([
    fetchUserContext(input.userId),
    fetchRelevantDocs(input.rawRequest),
  ]);
  return { ...input, userContext, relevantDocs };
}

async function classify(enriched: EnrichedInput): Promise<ModelOutput> {
  // Model call is isolated here, not scattered everywhere
  const raw = await llm.complete(buildClassificationPrompt(enriched));
  return parseAndValidate(raw);
}

async function respond(output: ModelOutput): Promise<WorkflowResult> {
  const response = await generateResponse(output);
  return {
    response,
    audit: { intent: output.intent, humanReviewed: false },
  };
}

async function runWorkflow(input: WorkflowInput): Promise<WorkflowResult> {
  const enriched = await enrich(input);
  const classified = await classify(enriched);
  return respond(classified);
}
Enter fullscreen mode Exit fullscreen mode

Each step is independently unit testable. You can mock classify to return a fixed ModelOutput and test respond in complete isolation. That was impossible when the prompt was the runtime.

Diagram of the typed LLM workflow: WorkflowInput feeds enrich(), producing EnrichedInput, which feeds classify() (LLM call), producing ModelOutput, which feeds respond(), producing WorkflowResult


Structured output + schema validation as a contract

The model call step should never return a raw string when you need structured data. Use JSON mode, tool calling, or a schema constrained completion, then validate immediately.

import { z } from "zod";

const ModelOutputSchema = z.object({
  intent: z.enum(["summarize", "search", "draft", "unknown"]),
  confidence: z.number().min(0).max(1),
  payload: z.string().min(1),
});

async function classify(enriched: EnrichedInput): Promise<ModelOutput> {
  const raw = await llm.complete(buildClassificationPrompt(enriched), {
    response_format: { type: "json_object" },
  });

  const parsed = JSON.parse(raw);
  const result = ModelOutputSchema.safeParse(parsed);

  if (!result.success) {
    throw new ClassificationValidationError(result.error, raw);
  }

  return result.data;
}
Enter fullscreen mode Exit fullscreen mode

Zod gives you a contract. If the model drifts, the validation catches it before the rest of your app sees the output. The answer to "how do you validate LLM responses?" is: schema validation on parse, not on trust.


Retries, idempotency, and failure gates

Validation failures should not crash silently. Wrap the model call with a retry budget and a typed failure signal:

type ClassifyResult =
  | { ok: true; data: ModelOutput }
  | { ok: false; reason: "validation" | "timeout" | "rate_limit"; raw?: string };

async function classifySafe(
  enriched: EnrichedInput,
  maxAttempts = 2
): Promise<ClassifyResult> {
  for (let attempt = 1; attempt <= maxAttempts; attempt++) {
    try {
      const data = await classify(enriched);
      return { ok: true, data };
    } catch (err) {
      if (err instanceof ClassificationValidationError && attempt < maxAttempts) {
        continue; // one retry on schema failure
      }
      if (err instanceof RateLimitError) {
        return { ok: false, reason: "rate_limit" };
      }
      return { ok: false, reason: "validation", raw: (err as any).raw };
    }
  }
  return { ok: false, reason: "validation" };
}
Enter fullscreen mode Exit fullscreen mode

Idempotency matters when retries touch external state. If your workflow calls an API inside the model step, wrap it in an idempotency key so a retry does not double the side effect. The workflow layer controls this. The model itself cannot.


Where a human gate belongs

A hybrid memory and retrieval approach (automatic retrieval at request start plus explicit storage) keeps agent state predictable. So does knowing when not to automate the final step.

High impact or irreversible steps should route to a human via a control gate before committing. Not because LLMs are bad. Because some decisions carry real consequences and the cost of a wrong one outweighs the automation gain.

async function runWorkflow(input: WorkflowInput): Promise<WorkflowResult> {
  const enriched = await enrich(input);
  const classifyResult = await classifySafe(enriched);

  if (!classifyResult.ok) {
    return queueForHumanReview(enriched, classifyResult.reason);
  }

  const { data: classified } = classifyResult;

  // Irreversible or low-confidence intent routes to human review
  if (classified.intent === "draft" && classified.confidence < 0.85) {
    return queueForHumanReview(enriched, "low confidence on draft intent");
  }

  return respond(classified);
}
Enter fullscreen mode Exit fullscreen mode

The control gate is a typed branch in your workflow, not a prompt instruction. "Only do this if you are sure" is not a guardrail. A typed branch is.

If you want to go deeper on how this fits into a full system, I wrote up the production architecture for agents including how to wire these patterns together at scale.


FAQ

How do you make LLM output deterministic?
You cannot make the model itself deterministic. You make the system deterministic around it. Schema validated structured output, typed workflow steps, and retry gates with failure signals are the practical levers. The model is one isolated black box step in an otherwise typed, testable pipeline.

What is structured output?
Structured output means the model returns data in a schema you define rather than freeform prose. Most providers support JSON mode or function calling. You parse and validate the result immediately with a schema library. If it does not match the schema, treat it as a failed call, not a soft warning.

How do you validate LLM responses?
Parse the response as JSON, then run it through a schema validator. Zod is a common choice in TypeScript projects. A safeParse call gives you a typed result: success with data or failure with an error you can act on. Failure is an exception to handle, not a case to log and move on.


If you want a deeper look at how deterministic workflows fit into a full production system, I cover the complete production architecture for agents on my site.

If you want Next.js for AI products wired up end to end, that is exactly the kind of work I take on.


Drop a comment below. Curious what patterns people use to keep LLM features testable in production.

Top comments (0)