Nikhil Verma

Posted on Nov 14 • Originally published at nikhilverma.com on Nov 14

LLMs as Unreliable Narrators: Dealing with UUID Hallucination

#agents #rag #llm #ai

TL;DR

Problem: LLMs make up identifiers (UUIDs) that don't exist in your data, which breaks your retrieval flow
Solution 1: Use enum constraints with readable identifiers when you control the workflow
Solution 2: Map UUIDs to simple tokens (ITEM-1, ITEM-2) for longer agent conversations
Result: Way fewer hallucinations and more reliable ID references

You've spent weeks building a neat RAG (Retrieval-Augmented Generation) system. Your embeddings are sharp, your retrieval is fast, and you've got the LLM hooked up to search through your data. You test it out, and then... the model returns a UUID that doesn't exist in your system.

This is one of those problems that catches you off guard because the LLM sounds confident. It's not hedging or saying "I'm not sure." It's just making up an identifier like a7f3c9e2-1b4d-4e8f-9c5d-2a8f7b9e3d1c with complete conviction.

Smaller models (like mini/fast variants) tend to do this more often. They'll invent IDs to sound authoritative.

The Core Problem

LLMs are pattern-matching machines. When you ask them to pick from a list of items using UUIDs, they see the format - that alphanumeric pattern with hyphens - and they try to reproduce it. But they're not really "remembering" your specific IDs. They're just generating something that looks like a UUID, and half the time, it's completely made up.

This creates problems downstream: you can't match outputs to stored objects, logging gets messy, and audit trails break. You end up with a bunch of "ID not found" errors.

Solution 1: Enums for Controlled Flows

If you're building a structured, agentic workflow where you control the entire decision path, you have a powerful tool: structured outputs with enums.

The idea is simple: instead of letting the LLM generate IDs freely, you constrain its response to only pick from valid options. Here's how it works:

// Helper to normalize titles and avoid duplicates
function normalizeTitle(t: string) {
  return t.trim().toLowerCase();
}

// Build a list of valid choices from your data
const seen = new Map<string, number>();
const validChoices = items.map((item) => {
  const norm = normalizeTitle(item.title);
  const count = (seen.get(norm) ?? 0) + 1;
  seen.set(norm, count);
  // Handle duplicate titles by adding a suffix
  return count === 1 ? item.title : `${item.title} (${count})`;
});

// Create a schema where the model must pick from an enum
const schema = z.object({
  selectedTitle: z.enum(validChoices as [string, ...string[]]),
  reasoning: z.string().max(240), // Cap verbosity
  confidence: z.number(),
});

// Use your LLM's structured output feature
const response = await llm.parse({
  schema,
  messages,
});

When you pass an enum constraint to the LLM, it can't hallucinate. The model knows it must pick one of the valid options. If it tries to return something outside the enum, the parsing will fail and you can retry.

The key trick: Use simple, human-readable identifiers (like titles) in your enum, not UUIDs. UUIDs are harder for models to reproduce accurately because they're random. Titles are memorable patterns.

Then verify the match server-side:

const selectedTitle = response.selectedTitle;

// Find the actual item by matching the normalized title
const actualItem = items.find(
  (item) => normalizeTitle(item.title) === normalizeTitle(selectedTitle),
);

if (!actualItem) {
  // Invalid selection - add feedback and retry
  messages.push({
    role: "user",
    content: `That title wasn't in the list. Please pick from: ${validChoices.join(", ")}`,
  });
  // Retry with feedback (consider exponential backoff)
}

When to use this: Controlled flows where you're calling the LLM as part of a defined process. Think: multi-step decision trees, validation agents, classification pipelines.

Limitations: Watch out for enum size - OpenAI caps schemas at 1000 total enum values, with character length limits for larger enums. If you're dealing with hundreds of items, you might want to do some top-N prefiltering or break it into chunks.

Solution 2: Deterministic Alias Tokens for Multi-Turn Flows

But what if your LLM is having a longer conversation where you're not controlling every single prompt? The agent is exploring, reasoning, making its own calls. Enums get awkward here because the agent might reference the same ID across multiple turns, and rebuilding your enum every time is messy.

That's where token mapping comes in.

The idea: create a simple mapping from tricky identifiers (like UUIDs) to easy ones (like ITEM-1, ITEM-2).

// Create token maps at the start of your agent run
const idToToken = new Map<string, string>();
const tokenToId = new Map<string, string>();
let tokenCounter = 1;

// When the agent needs to search or reference items,
// return tokens instead of real IDs
function mapItemsToTokens(items) {
  return items.map((item) => {
    // Reuse existing token if we've seen this item before
    let token = idToToken.get(item.id);
    if (!token) {
      token = `ITEM-${tokenCounter++}`;
      idToToken.set(item.id, token);
      tokenToId.set(token, item.id);
    }

    return {
      id: token, // Return the token, not the real UUID
      title: item.title,
      description: item.description,
    };
  });
}

Now when the agent works with items, it's dealing with ITEM-1, ITEM-2, etc. These are way harder to make up because they're simple and predictable. The agent picks them up quickly and usually sticks with them.

At the end, just decode the tokens back to real IDs:

// Helper for safer decoding
function decodeTokenOrThrow(token: string): string {
  const id = tokenToId.get(token);
  if (!id) throw new Error(`Unknown token: ${token}`);
  return id;
}

const decodedMatches = agentResponse.matches
  .map((match) => {
    const realId = tokenToId.get(match.itemId);
    if (!realId) {
      logger.error(`Agent returned unknown token: ${match.itemId}`);
      return null;
    }
    return {
      itemId: realId,
      confidence: match.confidence,
    };
  })
  .filter((m) => m !== null);

Why this works: Tokens are simple and sequential. The model picks them up easily and reuses them instead of making up new IDs. You've basically given the agent a clean vocabulary to work with.

When to use this: Multi-turn conversations, exploratory agents, any flow where the LLM is doing its own thing and needs to reference items consistently.

Pro tip: Every few turns, remind the model what tokens are active ("Active items: ITEM-1:titleA, ITEM-2:titleB...") to keep things fresh in longer chats.

Comparing the Approaches

Aspect	Enum Constraints	Token Aliasing
Setup Cost	Low (list mapping)	Medium (bidirectional maps)
Prompt Size Impact	High for large lists	Low (small tokens)
Multi-Turn Stability	Requires rebuild each turn	Naturally consistent
Hallucination Prevention	Complete (hard constraint)	Very high (simple patterns)
Best For	Single-shot decisions	Long conversations
Failure Mode	Parse error (retryable)	Unknown token (logged)

Combining Both Strategies

If you're building something more involved, you can mix both approaches:

Use tokens as your main interface to keep hallucinations low
Add enum validation for critical decisions as an extra safety net
Verify everything server-side before you commit the data

This gives you layered protection: the LLM rarely makes mistakes (tokens are simple), and when it does, you catch it.

The Takeaway

LLMs love to make up identifiers. They see the UUID pattern and just run with it. But you've got options:

For structured flows: Use enums to lock them into valid choices
For exploratory flows: Give them simple tokens to work with instead of UUIDs

Either approach cuts down hallucinations significantly. Pick whichever fits your setup better, and you'll save yourself a lot of debugging headaches.

DEV Community