The 7 LLM Integration Patterns That Break in Production

#ai #programming #llm

After 18 months of LLM integrations, these are the patterns that fail most often in production. Not theoretical failures — real incidents.

Pattern 1: Trusting JSON Mode Completely

Everyone thinks JSON mode means always valid JSON. It attempts JSON. You still need validation.

response = llm(format="json", prompt=user_prompt)
try:
    data = json.loads(response)
except json.JSONDecodeError:
    data = retry_with_stricter_prompt(user_prompt)

Pattern 2: No Timeout on LLM Calls

LLM API calls can hang. Without timeouts, your request thread blocks forever.

import signal
signal.alarm(30)  # 30 second timeout
response = llm.call(messages)
signal.alarm(0)

Pattern 3: Ignoring Token Count

Token counts = money. Without tracking, you dont know whats expensive until the bill arrives.

Pattern 4: No Retry Logic

LLM APIs fail. Your code should handle it with exponential backoff.

Pattern 5: Hardcoding Model Names

model=gpt-4o is fragile. Model names change. Use environment variables.

Pattern 6: No Circuit Breaker

One bad API day shouldnt take down your whole app.

Pattern 7: Forgetting Edge Cases

Empty input. Max length input. Unicode. Your LLM handles these differently than expected.

The Prevention Stack

Pattern	Prevention
JSON validation	Always validate with schema library
Timeouts	Always set timeout on API calls
Token tracking	Log every calls token count
Retries	Implement with exponential backoff
Circuit breaker	Add circuit breaker pattern
Edge cases	Write explicit tests

Most of these are basic distributed systems patterns applied to LLM integrations.

If you want a monitoring tool that catches some of these: DriftWatch from GBP9.90/mo

DEV Community