DEV Community

Jamie Cole
Jamie Cole

Posted on

The 7 LLM Integration Patterns That Break in Production

After 18 months of LLM integrations, these are the patterns that fail most often in production. Not theoretical failures — real incidents.

Pattern 1: Trusting JSON Mode Completely

Everyone thinks JSON mode means always valid JSON. It attempts JSON. You still need validation.

response = llm(format="json", prompt=user_prompt)
try:
    data = json.loads(response)
except json.JSONDecodeError:
    data = retry_with_stricter_prompt(user_prompt)
Enter fullscreen mode Exit fullscreen mode

Pattern 2: No Timeout on LLM Calls

LLM API calls can hang. Without timeouts, your request thread blocks forever.

import signal
signal.alarm(30)  # 30 second timeout
response = llm.call(messages)
signal.alarm(0)
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Ignoring Token Count

Token counts = money. Without tracking, you dont know whats expensive until the bill arrives.

Pattern 4: No Retry Logic

LLM APIs fail. Your code should handle it with exponential backoff.

Pattern 5: Hardcoding Model Names

model=gpt-4o is fragile. Model names change. Use environment variables.

Pattern 6: No Circuit Breaker

One bad API day shouldnt take down your whole app.

Pattern 7: Forgetting Edge Cases

Empty input. Max length input. Unicode. Your LLM handles these differently than expected.

The Prevention Stack

Pattern Prevention
JSON validation Always validate with schema library
Timeouts Always set timeout on API calls
Token tracking Log every calls token count
Retries Implement with exponential backoff
Circuit breaker Add circuit breaker pattern
Edge cases Write explicit tests

Most of these are basic distributed systems patterns applied to LLM integrations.

If you want a monitoring tool that catches some of these: DriftWatch from GBP9.90/mo

Top comments (0)