After 18 months of LLM integrations, these are the patterns that fail most often in production. Not theoretical failures — real incidents.
Pattern 1: Trusting JSON Mode Completely
Everyone thinks JSON mode means always valid JSON. It attempts JSON. You still need validation.
response = llm(format="json", prompt=user_prompt)
try:
data = json.loads(response)
except json.JSONDecodeError:
data = retry_with_stricter_prompt(user_prompt)
Pattern 2: No Timeout on LLM Calls
LLM API calls can hang. Without timeouts, your request thread blocks forever.
import signal
signal.alarm(30) # 30 second timeout
response = llm.call(messages)
signal.alarm(0)
Pattern 3: Ignoring Token Count
Token counts = money. Without tracking, you dont know whats expensive until the bill arrives.
Pattern 4: No Retry Logic
LLM APIs fail. Your code should handle it with exponential backoff.
Pattern 5: Hardcoding Model Names
model=gpt-4o is fragile. Model names change. Use environment variables.
Pattern 6: No Circuit Breaker
One bad API day shouldnt take down your whole app.
Pattern 7: Forgetting Edge Cases
Empty input. Max length input. Unicode. Your LLM handles these differently than expected.
The Prevention Stack
| Pattern | Prevention |
|---|---|
| JSON validation | Always validate with schema library |
| Timeouts | Always set timeout on API calls |
| Token tracking | Log every calls token count |
| Retries | Implement with exponential backoff |
| Circuit breaker | Add circuit breaker pattern |
| Edge cases | Write explicit tests |
Most of these are basic distributed systems patterns applied to LLM integrations.
If you want a monitoring tool that catches some of these: DriftWatch from GBP9.90/mo
Top comments (0)