Skip to content

DEV Community

Debby McKinney

Posted on Jan 23

You're Probably Going to Hit These LiteLLM Issues in Production

#chatgpt #ai #programming #opensource

If you're using LiteLLM and planning to scale, there are specific production issues you should know about. I'm not here to bash the project - but these are real problems documented in their GitHub repository that you'll encounter.

The Database Bottleneck (GitHub Issue #12067)

This is the most common problem teams hit. LiteLLM stores request logs in your PostgreSQL database. According to their own documentation and confirmed in issue #12067: "when there are 1M+ logs in the DB, it might slow down the LLM API requests."

One user reports running 100,000 requests per day. Simple math: that's 1 million logs in 10 days. Then your API requests start slowing down because the gateway is querying a database full of logs.

The suggested workaround involves moving logs to DynamoDB or blob storage and disabling database logging. But then you lose the UI dashboard that teams rely on for monitoring. You're trading functionality for performance.

Performance Degrades Over Time (GitHub Issue #6345)

Issue #6345 documents this pattern clearly: "Performance gradually degrades over time" and "after 2-3 hours it is getting slower."

The temporary fix? Restart the service. Performance comes back, then degrades again after a few hours.

This isn't a configuration problem you can tune your way out of. When your production gateway needs periodic restarts to maintain performance, that's an architectural constraint.

Cached Responses Are Slow (GitHub Issue #9910)

Issue #9910 shows something concerning. A user makes a request that hits the cache. The logs confirm "Cache Hit: True" with processing time of just 1 millisecond.

But the actual response time? 10.598 seconds.

The cache worked - but the response is still slow. That defeats the entire purpose of caching. Something in the request path is adding 10+ seconds even when the actual LLM call is skipped.

800+ Open Issues

As of January 2026, LiteLLM has over 800 open issues on GitHub. Many are feature requests, but a significant portion are bugs and production problems.

The volume itself isn't necessarily bad - popular projects get lots of issues. But the patterns in those issues reveal architectural constraints that matter at scale.

What This Means for Your Team

These aren't edge cases. They're documented, reproducible problems that multiple users have encountered:

If you're running < 10,000 requests/day: You probably won't hit these issues. The database won't fill up fast enough. Performance degradation might not be noticeable.

If you're running > 100,000 requests/day: You'll hit the database limit in days. You'll need workarounds. You'll need to monitor and restart services. Operational complexity increases.

If you're running serverless: Add the cold start problem on top (3+ second import times from the Reddit discussion we saw earlier).

Alternatives Worth Testing

Bifrost

Built in Go for gateway workloads. Different architecture means different tradeoffs:

No database in the request path (logs don't slow down API calls)
Performance stays consistent over time (no degradation requiring restarts)
Memory usage stays stable
maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally npx -y @maximhq/bifrost # Or use Docker docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello, Bifrost!"}] }'

That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…
View on GitHub

Portkey

Hosted service. Zero operational burden since you're not managing infrastructure. Performance is their problem, not yours.

Downside: Not self-hosted. Won't work if you have data residency requirements.

Building Your Own

For simple use cases (2-3 providers, basic routing), a thin wrapper around provider SDKs might be enough. More work upfront, but you control exactly what you build.

The Choice

Don't pick based on popularity. Test with your actual workload:

Run LiteLLM with production-like traffic
Monitor database growth daily
Watch performance over 24+ hours
Check if the operational complexity fits your team

Then test alternatives the same way.

The right choice depends on your scale and operational capacity. But know what you're choosing before you're debugging production issues.

Being Fair

LiteLLM solved an important problem - making multi-provider access easy. The maintainers are responsive. Updates ship frequently.

But the GitHub issues show real limitations at scale. These aren't hypothetical - they're documented problems from production users.

If you're starting fresh, test thoroughly before committing. If you're already on LiteLLM and hitting these issues, alternatives exist.

Just make sure you understand the tradeoffs.

Verified sources:

Issue #12067: https://github.com/BerriAI/litellm/issues/12067
Issue #6345: https://github.com/BerriAI/litellm/issues/6345
Issue #9910: https://github.com/BerriAI/litellm/issues/9910
Bifrost: https://docs.getbifrost.ai

Top comments (0)

Subscribe