LiteLLM reached 34,000+ GitHub stars as a popular multi-provider LLM gateway. But examining specific GitHub issues reveals production limitations that matter for teams at scale.
Issue #12067: Database Becomes a Bottleneck
GitHub issue #12067 documents a scaling problem. A user running LiteLLM with PostgreSQL, Redis, and AWS infrastructure describes their situation:
"As per the docs -> when there are 1M+ logs in the DB, it might slow down the LLM API requests"
"My daily request quota is ~100,000 requests, which means I will hit that quota within roughly 10 days."
The math is straightforward: at 100,000 requests per day, the database reaches 1 million logs in 10 days. Then API requests slow down because the gateway queries that database.
The documented workaround involves moving logs to DynamoDB or blob storage and disabling database logging. But the user asks:
"Once I disable storing the logs into Postgres DB, and move them to DynamoDB -> would I be able to see the logs on the LiteLLM Proxy UI?"
The tradeoff: functionality (UI monitoring) versus performance (system speed).
Issue #6345: Performance Degradation Pattern
GitHub issue #6345 describes performance behavior over time:
"Performance gradually degrades over time. The issue persists even after disabling router features and Redis."
"But then after 2-3 hours it is getting slower."
"The problem is temporarily resolved by restarting the service or the entire system, with initial requests being faster after a restart."
The user tested with different configurations (4 workers, then 1 worker) without resolving the degradation. Performance returns after restart, then degrades again.
This pattern suggests resource accumulation or state management issues rather than simple configuration problems.
Issue #9910: Cached Response Latency
GitHub issue #9910 shows an unexpected result. The user makes a request that hits the cache:
time curl -X POST http://${LITELLM_POD_IP}/v1/chat/completions ...
{ ... 0.00s user 0.01s system 0% cpu 10.598 total}
LiteLLM logs confirm the cache hit:
Cost: $0.000000
Cache Hit: True
Status: Success
Start Time: 2025-04-11T10:01:45.742000Z
End Time: 2025-04-11T10:01:45.743000Z
Processing time: 1 millisecond. Actual response time: 10.6 seconds.
The cache worked - the LLM wasn't called. But something in the request path added 10+ seconds. Caching is meant to reduce latency, not preserve it.
What These Issues Indicate
These three issues share a theme: architectural constraints that emerge at scale.
Database in the request path - Logs accumulate and slow down API queries. Solutions require disabling features or complex workarounds.
Resource management over time - Performance degrades after hours of operation. Temporary fixes involve restarting services.
Request path efficiency - Even when skipping the LLM call entirely, requests take seconds instead of milliseconds.
The 1,000+ Open Issues Context
As of January 2025, LiteLLM has over 1,000 open GitHub issues. Not all are bugs - many are feature requests or discussions. But the volume combined with specific patterns reveals where the architecture faces constraints.
The maintainers are active. Updates ship frequently. Issues get responses. But some problems require architectural changes rather than patches.
Alternative Architectures
Some teams moved to Bifrost (https://github.com/maximhq/bifrost), which makes different architectural choices:
- Written in Go (compiled binary vs Python interpreter)
- Logs separate from request path (no database queries during API calls)
- Different memory management (no degradation over time)
Benchmarks show significantly faster performance, but Bifrost has smaller community and fewer providers than LiteLLM.
Other teams use hosted services (Portkey, OpenRouter) to avoid infrastructure management entirely.
When These Issues Matter
For development and prototyping: These issues are typically irrelevant. Low volume masks the problems.
For moderate production (< 10,000 requests/day): The database won't fill fast enough to cause problems. Performance degradation might not be noticeable.
For high-scale production (> 100,000 requests/day): All three issues become operational concerns requiring workarounds.
Reading GitHub Issues
The three issues examined here aren't cherry-picked for criticism. They're representative patterns from users encountering production constraints.
Issue #12067: 100,000 requests/day is not unusual for production applications.
Issue #6345: Performance degradation after hours is documented by multiple users.
Issue #9910: Slow cached responses contradict the purpose of caching.
The Maintainer Response
Credit to LiteLLM's maintainers for transparency. The documentation acknowledges the 1M log limitation. The founder engages in GitHub issues and community discussions.
But acknowledging limitations doesn't eliminate them. Teams need to decide if documented constraints fit their requirements.
Making Informed Decisions
Don't choose infrastructure based on popularity metrics. Read the actual issues:
- Check if reported problems match your use case
- Verify if workarounds fit your operational capacity
- Test with production-like volume and duration
- Compare alternatives under your specific workload
The Pattern
First-generation tools often prioritize rapid development and wide compatibility. Second-generation tools learn from production deployments and optimize for specific constraints.
LiteLLM made multi-provider access accessible. The GitHub issues show where that approach encounters limits.
Whether those limits matter depends entirely on your requirements.
Verified sources:
- Issue #12067: https://github.com/BerriAI/litellm/issues/12067
- Issue #6345: https://github.com/BerriAI/litellm/issues/6345
- Issue #9910: https://github.com/BerriAI/litellm/issues/9910
- Bifrost: https://github.com/maximhq/bifrost
Top comments (0)