DEV Community

Debby McKinney
Debby McKinney

Posted on

You're Probably Going to Hit These LiteLLM Issues in Production

If you're using LiteLLM and planning to scale, there are specific production issues you should know about. I'm not here to bash the project - but these are real problems documented in their GitHub repository that you'll encounter.

The Database Bottleneck (GitHub Issue #12067)

This is the most common problem teams hit. LiteLLM stores request logs in your PostgreSQL database. According to their own documentation and confirmed in issue #12067: "when there are 1M+ logs in the DB, it might slow down the LLM API requests."

One user reports running 100,000 requests per day. Simple math: that's 1 million logs in 10 days. Then your API requests start slowing down because the gateway is querying a database full of logs.

The suggested workaround involves moving logs to DynamoDB or blob storage and disabling database logging. But then you lose the UI dashboard that teams rely on for monitoring. You're trading functionality for performance.

Performance Degrades Over Time (GitHub Issue #6345)

Issue #6345 documents this pattern clearly: "Performance gradually degrades over time" and "after 2-3 hours it is getting slower."

The temporary fix? Restart the service. Performance comes back, then degrades again after a few hours.

This isn't a configuration problem you can tune your way out of. When your production gateway needs periodic restarts to maintain performance, that's an architectural constraint.

Cached Responses Are Slow (GitHub Issue #9910)

Issue #9910 shows something concerning. A user makes a request that hits the cache. The logs confirm "Cache Hit: True" with processing time of just 1 millisecond.

But the actual response time? 10.598 seconds.

The cache worked - but the response is still slow. That defeats the entire purpose of caching. Something in the request path is adding 10+ seconds even when the actual LLM call is skipped.

800+ Open Issues

As of January 2026, LiteLLM has over 800 open issues on GitHub. Many are feature requests, but a significant portion are bugs and production problems.

The volume itself isn't necessarily bad - popular projects get lots of issues. But the patterns in those issues reveal architectural constraints that matter at scale.

What This Means for Your Team

These aren't edge cases. They're documented, reproducible problems that multiple users have encountered:

If you're running < 10,000 requests/day: You probably won't hit these issues. The database won't fill up fast enough. Performance degradation might not be noticeable.

If you're running > 100,000 requests/day: You'll hit the database limit in days. You'll need workarounds. You'll need to monitor and restart services. Operational complexity increases.

If you're running serverless: Add the cold start problem on top (3+ second import times from the Reddit discussion we saw earlier).

Alternatives Worth Testing

Bifrost

Built in Go for gateway workloads. Different architecture means different tradeoffs:

  • No database in the request path (logs don't slow down API calls)
  • Performance stays consistent over time (no degradation requiring restarts)
  • Memory usage stays stable

    GitHub logo maximhq / bifrost

    Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

    Bifrost

    Go Report Card Discord badge Known Vulnerabilities codecov Docker Pulls Run In Postman Artifact Hub License

    The fastest way to build AI applications that never go down

    Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

    Quick Start

    Get started

    Go from zero to production-ready AI gateway in under a minute.

    Step 1: Start Bifrost Gateway

    # Install and run locally
    npx -y @maximhq/bifrost
    
    # Or use Docker
    docker run -p 8080:8080 maximhq/bifrost
    Enter fullscreen mode Exit fullscreen mode

    Step 2: Configure via Web UI

    # Open the built-in web interface
    open http://localhost:8080
    Enter fullscreen mode Exit fullscreen mode

    Step 3: Make your first API call

    curl -X POST http://localhost:8080/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "openai/gpt-4o-mini",
        "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
      }'
    Enter fullscreen mode Exit fullscreen mode

    That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…

Portkey

Hosted service. Zero operational burden since you're not managing infrastructure. Performance is their problem, not yours.

Downside: Not self-hosted. Won't work if you have data residency requirements.

Building Your Own

For simple use cases (2-3 providers, basic routing), a thin wrapper around provider SDKs might be enough. More work upfront, but you control exactly what you build.

The Choice

Don't pick based on popularity. Test with your actual workload:

  1. Run LiteLLM with production-like traffic
  2. Monitor database growth daily
  3. Watch performance over 24+ hours
  4. Check if the operational complexity fits your team

Then test alternatives the same way.

The right choice depends on your scale and operational capacity. But know what you're choosing before you're debugging production issues.

Being Fair

LiteLLM solved an important problem - making multi-provider access easy. The maintainers are responsive. Updates ship frequently.

But the GitHub issues show real limitations at scale. These aren't hypothetical - they're documented problems from production users.

If you're starting fresh, test thoroughly before committing. If you're already on LiteLLM and hitting these issues, alternatives exist.

Just make sure you understand the tradeoffs.


Verified sources:

Top comments (0)