BIFROST COMMENT
The routing overhead caught us off guard. We were running caption generation through a larger model for every input when 70% of them only needed a fast small model. Adding a gateway with cost-aware routing (we landed on Bifrost for this, though LiteLLM and Portkey do the same thing: https://github.com/maximhq/bifrost) cut LLM spend in our vision pipeline by 38% without touching the heavy-model cases.
Top comments (0)