When an agent feels sluggish, the instinct is to blame reasoning quality.
But in agentic AI systems, reasoning is rarely the real problem.
Inference today looks like:
-planning a path forward
-calling tools
-waiting on external systems
-re-planning based on outputs
-generating a final response across long sessions
That entire loop is inference.
In a recent chat with Yunmo and Alex from FriendliAI, we explored why inference has quietly become the biggest bottleneck in agent performance and how teams are optimizing for it.
The key shift:
Latency, throughput, and cost aren’t infra trade-offs anymore. They’re product decisions.
If you’re building agentic systems, this is worth rethinking.
▶️ Full webinar link: https://shorturl.at/moj3x
Top comments (0)