Function Calling vs RAG: 2.3s Latency Gap in Production

#llm #rag #openai #functioncalling

RAG answered our pricing question in 800ms. Function calling took 3.1 seconds. Same model, same query.

This wasn't supposed to happen. Function calling is deterministic — parse the schema, call the endpoint, return structured data. RAG has to embed the query, search vectors, rank chunks, stuff context. On paper, function calling should be faster.

But when we instrumented both approaches on a real customer support chatbot answering "What's the price of the Pro plan?", RAG consistently beat function calling by 2-3x. The culprit? Sequential API calls. Function calling required a round-trip to generate the function call, then another to execute it and synthesize the response. RAG did everything in one shot.

This post walks through when each approach actually wins in production. Not the theoretical comparison, but what happens when you measure end-to-end latency, token costs, and failure modes on real queries.