AI responses were taking 6 to 8 seconds. Not the model. Data fetching.
Flow was fully sequential:
CRM
Inventory
Accounting
Then build response
Each question was starting from zero.
Same data. Same queries. Repeated every time.
Fix was simple. Stop fetching everything again and again.
We changed how context is handled.
Instead of fetching on every request:
Preload frequently used data when session starts
Store it in session context
Reuse it across multiple questions
So the second and third question do not hit all systems again.
Problems we hit:
Stale data
Some values changed during the session
Fixed by adding expiry on cached context
Memory growth
Session context kept increasing
Fixed by limiting what we store and cleaning unused data
Result:
6 to 8 seconds down to around 3 seconds average
Follow up queries became much faster
AI did not get faster. We just stopped wasting time fetching the same data.
This is the kind of problem that shows up in BrainPack deployments where AI depends on multiple existing systems. The bottleneck is almost always in how data is orchestrated, not the model.
Top comments (0)