The latency is trash. I’m sorry, but you can’t build a fluid voice conversation when your memory layer takes 500ms+ just to fetch a relevant fact. By the time the AI responds, the user has already hung up or started talking again. And the pricing for high-volume retrieval? A joke.
I got tired of waiting, so I built my own free solution Orthanc.ai.
It’s a memory layer designed specifically for speed. No bloat, no external API dependancies for the embeddings. It uses optimized local models to handle storage and retrieval instantly.
I open-sourced the whole thing. Steal the code here: https://github.com/orthanc-protocol/client-sdk.git
If you want the speed but don't want to manage the servers, I’m giving away free credits for the hosted API/SDK. Hit me up and I'll send you a key or make a quick profile on the website and snag a key.
Top comments (0)