We wanted to add some cool AI chat features to our product – and getting it to work with code examples was easy – but reliably scaling it and doing it securely was a lot harder.
Even at moderate scale, keeping it reliable and available was a major headache. Provider outages caused our product to fail at the worst times (like product demos).
What does it take to get our AI features Production-Ready?
We realized we needed multiple LLM providers so that we could gracefully failover to another when not if they had an outage.
Different providers had different rate limits, so we added the ability to retry a request to different providers whenever we hit a rate limit.
An let’s not forget EU customers. Without data tenancy settings to route AI chat requests to LLM providers in the EU, they wouldn’t be able to use our software.
We added response caching, a developer control panel, customer token allowances, secure API key storage, load shedding, and PII data redaction, too.
And now we’ve packaged up everything we’ve learned for you to use in your applications.
Head on over to https://llmasaservice.io/ and check it out. We're looking for application developers to pilot it. Get in touch, and let's get your AI features into production!
Top comments (0)