While making an LLM product using OpenAI's API one of the first questions that arises is:
Why is OpenAI's API so slow?
Depending on the question and the prompt, the response can be quite long, and the longer it is, the longer it takes for ChatGPT's API to return the answer.
If you make a question in ChatGPT's official page, the response comes little by little, chunk by chunk, so even if it takes some time to return the full response, it is not that bad, as you are able to read the answer as it comes.
This is because ChatGPT's responses are streams of data, instead of just one big blob, so if you see that the response you make in your application is crazy slow, is because you are actually waiting for ChatGPT for generating the full response.
Then, how do I stream the response like in OpenAI's official web page?
Typically, you would make the request to OpenAI's API in your backend, as you need to pass your OpenAI API key, which is private to you, but re-streaming the data to the frontend can get really tricky depending on your stack and infra...
Then why not just make the query directly from the mobile/web app? you could just use fetch
for querying OpenAI directly there. The answer is simple: because you do not want to leak your API KEY to the world.
But what if there was a way of querying OpenAI's API directly from the browser/mobile app without the need of leaking your API KEY...
Meet Signway, a service that brings the power of pre-signed URLs to your apps.
Signway is a high performant gateway that proxies pre-signed requests to the specified destination.
It uses the same cryptographic trick that AWS S3 uses with its pre-signed URLs, but it redirects requests to specified destinations instead of giving access to S3 objects.
You can see this more in detail in https://github.com/gabotechs/signway.
The idea is the following:
- You configure Signway with an
id
and asecret
. The managed version already does this for you, but you could also launch it with docker and configure it yourself. - In your backend, instead of querying OpenAI's API, you create a pre-signed url using Signway's Python SDK or JavaScript SDK. For that, you will need to use the same
id
andsecret
that Signway is configured with. - You send the generated pre-signed URL back to the frontend, and you make the request using
fetch
as if you where querying OpenAI's API but the URL is different (it is the Signway's pre-signed one) - The request will pass through Signway, who will validate that the request is authentic and has not expired. If everything looks great, you will see OpenAI's response directly in the client app, without any backend in the middle.
What's next?
- Checkout this post if you want to learn how to use Signway for streaming ChatGPT responses in the browser.
- There is an example here where you can hack around yourself with Signway using
python
for generating pre-signed URLs,curl
for making http requests to OpenAI anddocker
for launchingSignway
in your local machine. - Check the github repo, as it has more detailed explanation of how this works.
- If you want a fully managed solution, check Signway's website.
Top comments (0)