How often do you see this error message when working with LLMs?
openai.error.ratelimiterror rate limit reached for default-gpt-3.5-turbo in organization
If you are working with LLM models you might already experienced a problem with processing multiple messages at a reasonable time. Large LLMs might take up to 20 seconds to generate an AI response. Obviously this means that processing messages sequentially and waiting 20s after every request before processing next one is not a good option.
Thus many data scientists switched to multiprocessing for speeding up their work. Multiprocessing limits your request to number of threads of your local machine. Yes, this means that let's say 4 local threads are waiting for their responses from server API, and server is processing this four responses. But does this mean that you are using all compute power of a server? In reality your 4 API requests wouldn't account ever for 0.1% sever load.
More advanced developer will say that the correct solution would be to use async requests. In this case server is processing all received requests in parallel. Simply speaking if you don't have any server side limits than you will be able to send all your requests in one go and get result in 20 seconds (server will process all your requests in parallel).
However API's tend to return errors, thus it is quite expected that most of your requests will fail because of errors like RateLimit
or generic APIError
. This means that correct async implementation should use reasonable batch size which won't trigger exceptions like 'RateLimit' from the server and on top of that your implementation should restart all failed async tasks.
I have created small code snippet that will help you to start using such async implementation of API requests to ChatGPT in your projects. Please check my github repo code for examples.
Github async_chatgpt
For example here is how easily to run the code directly from the jupyter notebook:
To make your code more robust my recommendations would be to break your chats in reasonable batches (for example send 50 chats at a time) and store API responses of each batch on the disk before proceeding to sending next batch in async mode.
While processing a batch you will see logs with information on how many tasks were completed, how many will be restarted and what were the exceptions during processing.
Hope this will help you to automate some of your analysis.
If you like this post don't hesitate to:
☑ add a reaction ♡ 🌺 🍒 ✨
☑ post a comment 📣 🔊
☑ give a star to the github repo ★ ☆ ★
Top comments (0)