the time difference between sending a prompt and receiving a response through the terminal (about 3 secs) and using python to do it programmatically (about 40 secs) is a problem I am keen to solve. The issue seems to be the fact that the Ollama model is being reloaded each time although it is actually running.
Can someone help me in this regard?
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)