Hi everyone, I worked on this project one year ago since I started working on developing new programs by using GPT and GPU could.  And I found out these applications typically run on GPU clouds in industrial environments, where the cost of LLM requests may be ten times higher than that of traditional queries. 
Is there a method to improve the the scheduling issues of LLMs?
 
              
            
          For further actions, you may consider blocking this person and/or reporting abuse
 

 
    
Top comments (0)