- Use LLMFit to select a model from the list.
- Download the selected model. This will save the downloaded model in
.cache/llmfit/models/Qwen2.5-Coder-7B-Instruct-Q8_0.gguf - Get llama.cpp from the releases tab
- Use this command
llama-server -m .cache/llmfit/models/Qwen2.5-Coder-7B-Instruct-Q8_0.gguf -ngl -1to run the localhost server of chat interface -
-nglis basically to say for how many layers of GPU to be used, -1 is auto.
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (0)