Discussion on: How to run Llama 2 on anything

View post

Thank you for the post!

As of Sep 2023, do you have any recommendation at running the model locally on macbook with Intel CPU and Intel graphics card?

wget https://huggingface.co/substratusai/Llama-2-13B-chat-GGUF/resolve/main/model.bin -O model.q4_k_s.gguf has worked for me.

However, when I run

./main -t 4 -m ./models/model.q4_k_s.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"

it could load the model.

ggml_metal_init: load pipeline error: Error Domain=CompilerError Code=2 "AIR builtin function was called but no definition was found." UserInfo={NSLocalizedDescription=AIR builtin function was called but no definition was found.}
llama_new_context_with_model: ggml_metal_init() failed
llama_init_from_gpt_params: error: failed to create context with model './models/model.q4_k_s.gguf'


main: error: unable to load model

Thank you!

Chandler TimeSurge Labs • Sep 7 '23 • Edited

I would start by making sure your source code is up to date and doing a recompile, as well as double checking the path where your model is located.

I don't know much about getting Llama.cpp working on Intel Macs, but I'd try to run it without metal enabled (set -ngl to 0) and see if you can get that working. I am not seeing anything about running Llama.cpp on Intel other than a few issues saying it doesn't work and they're trying to get it working with another GPU in their system (both of which were iMacs with other GPUs installed as well). If you can't get it working with the above advice, I'd advise making an issue on the Llama.cpp GitHub.