DEV Community

Choonho Son
Choonho Son

Posted on • Edited on

Llama 2 in Apple Silicon Macbook (2/3)

To program Llama 2 easily, it is highly recommended to encode quantized model.

There is llama C++ port repository.

Download llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
Enter fullscreen mode Exit fullscreen mode

Convert model to GGLM format

cd llama.cpp
python3 -m venv llama2
source llama2/bin/activate
python3 -m pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Converting process consists of two step.

  1. convert model to f16 format
  2. convert f16 model to ggml

convert to f16 format

mkdir -p models/7B
python3 convert.py --outfile models/7B/ggml-model-f16.bin \
--outtype f16 \
../llama2/llama/llama-2-7b-chat \
--vocab-dir ../llama2/llama
Enter fullscreen mode Exit fullscreen mode

Before run the convert, create output directory (ex. models/7B)

--outfile is for specifying the output file name
--outtype is for specifying the output type which is f16
--vocab-dir is for specifying the directory containing tokenizer.model file

If you are hard to find tokenzier.model file, see tokenizer.model

convert f16 model to ggml

This step is called as quantize the model

./quantize ./models/7B/ggml-model-f16.bin \
./models/7B/ggml-model-q4_0.bin q4_0
Enter fullscreen mode Exit fullscreen mode

After quantize model, the file size became very small.

mzc01-choonhoson@MZC01-CHOONHOSON 7B % ls -alh
total 33831448
drwxr-xr-x@ 4 mzc01-choonhoson  staff   128B  9 12 17:23 .
drwxr-xr-x@ 5 mzc01-choonhoson  staff   160B  9 12 16:50 ..
-rw-r--r--@ 1 mzc01-choonhoson  staff    13G  9 12 17:23 ggml-model-f16.bin
-rw-r--r--@ 1 mzc01-choonhoson  staff   3.6G  9 12 17:23 ggml-model-q4_0.bin
Enter fullscreen mode Exit fullscreen mode

Example

All done. run example binary!!!

./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt
Enter fullscreen mode Exit fullscreen mode

References

GGML - Large Language Models for Everyone
https://github.com/rustformers/llm/blob/main/crates/ggml/README.md

Series

Llama 2 in Apple Silicon Bacbook (1/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-13-54h

Llama 2 in Apple Silicon Bacbook (2/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-23-2j51

Llama 2 in Apple Silicon Bacbook (3/3)
https://dev.to/choonho/llama-2-in-apple-silicon-macbook-33-3hb7

Top comments (0)