Originally published on https://www.ankitbabber.com
I have a Mac with Intel silicon. I also have an eGPU with an AMD 6900XT (...allright!). BUT I COULDN'T HARNESS THAT POWER AND RUN A LLM LOCALLY WITH OLLAMA!!! If you have a Mac with Intel silicon, then you know that the CPU and integrated GPU are insufficient for running a LLM locally. I don't want to buy new hardware to play around with LLMs locally. Then I found llama.cpp. This is an amazing repo that helps democratize the use of LLMs on local machines. However, the install instructions on the llama.cpp repo did not cover the issues I came across, which is why I was motivated to write up a post to help others out. If you have older hardware that isn't supported by the current tools to run a LLM locally (specifically a Mac with Intel silicon and an AMD eGPU), then this post is for you! As a side note, if your hardware isn't a 1:1 match for what I have, but you realize that running the Vulkan SDK backend for llama.cpp is the right fit for you, then this post may be useful for setting up the Vulkan backend.
0. Dependencies and Setup
- If you have a Mac with Intel silicon, then you need to use the
Vulkanbackend when setting up llama.cpp because llama.cpp only supports theMetalapi on Macs for Apple silicon and AMD only providesHIPsupport for Linux. - Make sure you haven't installed
MoltenVKor otherVulkanSDK components piecemeal via a package manager likebrewbecause that can interfere with theVulkanSDK install. - Download and install the Vulkan SDK.
- Install
cmakeandlibompwith your favorite package manager. - Make sure you verify the
sha256hash to ensure the file you downloaded is correct.
# verify the sha256 hash on your local machine, I'll do it on my Mac with the below command
openssl dgst -sha256 ./path/to/download/vulkansdk-macos-1.4.304.0.zip
# verify the output visually, on a repl, etc
# you will also need to install cmake and libomp
brew install cmake libomp
brew doctor --verbose
# if there is no output, then proceed
# if there are files listed here, copy them and save them in a separate txt file
# get the `llama.cpp` repo from github
git clone https://github.com/ggerganov/llama.cpp.git
# or with the github cli get the `llama.cpp` repo from github
# gh repo clone ggerganov/llama.cpp
# ensure you are in the llama.cpp directory for the next steps
cd llama.cpp
1. Build llama.cpp Using cmake
- When I tried to build
llama.cppwithcmakefor the first time following the instructions on the llama.cpp repo for Vulkan, I got the following error after runningcmake -B build -DGGML_VULKAN=ON:
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
CMake Warning at ggml/src/ggml-cpu/CMakeLists.txt:53 (message):
OpenMP not found
- On your Mac, Xcode Command Line Tools or
xcode-selectshould provide access to OpenMP viaClang. - My version of
cmakecould not see theOpenMPapi included withClang, so to make my life easier I just installedOpenMPviabrew install libomp. - After this, I needed to link
libompwhen runningcmake -B build -DGGML_VULKAN=ONfrom the instructions on the llama.cpp repo for Vulkan. Since, I installedlibompwithbrew, the pathway tolibompreflects that. Also, I explicitly ensured theMetalapi was off.
cmake -B build -DGGML_METAL=OFF -DGGML_VULKAN=ON \
-DOpenMP_C_FLAGS=-fopenmp=lomp \
-DOpenMP_CXX_FLAGS=-fopenmp=lomp \
-DOpenMP_C_LIB_NAMES="libomp" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_libomp_LIBRARY="$(brew --prefix)/opt/libomp/lib/libomp.dylib" \
-DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include" \
-DOpenMP_CXX_LIB_NAMES="libomp" \
-DOpenMP_C_FLAGS="-Xpreprocessor -fopenmp $(brew --prefix)/opt/libomp/lib/libomp.dylib -I$(brew --prefix)/opt/libomp/include"
- I didn't run into any other errors after running the previous command, but make sure to read through the output to ensure everything is fine on your system.
- To complete the build process, run
cmake --build build --config Release. - You will see many objects being built so that
llama.cppcan run, and the following warnings as of versionb4686ofllama.cpp:
ggml_vulkan: Generating and compiling shaders to SPIR-V
[ 6%] Building CXX object ggml/src/ggml-vulkan/CMakeFiles/ggml-vulkan.dir/ggml-vulkan.cpp.o
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:1382:2: warning: extra ';' outside of a function is incompatible with C++98 [-Wc++98-compat-extra-semi]
1382 | };
| ^
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:7048:16: warning: 'return' will never be executed [-Wunreachable-code-return]
7048 | return false;
| ^~~~~
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8208:15: warning: 'break' will never be executed [-Wunreachable-code-break]
8208 | } break;
| ^~~~~
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8167:15: warning: 'break' will never be executed [-Wunreachable-code-break]
8167 | } break;
| ^~~~~
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8088:15: warning: 'break' will never be executed [-Wunreachable-code-break]
8088 | } break;
| ^~~~~
/Users/ankit/Playground/llama.cpp/ggml/src/ggml-vulkan/ggml-vulkan.cpp:8035:13: warning: 'break' will never be executed [-Wunreachable-code-break]
8035 | break;
| ^~~~~
6 warnings generated.
[ 6%] Building CXX object ggml/src/ggml-vulkan/CMakeFiles/ggml-vulkan.dir/ggml-vulkan-shaders.cpp.o
[ 7%] Linking CXX shared library ../../../bin/libggml-vulkan.dylib
[ 7%] Built target ggml-vulkan
[ 8%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
cc: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
In file included from /Users/ankit/Playground/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:40:
/usr/local/opt/libomp/include/omp.h:54:9: warning: ISO C restricts enumerator values to range of 'int' (2147483648 is too large) [-Wpedantic]
54 | omp_sched_monotonic = 0x80000000
| ^ ~~~~~~~~~~
/usr/local/opt/libomp/include/omp.h:411:7: warning: ISO C restricts enumerator values to range of 'int' (18446744073709551615 is too large) [-Wpedantic]
411 | KMP_ALLOCATOR_MAX_HANDLE = UINTPTR_MAX
| ^ ~~~~~~~~~~~
/usr/local/opt/libomp/include/omp.h:427:7: warning: ISO C restricts enumerator values to range of 'int' (18446744073709551615 is too large) [-Wpedantic]
427 | KMP_MEMSPACE_MAX_HANDLE = UINTPTR_MAX
| ^ ~~~~~~~~~~~
/usr/local/opt/libomp/include/omp.h:471:39: warning: ISO C restricts enumerator values to range of 'int' (18446744073709551615 is too large) [-Wpedantic]
471 | typedef enum omp_event_handle_t { KMP_EVENT_MAX_HANDLE = UINTPTR_MAX } omp_event_handle_t;
| ^ ~~~~~~~~~~~
4 warnings generated.
[ 8%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 9%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o
cc: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 10%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 10%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o
c++: warning: /usr/local/opt/libomp/lib/libomp.dylib: 'linker' input unused [-Wunused-command-line-argument]
[ 12%] Linking CXX shared library ../../bin/libggml-cpu.dylib
[ 12%] Built target ggml-cpu
- I have not run into any issues from these warnings as of yet. But, I will update this post in the future if I do.
2. Getting Models for llama.cpp
- You will need to download models to run with
llama.cpp. -
Hugging Face is an excellent source for models, but make sure you get quantized models. Quantized models have been changed to work on hardware that does not have enough
RAMto run the model as it was intended. The only change in the model is how large of an input the model will work with by reducing the bits. - I will create a future post regarding quantizing models, but for now we will use a pre-quantized model for the purposes of testing our build.
-
cd ../or go one level up outside of thellama.cppdirectory andmkdir llm-models. We will store all of our models outside of thellama.cpprepo. - Go to the newly created directory
cd llm-models. - I downloaded an
8 bitMeta Llama 3.1model from ggml's hugging face, which was quantized using aQ4_0quantization method. - Ensure you download the model into the
llm-models/directory. - Once the download is complete, we are ready to run the model locally.
3. Running llama.cpp
cd llama.cpp
# start interactive mode
./build/bin/llama-cli -m ../llm-models/meta-llama-3.1-8b-instruct-q4_0.gguf
- If everything went well,
llama.cppwill see the AMD GPU:
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6900 XT (MoltenVK) | uma: 0 | fp16: 1 | warp size: 64 | shared memory: 65536 | matrix cores: none
- Test out interactive mode and have fun running an
LLMlocally!
4. Last Minute Cleanup
- On my Mac, the
VulkanSDK created a bunch ofdylib,static lib,pc, andheaderfiles in/usr/local/lib. - When you run
brew doctor --verbose,brewwill give you a bunch of warnings that it found unbrewed files. - You can choose to ignore this warning. However, if it bothers you like it bothered me. You'll want to do something about it.
-
WARNING:The next steps involve alteringbrewlocally, and is a temporary fix. If you are not comfortable working withbashfunctions or alteringrubycode, do not proceed. - Go to
/usr/local/Homebrew/Library/Homebrew. - There you will find
diagnostic.rb. - Open this file and change the
allow_listarray in the following functionsdef check_for_stray_dylibs,def check_for_stray_static_libs,def check_for_stray_pcs, anddef check_for_stray_headers, with the corresponding files. You can runbrew doctor --verboseto get the list of files again. - Make sure not to include the files that were output by
brew doctor --verboseand saved to a separate file. If there were unbrewed files before installing theVulkanSDK, then they must be addressed separately and are not part of the scope of this post. - Once you have changed
diagnostic.rb, save the file. - When you
cd /usr/local/Homebrewand rungit status, you will see that thediagnostic.rbis changed. We cannot commit these changes. - If you run
brew doctor --verbosethe files added to/usr/local/libby theVulkanSDK are no longer there. - These changes are not permanent. To ensure I don't have to edit
diagnostic.rbeach time I upgradebrew. I wrote twobashfunctions and saved the list ofdylibs,static libs,pcs, andheadersto aJSONfile. - Add the following functions to your
.bashrc,.zshrcor/customforoh-my-zsh
function allow-stash()
{
CURRDIRR=$(pwd)
cd /usr/local/HomeBrew
git stash
cd $CURRDIRR
}
function allow-stash-apply()
{
CURRDIRR=$(pwd)
cd /usr/local/HomeBrew
git stash apply
cd $CURRDIRR
}
-
allow-stashsaves the changes made to thediagnostic.rbingit stash. We're simply saving the appends made to theallow_listin the previously listed functions. - Then I run
brew update && brew upgrade. - Then I run
allow-stash-applyto add my changes todiagnostic.rbagain. - You can create a third function to combine all these steps into a single command.
function update-brew()
{
allow-stash && brew update && brew upgrade && allow-stash-apply
}
-
bashwill run all these commands in sequence from left to right, so the order matters.
Top comments (2)
Followed the instructions and got it built and running but the output from the LLM is all @@@@@@@@@ did you come across this problem? I have a 6800xt in eGPU on intel Mac like you.
I am running into the exact same problem. I have been able to successfully compile llama.cpp with Vulkan using these instructions, but the output is gibberish when offloading layers to the gpu (on both my MacBook Pro and Hackintosh).
I do notice @ababber didn't add
-ngl **at the end of the llama-cli command in the instructions:Running like this, I noticed no layers are actually offloaded to the gpu, so the model only runs from the cpu, and it looks like it is running successfully (it gives normal output), but of course the gpu is not actually utilised then.
So to make sure the layers are properly offloaded to the gpu, the command should look something like:
But when running like this, the output is just gibberish.
Anyone got different results?