DEV Community

Cover image for Hosting Your Own AI Chatbot on Android Devices
MrDoe
MrDoe

Posted on

Hosting Your Own AI Chatbot on Android Devices

Are you tired of handing over your personal data to big tech companies every time you interact with an AI assistant? Well, I've got good news - there's a way to run powerful language models right on your Android smartphone or tablet, and it all starts with llama.cpp.

In this in-depth tutorial, I'll walk you through the process of setting up llama.cpp on your Android device, so you can experience the freedom and customizability of local AI processing. No more relying on distant servers or worrying about your data being compromised. It's time to take back control and unlock the full potential of modern machine learning technology.

The Advantages of Running a Large Language Model (LLM) Locally

Before we dive into the technical details, let's explore what's the reason for running AI models locally on Android devices.

Firstly, it gives you complete control over your data. When you engage with a cloud-based AI assistant, your conversations, queries, and even personal information are sent to remote servers, where you have little to no visibility or control over how it's used or even sold to third party companies.

With llama.cpp, everything happens right on your device. Your interactions with the AI never leave your smartphone or tablet, ensuring your privacy remains intact. Plus, you can even use these local AI models in places where you don't have an internet connection or aren't allowed to access cloud-based AI services, like some workplaces.

But the benefits don't stop there. By running a local Ai, you also have the power to customize it. Instead of being limited to the pre-built models offered by big tech companies, you can hand-pick AI models that are tailored to your specific needs and interests. Or, if you own the right hardware and are experienced with AI models, you can even fine-tune the models yourself to create a truly personalized AI experience.

Getting Started with llama.cpp on Android

Alright, let's dive into setting up llama.cpp on your Android device.

Prerequisites

Before we begin, make sure your Android device meets the following requirements:

  • Android 8.0 or later
  • At least 6-8GB of RAM for optimal performance
  • A modern Snapdragon or Mediatek CPU with at least 4 cores
  • Enough storage space for the application and language model files (typically 1-8GB)

Step 1: Install F-Droid and Termux

First, you'll need to install the F-Droid app repository on your Android device. F-Droid is a great source for open-source software, and it's where we'll be getting the Termux terminal emulator.

Head over to the F-Droid website and follow the instructions to install the app. Once that's done, open F-Droid and search for Termux and install the latest version.
Please don't use Google Play Store to install Termux, as the version there is very outdated.

Setup Termux Repositories (optional)

If you change the termux repository server to one in your country you can gain faster download speeds when installing packages:

termux-change-repo
Enter fullscreen mode Exit fullscreen mode

If you need help, check the Termux Wiki site.

Step 2: Set up the llama.cpp Environment

With Termux installed, it's time to get the llama.cpp project up and running. Start by opening the Termux app and install the following packages, which we'll need later for compiling llama.cpp:

pkg i clang wget git cmake
Enter fullscreen mode Exit fullscreen mode

Now clone the llama.cpp git repository to your phone:

git clone https://github.com/ggerganov/llama.cpp.git
Enter fullscreen mode Exit fullscreen mode

Next, we need to set up the Android NDK (Native Development Kit) to compile the llama.cpp project. Visit the Termux-NDK repository and download the latest NDK release. Extract the ZIP file, then set the NDK path in Termux:

unzip [NDK_ZIP_FILE].zip
export NDK=~/[EXTRACTED_NDK_PATH]
Enter fullscreen mode Exit fullscreen mode

Step 3.1: Compile llama.cpp with Android NDK

With the NDK set up, you can now compile llama.cpp for your Android device. There are two options: with or without GPU acceleration. I recommend starting with the non-GPU version, as it's a bit simpler to set up.

mkdir build
cd build
cmake -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-24 -DCMAKE_C_FLAGS=-march=native ..
make
Enter fullscreen mode Exit fullscreen mode

If everything goes well, you should now have working llama.cpp binaries in the build folder of the project. You can now continue with downloading a model file (Step 4).

Step 3.2 Build llama.cpp with GPU Acceleration (optional)

Building llama.cpp with OpenCL and CLBlast support can increase the overall performance, but requires some additional steps:

Download necessary packages:

apt install ocl-icd opencl-headers opencl-clhpp clinfo libopenblas
Enter fullscreen mode Exit fullscreen mode

Download CLBlast, compile it and copy clblast.h into the llama.cpp folder:

git clone https://github.com/CNugteren/CLBlast.git
cd CLBlast
cmake .
cmake --build . --config Release
mkdir install
cmake --install . --prefix ~/CLBlast/install
cp libclblast.so* $PREFIX/lib
cp ./include/clblast.h ../llama.cpp
Enter fullscreen mode Exit fullscreen mode

Copy OpenBLAS files to llama.cpp:

cp /data/data/com.termux/files/usr/include/openblas/cblas.h .
cp /data/data/com.termux/files/usr/include/openblas/openblas_config.h .
Enter fullscreen mode Exit fullscreen mode

Build llama.cpp with CLBlast:

cd ~/llama.cpp
mkdir build
cd build
cmake -DLLAMA_CLBLAST=ON -DCMAKE_TOOLCHAIN_FILE=$NDK/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-24 -DCMAKE_C_FLAGS=-march=native -DCLBlast_DIR=~/CLBlast/install/lib/cmake/CLBlast ..
cd ..
make
Enter fullscreen mode Exit fullscreen mode

Add LD_LIBRARY_PATH under ~/.bashrc(Run program directly on physical GPU):

echo "export LD_LIBRARY_PATH=/vendor/lib64:$LD_LIBRARY_PATH:$PREFIX" >> ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

Check if GPU is available for OpenCL:

clinfo -l
Enter fullscreen mode Exit fullscreen mode

If everything is working fine, e.g. for a Qualcomm Snapdragon SoC, it will display:

Platform #0: QUALCOMM Snapdragon(TM)
 `-- Device #0: QUALCOMM Adreno(TM)
Enter fullscreen mode Exit fullscreen mode

Step 4: Download and Copy a Language Model

Finally, you'll need to download a compatible language model and copy it to the ~/llama.cpp/models directory. Head over to Hugging Face and search for a GGUF-formatted model that fits within your device's available RAM. I'd recommend starting with TinyLlama-1.1B.

Once you've downloaded the model file, use the

termux-setup-storage
Enter fullscreen mode Exit fullscreen mode

command in Termux to grant access to your device's shared storage. Then, move the model file to the llama.cpp models directory:

mv ~/storage/downloads/model_name.gguf ~/llama.cpp/models
Enter fullscreen mode Exit fullscreen mode

Step 5: Running llama.cpp

With the llama.cpp environment set up and a language model in place, you're ready to start interacting with your very own local AI assistant. I recommend to run the llama.cpp web server:

cd llama.cpp
./server -m models/[YourModelName].gguf -t [#threads]
Enter fullscreen mode Exit fullscreen mode

Replace #threads with the number of cores of your Android device minus 1, otherwise it may become unresponsive.

And then access the AI chatbot locally by opening http://localhost:8080 in your mobile browser.

Image description

Alternatively, you can run the llama.cpp chat directly in Termux:

./main -m models/[YourModelName].gguf --color -inst
Enter fullscreen mode Exit fullscreen mode

Conclusion

While performance will vary based on your device's hardware capabilities, even mid-range phones should be able to run llama.cpp reasonably well as long as you choose small enough models that fit into your device's memory. High-end devices will, of course, be able to take fuller advantage of the model's capabilities.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.