DEV Community

Running Llama 3.2 on Android: A Step-by-Step Guide Using Ollama

KAMAL KISHOR on October 11, 2024

Meta’s Llama 3.2, unveiled at their Developer Conference, has redefined on-device AI with its multimodal capabilities and optimized models for mobi...

Read full post

Sam Rahimi • Feb 21

Awesome! Worked out of the box for me (Vivo v30 Lite, Android 14, and the latest Termux APK, specifically: github.com/termux/termux-app/relea...)

Note that this will almost certainly NOT work if you download Termux from Google Play Store - while it's fine for casual use it is NOT the same as the open source distro that you download from GitHub.

KAMAL KISHOR • Mar 7

Glad to hear it worked out of the box for you on your Vivo V30 Lite with Android 14! 🎉 Thanks for sharing your setup details.

And yes, you're absolutely right—downloading Termux from the Google Play Store can cause issues since it's outdated and lacks important updates. The GitHub version is the way to go for the latest features and compatibility.

Honoré SOKE • Oct 18 '24

Great article. Thank

manutown • Jan 30

Está corriendo suavemente el de 3 b
Muchas gracias

KAMAL KISHOR • Jan 31

¡Me alegra saber que el modelo de 3B está funcionando sin problemas! 😊 Si tienes alguna otra pregunta o necesitas más ayuda, no dudes en preguntar. Aquí estoy para ayudarte. ¡Feliz experimentación con Ollama y los modelos de IA! 🚀

manutown • Jan 31

Y lo he hecho funcionar con un celular Android de gama baja el poco c65 de Xiaomi
no es de gama alta como los Samsung s23 en el cual funcionan muy rápido gracias

KAMAL KISHOR • Feb 8

Parece que lograste hacerlo funcionar en un Xiaomi Poco C65, ¡eso es genial! A pesar de ser un dispositivo de gama baja, demuestra que la optimización y la eficiencia del software pueden marcar la diferencia. 🚀 ¿Notaste algún problema de rendimiento o corre bien en general?

manutown • Feb 10

Ya lo instale en un galaxy s24 y ejecuta sus análisis igual en los mismos tiempos si utilizo el el 1b responde más rápido pero tiende a mentir y hacer circular la respuesta

b9Joker108 • Oct 26 '24

I got this error 🤕 on my Samsung Galaxy Tab S9 Ultra:

❯ go build .
# github.com/ollama/ollama/llama
ggml-quants.c:4023:88: error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm', but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled without support for 'i8mm'
ggml-quants.c:4023:76: error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm', but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled without support for 'i8mm'
ggml-quants.c:4023:64: error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm', but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled without support for 'i8mm'
ggml-quants.c:4023:52: error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm', but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled without support for 'i8mm'
# github.com/ollama/ollama/discover
gpu_info_cudart.c:61:13: warning: comparison of different enumeration types ('cudartReturn_t' (aka 'enum cudartReturn_enum') and 'enum cudaError_enum') [-Wenum-compare]

Ervan Kurniawan • Feb 17 • Edited

Successfully run deepseek-coder for the first time locally even too slow! (I'll change the model later that run faster on my Samsung A51 device 😅)
Image description

lanbase • Oct 27 '24 • Edited

Hi, I have got the same error on Honor Magic6 Pro (snapdragon 8 gen3).

~/ollama $ go build .
# github.com/ollama/ollama/discover
gpu_info_cudart.c:61:13: warning: comparison of different enumeration types ('cudartReturn_t' (aka 'enum cudartReturn_enum') and 'enum cudaError_enum') [-Wenum-compare]
# github.com/ollama/ollama/llama
ggml-quants.c:4023:88: error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm', but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled without support for 'i8mm'
ggml-quants.c:4023:76: error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm', but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled without support for 'i8mm'
ggml-quants.c:4023:64: error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm', but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled without support for 'i8mm'
ggml-quants.c:4023:52: error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm', but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled without support for 'i8mm'
~/ollama $

update :

I have found a workaround here:

github.com/ollama/ollama/issues/7292

cheers.

Johan • Oct 27 '24 • Edited

I had the same, but found a workaround here

Basically, you modify llama.go#L37-L38 to remove -D__ARM_FEATURE_MATMUL_INT8

walid f • Mar 21

All this tutorial is actually outdated, you can directly install ollama by using:
pkg install ollama
on both playstore and non playstore version of termux (the playstore version gets update slower than the other one, and btw for people on pixel 7 and more use only Ollama via CLI if you try to use any Webui or gui for ollama you tokens will go from 10t/s to 0.2/s (no jokes its really 1token every 5seconds, that due to the 5 march update on pixel phone that i dont what it does exactly but im makes the text response goes really slow.

acauema • Apr 17

yeah, very very true! thanks for the update. i will repeat the command below with proper formatting for better ux:

pkg install ollama

and to remove all the unnecessary files from this tutorial, i just deleted the folder, which i installed on home folder.

rm -rf ~/ollama

i actually kept go and other packages as i will probably use them eventually... but to remove the packages as well, go ahead with it.

pkg remove git cmake goland

Ford Prefekt • Apr 7

I get the error:

~/ollama $ go build .

runtime/cgo

CANNOT LINK EXECUTABLE "/data/data/com.termux/files/usr/bin/aarch64-linux-android-clang": library "libLLVM.so" not found: needed by main executable

~/ollama $
~/ollama $ ./ollama serve &
[1] 23711
~/ollama $ bash: ./ollama: Permission denied
~/ollama $

Please help

Nigel Burton • Nov 9 '24

Qualcomm's spec sheets for Snapdragon 8 gen 3 suggest it can use GPU and a DSP to speed up LLM inference.

Do you, or any other readers know whether Ollama is taking advantage of the hardware?

if not are there any open source projects that are utilizing the full capabilities of the Gen3?

Thanks for the very useful article.

Learn AI • Jan 22

Based on the search results, here’s a detailed response to your questions regarding the utilization of Snapdragon 8 Gen 3 hardware (GPU and DSP) for LLM inference, particularly with Ollama and other open-source projects:

1. Is Ollama Taking Advantage of Snapdragon 8 Gen 3 Hardware?

As of the latest information, Ollama does not currently fully utilize the GPU and DSP capabilities of the Snapdragon 8 Gen 3 for LLM inference. While Ollama supports running models like Llama 3.2 on Android devices using Termux, its primary focus has been on CPU-based inference. There are discussions and efforts to integrate GPU and NPU support, but these are still in progress and not yet fully realized.

For example:

A user reported that the NPU on a Snapdragon X Elite device (which shares similar architecture with Snapdragon 8 Gen 3) was not being utilized when running Ollama.
Developers have mentioned that GPU support (via OpenCL) is being worked on, but NPU support will require more effort and has no estimated timeline.

2. Open-Source Projects Utilizing Snapdragon 8 Gen 3 Hardware

Several open-source projects and frameworks are actively leveraging the full capabilities of the Snapdragon 8 Gen 3, including its GPU, DSP, and NPU for AI and LLM tasks:

a. Qualcomm AI Hub Models

The Qualcomm AI Hub Models project provides optimized machine learning models for Snapdragon devices, including the Snapdragon 8 Gen 3. These models are designed to take advantage of the hardware's CPU, GPU, and NPU for tasks like image classification, object detection, and LLM inference.
The project supports various runtimes, including Qualcomm AI Engine Direct, TensorFlow Lite, and ONNX, enabling efficient deployment on Snapdragon hardware.

b. MiniCPM-Llama3-V 2.5

This open-source multimodal model is optimized for deployment on Snapdragon 8 Gen 3 devices. It uses 4-bit quantization and integrates with Qualcomm’s QNN framework to unlock NPU acceleration, achieving significant speed-ups in image encoding and language decoding.
The model demonstrates how open-source projects can leverage Snapdragon hardware for efficient on-device AI applications.

c. Llama.cpp

Llama.cpp is a popular open-source project for running LLMs locally. While it primarily focuses on CPU inference, there are ongoing efforts to add GPU and NPU support for Snapdragon devices. For example, developers are working on an OpenCL-based backend for Adreno GPUs, which could extend to Snapdragon 8 Gen 3.
Some users have reported successful performance benchmarks on Snapdragon X Elite devices, indicating potential for future optimizations.

d. Qualcomm AI Engine Direct

This framework allows developers to compile and optimize models for Snapdragon hardware, including the GPU and NPU. It is used in projects like EdgeStableDiffusion, which demonstrates how large models like Stable Diffusion can be efficiently run on Snapdragon 8 Gen 2 and Gen 3 devices.

3. Future Prospects

Ollama: While Ollama does not yet fully utilize Snapdragon 8 Gen 3 hardware, the development community is actively working on GPU and NPU support. This could significantly improve performance for on-device LLM inference in the future.
Open-Source Ecosystem: Projects like Qualcomm AI Hub Models, MiniCPM-Llama3-V 2.5, and Llama.cpp are leading the way in leveraging Snapdragon hardware. These efforts highlight the potential for open-source tools to fully utilize the capabilities of modern mobile chipsets.

Conclusion

Currently, Ollama does not fully utilize the GPU and DSP capabilities of the Snapdragon 8 Gen 3, but there are promising open-source projects like Qualcomm AI Hub Models, MiniCPM-Llama3-V 2.5, and Llama.cpp that are making significant strides in this area. As development continues, we can expect more tools to take full advantage of Snapdragon hardware for efficient on-device AI and LLM inference.

For further details, you can explore the referenced projects and discussions in the search results.

Comment deleted

Ernestoyoofi • Jan 14

your url is incomplete, it should be like this https://github.com/ollama/ollama.git

Nann T T Hein • Jan 14

Thank you 🫢

Dev • Feb 14

I got this and i don't know what to do, can someone please help me?

KAMAL KISHOR • Mar 7 • Edited

It looks like your build error is due to missing ARM NEON FP16 support. The identifiers vld1q_f16 and vld1_f16 are ARM NEON intrinsics for float16 operations, which might not be enabled by default in your compiler.

Possible fixes:

Ensure you're using a recent GCC (10+) or Clang version that supports NEON FP16.
Try compiling with the flag:

   CFLAGS="-mfpu=neon-fp16" make

If you're building on x86 instead of ARM, the code might not be compatible. You may need to disable NEON-related optimizations.

As for the warnings about format specifiers (%lu vs. uint64_t), you can fix them by using %llu or explicitly casting the argument to (unsigned long long).

Let me know if you need more help!

Picaso X • Jan 30

./ollama run llama3.2:3b --verbose

Error: could not connect to ollama app
What can I do to solve this problem?

KAMAL KISHOR • Jan 31

The error Error: could not connect to ollama app typically occurs when the Ollama server is not running or is not accessible. Since you're running this on an Android device using Termux, here are some steps to troubleshoot and resolve the issue:

1. Ensure the Ollama Server is Running

The Ollama server must be running in the background for the ollama run command to work.
Start the Ollama server by running:
```
 ./ollama serve &
```
The & at the end runs the server in the background. You should see a message confirming the server is running.

Check if the server is running:

Run the following command to see if the Ollama process is active:
```
 ps aux | grep ollama
```
If you don't see the ollama serve process, restart it.

2. Verify the Ollama Server Port

By default, Ollama uses port 11434. Ensure this port is not blocked or used by another process.
Check if the port is open:
```
 netstat -tuln | grep 11434
```
If the port is not listed, the server might not be running correctly.

3. Check for Errors in the Server Logs

If the server fails to start, check the logs for errors:
```
 ./ollama serve
```
Look for any error messages that might indicate why the server isn't starting. Common issues include missing dependencies or insufficient memory.

4. Ensure Termux Has Proper Permissions

Termux needs storage and internet permissions to function correctly.
Grant storage access:
```
 termux-setup-storage
```
Ensure Termux has internet access by testing with:
```
 ping google.com
```

5. Reinstall Ollama

If the server still doesn't start, try reinstalling Ollama:

 cd ~
 rm -rf ollama
 git clone --depth 1 https://github.com/ollama/ollama.git
 cd ollama
 go generate ./...
 go build .

Then, start the server again:
```
 ./ollama serve &
```

6. Check Device Resources

Running Llama 3.2 on an Android device can be resource-intensive. Ensure your device has enough RAM and storage.
If your device is low on resources, the server might fail to start. Try closing other apps or using a smaller model like llama3.2:1b.

7. Test with a Smaller Model

If the issue persists, test with a smaller model to rule out resource constraints:
```
 ./ollama run llama3.2:1b
```
If the smaller model works, the issue might be related to the device's ability to handle the 3B model.

8. Use Verbose Mode for Debugging

Run the Ollama server in verbose mode to get more detailed logs:
```
 ./ollama serve --verbose
```
Look for any specific errors or warnings that might indicate the root cause.

9. Check for Termux Updates

Ensure you're using the latest version of Termux. Older versions might have compatibility issues.
Update Termux packages:
```
 pkg update && pkg upgrade
```

10. Restart Termux

Sometimes, simply restarting Termux can resolve connectivity issues:
- Close Termux completely and reopen it.
- Restart the Ollama server:
```
   ./ollama serve &
```

11. Verify the Model Name

Double-check that the model name llama3.2:3b is correct. If the model doesn't exist, Ollama will fail to connect.
List available models:
```
 ./ollama list
```
If the model isn't listed, pull it first:
```
 ./ollama pull llama3.2:3b
```

12. Check Network Connectivity

If you're behind a proxy or firewall, ensure Termux has access to the internet.
Test connectivity by pulling a model:
```
 ./ollama pull llama3.2:1b
```
If the pull fails, there might be a network issue.

13. Use a Different Device

If none of the above steps work, try running Ollama on a different Android device or a PC. This will help determine if the issue is specific to your device.

Summary of Commands

Here’s a quick summary of the key commands to troubleshoot and resolve the issue:

# Start Ollama server
./ollama serve &

# Check if server is running
ps aux | grep ollama

# Check port usage
netstat -tuln | grep 11434

# Reinstall Ollama
cd ~
rm -rf ollama
git clone --depth 1 https://github.com/ollama/ollama.git
cd ollama
go generate ./...
go build .

# Run a smaller model
./ollama run llama3.2:1b

# Pull the model manually
./ollama pull llama3.2:3b

If you’ve tried all the steps above and the issue persists, feel free to provide more details about the error logs or behavior, and I’ll help you further!

Brayan Herrera • Feb 21

Me sale eso despues del comando go build .

manutown • Jan 30

Cómo agregar el modelo de deepseek a ollama y seleccionar lo podrías agregar las instrucciones como otra opción

KAMAL KISHOR • Jan 31

Descargue el modelo DeepSeek de Hugging Face.

Convierta el modelo al formato GGUF (si es necesario).

Cree un Modelfile y especifique la ruta al modelo.

Construya el modelo en Ollama:

bash Copiar ollama crear deepseek -f Modelfile Ejecute el modelo:

bash Copiar ollama ejecutar deepseek Solución de problemas Si el modelo no se carga, asegúrese de que el archivo del modelo esté en el formato correcto (GGUF/GGML).

Verifique los registros de Ollama en busca de errores:

bash Copiar ollama server --verbose Si encuentra problemas de memoria, intente usar un modelo más pequeño o ejecutar Ollama en un dispositivo con más RAM.

Si sigue estos pasos, podrá agregar y utilizar con éxito el modelo DeepSeek en Ollama. ¡Avíseme si necesita más ayuda!