DEV Community

Cover image for Run AI Locally: Converting an Android phone into into a personal LLM server
Daniel Odii
Daniel Odii

Posted on

Run AI Locally: Converting an Android phone into into a personal LLM server

Cloud AI is convenient. It is also expensive and dependent on internet access.

Over the weekend, I tried something different: I converted an old 4GB Android phone into a local LLM server and routed it to my PC.

The goal was simple. Run AI offline. No subscriptions. No API costs.

Here is what worked, what did not, and what this experiment reveals about the future of edge AI.

The Stack

The setup was minimal:

• Termux for a Linux-like environment on Android
• Ollama for running local language models
• Qwen2 (0.5b variant) as the lightweight model
• One old Android device with 4GB RAM

Below are the exact steps that worked for me when setting up Termux and Ollama on Android.

  1. Install Termux

Download the latest APK from the official GitHub releases page of Termux. After installation, grant storage and network permissions when prompted.

  1. Update packages and install Ollama

Inside Termux:

pkg update && pkg install ollama
Enter fullscreen mode Exit fullscreen mode

This installs Ollama directly in the Termux environment.

  1. Start the Ollama server

Expose it to your local network:

export OLLAMA_HOST=0.0.0.0:11434
ollama serve &
Enter fullscreen mode Exit fullscreen mode

Setting 0.0.0.0 allows other devices on the same network to connect.

  1. Pull a lightweight model

For low-RAM devices, I used:

ollama pull qwen2:0.5b
Enter fullscreen mode Exit fullscreen mode

This pulls the 0.5B parameter variant of Qwen2, which is small enough to run on constrained hardware. If download speed is an issue, using alternative mirrors can help.

  1. Run the model and test with a prompt
ollama run qwen2:0.5b
Enter fullscreen mode Exit fullscreen mode

Note: On some setups, Ollama may throw an error about a missing serve executable. Creating a symbolic link fixes it:

ln -s $PREFIX/bin/ollama $PREFIX/bin/serve
Enter fullscreen mode Exit fullscreen mode

This maps the expected command to the correct binary.

  1. Access from your PC

From your computer, send a request to the phone’s local IP:

curl http://[phone-ip]:11434/api/generate -d '{"model": "qwen2:0.5b", "prompt": "Test"}'
Enter fullscreen mode Exit fullscreen mode

If everything is configured correctly, the phone responds with generated text.

At this point, your Android device is functioning as a local LLM server.

The Android device I used has a weak mobile CPU and limited RAM. Inference times were noticeably slow. Large prompts required patience.

There were additional bottlenecks:

• Termux introduces slight I/O latency since it runs a Linux environment on top of Android.
• The phone throttled performance to manage heat and battery health.
• Sustained loads caused noticeable slowdowns.

Phones are not designed to behave like servers. Thermal limits are very real.

Still, the system remained functional.

What This Actually Proves

The interesting part is not performance. It is feasibility.

A few years ago, running a language model required serious hardware. Now, even a retired Android phone can serve a lightweight LLM.

This experiment highlights three shifts:

Model compression is improving rapidly.

Edge AI is becoming practical.

Personal AI infrastructure is possible without cloud dependence.

This was not about replacing high-performance systems. It was about exploring autonomy.

Offline AI changes the equation. No network dependency. No usage limits. No recurring costs.

Is It Practical?

For production workloads? nope, not really.

For experimentation, learning, and private local tooling, yes.

If you are building tools that require lightweight inference or offline capabilities, small models running on edge devices are increasingly viable.

The tradeoff is speed.

The benefit is independence.

I hope you enjoyed it!

Top comments (0)