Introduction
I tried running PrismML's 1-bit LLMs, Bonsai-8B and Bonsai-1.7B, on the Linux terminal app on a Google Pixel 7a. Bonsai-8B has a compact model size of just 1.15GB and reportedly runs on a Raspberry Pi 4B. Since it was described as "An 8-billion-parameter LLM that 'runs on a smartphone' — '1-bit Bonsai' at 1.15GB claiming production-level performance draws attention," I decided to test whether it actually runs on a smartphone.
References
https://prismml.com/news/bonsai-8b
https://qiita.com/y0kud4/items/3f7faeea52d7eec01b0f
https://qiita.com/moritalous/items/96cdc8bcd48d8a193556
https://qiita.com/revsystem/items/7ad150a7f99aaea27691
https://zenn.dev/kun432/scraps/ce16474a3be277
Test Environment
- Google Pixel 7a
- Android 16
- Linux terminal
Setup
Enabling the Linux Terminal app
To enable the Linux terminal app on a Pixel device, refer to the article Running AWS Bedrock API Using Google Pixel's Linux Terminal app.
Building llama.cpp
Build llama.cpp by following this article:
https://qiita.com/moritalous/items/96cdc8bcd48d8a193556
Install the packages required for the build:
sudo apt update
sudo apt install -y git cmake build-essential libopenblas-dev pkg-config openssl libssl-dev
Clone the llama.cpp fork maintained by PrismML:
git clone https://github.com/PrismML-Eng/llama.cpp && cd llama.cpp
Build the project. This takes approximately 15 minutes.
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build --config Release -j2
Running the Models
Since the Pixel's Linux terminal app does not support Japanese input, I used a simple prompt: 'hello'. Both models produced streaming output too slowly to be practical on the Pixel's Linux terminal app. Additionally, the Linux terminal app crashed frequently, requiring a full recovery each time, so I was only able to test the following two patterns.
Bonsai-8B
The first run takes longer because the model needs to be downloaded. When the context size was not specified with -c, a Segmentation fault occurred. Following the advice at https://zenn.dev/kun432/scraps/ce16474a3be277#comment-3592ca1fdb0075, I set the context size to 32784.
./build/bin/llama-cli \
-hf prism-ml/Bonsai-8B-gguf \
-p "hello" \
-c 32784
The results were as follows:
[Prompt: 1.5 t/s | Generation: 0.1 t/s]
Bonsai-1.7B
Similarly, I ran Bonsai-1.7B. This model did not require specifying the context size.
./build/bin/llama-cli \
-hf prism-ml/Bonsai-1.7B-gguf \
-p "hello"
The results were as follows:
[Prompt: 4.4 t/s | Generation: 1.5 t/s]
Conclusion
I tested Bonsai-8B and Bonsai-1.7B on the Google Pixel 7a's Linux terminal app. I confirmed that both models run. However, despite being lightweight models, the devices' performance was insufficient for practical use. That said, the fact that the build completed successfully was an interesting finding.


Top comments (0)