Rev

Posted on Apr 15 • Edited on Jun 15 • Originally published at qiita.com

I tried out "1-bit LLM Bonsai" using the Linux terminal app on my Google Pixel 7a.

#llm #pixel

Introduction

I tried running PrismML's 1-bit LLMs, Bonsai-8B and Bonsai-1.7B, on the Linux terminal app on a Google Pixel 7a. Bonsai-8B has a compact model size of just 1.15GB and reportedly runs on a Raspberry Pi 4B. Since it was described as "An 8-billion-parameter LLM that 'runs on a smartphone' — '1-bit Bonsai' at 1.15GB claiming production-level performance draws attention," I decided to test whether it actually runs on a smartphone.

References

https://prismml.com/news/bonsai-8b

https://qiita.com/y0kud4/items/3f7faeea52d7eec01b0f

https://qiita.com/moritalous/items/96cdc8bcd48d8a193556

https://qiita.com/revsystem/items/7ad150a7f99aaea27691

https://zenn.dev/kun432/scraps/ce16474a3be277

Test Environment

Google Pixel 7a
Android 16
Linux terminal

Setup

Enabling the Linux Terminal app

To enable the Linux terminal app on a Pixel device, refer to the article Running AWS Bedrock API Using Google Pixel's Linux Terminal app.

Building llama.cpp

Build llama.cpp by following this article:

https://qiita.com/moritalous/items/96cdc8bcd48d8a193556

Install the packages required for the build:

sudo apt update
sudo apt install -y git cmake build-essential libopenblas-dev pkg-config openssl libssl-dev

Clone the llama.cpp fork maintained by PrismML:

git clone https://github.com/PrismML-Eng/llama.cpp && cd llama.cpp

Build the project. This takes approximately 15 minutes.

cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build --config Release -j2

Running the Models

Since the Pixel's Linux terminal app does not support Japanese input, I used a simple prompt: 'hello'. Both models produced streaming output too slowly to be practical on the Pixel's Linux terminal app. Additionally, the Linux terminal app crashed frequently, requiring a full recovery each time, so I was only able to test the following two patterns.

Bonsai-8B

The first run takes longer because the model needs to be downloaded. When the context size was not specified with -c, a Segmentation fault occurred. Following the advice at https://zenn.dev/kun432/scraps/ce16474a3be277#comment-3592ca1fdb0075, I set the context size to 32784.

./build/bin/llama-cli \
  -hf prism-ml/Bonsai-8B-gguf \
  -p "hello" \
  -c 32784

The results were as follows:
[Prompt: 1.5 t/s | Generation: 0.1 t/s]

Bonsai-1.7B

Similarly, I ran Bonsai-1.7B. This model did not require specifying the context size.

./build/bin/llama-cli \
  -hf prism-ml/Bonsai-1.7B-gguf \
  -p "hello"

The results were as follows:
[Prompt: 4.4 t/s | Generation: 1.5 t/s]

Conclusion

I tested Bonsai-8B and Bonsai-1.7B on the Google Pixel 7a's Linux terminal app. I confirmed that both models run. However, despite being lightweight models, the devices' performance was insufficient for practical use. That said, the fact that the build completed successfully was an interesting finding.

DEV Community