DEV Community

Khushi Nakra
Khushi Nakra

Posted on

Trying On-Device LLM Inference with Python

Running large language models on-device is becoming increasingly practical. Instead of sending prompts to a cloud API, models can run directly on hardware, improving privacy and reducing network dependency.

Below is a minimal example of running an LLM in Python using the picoLLM inference engine.

Install the SDK

Install the Python package:

pip install picollm
Enter fullscreen mode Exit fullscreen mode

Get an AccessKey and a Model

To run a model, you need:

  1. An AccessKey from the Picovoice Console
  2. A downloaded model file

You can create an account and download models here:

https://console.picovoice.ai/

picoLLM supports several open-weight models such as Gemma, Llama, Mistral, Mixtral, and Phi, and runs on Linux, macOS, Windows, and Raspberry Pi with CPU or GPU inference.

Minimal Example

Import the package:

import picollm
Enter fullscreen mode Exit fullscreen mode

Create an engine instance:

engine = picollm.create(access_key, model_path)
Enter fullscreen mode Exit fullscreen mode

Generate a completion:

engine.generate(prompt)
Enter fullscreen mode Exit fullscreen mode

Next Steps

For full API details and step-by-step demos:

For a full walkthrough and explanation, see the original article:

https://picovoice.ai/blog/how-to-run-llms-locally-with-python/

Top comments (0)