Running large language models on-device is becoming increasingly practical. Instead of sending prompts to a cloud API, models can run directly on hardware, improving privacy and reducing network dependency.
Below is a minimal example of running an LLM in Python using the picoLLM inference engine.
Install the SDK
Install the Python package:
pip install picollm
Get an AccessKey and a Model
To run a model, you need:
- An AccessKey from the Picovoice Console
- A downloaded model file
You can create an account and download models here:
https://console.picovoice.ai/
picoLLM supports several open-weight models such as Gemma, Llama, Mistral, Mixtral, and Phi, and runs on Linux, macOS, Windows, and Raspberry Pi with CPU or GPU inference.
Minimal Example
Import the package:
import picollm
Create an engine instance:
engine = picollm.create(access_key, model_path)
Generate a completion:
engine.generate(prompt)
Next Steps
For full API details and step-by-step demos:
- Python SDK docs: https://picovoice.ai/docs/api/picollm-python/
- Quick start guide: https://picovoice.ai/docs/quick-start/picollm-python/
- Demo source code: https://github.com/Picovoice/picollm/tree/main/demo/python
For a full walkthrough and explanation, see the original article:
https://picovoice.ai/blog/how-to-run-llms-locally-with-python/
Top comments (0)