Trying On-Device LLM Inference with Python

#python #ai #llm

Running large language models on-device is becoming increasingly practical. Instead of sending prompts to a cloud API, models can run directly on hardware, improving privacy and reducing network dependency.

Below is a minimal example of running an LLM in Python using the picoLLM inference engine.

Install the SDK

Install the Python package:

pip install picollm

Get an AccessKey and a Model

To run a model, you need:

An AccessKey from the Picovoice Console
A downloaded model file

You can create an account and download models here:

https://console.picovoice.ai/

picoLLM supports several open-weight models such as Gemma, Llama, Mistral, Mixtral, and Phi, and runs on Linux, macOS, Windows, and Raspberry Pi with CPU or GPU inference.

Minimal Example

Import the package:

import picollm

Create an engine instance:

engine = picollm.create(access_key, model_path)

Generate a completion:

engine.generate(prompt)

Next Steps

For full API details and step-by-step demos:

Python SDK docs: https://picovoice.ai/docs/api/picollm-python/
Quick start guide: https://picovoice.ai/docs/quick-start/picollm-python/
Demo source code: https://github.com/Picovoice/picollm/tree/main/demo/python

For a full walkthrough and explanation, see the original article:

https://picovoice.ai/blog/how-to-run-llms-locally-with-python/

DEV Community

Trying On-Device LLM Inference with Python

Install the SDK

Get an AccessKey and a Model

Minimal Example

Next Steps

Top comments (0)