DEV Community

Cover image for Build Your Own AI Assistant App - No Cloud Required!
Vijay Vimalananda Guru
Vijay Vimalananda Guru

Posted on

Build Your Own AI Assistant App - No Cloud Required!

Introduction

This article will help you to build your own AI Assistant application using edge devices like Raspberry Pi. I initially began this project with a goal of automating all the smart devices in my home, but we will cover that topic in a further article. Before we started, we need to understand what we are going to build and the prerequisites involved.

What You'll Build

  • First API service that communicate between your Android/iOS app with the SLM(Small Language Model) that is hosted in Raspberry Pi. Here, we are going to use Python framework FastAPI.
  • Mobile application(both Android/iOS) using Compose Multiplatform.

Prerequisites

  • Hardware: Although Raspberry Pi 4 (4GB/8GB) is enough, it is better to have Raspberry Pi 5 (4BG/8GB/16GB) with 32GB SD card and active cooler.
  • Software: Android Studio (I'm using Panda), Python version 3.11+, Tailscale app(which we will talk later)
  • Model: TinyLlama-1.1B-chat-v1.0 (~680MB of size, quantized, consumes 1.1GB –1.9GB RAM, more than enough for a short queries)
  • Network: Pi can be connected over Wi-Fi or Ethernet
  • Operating System: Raspberry Pi OS for Raspberry Pi module and Mac/Windows/Linux for mobile app development.

Architecture

  • Frontend: Compose Multiplatform(single codebase for Android/iOS)
  • Backend: FastAPI(handling async request & model inference)

Let's begin with Frontend

After installing Android Studio, you can download the source code from my GitHub repository or write your own implementation - I'm sure may of you can come up with a better version. To keep this article concise, I won't go further detail about the CMP (Compose Multiplatform) code here.

Tech Stack

  • UI: Compose Multiplatform
  • Language: Kotlin
  • Architecture: Clean Architecture (Data, Domain, Presentation)
  • Networking Library: Ktor
  • Local Database: SQLDelight
  • Dependency Injection: Koin
  • Concurrency: Kotlin Coroutines & Flow

Project Structure

  • composeApp/src/commonMain: Shared logic, data models, repositories, and UI components.
  • composeApp/src/androidMain: Android-specific implementations (e.g., Database Driver, MainApplication).
  • composeApp/src/iosMain: iOS-specific implementations and entry points.
  • data/: Network services (AIChatApiService) and repository implementations.
  • domain/: Business logic and Use Cases.
  • presentation/: ViewModels and Compose UI screens.

Important Note

The application communicates with the AI model via REST API. You need to configure the BASE_URL in AIChatApiService.kt:

// Use this if you have not setup Tailscale
const val BASE_URL = "http://<YOUR_RASPBERRY_PI_IP>:8000"
// If you have setup Tailscale
const val BASE_URL = "http://<IP_PROVIDED_BY_TAILSCALE>:8000"
Enter fullscreen mode Exit fullscreen mode

Once we setup backend part in Raspberry Pi, we can replace that URL here.

Let's Begin with Backend Setup

To streamline this process, we will directly begin with installation of required packages and setting up AI Engine. Of course implementing API service that communicated between app and the AI model.

After installing Raspberry Pi OS, switch on the Pi module and open terminal/command prompt. 

Let's begin with hosting the model

Step 1

# Intall VCS and essential tools for compiling
sudo apt install git build-essential cmake -y

# Download cpp project to run Llama on CPU
git clone https://github.com/ggerganov/llama.cpp

# Once done, go to 
cd llama.cpp
cd models

# Download the model from Huggingface or choose your favourite model(recommanded 0.5B-3B Parameter)
curl -L -o tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

# Prepare your project for building
cmake -B build

# Compiles the project. This will take some time
cmake --build build --config Release
Enter fullscreen mode Exit fullscreen mode

Open new terminal window and run this

# Start the model
./llama.cpp/build/bin/llama-server -m ./llama.cpp/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -c 512 -t 4 --port 8080
Enter fullscreen mode Exit fullscreen mode

Step 2
Now, we will implement an API that communicates with the model which we hosted in step 1.

# Check for OS update
sudo apt update && sudo apt upgrade -y

# Make a separate folder for API service. I'm naming it as quanta
mkdir quanta && cd quanta

# To create isolated virtual environment
python3 -m venv .venv

# Activates virtual environment
source .venv/bin/activate

# Install required packages
pip install fastapi uvicorn requests httpx psutil

# Create a Python file naming as aichatservice
touch aichatservice.py
Enter fullscreen mode Exit fullscreen mode

Open the file aichatservice.py and paste the code available in this repo

Save the file and run the below command

uvicorn aichatservice:app --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode

Try this in any API testing tools like Postman or Rest Client
Request Type: POST
URL: http://raspberry-pi-id:8000/generate

//Request Body:
{
  "prompt": "can you explain to me AI in short?"
}
Enter fullscreen mode Exit fullscreen mode

Please take note that whenever you login/restart the Raspberry Pi we have to start the web server using the above uvicorn command(picture 1). We also need to start the LLM server using llama.cpp(picture 2)

Picture 1

Picture 2

We are now done with the backend part. 

Additional: To make the API accessible outside your local network, you can do port forwarding(router level-not secured)or a tunneling service. I'm going to use Private VPN (Tailscale). This works securely anywhere.

Open the terminal and run the below command.

# Intall Tailscale 
curl -fsSL https://tailscale.com/install.sh | sh
# After Installation, run this
sudo tailscale up
Enter fullscreen mode Exit fullscreen mode

After running sudo tailscale up, you will be prompted to authenticate using the link provided. Once authentication is successful, you will be redirected to the Tailscale dashboard where you can find the IP Address of the device connected. Copy it and replace it with URL: http://ip-provided-by-tailscale:8000/generate. 

And that is it -  you are all set. Now, it's time to run the application in Android Studio.

Final Output

Android

iOS

Since this article only covers the basics of building you own AI assistant, I didn't focus much on the design, adding comment line, architecure and "Token Streaming". In the next article, we will explore how to communicate with smart devices using Docker and Home Assistant.

Let me know your thoughts in the comments so that it will help me improve further.

Top comments (0)