I built an open-source AI assistant that actually runs my day — here's the architecture

#ai #opensource #selfhosted #python

For the past few months, I've been running a personal AI assistant on a $5 VPS. Not a chatbot — an actual assistant that manages my calendar, triages my email, controls my Spotify, sends me proactive reminders, and remembers my preferences over time.

Today I'm open-sourcing it. It's called Rook.

GitHub: github.com/barman1985/Rook

Why I built this

Every AI assistant I tried fell into one of two categories:

Chatbots — they answer questions but don't do anything
Overengineered platforms — they need Kubernetes, five microservices, and a PhD to deploy

I wanted something in between. An AI that lives in Telegram (zero onboarding — no new app to install), actually executes tasks via tool use, and runs on a single VPS I already had lying around.

What Rook does

📅 Google Calendar — create, edit, delete, search events
📧 Gmail — read, search, send emails
🎵 Spotify — play, search, playlists, device management
📺 TV/Chromecast — power, apps, volume control via ADB
🧠 Memory — remembers preferences using ACT-R cognitive architecture
🔔 Proactive — morning briefing at 7am, calendar reminders every 15 min, evening summary
🎙️ Voice — local STT (faster-whisper) + TTS (Piper) — completely free and private
🔌 MCP Server — expose all tools to Claude Desktop or Cursor
🧩 Plugins — drop a Python file, restart, new skill is live

The architecture (the part I'm most proud of)

Rook has 5 layers with strict dependency direction — each layer only depends on the layer below it:

┌─────────────────────────────────┐
│       Transport layer           │  Telegram, MCP, CLI
├─────────────────────────────────┤
│       Router / Orchestrator     │  Intent → model → agentic loop
├─────────────────────────────────┤
│       Skill layer (pluggable)   │  Calendar, Email, Spotify, ...
│  ┌──────┐ ┌──────┐ ┌───────-─┐  │
│  │built │ │built │ │community│  │  Drop a .py, done.
│  │ -in  │ │ -in  │ │ plugin  │  │
│  └──────┘ └──────┘ └──────-──┘  │
├─────────────────────────────────┤
│           Event bus             │  on "calendar.reminder"→notify
├─────────────────────────────────┤
│         Core services           │  Config, DB, Memory, LLM client
└─────────────────────────────────┘

** Why this matters: **

Skills never import from Transport. Calendar doesn't know it's being called from Telegram. Tomorrow it could be WhatsApp or a CLI.
Event bus decouples everything. The scheduler emits calendar.reminder — it doesn't know or care who's listening. The notification service picks it up and sends a Telegram message.
One config, one DB, one LLM client. No module reads .env directly. No module opens its own SQLite connection. Everything flows through Core.

The plugin system

This is what I think makes Rook actually useful for others. Adding a new integration is one Python file:

# rook/skills/community/weather.py
from rook.skills.base import Skill, tool

class WeatherSkill(Skill):
    name = "weather"
    description = "Get weather forecasts"

    @tool("get_weather", "Get current weather for a city")
    def get_weather(self, city: str) -> str:
        import httpx
        return httpx.get(f"https://wttr.in/{city}?format=3").text

skill = WeatherSkill()

That's it. The @tool decorator registers it with the LLM. Type hints are auto-inferred into JSON schema. Drop the file in skills/community/, restart Rook, and the LLM can now call get_weather.

No core changes. No PR needed. No registration boilerplate.

ACT-R memory (not just another key-value store)

Most AI assistants either forget everything between sessions or dump everything into a flat database. Rook's memory is inspired by the ACT-R cognitive architecture from psychology.

Every memory has an activation score based on:

Recency — when was it last accessed? (power law decay)
Frequency — how often is it accessed? (logarithmic boost)
Confidence — how reliable is this fact?

When the LLM needs context, only the most activated memories get injected into the system prompt. Frequently used memories stay sharp. Unused ones naturally fade. Just like your brain.

Local voice (zero API costs)

Rook processes voice messages locally:

STT: faster-whisper (base model, CPU int8) — transcribes voice messages from Telegram
TTS: Piper (Czech voice, 61MB ONNX model) — Rook speaks back

Both run on the VPS. No cloud API calls, no per-minute billing, completely private.

Setup takes 2 minutes

git clone https://github.com/barman1985/Rook.git
cd Rook && python -m venv venv && source venv/bin/activate
python -m rook.setup  # interactive wizard guides you through everything
python -m rook.main

The setup wizard asks for your API keys step by step, auto-detects available integrations, and generates .env. Docker is also supported.

What you need:

Any VPS or home server (runs fine on 1GB RAM)
Python 3.11+
Anthropic API key (~$5-10/month for personal use)
Telegram (as the interface)

Numbers

3,100 lines of Python
10 skills, 32 tools
65 tests (62 pass, 3 skip for optional dependencies)
MIT license
Runs on a $5 VPS alongside other projects

What's next

More community skills (weather, Notion, Todoist, Home Assistant)
Bluesky integration for social posting
Ollama support for local LLM fallback
GitHub Actions CI pipeline
Skill marketplace

If you've ever wanted an AI assistant that actually does things instead of just chatting, give Rook a try. Star the repo if it looks useful, and I'd love feedback on the architecture.

GitHub: github.com/barman1985/Rook
Support: Buy me a coffee

♜ Rook — your strategic advantage.