Full visual version : foodlbs.github.io/openclad
There's a gap between what AI chatbots can do and what they actually do for you day-to-day. ChatGPT can write a poem, but it can't turn off your living room lights when you're already in bed. Claude can analyze a document, but it won't proactively send you a news briefing at 9 AM.
I wanted something different: an AI agent that runs continuously, has access to my real tools and services, makes decisions autonomously for low-risk tasks, and asks permission before doing anything destructive. Something I could message from my phone and get things done — not just get answers.
So I built Jarvis.
What Jarvis Can Do
| Time | What happens |
|---|---|
| 9:00 AM | Jarvis sends a news briefing to Telegram — top headlines, tech, markets. No prompt needed. |
| 10:30 AM | "What's the status of my job applications?" → queries SQLite tracker, returns summary |
| 2:00 PM | Send a PDF → Jarvis summarizes, stores in vector memory, saves markdown to Documents |
| 11:00 PM | "Turn off all the lights." → Approval button on Telegram → tap Approve → lights off |
The key insight: Jarvis doesn't just respond to questions. It executes tasks, remembers context, and operates on a schedule — all while respecting a risk classification system that keeps me in control of anything with real-world side effects.
Architecture Overview
The system breaks down into four layers:
The Tech Stack
| Layer | Technology | Why |
|---|---|---|
| Language | Python 3.12 + uv workspace | Fast deps, monorepo support |
| Agent | Claude Agent SDK (subprocess) | Process isolation, crash recovery |
| Chat | aiogram 3.x | Async Telegram, inline keyboards |
| State | Redis | Task state, buffers, retry queue |
| Vector Memory | ChromaDB (local) | Free, no cloud dependency |
| Embeddings | OpenAI text-embedding-3-small | Cost-effective semantic search |
| MCP Framework | FastMCP | Simple Python MCP server creation |
| Config | pydantic-settings + YAML | Type-safe + env var overrides |
| Logging | structlog | Structured JSON logs |
| Daemon | macOS LaunchAgent | Auto-start, background execution |
Deep Dive: How Each Piece Works
1. The Conversation Flow
When you send a message on Telegram:
Smart model selection — Short, simple messages (<100 chars, no complex keywords) auto-route to Haiku instead of Sonnet. Saves cost and cuts latency for quick tasks while preserving Sonnet's reasoning for demanding work.
Conversation buffer — Last 10 turns stored in Redis with a 1-hour TTL. Follow-ups like "What about the bedroom lights?" work because Jarvis remembers the context.
2. The Risk System
The most important piece. Two-tier classification:
✅ AUTONOMOUS — No approval needed
- File reads, web search, memory queries
- Code sandbox, browser navigation
- Job tracker reads, calendar reads
🔒 REQUIRE APPROVAL — Inline Telegram button
- File writes / deletes
- Email sends, calendar edits
- Smart home control, phone calls, purchases
Classification isn't just by tool name — it examines input parameters too. Reading ~/Documents is autonomous. Writing to /etc/ always requires approval. Checking a lock status is autonomous. Unlocking it requires approval regardless of context.
Configured via risk_policy.yaml — no code changes needed:
risk_overrides:
mcp__filesystem__write_file: autonomous # Trust local file writes
mcp__smart_home__call_service: require_approval # Always ask
context_escalation:
dangerous_paths:
- /system
- /etc
- /usr/bin
sensitive_entities:
- lock
- alarm
- security
3. The Memory System
Two layers, each optimized for different retrieval patterns:
Short-term: Markdown files injected into the system prompt
| File | Purpose |
|---|---|
preferences.md |
Learned preferences ("prefers TypeScript over JavaScript") |
projects.md |
Active project status |
chat_history.md |
Recent session summaries — auto-trims at 30 sessions |
journal.md |
Agent reflections and observations |
The ContextLoader stitches these into every agent invocation. When chat_history.md exceeds 30 sessions, older entries compress into bullet-point summaries to keep the prompt under control.
Long-term: ChromaDB vector store with semantic search
After completing significant tasks, Jarvis stores a 2-3 sentence summary with metadata. Before tackling complex work, it searches: "Have I solved something like this before?"
The combination gives Jarvis both working memory (always loaded) and recall (searchable when needed).
4. The Skill Framework
Skills are Markdown files defining triggers, steps, and required tools:
## Daily News Briefing
**Trigger:** "news update", "daily news", "morning briefing"
**Schedule:** Every day at 9:00 AM EST
### Steps
1. Search for current top headlines
2. Search for tech industry news
3. Search for business/markets news
4. Format into clean briefing with sections
5. Output ONLY the briefing — no meta-commentary
The skill manager routes incoming messages by matching keywords and context. If a request matches multiple skills, it chains them.
The interesting part: when Jarvis notices a repeatable pattern (3+ similar tool call sequences), it suggests creating a new skill. The system grows organically based on actual usage.
5. The Scheduler
Parses a human-readable schedules.md into cron-like entries:
## Daily News Briefing
**Schedule:** Every day at 9:00 AM EST
**Action:** Execute daily-news-briefing skill
**Output:** Send formatted news briefing via Telegram
**Status:** Active
Each job runs as an async task with max turns capped at 15 to enforce conciseness and suppress the model's meta-commentary tendency.
6. Resilience & Fallback
Running 24/7 means things will break. Three mechanisms handle it:
Retry queue — Failed tasks enter a Redis sorted set with exponential backoff (30s → 60s → 120s). After 3 retries, abandoned with a failure notification. On retry, model downgrades to Haiku to reduce cost.
API fallback — If the Claude API is rate-limited, Jarvis detects error keywords ("rate_limit", "529", "overloaded") and falls back to the Claude Code CLI with a cached OAuth token.
Circuit breaker — Standard closed → open → half-open pattern. After 5 consecutive failures, circuit opens and rejects calls for 60 seconds before allowing a probe request.
Project Structure
personal-ai-agent/
├── main.py # Entry point & orchestrator
├── pyproject.toml # uv workspace root
├── compose.yaml # Docker Compose (Redis)
├── packages/
│ ├── core/ # Agent, config, state, risk, retry, scheduler
│ ├── interfaces/ # Telegram bot, handlers, approval flow
│ └── mcp_servers/ # Job tracker, smart home, memory
├── data/
│ ├── agent_context/ # personality.md, skills/, schedules.md
│ ├── memory/chroma/ # Vector store persistence
│ └── secrets/ # OAuth credentials (gitignored)
├── configs/
│ ├── agent.yaml # Runtime config
│ └── risk_policy.yaml # Risk classification overrides
└── tests/ # 20+ unit tests
The monorepo keeps things modular — core has no Telegram dependency, interfaces has no MCP dependency, and mcp_servers are standalone FastMCP processes. Want Discord instead of Telegram? Replace interfaces without touching core.
Key Design Decisions
Why local ChromaDB over Pinecone? No cloud dependency. The vector store lives at data/memory/chroma/ — just files on disk. Zero cost, zero round-trip latency, git-backupable.
Why the subprocess model? Each agent invocation is isolated. If it crashes, nothing leaks. SDK upgrade? Restart the process. Simplest possible isolation boundary.
Why Telegram over a custom UI? Already on my phone, laptop, and watch. Inline keyboards, file attachments, voice messages, rich formatting — all built-in. A custom UI would have taken weeks for a worse experience.
Why Markdown for schedules and skills? Editable with any text editor, version-controlled with git, readable by the agent itself. When Jarvis creates a new skill, it writes a .md file.
Why Redis for everything stateful? One dependency, in-memory speed, TTL for auto-cleanup, pub/sub for future real-time features.
Running It Yourself
# Clone the starter
git clone https://github.com/yourusername/jarvis-starter.git
cd jarvis-starter
# Configure
cp .env.example .env # Add your API keys
# Install
uv sync --all-packages
# Start Redis
docker compose up -d redis
# Run
uv run python main.py
You'll need:
- Anthropic API key (Claude)
- Telegram bot token (from
@BotFather) - Your Telegram chat ID
- Optional: OpenAI key (embeddings), Google OAuth (Calendar/Gmail), Home Assistant URL + token
The starter ships with the full structure, a working Telegram bot, the risk system, two MCP servers (filesystem + memory), and one example skill (daily news briefing). Everything else added incrementally.
What I'd Do Differently
- Voice pipeline — Telegram voice transcription works but is clunky. A dedicated streaming voice pipeline would transform the experience.
-
Parameterized skills — Skills as templates:
Research {topic} with depth {shallow|deep}instead of flat instruction sets. - Multi-agent orchestration — Some tasks need parallel sub-agents (researcher + writer). The single-agent model hits turn limits on complex workflows.
- Observability dashboard — Task history, tool usage, cost tracking, memory growth. The event stream is there but underutilized.
Wrapping Up
Jarvis has been running 24/7 on my Mac for about two weeks — delivering morning news, tracking job applications, helping with research, and controlling my apartment — all through Telegram.
The total codebase is ~2,000 lines of Python across three packages, plus Markdown files for personality, skills, and schedules. The agent framework (Claude SDK + MCP servers) does the heavy lifting; the surrounding infrastructure — risk classification, retry logic, conversation persistence, skill routing — is what transforms a chatbot into an actual assistant.
Check out the starter repo and if you build something cool with it, I'd love to hear about it.
Built with Claude Agent SDK · aiogram · Redis · ChromaDB · FastMCP · Running on a Mac Mini as a LaunchAgent

Top comments (0)