DEV Community

Cover image for $0 Personal Agentic AI Assistant - Architecture - Part 1
AK DevCraft
AK DevCraft Subscriber

Posted on • Edited on

$0 Personal Agentic AI Assistant - Architecture - Part 1

OpenClaw Challenge Submission 🦞

Introduction

A productivity tool that promised to change everything, charged monthly, and quietly became background noise. AI assistants are going the same way — another tab, another login, another $20/month for something you open twice a week. Probably, most of us regret the subscription that we have today.

Welp! What if you didn't have to?

In today’s world, the infrastructure exists to run a capable, always-on personal AI assistant, one that lives in your day-to-day regular apps like Telegram or WhatsApp, remembers you, browses the web, and handles real tasks — for exactly zero dollars a month. Not a trial. Not a teaser. Permanently free, on infrastructure you control.

This article explains the architecture that makes it possible and why each piece matters.

The Subscription Trap

Most people's AI setup looks like this: Claude.ai, ChatGPT, or any other AI providers in a browser tab or mobile app, opened when needed, closed when done. Conversations are saved, and you can go back to what you discussed last time if you're in the same thread. But it's passive history, not active memory. You have to go and find it. And across that whole time, it couldn't reach out, take action, or do anything unless you opened it first.

That's not an assistant. That's a very smart search box.

A real assistant is always on. It knows who you are. It operates in the apps you already use. It can take actions, not just generate text, and it doesn't charge you for existing.

Until recently, building that required either paying cloud AI bills or owning serious hardware. Both most likely out of reach for most people. However, that can be changed.

Three Shifts That Can Make This Possible

1. Open-weight models are now genuinely capable

Meta's Llama, Google's Gemma, and others have closed the gap with proprietary models significantly over the past few years. A 3-8 billion parameter model running locally can handle the majority of everyday tasks like summarising, drafting, answering questions, and light reasoning, that people actually use AI assistants for day-to-day.

2. Cloud providers offer permanently free compute

Oracle Cloud's Always Free tier gives you up to 4 ARM CPU cores and 24GB of RAM — permanently, with no expiry date. Not a 12-month trial like AWS. Not credits that run out. A real server running 24/7 at zero cost, forever, as long as you keep the account active.

That's enough to run Ollama with a capable local model.

3. Free API tiers have become genuinely useful

Google's Gemini 2.5 Flash-Lite is generally capped at 250K Tokens Per Minute (TPM) on the free tier with no credit card required. For a personal assistant handling one person's queries, that's more than enough headroom. When a local model is too slow or too limited for a task, Gemini catches it — for free.

Put these three things together, and the economics change completely.

The Architecture - Tech Stack

Tech Stack

  • Oracle Cloud ARM Instance — your always-on server. 4 CPU cores, 24 GB RAM, permanently free. Hosts everything. Never sleeps, never charges.

  • Ollama — runs open-source language models locally on your server. No API calls, no cost, no data leaving your machine. The primary brain is for most tasks.

  • Gemini API (free tier) — Google's fallback for when the local model is too slow or hits a complex task. 1,000 free requests per day—no credit card.

  • OpenClaw — The agent layer that ties everything together. Connects to Telegram, maintains memory across conversations, runs scheduled tasks, and routes requests between local and cloud models intelligently.

  • Tavily - Native AI search engine

  • MCP - Model Context Protocol server, bridge to fetch data from backend service/engine

What It Can Actually Do

This isn't just a toy setup. On this stack, you get:

  • Telegram access — message your agent from your phone, anywhere, like texting a person

  • Persistent memory — it remembers your preferences, ongoing projects, and past conversations

  • Web search — real-time search via Tavily's free tier integrated directly into responses

  • File operations — read, write, and summarise documents on the server

  • GitHub integration — search issues, review code, summarise pull requests

  • Scheduled tasks — set reminders, recurring summaries, automated workflows

  • Custom agents — define specialised subagents for specific tasks (code review, research, writing)

What it can't do as well as a paid service: complex multi-step reasoning at speed, very long document analysis, and tasks that push the limits of a 3B parameter model. For those, the Gemini fallback steps in.

The Honest Tradeoffs

Zero cost doesn't mean zero compromise. Know what you're getting into:

  • Speed — local CPU inference is slower than cloud APIs. A response that takes a few seconds on Claude.ai might take > 30 seconds locally. With Gemini as a fallback, complex tasks are fast. Simple tasks on the local model are slow but free.

  • Quality ceiling — a 3B local model is noticeably less capable than Claude Sonnet or GPT-4. For writing, summarisation, and Q&A, it's fine. For nuanced reasoning or complex code, it shows limitations.

  • Setup effort — this is not a five-minute install. There are VCN configurations, systemd services, API keys, and model downloads involved. It takes an afternoon to set up correctly. Once running, it requires minimal maintenance.

  • Oracle ARM capacity — Oracle's free ARM instances are in high demand. You may need to retry provisioning multiple times or upgrade to Pay As You Go (which still costs $0 for Always Free resources) to get reliable access.

Who This Is For

It makes sense if:

  • You're comfortable with a terminal and basic Linux
  • You want AI infrastructure you actually control
  • You're experimenting and don't want ongoing costs
  • You're comfortable with slower responses in exchange for zero cost

It doesn't make sense if:

  • You need production-grade reliability
  • Response speed is critical
  • You want a turnkey experience with no configuration
  • You'd rather pay $10-20/month for something that just works

For the right person, this is the most interesting AI setup you can build right now. Not because it beats the paid alternatives on any individual metric, but because it's yours — running on your server, with your data, on your terms, for nothing, and most importantly, your private data on your laptop is far away from accidentally being exposed.

What's Next

This article is the first in a five-part series:

  1. The Architecture ← you are here
  2. Setting Up Free Cloud Server — VCN, ARM instances, static IPs, the gotchas
  3. Running Ollama on ARM — model selection, disk management, CPU inference reality
  4. Installing OpenClaw on Linux — avoiding every trap
  5. The Complete Setup — Telegram, Gemini fallback, end-to-end testing

Stay tuned, all links will be updated as articles are published.

If you have reached this point, I have made a satisfactory effort to keep you reading. Please be kind enough to leave any comments or share any corrections.

My Other Blogs:

Top comments (2)

Collapse
 
akdevcraft profile image
AK DevCraft

Well! quite late to the OpenClaw challenge but sometimes it’s better to put yourself out there than never try at all.

Collapse
 
harjjotsinghh profile image
Harjot Singh

A $0 agentic assistant is a great forcing function - the budget constraint makes you architect well instead of papering over inefficiency with a bigger model. Free tiers + local models + ruthless context discipline is exactly the stack that teaches you where the spend actually goes, because you feel every wasted call.

The architecture decisions that keep it at $0 are the same ones that keep a paid system cheap at scale: route trivial work to local/free, cache aggressively, and only escalate to a paid API for the rare hard call. The discipline you're forced into at $0 is what most people should be doing at $200/mo anyway. Same routing thesis behind Moonshift (prompt to a shipped SaaS on your own GitHub+Vercel). Looking forward to the rest of the series; what's handling the local inference - Ollama, or a free API tier? (Moonshift's first run's free if useful.)