I built a fully local AI assistant at 16 — no cloud, no API keys, runs on your GPU

#ai #llm #privacy #showdev

I'm 16, from Pune, India. For the past couple of years I've been building O-AI — a fully local AI desktop assistant. No cloud. No API keys. No data leaving your machine. Everything runs on your own GPU.

Why I built it

Every AI assistant I tried sent data somewhere. ChatGPT, Copilot, Gemini — all cloud. I wanted something that felt like JARVIS from Iron Man: smart, fast, personal, and private. So I built it from scratch.

What O-AI can do

Core engine:

Runs LLMs fully on-device via llama.cpp / Ollama (zero internet required)
Self-learning core — extracts facts from every conversation and stores them permanently
Fine-tuning pipeline — train the model on your own data, locally

Voice & language:

Voice control in English, Hindi, and Marathi via Whisper (running locally)
Responds in whatever language you speak

Modes:

JARVIS mode — arc-reactor HUD, 4 reactive states, British-male voice, "sir" persona
Take Over PC mode — full desktop automation
Animated floating desktop pet (4 types, draggable, reacts to voice)

30+ automation fast-paths: open apps, search the web, control media, screen vision, run code, edit files, cursor control, social media steps, clipboard ops...

Multi-step agent system: plan → execute → verify loop with 14+ step types (web_search, fetch_url, read_screen, run_code, edit_file, open_social, and more)

Stack

Backend:  Python (Flask IPC + agent core)
Frontend: Electron + vanilla JS
LLM:      llama.cpp / Ollama
Voice:    Whisper (local) + Edge TTS / neural voice
Vision:   PIL + screen capture

The hardest bugs

"Says done but isn't" — Early versions reported success even when an agent step failed. Fixed by building a proper outcome verifier that reads the actual result, not the plan.

The "opens a random video" bug — Asking the agent to play something would open random YouTube videos. Root cause: the plan validator wasn't catching placeholder URLs like [video_url]. Fixed with a universal content guard on all plans.

GPU offloading on Windows — Getting all 32 layers onto the GPU with the right CUDA flags took way too long. Worth it though.

What I learned

Building something real teaches you more than any tutorial. Every bug is a design decision you haven't made yet. If you're not embarrassed by v1, you shipped too late.