Natnael Getenew

Posted on Mar 12

I Built a Personal AI Computer With Gemini - Here's How

#gemini #googlecloud #ai #hackathon

This article was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge

The Problem Nobody Has Solved

306,000 people starred Open Claw on GitHub. They all want the same thing: a personal AI that actually does things. Sends emails. Manages calendars. Runs code. Browses the web. Learns new skills.

But every solution looks the same: clone the repo, install Docker, configure API keys, run terminal commands, manage a cloud bill. The technology is amazing. The accessibility is terrible.

8 billion people want a personal AI computer. 99% of them will never run a Docker container.

So I built Elora.

What Elora Is

Elora is not a chatbot. She's a personal AI computer that lives on your phone.

She has her own sandbox (a persistent cloud VM where she installs packages, runs code, and saves files - isolated per user). She has her own skill system (she can search for skills, install them, or write new ones from scratch). And she has a security layer that protects everything she does.

You download the app and talk to her. That's it. No setup. No API keys. No Docker.

Elora Live Voice Architecture

Elora Wake Word

The Tech Stack

Layer	Technology
Mobile	Expo / React Native (TypeScript)
Voice	Gemini Live API (real-time bidirectional audio)
Agent	Google ADK (multi-agent orchestration)
LLM	Gemini 2.0 Flash / 2.5 Flash
Browser	Playwright + Gemini 2.5 Flash (computer use)
Code Sandbox	E2B (per-user persistent VMs)
Skills	Custom skill engine (search, install, create, execute)
Security	Agntor trust protocol
Memory	MemU + Firestore + text-embedding-004
Backend	FastAPI on Google Cloud Run
IaC	Terraform + GitHub Actions CI/CD

Let me walk through how I built the pieces that matter.

1. Voice That Feels Alive - Gemini Live API

The Gemini Live API is what makes Elora feel real. It's full duplex audio - she talks while you talk, you can interrupt her mid-sentence, and she handles it naturally.

Here's the architecture for voice:

Phone (mic) → PCM audio chunks via WebSocket → Cloud Run
  → Gemini Live API session (bidirectional)
  → Audio response chunks → WebSocket → Phone (speaker)

The mobile app maintains three simultaneous WebSocket connections:

Text chat - ADK agent with full tool calling
Live audio - Gemini Live API with real-time audio streaming
Wake word - Always-on "Hey Elora" detection

The wake word detector is its own Gemini Live session configured to only respond with "WAKE" when it hears the trigger phrase. Minimal tokens, always listening.

The hardest part: Gemini Live API doesn't support ADK's tool-calling protocol natively. So I built a parallel system - manual JSON schemas for every tool declaration, a dispatch function that maps tool names to the same Python functions the ADK agent uses, and a response handler that streams tool results back into the Live session. Every tool works in both text mode (ADK) and voice mode (Live API).

2. Vision - She Sees Your World

During a live call, Elora watches through your camera. The mobile app captures frames and sends them as base64 JPEG over the WebSocket. On the backend, a proactive vision loop runs every 3 seconds:

# Simplified proactive vision logic
if camera_active and user_quiet_for_8s and last_proactive_25s_ago:
    frame = latest_camera_frame
    faces = await recognize_faces(frame, user_id)
    prompt = f"[VISION CHECK] You see: {faces}. Comment if relevant."
    session.send(prompt + frame)
    # If Elora responds with <silent>, swallow the response

She doesn't just respond when asked - she speaks up when she sees something worth mentioning. Point the camera at a friend she's seen before, and she'll say their name. That's face recognition using Gemini Vision with two-pass comparison against stored reference images in Cloud Storage.

3. The Skill System - Why Elora Is a Computer, Not a Chatbot

This is the feature I'm most proud of. Every AI assistant has a fixed set of tools. Elora can learn new ones.

The skill system works in four modes:

Search: Query the skill registry (bundled + community) by keyword.

Install: Download a skill definition (YAML metadata + Python code) into the user's Firestore profile and deploy it to their sandbox.

Execute: Load the skill code, fill in template parameters, and run it in the user's E2B sandbox. Real code, real output.

Create: This is the magic. Tell Elora "create a skill that checks if a website is up" and she:

Writes the Python code
Creates a YAML skill definition with parameters
Tests the code in your sandbox with a dry run
Validates the output
Saves it permanently to your library

The skill you asked for now exists forever. You can run it tomorrow, next week, next year.

# Bundled skills ship with Elora
BUNDLED_SKILLS = {
    "weather": {
        "name": "weather",
        "description": "Get current weather for any city",
        "code": "import requests\nurl = f'https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lon}&current_weather=true'\n..."
    },
    "crypto_prices": { ... },
    "hackernews": { ... },
    "exchange_rates": { ... },
    "wikipedia": { ... },
    "rss_reader": { ... },
}

Six skills ship bundled. Users can create unlimited custom ones. And there's a community registry where you can publish skills for others to use.

This is what transforms Elora from "assistant" to "computer." A computer isn't defined by what it ships with - it's defined by what you can make it do.

4. Per-User Sandbox - Your Computer in the Cloud

Every Elora user gets their own isolated cloud VM via E2B. This isn't shared compute - it's YOUR machine.

def get_or_create_sandbox(user_id: str):
    # Check in-memory cache
    if user_id in _active_sandboxes:
        return _active_sandboxes[user_id]

    # Check Firestore for paused sandbox ID
    doc = _get_sandbox_doc(user_id).get()
    if doc.exists:
        sandbox_id = doc.to_dict().get("sandbox_id")
        # Reconnect to existing sandbox
        sandbox = Sandbox.connect(sandbox_id)
    else:
        # Create new sandbox with pre-installed packages
        sandbox = Sandbox(timeout=3600, metadata={"user_id": user_id})
        sandbox.commands.run("pip install requests beautifulsoup4 feedparser pyyaml")
        sandbox.commands.run("mkdir -p /home/user/skills /home/user/workspace /home/user/data")
        # Persist sandbox ID
        _get_sandbox_doc(user_id).set({"sandbox_id": sandbox.sandbox_id})

    _active_sandboxes[user_id] = sandbox
    return sandbox

Sandboxes auto-pause when idle and reconnect when needed. Packages you install persist. Files you create persist. The sandbox ID is stored in Firestore so it survives server restarts.

When Elora runs code for you - whether it's a skill, a script you asked for, or a data analysis
it runs in YOUR sandbox. Nobody else can see it or touch it.

5. Security - The Agntor Trust Protocol

When your AI agent has access to your email, calendar, files, and code execution, security isn't optional.

The Agntor trust protocol runs as middleware on every incoming message:

Prompt injection guard - 12 regex patterns + 3 heuristic checks + structural analysis. Catches "ignore previous instructions" and its 50 variants.
PII/secret redaction - Detects and masks API keys, tokens, credit card numbers, and SSNs before they reach the model.
Tool guardrails - Blocklist (shell.exec, eval) and confirmation list (send_email, delete_file). Dangerous tools are blocked. Sensitive tools require explicit confirmation.
SSRF protection - Validates all URLs against private IP ranges with DNS resolution. Prevents the model from being tricked into accessing internal services.
Agent identity - A verifiable identity endpoint that exposes Elora's capabilities and security posture.

$ curl https://elora-backend-453139277365.us-central1.run.app/agent/identity
{
  "agent_name": "Elora",
  "version": "0.5.0",
  "security": {
    "prompt_guard": true,
    "pii_redaction": true,
    "tool_guardrails": true,
    "ssrf_protection": true
  }
}

The entire security layer is pure Python - no external dependencies. It's fast enough to run on every message without noticeable latency.

6. Multi-Agent Architecture - Google ADK

Elora uses Google's Agent Development Kit (ADK) with a hierarchical multi-agent architecture:

elora_root (orchestrator)
├── web_researcher    → web_search + fetch_webpage
├── browser_worker    → Playwright + Gemini computer-use
├── email_calendar    → Gmail + Google Calendar (full CRUD)
├── file_memory       → Cloud Storage + Firestore memory
└── research_loop     → LoopAgent with self-verification

The root agent decides which sub-agent to delegate to based on the user's intent. "Send an email" goes to email_calendar. "What's on Hacker News" goes to browser_worker. "Remember that I prefer morning meetings" goes to file_memory.

ADK's constraint of one parent per agent forced clean separation of concerns. Each sub-agent has exactly the tools it needs and nothing more.

7. 40+ Real Tools

These aren't mock tools. They execute real actions:

Gmail - Send, read, archive, trash, label, batch manage
Google Calendar - Create, update, delete, list, search
Browser - Playwright opens real pages, takes screenshots, Gemini reasons about what it sees
Code execution - Python and JavaScript in your personal sandbox
SMS - Twilio (with deep-link fallback)
Google Slides & Docs - Programmatic creation with shareable links
Face recognition - Two-pass Gemini Vision comparison against stored references
File management - Upload, read, list, delete in per-user Cloud Storage
Reminders - Natural language time parsing, push notification delivery
People memory - Names, relationships, birthdays, contact info, appearance
Proactive engine - Meeting alerts, birthday nudges, stale contact check-ins

8. Memory - She Remembers Everything

Elora has a 3-layer memory system:

Layer 1: Raw facts. After every conversation, a background task extracts key facts and stores them as vector embeddings (text-embedding-004) in Firestore. Semantic search retrieves relevant memories on every new conversation.

Layer 2: Compacted profile. Periodically, Gemini Flash merges and deduplicates raw facts into a structured user profile - preferences, relationships, work info, goals.

Layer 3: Session summaries. After every call, a summary is generated. The last 3 summaries are injected into the next session for continuity.

This is powered by MemU, which achieves 92% accuracy on the Locomo memory benchmark at 10x lower always-on cost compared to traditional RAG approaches.

Deployment

The entire backend deploys to Google Cloud Run with a single git push:

GitHub Actions builds the Docker image
Pushes to Artifact Registry
Deploys to Cloud Run with all environment variables
Creates Firestore indexes

Infrastructure is managed with Terraform:

resource "google_cloud_run_service" "elora_backend" {
  name     = "elora-backend"
  location = "us-central1"

  template {
    spec {
      containers {
        image = "us-central1-docker.pkg.dev/${var.project_id}/elora/backend:latest"
        resources {
          limits = { cpu = "2", memory = "2Gi" }
        }
      }
      timeout_seconds = 3600  # Long-running WebSocket connections
    }
  }
}

40+ Cloud Run revisions tell the development story. The backend has been continuously deployed and iterated throughout the build.

What I Learned

The gap between "chatbot" and "computer" is isolation, persistence, and extensibility. It's not about making the LLM smarter. It's about giving it a sandbox that persists, a skill system that grows, and security that you can trust. That's what makes it a computer.

Security can't be an afterthought. When your agent can read your email, send texts, and execute code, prompt injection isn't theoretical - it's an attack vector. Build the guard first.

ADK multi-agent is production-ready. The one-parent-per-agent constraint feels limiting at first, but it forces clean architecture. Each agent has exactly the tools and context it needs.

Try It

The backend is live:

https://elora-backend-453139277365.us-central1.run.app

The code is open:

https://github.com/Garinmckayl/elora

To run the mobile app:

git clone https://github.com/Garinmckayl/elora.git
cd elora/app
npm install
npx expo start --tunnel

Scan the QR code with Expo Go. Talk to Elora.

Built by a solo developer in Addis Ababa, Ethiopia for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge

GitHub: github.com/Garinmckayl/elora

DEV Community