S M Tahosin

Posted on Apr 21 • Originally published at github.com

HOCKS AI: I Open-Sourced a Full AI Platform With Chat, Vision, Video Analysis & Website Generation — Runs at $0/Month

#ai #opensource #webdev #firebase

TL;DR: I built and open-sourced a production-ready AI platform that combines chat, image analysis, video analysis, and website generation. It uses free models where possible and costs ~$0/month to run. Live demo | GitHub

Why I Built This

Every AI tool I tried was either:

Too expensive — GPT-4 API bills adding up fast
Single-purpose — chat OR image analysis, never both
Closed source — no way to learn from the architecture

I wanted a single platform that handles multiple AI modalities, uses the best free models available, and is fully open-source so other developers can learn from it.

The result is HOCKS AI — a multi-modal AI assistant platform.

🔗 Live: hocks.app
📦 Source: github.com/x-tahosin/hocks-ai

What It Does

Feature	AI Model	Monthly Cost
💬 Streaming Chat	OpenRouter GPT-OSS-120B (free)	$0
🌐 Website Generator	OpenRouter Nemotron-3 120B (free)	$0
🖼️ Image Analysis	Google Gemini 2.0 Flash	~$0.002/call
🎬 Video Analysis	Google Gemini 2.0 Flash	~$0.003/call
🧠 Memory System	Firebase Firestore	$0 (free tier)
🔐 Auth + Admin	Firebase Auth	$0

Total monthly cost: ~$0–5 depending on vision API usage.

The Hybrid Model Strategy

This is the key architectural decision. Instead of paying for one expensive model for everything, I split by capability:

Free Models for Text Tasks

Chat + Code Generation → OpenRouter API
├── openai/gpt-oss-120b:free (120B params, conversational)
└── nvidia/nemotron-3-super-120b-a12b:free (code generation)

These free 120B parameter models are genuinely production-quality for text tasks. GPT-OSS-120B handles conversational AI beautifully — context tracking, nuanced responses, multi-turn dialogue. Nemotron-3 excels at code generation and can build full websites from prompts.

Paid Models for Vision Tasks

Image + Video Analysis → Google Gemini 2.0 Flash
├── analyzeImage (~$0.002/call)
└── analyzeVideo (~$0.003/call)

Free models simply can't match Gemini's multimodal capabilities yet. Image understanding, OCR, visual reasoning — Gemini 2.0 Flash delivers production-quality results at extremely low per-call costs.

Architecture Deep Dive

┌─────────────────────────────────────────────┐
│          Frontend (React 18 + Vite)         │
│         Firebase Hosting / hocks.app        │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│     Firebase Cloud Functions (Node 20)      │
├─────────────────────────────────────────────┤
│  streamChat ────► OpenRouter (GPT-OSS-120B) │
│  generateCode ──► OpenRouter (Nemotron-3)   │
│  analyzeImage ──► Google Gemini 2.0 Flash   │
│  analyzeVideo ──► Google Gemini 2.0 Flash   │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│           Firebase Services                 │
│  • Firestore (users, memories, analytics)   │
│  • Authentication (Google + Email/Pass)     │
│  • Secret Manager (all API keys)            │
│  • Storage (file uploads)                   │
└─────────────────────────────────────────────┘

Key Design Decisions

1. Zero API Keys in Frontend

Every AI call is proxied through Firebase Cloud Functions. API keys live exclusively in Firebase Secret Manager — not in environment variables, not in .env files, not anywhere in client code.

// Cloud Function reads secret at runtime
const geminiApiKey = defineSecret("GEMINI_API_KEY");

exports.analyzeImage = onCall(
  { secrets: [geminiApiKey] },
  async (request) => {
    // Key is only available server-side
    const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
    // ...
  }
);

2. SSE Streaming for Real-Time Chat

Instead of waiting for the full response, the chat streams tokens in real-time using Server-Sent Events:

// Server: Stream each chunk from OpenRouter
const reader = orResponse.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  res.write(`data: ${JSON.stringify({ text, fullText })}\n\n`);
}

// Client: Render as tokens arrive
eventSource.onmessage = (event) => {
  const { text } = JSON.parse(event.data);
  updateChatUI(text); // Instant visual feedback
};

3. Per-User Memory System

The AI remembers context across sessions. Users can save memories that persist in Firestore and are injected into every AI conversation:

// Inject memories into system prompt
let systemContent = SYSTEM_PROMPT;
if (memories.length > 0) {
  systemContent += "\n\n=== USER'S SAVED MEMORIES ===\n";
  memories.forEach((mem, i) => {
    systemContent += `${i + 1}. ${mem.content}\n`;
  });
}

4. Admin Dashboard with Cost Tracking

Built-in analytics track every API call in real-time:

Usage counters per feature (chat, image, video, website)
Daily cost breakdown with budget alerts
Feature toggles — disable any AI feature instantly
Audit logging for all admin actions

Security Architecture

Layer	Implementation
API Keys	Firebase Secret Manager (never in code)
Data Isolation	Firestore rules enforce per-user access
Admin Access	Custom claims + email verification
Authentication	Firebase Auth (Google + email/password)
Audit Trail	Every admin action logged with timestamp

Tech Stack

Layer	Technology
Frontend	React 18, Vite, CSS3 (Glassmorphism dark UI)
Backend	Firebase Cloud Functions (Node.js 20)
AI Engine	Google Gemini 2.0 Flash + OpenRouter (free models)
Database	Cloud Firestore
Auth	Firebase Authentication
Hosting	Firebase Hosting (custom domain)
Secrets	Firebase Secret Manager

Get Started in 5 Minutes

# Clone
git clone https://github.com/x-tahosin/hocks-ai.git
cd hocks-ai

# Install
cd functions && npm install && cd ..

# Set your API keys securely
firebase functions:secrets:set GEMINI_API_KEY
firebase functions:secrets:set OPENROUTER_API_KEY

# Deploy everything
firebase deploy

You need:

Node.js 20+
Firebase CLI (npm i -g firebase-tools)
A Gemini API key from ai.google.dev (free)
An OpenRouter API key from openrouter.ai (free models available)

What I Learned

Free AI models are production-viable — 120B parameter models handle conversational AI surprisingly well
Hybrid strategies save money — use free for text, paid only for vision
Firebase Secret Manager > .env files — proper secret management matters in production
SSE streaming transforms UX — users seeing real-time responses feels dramatically better than waiting
Cost tracking from day one — know exactly where every dollar goes

Try It

🔗 Live demo: hocks.app
📦 Source code: github.com/x-tahosin/hocks-ai
⭐ Star the repo if you find it useful!

What free AI models are you using in production? I'd love to hear about your hybrid model strategies in the comments.

DEV Community