TL;DR: I built and open-sourced a production-ready AI platform that combines chat, image analysis, video analysis, and website generation. It uses free models where possible and costs ~$0/month to run. Live demo | GitHub
Why I Built This
Every AI tool I tried was either:
- Too expensive — GPT-4 API bills adding up fast
- Single-purpose — chat OR image analysis, never both
- Closed source — no way to learn from the architecture
I wanted a single platform that handles multiple AI modalities, uses the best free models available, and is fully open-source so other developers can learn from it.
The result is HOCKS AI — a multi-modal AI assistant platform.
🔗 Live: hocks.app
📦 Source: github.com/x-tahosin/hocks-ai
What It Does
| Feature | AI Model | Monthly Cost |
|---|---|---|
| 💬 Streaming Chat | OpenRouter GPT-OSS-120B (free) | $0 |
| 🌐 Website Generator | OpenRouter Nemotron-3 120B (free) | $0 |
| 🖼️ Image Analysis | Google Gemini 2.0 Flash | ~$0.002/call |
| 🎬 Video Analysis | Google Gemini 2.0 Flash | ~$0.003/call |
| 🧠 Memory System | Firebase Firestore | $0 (free tier) |
| 🔐 Auth + Admin | Firebase Auth | $0 |
Total monthly cost: ~$0–5 depending on vision API usage.
The Hybrid Model Strategy
This is the key architectural decision. Instead of paying for one expensive model for everything, I split by capability:
Free Models for Text Tasks
Chat + Code Generation → OpenRouter API
├── openai/gpt-oss-120b:free (120B params, conversational)
└── nvidia/nemotron-3-super-120b-a12b:free (code generation)
These free 120B parameter models are genuinely production-quality for text tasks. GPT-OSS-120B handles conversational AI beautifully — context tracking, nuanced responses, multi-turn dialogue. Nemotron-3 excels at code generation and can build full websites from prompts.
Paid Models for Vision Tasks
Image + Video Analysis → Google Gemini 2.0 Flash
├── analyzeImage (~$0.002/call)
└── analyzeVideo (~$0.003/call)
Free models simply can't match Gemini's multimodal capabilities yet. Image understanding, OCR, visual reasoning — Gemini 2.0 Flash delivers production-quality results at extremely low per-call costs.
Architecture Deep Dive
┌─────────────────────────────────────────────┐
│ Frontend (React 18 + Vite) │
│ Firebase Hosting / hocks.app │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Firebase Cloud Functions (Node 20) │
├─────────────────────────────────────────────┤
│ streamChat ────► OpenRouter (GPT-OSS-120B) │
│ generateCode ──► OpenRouter (Nemotron-3) │
│ analyzeImage ──► Google Gemini 2.0 Flash │
│ analyzeVideo ──► Google Gemini 2.0 Flash │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Firebase Services │
│ • Firestore (users, memories, analytics) │
│ • Authentication (Google + Email/Pass) │
│ • Secret Manager (all API keys) │
│ • Storage (file uploads) │
└─────────────────────────────────────────────┘
Key Design Decisions
1. Zero API Keys in Frontend
Every AI call is proxied through Firebase Cloud Functions. API keys live exclusively in Firebase Secret Manager — not in environment variables, not in .env files, not anywhere in client code.
// Cloud Function reads secret at runtime
const geminiApiKey = defineSecret("GEMINI_API_KEY");
exports.analyzeImage = onCall(
{ secrets: [geminiApiKey] },
async (request) => {
// Key is only available server-side
const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
// ...
}
);
2. SSE Streaming for Real-Time Chat
Instead of waiting for the full response, the chat streams tokens in real-time using Server-Sent Events:
// Server: Stream each chunk from OpenRouter
const reader = orResponse.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
res.write(`data: ${JSON.stringify({ text, fullText })}\n\n`);
}
// Client: Render as tokens arrive
eventSource.onmessage = (event) => {
const { text } = JSON.parse(event.data);
updateChatUI(text); // Instant visual feedback
};
3. Per-User Memory System
The AI remembers context across sessions. Users can save memories that persist in Firestore and are injected into every AI conversation:
// Inject memories into system prompt
let systemContent = SYSTEM_PROMPT;
if (memories.length > 0) {
systemContent += "\n\n=== USER'S SAVED MEMORIES ===\n";
memories.forEach((mem, i) => {
systemContent += `${i + 1}. ${mem.content}\n`;
});
}
4. Admin Dashboard with Cost Tracking
Built-in analytics track every API call in real-time:
- Usage counters per feature (chat, image, video, website)
- Daily cost breakdown with budget alerts
- Feature toggles — disable any AI feature instantly
- Audit logging for all admin actions
Security Architecture
| Layer | Implementation |
|---|---|
| API Keys | Firebase Secret Manager (never in code) |
| Data Isolation | Firestore rules enforce per-user access |
| Admin Access | Custom claims + email verification |
| Authentication | Firebase Auth (Google + email/password) |
| Audit Trail | Every admin action logged with timestamp |
Tech Stack
| Layer | Technology |
|---|---|
| Frontend | React 18, Vite, CSS3 (Glassmorphism dark UI) |
| Backend | Firebase Cloud Functions (Node.js 20) |
| AI Engine | Google Gemini 2.0 Flash + OpenRouter (free models) |
| Database | Cloud Firestore |
| Auth | Firebase Authentication |
| Hosting | Firebase Hosting (custom domain) |
| Secrets | Firebase Secret Manager |
Get Started in 5 Minutes
# Clone
git clone https://github.com/x-tahosin/hocks-ai.git
cd hocks-ai
# Install
cd functions && npm install && cd ..
# Set your API keys securely
firebase functions:secrets:set GEMINI_API_KEY
firebase functions:secrets:set OPENROUTER_API_KEY
# Deploy everything
firebase deploy
You need:
- Node.js 20+
- Firebase CLI (
npm i -g firebase-tools) - A Gemini API key from ai.google.dev (free)
- An OpenRouter API key from openrouter.ai (free models available)
What I Learned
- Free AI models are production-viable — 120B parameter models handle conversational AI surprisingly well
- Hybrid strategies save money — use free for text, paid only for vision
- Firebase Secret Manager > .env files — proper secret management matters in production
- SSE streaming transforms UX — users seeing real-time responses feels dramatically better than waiting
- Cost tracking from day one — know exactly where every dollar goes
Try It
- 🔗 Live demo: hocks.app
- 📦 Source code: github.com/x-tahosin/hocks-ai
- ⭐ Star the repo if you find it useful!
What free AI models are you using in production? I'd love to hear about your hybrid model strategies in the comments.
Top comments (0)