Hudson Mathew

Posted on May 4 • Edited on May 8

Hero's AI

#machinelearning #ai #rag #nlp

🧠 I Built a AI Assistant with Multi-Model Fallback, Voice Chat & a Personal Data Analyst — Here's How

What happens when your AI goes down mid-conversation? You lose users. I built Hero's AI to make sure that never happens — and added a whole lot more along the way.

Live Demo

Have you ever used an AI tool that just... stopped working? Maybe it hit a quota limit, the API went down, or the model just returned nothing. If you're building on top of a single AI provider, you're one outage away from a broken product.

That's the problem I set out to solve with Hero's AI — a full-stack, production-grade AI assistant platform built with Django and Python. But I didn't stop at just "fixing uptime." I ended up building something that handles voice chat, web search, document analysis, and even runs data analytics on uploaded spreadsheets — all in one platform.

Here's a deep dive into how it works, what I built, and what I learned.

🚀 Project Overview

Hero's AI is a personal AI assistant platform that goes way beyond a simple chatbot.

Think of it as your smart digital companion — one that can:

Write and debug code
Search the web for real-time information
Hold voice conversations
Analyze uploaded documents
Run data analytics on your CSV or Excel files without you writing a single line of code

The platform is built on Django, uses Google Gemini as its primary AI brain, and has a waterfall fallback system through OpenRouter and Groq so users always get a response — even if one provider goes down.

It also ships with two powerful sub-modules:

Baymax — the AI orchestration engine
Infinsight — an RAG-powered data analytics layer

🔗 GitHub: https://github.com/Hudsonmathew1910/Hero-s-AI

✨ Key Features

🔄 Multi-model fallback chain — Gemini → OpenRouter (6 models) → Groq (4 models). Zero downtime from API failures.
💬 Text Chat — General conversation with persistent, per-user session history
👨‍💻 Coding Assistant — Code generation, debugging, and refactoring across Python, JS, SQL, and more
🔍 Web Search Mode — Real-time answers using DuckDuckGo + Wikipedia, summarized by Gemini
🎙️ Voice Chat — Short, natural spoken-style responses optimized for text-to-speech
📄 File Analysis — Upload PDFs, DOCX, or TXT files and ask questions about them
📊 Infinsight Analytics — Upload CSV/Excel, ask plain-English questions, get analyst-grade reports
🔐 Encrypted API Key Storage — User keys are Fernet-encrypted server-side, never exposed to the browser
🔑 Google OAuth Login — One-click sign-in alongside traditional email/password
🧠 NLP Intent Detection — Custom regex-based NLP pipeline routes each message to the right handler
🎛️ Custom Instructions — Users can set their name, preferences, and AI tone globally
🕵️ Temporary Chat Mode — Privacy-first mode where conversations aren't saved

🤖 Baymax (The AI Orchestration Module)

Baymax is the core engine of Hero's AI — named after everyone's favorite healthcare companion, because it's always there when you need it.

hero-s-ai.onrender.com

What does Baymax do?

Baymax is an orchestrator class that decides:

Which AI model to use for a given task
What to do when that model fails
How many tokens to allocate based on task complexity

Every user message flows through the NLP layer first (more on that below), and then Baymax picks the right handler and model for the job.

Baymax's Key Features

Smart Model Routing — Routes coding tasks to Gemini 2.5 Flash, general chat to OpenRouter NVIDIA Nemotron, and so on. The right model for the right task.
Waterfall Fallback — If the primary model fails, Baymax silently tries up to 10 fallback models across OpenRouter and Groq. The user never sees an error.
Adaptive Token Budgets — Short conversational replies use fewer tokens (faster, cheaper). File analysis and data queries get up to 4,000 tokens for thorough output.
Compressed History — Instead of sending the full chat history (expensive!), Baymax loads only 2–6 recent turns depending on task type and compresses them into a single context block.
Parallel DB Lookups — API keys, user settings, and chat sessions are fetched in parallel threads to minimize latency.

How does it help users?

Users never have to worry about which AI to use or whether it's available. Baymax handles all of that invisibly. You ask a question, you get an answer — always.

📊 Infinsight (The Data Analytics Module)

Infinsight is the feature I'm most proud of. It turns Hero's AI into a personal data analyst that anyone can use — no Python, no SQL, no Excel expertise required.

hero-s-ai.onrender.com/infinsight

What does Infinsight do?

Users upload a CSV, Excel, or PDF file. Then they ask questions in plain English like:

"What's the average revenue by region?"
"Which customers are most likely to churn?"
"Show me a sales trend for the last 6 months."

Infinsight handles the rest.

How it works under the hood

This is where it gets interesting. Infinsight uses a Retrieval-Augmented Generation (RAG) pipeline:

Parse — The file is parsed into text chunks. CSV rows become natural-language summaries; PDFs are chunked page by page.
Embed — Each chunk is converted into a 768-dimensional vector using gemini-embedding-001.
Store — Vectors are stored in a Pinecone vector database under a user-specific namespace.
Retrieve — When a query arrives, the most relevant chunks are fetched via semantic similarity search.
Analyze — The relevant context and dataset schema are passed to Gemini. If the query needs computation (sums, trends, forecasts), Gemini generates Pandas code, which is executed safely in a sandboxed asteval interpreter.
Report — The computed results are passed back to Gemini, which writes a clean, human-readable analyst report.

What kind of analysis can it do?

📈 Time-series trend analysis
🧮 RFM (Recency, Frequency, Monetary) customer segmentation
📉 Linear regression forecasting
🔗 Profit-discount correlation analysis
📋 Summary statistics, grouping, filtering

The value it delivers

Infinsight removes the biggest barrier to data analysis: you don't need to know how to code. Upload your file, ask your question, get your answer. It's like having a junior data analyst on call 24/7.

🛠 Tech Stack

Layer	Technology
Backend	Django 5.2, Python 3.11, Gunicorn
Database	PostgreSQL (Neon cloud)
Primary AI	Google Gemini 2.5 Flash / Lite
AI Gateway	OpenRouter (100+ LLMs)
AI Fallback	Groq (Llama, Qwen, Kimi)
NLP	Custom Regex Engine (Nlp.py)
Vector DB	Pinecone
Embeddings	gemini-embedding-001 (768 dim)
Web Search	DuckDuckGo (ddgs) + Wikipedia
File Parsing	pdfplumber, pypdf, python-docx
Data Analysis	pandas, numpy, scikit-learn
Secure Execution	asteval (sandboxed Python)
Encryption	cryptography (Fernet)
Auth	Django sessions + Google OAuth 2.0
Frontend	Vanilla JS + CSS (no framework)
Deployment	Render

⚙️ How It Works — Step by Step

Here's the full flow from the moment you type a message to the moment you get a response:

Step 1 — Authentication
You log in via email/password or Google OAuth. Your session is stored server-side with a 2-week expiry.

Step 2 — API Key Setup
You enter your Gemini and OpenRouter API keys. They're encrypted with Fernet and stored in the database — they never touch the browser again.

Step 3 — Send a Message
You type your message (or use voice) and select a mode: Text, Coding, Web Search, File Analysis, etc.

Step 4 — NLP Pre-processing
Your message is cleaned (noise removed, punctuation normalized) and analyzed for intent. If you wrote text mode but included code blocks, the system upgrades it to coding mode automatically.

Step 5 — Parallel DB Lookups
Your API keys, user settings, and chat session are fetched in parallel threads. This keeps things fast.

Step 6 — History Load
2–6 recent turns from your current chat session are loaded and compressed into a context block.

Step 7 — Baymax Dispatches
The right handler is called: handle_text, handle_coding, handle_websearch, handle_voice_chat, handle_file, etc.

Step 8 — AI Model Call + Fallback
The primary model is called. If it fails → try fallback OpenRouter models → try Groq. A response always comes back.

Step 9 — Response Returned
The AI reply is returned as JSON to the frontend. Non-temporary chats are saved to the database asynchronously (no latency added).

Step 10 — Infinsight Path (for data files)
Upload → Parse → Embed → Store in Pinecone → Retrieve by similarity → Feed to Gemini → Generate Pandas code → Execute in sandbox → Interpret result → Return analyst report.

🖼 Screenshots / Demo

Main Chat UI

Infinsight Analytics Report

Voice Chat Overlay

🚧 Challenges & Solutions

Challenge 1 — Single Point of Failure

Problem: If Gemini's API quota ran out mid-session, the entire platform would break.
Solution: Built a waterfall fallback system — Primary → 6 OpenRouter models → 4 Groq models. If one fails, the next kicks in automatically with zero user-facing errors.

Challenge 2 — Generic AI Responses

Problem: A single model with a single prompt gives mediocre results across different task types.
Solution: Built the NLP intent detection layer to classify every message and route it to a purpose-built handler with a task-specific system prompt and model selection.

Challenge 3 — Secure API Key Management

Problem: Storing API keys in localStorage or plain text in a DB is a security nightmare.
Solution: All keys are encrypted with Fernet symmetric encryption before database storage. They're decrypted server-side only when needed and never returned to the client.

Challenge 4 — Data Analysis Without Code

Problem: CSV/Excel analysis requires Python or SQL skills that most users don't have.
Solution: Infinsight's RAG pipeline retrieves relevant data context and has Gemini generate Pandas code on the fly. The code runs in a sandboxed asteval interpreter and the output is translated into a readable report.

Challenge 5 — Expensive Long Conversations

Problem: Sending full chat history to the LLM on every turn is slow and burns tokens fast.
Solution: History is limited to 1–6 turns depending on task type, compressed into a single block, and paired with adaptive token budgets that scale with task complexity.

🌍 Use Cases

👩‍🎓 Students & Learners
Get explanations, essay help, coding guidance, and concept breakdowns — all in one place.

👨‍💻 Developers
Generate boilerplate, debug tricky functions, refactor messy code, and get algorithmic suggestions.

📊 Data Analysts
Upload your CSV or Excel file and ask questions in plain English. No pandas, no SQL, no problem.

🔬 Researchers
Use web search mode to get AI-summarized answers with citations from live web results — no stale training data.

🏢 Businesses
Deploy as an internal tool with per-user API key isolation, custom instructions, and persistent chat history.

🧑 General Users
Voice chat, document Q&A, general conversation, web lookups — a true all-in-one assistant.

🔮 Future Improvements

Here's what's on the roadmap:

🖼️ Image Generation — Integrate Stable Diffusion or DALL·E for visual output
📱 Mobile App — A React Native companion app for on-the-go access
🔌 Plugin System — Allow users to connect custom tools and APIs to the assistant
🧩 Infinsight v2 — Support for SQL databases and real-time data streams
🌐 Multi-language Support — NLP and UI support for non-English users
👥 Team Workspaces — Shared chat sessions and collaborative analytics for teams
📡 Webhook Integrations — Trigger Hero's AI from Slack, Notion, or GitHub events
🧠 Long-term Memory — Persistent semantic memory across sessions using vector search

🎉 Conclusion

Building Hero's AI was one of the most rewarding projects I've worked on. It pushed me to think about reliability, security, user experience, and AI engineering all at once.

The biggest lesson? Don't build on a single point of failure. Whether it's your AI provider, your database, or your auth system — build fallbacks in from day one.

If you're a developer who's ever wanted to build a serious AI product — one that handles real-world edge cases, keeps user data secure, and actually works when APIs go down — I hope Hero's AI gives you some useful ideas and patterns to borrow.

Feel free to check out the repo, open issues, or drop me a message. I'd love to hear your thoughts! 🚀

🔗 GitHub: https://github.com/Hudsonmathew1910/Hero-s-AI
📧 Contact: hudsonmathew2004@gmail.com

Feel free to explore Hero's AI

If you found this useful, drop a ❤️ and share it with a developer friend. And if you build something inspired by this — I'd genuinely love to see it!

Tags: #python #django #ai #machinelearning #webdev #opensource #showdev #productivity

DEV Community