๐ง I Built a AI Assistant with Multi-Model Fallback, Voice Chat & a Personal Data Analyst โ Here's How
What happens when your AI goes down mid-conversation? You lose users. I built Hero's AI to make sure that never happens โ and added a whole lot more along the way.
Live Demo
Have you ever used an AI tool that just... stopped working? Maybe it hit a quota limit, the API went down, or the model just returned nothing. If you're building on top of a single AI provider, you're one outage away from a broken product.
That's the problem I set out to solve with Hero's AI โ a full-stack, production-grade AI assistant platform built with Django and Python. But I didn't stop at just "fixing uptime." I ended up building something that handles voice chat, web search, document analysis, and even runs data analytics on uploaded spreadsheets โ all in one platform.
Here's a deep dive into how it works, what I built, and what I learned.
๐ Project Overview
Hero's AI is a personal AI assistant platform that goes way beyond a simple chatbot.
Think of it as your smart digital companion โ one that can:
- Write and debug code
- Search the web for real-time information
- Hold voice conversations
- Analyze uploaded documents
- Run data analytics on your CSV or Excel files without you writing a single line of code
The platform is built on Django, uses Google Gemini as its primary AI brain, and has a waterfall fallback system through OpenRouter and Groq so users always get a response โ even if one provider goes down.
It also ships with two powerful sub-modules:
- Baymax โ the AI orchestration engine
- Infinsight โ an RAG-powered data analytics layer
๐ GitHub: https://github.com/Hudsonmathew1910/Hero-s-AI
โจ Key Features
- ๐ Multi-model fallback chain โ Gemini โ OpenRouter (6 models) โ Groq (4 models). Zero downtime from API failures.
- ๐ฌ Text Chat โ General conversation with persistent, per-user session history
- ๐จโ๐ป Coding Assistant โ Code generation, debugging, and refactoring across Python, JS, SQL, and more
- ๐ Web Search Mode โ Real-time answers using DuckDuckGo + Wikipedia, summarized by Gemini
- ๐๏ธ Voice Chat โ Short, natural spoken-style responses optimized for text-to-speech
- ๐ File Analysis โ Upload PDFs, DOCX, or TXT files and ask questions about them
- ๐ Infinsight Analytics โ Upload CSV/Excel, ask plain-English questions, get analyst-grade reports
- ๐ Encrypted API Key Storage โ User keys are Fernet-encrypted server-side, never exposed to the browser
- ๐ Google OAuth Login โ One-click sign-in alongside traditional email/password
- ๐ง NLP Intent Detection โ Custom regex-based NLP pipeline routes each message to the right handler
- ๐๏ธ Custom Instructions โ Users can set their name, preferences, and AI tone globally
- ๐ต๏ธ Temporary Chat Mode โ Privacy-first mode where conversations aren't saved
๐ค Baymax (The AI Orchestration Module)
Baymax is the core engine of Hero's AI โ named after everyone's favorite healthcare companion, because it's always there when you need it.
What does Baymax do?
Baymax is an orchestrator class that decides:
- Which AI model to use for a given task
- What to do when that model fails
- How many tokens to allocate based on task complexity
Every user message flows through the NLP layer first (more on that below), and then Baymax picks the right handler and model for the job.
Baymax's Key Features
- Smart Model Routing โ Routes coding tasks to Gemini 2.5 Flash, general chat to OpenRouter NVIDIA Nemotron, and so on. The right model for the right task.
- Waterfall Fallback โ If the primary model fails, Baymax silently tries up to 10 fallback models across OpenRouter and Groq. The user never sees an error.
- Adaptive Token Budgets โ Short conversational replies use fewer tokens (faster, cheaper). File analysis and data queries get up to 4,000 tokens for thorough output.
- Compressed History โ Instead of sending the full chat history (expensive!), Baymax loads only 2โ6 recent turns depending on task type and compresses them into a single context block.
- Parallel DB Lookups โ API keys, user settings, and chat sessions are fetched in parallel threads to minimize latency.
How does it help users?
Users never have to worry about which AI to use or whether it's available. Baymax handles all of that invisibly. You ask a question, you get an answer โ always.
๐ Infinsight (The Data Analytics Module)
Infinsight is the feature I'm most proud of. It turns Hero's AI into a personal data analyst that anyone can use โ no Python, no SQL, no Excel expertise required.
hero-s-ai.onrender.com/infinsight

What does Infinsight do?
Users upload a CSV, Excel, or PDF file. Then they ask questions in plain English like:
- "What's the average revenue by region?"
- "Which customers are most likely to churn?"
- "Show me a sales trend for the last 6 months."
Infinsight handles the rest.
How it works under the hood
This is where it gets interesting. Infinsight uses a Retrieval-Augmented Generation (RAG) pipeline:
- Parse โ The file is parsed into text chunks. CSV rows become natural-language summaries; PDFs are chunked page by page.
-
Embed โ Each chunk is converted into a 768-dimensional vector using
gemini-embedding-001. - Store โ Vectors are stored in a Pinecone vector database under a user-specific namespace.
- Retrieve โ When a query arrives, the most relevant chunks are fetched via semantic similarity search.
-
Analyze โ The relevant context and dataset schema are passed to Gemini. If the query needs computation (sums, trends, forecasts), Gemini generates Pandas code, which is executed safely in a sandboxed
astevalinterpreter. - Report โ The computed results are passed back to Gemini, which writes a clean, human-readable analyst report.
What kind of analysis can it do?
- ๐ Time-series trend analysis
- ๐งฎ RFM (Recency, Frequency, Monetary) customer segmentation
- ๐ Linear regression forecasting
- ๐ Profit-discount correlation analysis
- ๐ Summary statistics, grouping, filtering
The value it delivers
Infinsight removes the biggest barrier to data analysis: you don't need to know how to code. Upload your file, ask your question, get your answer. It's like having a junior data analyst on call 24/7.
๐ Tech Stack
| Layer | Technology |
|---|---|
| Backend | Django 5.2, Python 3.11, Gunicorn |
| Database | PostgreSQL (Neon cloud) |
| Primary AI | Google Gemini 2.5 Flash / Lite |
| AI Gateway | OpenRouter (100+ LLMs) |
| AI Fallback | Groq (Llama, Qwen, Kimi) |
| NLP | Custom Regex Engine (Nlp.py) |
| Vector DB | Pinecone |
| Embeddings | gemini-embedding-001 (768 dim) |
| Web Search | DuckDuckGo (ddgs) + Wikipedia |
| File Parsing | pdfplumber, pypdf, python-docx |
| Data Analysis | pandas, numpy, scikit-learn |
| Secure Execution | asteval (sandboxed Python) |
| Encryption | cryptography (Fernet) |
| Auth | Django sessions + Google OAuth 2.0 |
| Frontend | Vanilla JS + CSS (no framework) |
| Deployment | Render |
โ๏ธ How It Works โ Step by Step
Here's the full flow from the moment you type a message to the moment you get a response:
Step 1 โ Authentication
You log in via email/password or Google OAuth. Your session is stored server-side with a 2-week expiry.
Step 2 โ API Key Setup
You enter your Gemini and OpenRouter API keys. They're encrypted with Fernet and stored in the database โ they never touch the browser again.
Step 3 โ Send a Message
You type your message (or use voice) and select a mode: Text, Coding, Web Search, File Analysis, etc.
Step 4 โ NLP Pre-processing
Your message is cleaned (noise removed, punctuation normalized) and analyzed for intent. If you wrote text mode but included code blocks, the system upgrades it to coding mode automatically.
Step 5 โ Parallel DB Lookups
Your API keys, user settings, and chat session are fetched in parallel threads. This keeps things fast.
Step 6 โ History Load
2โ6 recent turns from your current chat session are loaded and compressed into a context block.
Step 7 โ Baymax Dispatches
The right handler is called: handle_text, handle_coding, handle_websearch, handle_voice_chat, handle_file, etc.
Step 8 โ AI Model Call + Fallback
The primary model is called. If it fails โ try fallback OpenRouter models โ try Groq. A response always comes back.
Step 9 โ Response Returned
The AI reply is returned as JSON to the frontend. Non-temporary chats are saved to the database asynchronously (no latency added).
Step 10 โ Infinsight Path (for data files)
Upload โ Parse โ Embed โ Store in Pinecone โ Retrieve by similarity โ Feed to Gemini โ Generate Pandas code โ Execute in sandbox โ Interpret result โ Return analyst report.
๐ผ Screenshots / Demo
๐ง Challenges & Solutions
Challenge 1 โ Single Point of Failure
Problem: If Gemini's API quota ran out mid-session, the entire platform would break.
Solution: Built a waterfall fallback system โ Primary โ 6 OpenRouter models โ 4 Groq models. If one fails, the next kicks in automatically with zero user-facing errors.
Challenge 2 โ Generic AI Responses
Problem: A single model with a single prompt gives mediocre results across different task types.
Solution: Built the NLP intent detection layer to classify every message and route it to a purpose-built handler with a task-specific system prompt and model selection.
Challenge 3 โ Secure API Key Management
Problem: Storing API keys in localStorage or plain text in a DB is a security nightmare.
Solution: All keys are encrypted with Fernet symmetric encryption before database storage. They're decrypted server-side only when needed and never returned to the client.
Challenge 4 โ Data Analysis Without Code
Problem: CSV/Excel analysis requires Python or SQL skills that most users don't have.
Solution: Infinsight's RAG pipeline retrieves relevant data context and has Gemini generate Pandas code on the fly. The code runs in a sandboxed asteval interpreter and the output is translated into a readable report.
Challenge 5 โ Expensive Long Conversations
Problem: Sending full chat history to the LLM on every turn is slow and burns tokens fast.
Solution: History is limited to 1โ6 turns depending on task type, compressed into a single block, and paired with adaptive token budgets that scale with task complexity.
๐ Use Cases
๐ฉโ๐ Students & Learners
Get explanations, essay help, coding guidance, and concept breakdowns โ all in one place.
๐จโ๐ป Developers
Generate boilerplate, debug tricky functions, refactor messy code, and get algorithmic suggestions.
๐ Data Analysts
Upload your CSV or Excel file and ask questions in plain English. No pandas, no SQL, no problem.
๐ฌ Researchers
Use web search mode to get AI-summarized answers with citations from live web results โ no stale training data.
๐ข Businesses
Deploy as an internal tool with per-user API key isolation, custom instructions, and persistent chat history.
๐ง General Users
Voice chat, document Q&A, general conversation, web lookups โ a true all-in-one assistant.
๐ฎ Future Improvements
Here's what's on the roadmap:
- ๐ผ๏ธ Image Generation โ Integrate Stable Diffusion or DALLยทE for visual output
- ๐ฑ Mobile App โ A React Native companion app for on-the-go access
- ๐ Plugin System โ Allow users to connect custom tools and APIs to the assistant
- ๐งฉ Infinsight v2 โ Support for SQL databases and real-time data streams
- ๐ Multi-language Support โ NLP and UI support for non-English users
- ๐ฅ Team Workspaces โ Shared chat sessions and collaborative analytics for teams
- ๐ก Webhook Integrations โ Trigger Hero's AI from Slack, Notion, or GitHub events
- ๐ง Long-term Memory โ Persistent semantic memory across sessions using vector search
๐ Conclusion
Building Hero's AI was one of the most rewarding projects I've worked on. It pushed me to think about reliability, security, user experience, and AI engineering all at once.
The biggest lesson? Don't build on a single point of failure. Whether it's your AI provider, your database, or your auth system โ build fallbacks in from day one.
If you're a developer who's ever wanted to build a serious AI product โ one that handles real-world edge cases, keeps user data secure, and actually works when APIs go down โ I hope Hero's AI gives you some useful ideas and patterns to borrow.
Feel free to check out the repo, open issues, or drop me a message. I'd love to hear your thoughts! ๐
๐ GitHub: https://github.com/Hudsonmathew1910/Hero-s-AI
๐ง Contact: hudsonmathew2004@gmail.com
Feel free to explore Hero's AI
If you found this useful, drop a โค๏ธ and share it with a developer friend. And if you build something inspired by this โ I'd genuinely love to see it!
Tags: #python #django #ai #machinelearning #webdev #opensource #showdev #productivity





Top comments (0)