DEV Community

Cover image for Hero's AI
Hudson Mathew
Hudson Mathew

Posted on

Hero's AI

๐Ÿง  I Built a AI Assistant with Multi-Model Fallback, Voice Chat & a Personal Data Analyst โ€” Here's How

What happens when your AI goes down mid-conversation? You lose users. I built Hero's AI to make sure that never happens โ€” and added a whole lot more along the way.

Live Demo

Have you ever used an AI tool that just... stopped working? Maybe it hit a quota limit, the API went down, or the model just returned nothing. If you're building on top of a single AI provider, you're one outage away from a broken product.

That's the problem I set out to solve with Hero's AI โ€” a full-stack, production-grade AI assistant platform built with Django and Python. But I didn't stop at just "fixing uptime." I ended up building something that handles voice chat, web search, document analysis, and even runs data analytics on uploaded spreadsheets โ€” all in one platform.

Here's a deep dive into how it works, what I built, and what I learned.


๐Ÿš€ Project Overview

Hero's AI is a personal AI assistant platform that goes way beyond a simple chatbot.

Think of it as your smart digital companion โ€” one that can:

  • Write and debug code
  • Search the web for real-time information
  • Hold voice conversations
  • Analyze uploaded documents
  • Run data analytics on your CSV or Excel files without you writing a single line of code

The platform is built on Django, uses Google Gemini as its primary AI brain, and has a waterfall fallback system through OpenRouter and Groq so users always get a response โ€” even if one provider goes down.

It also ships with two powerful sub-modules:

  • Baymax โ€” the AI orchestration engine
  • Infinsight โ€” an RAG-powered data analytics layer

๐Ÿ”— GitHub: https://github.com/Hudsonmathew1910/Hero-s-AI

โœจ Key Features

  • ๐Ÿ”„ Multi-model fallback chain โ€” Gemini โ†’ OpenRouter (6 models) โ†’ Groq (4 models). Zero downtime from API failures.
  • ๐Ÿ’ฌ Text Chat โ€” General conversation with persistent, per-user session history
  • ๐Ÿ‘จโ€๐Ÿ’ป Coding Assistant โ€” Code generation, debugging, and refactoring across Python, JS, SQL, and more
  • ๐Ÿ” Web Search Mode โ€” Real-time answers using DuckDuckGo + Wikipedia, summarized by Gemini
  • ๐ŸŽ™๏ธ Voice Chat โ€” Short, natural spoken-style responses optimized for text-to-speech
  • ๐Ÿ“„ File Analysis โ€” Upload PDFs, DOCX, or TXT files and ask questions about them
  • ๐Ÿ“Š Infinsight Analytics โ€” Upload CSV/Excel, ask plain-English questions, get analyst-grade reports
  • ๐Ÿ” Encrypted API Key Storage โ€” User keys are Fernet-encrypted server-side, never exposed to the browser
  • ๐Ÿ”‘ Google OAuth Login โ€” One-click sign-in alongside traditional email/password
  • ๐Ÿง  NLP Intent Detection โ€” Custom regex-based NLP pipeline routes each message to the right handler
  • ๐ŸŽ›๏ธ Custom Instructions โ€” Users can set their name, preferences, and AI tone globally
  • ๐Ÿ•ต๏ธ Temporary Chat Mode โ€” Privacy-first mode where conversations aren't saved

๐Ÿค– Baymax (The AI Orchestration Module)

Baymax is the core engine of Hero's AI โ€” named after everyone's favorite healthcare companion, because it's always there when you need it.

hero-s-ai.onrender.com

What does Baymax do?

Baymax is an orchestrator class that decides:

  1. Which AI model to use for a given task
  2. What to do when that model fails
  3. How many tokens to allocate based on task complexity

Every user message flows through the NLP layer first (more on that below), and then Baymax picks the right handler and model for the job.

Baymax's Key Features

  • Smart Model Routing โ€” Routes coding tasks to Gemini 2.5 Flash, general chat to OpenRouter NVIDIA Nemotron, and so on. The right model for the right task.
  • Waterfall Fallback โ€” If the primary model fails, Baymax silently tries up to 10 fallback models across OpenRouter and Groq. The user never sees an error.
  • Adaptive Token Budgets โ€” Short conversational replies use fewer tokens (faster, cheaper). File analysis and data queries get up to 4,000 tokens for thorough output.
  • Compressed History โ€” Instead of sending the full chat history (expensive!), Baymax loads only 2โ€“6 recent turns depending on task type and compresses them into a single context block.
  • Parallel DB Lookups โ€” API keys, user settings, and chat sessions are fetched in parallel threads to minimize latency.

How does it help users?

Users never have to worry about which AI to use or whether it's available. Baymax handles all of that invisibly. You ask a question, you get an answer โ€” always.


๐Ÿ“Š Infinsight (The Data Analytics Module)

Infinsight is the feature I'm most proud of. It turns Hero's AI into a personal data analyst that anyone can use โ€” no Python, no SQL, no Excel expertise required.

hero-s-ai.onrender.com/infinsight

What does Infinsight do?

Users upload a CSV, Excel, or PDF file. Then they ask questions in plain English like:

  • "What's the average revenue by region?"
  • "Which customers are most likely to churn?"
  • "Show me a sales trend for the last 6 months."

Infinsight handles the rest.

How it works under the hood

This is where it gets interesting. Infinsight uses a Retrieval-Augmented Generation (RAG) pipeline:

  1. Parse โ€” The file is parsed into text chunks. CSV rows become natural-language summaries; PDFs are chunked page by page.
  2. Embed โ€” Each chunk is converted into a 768-dimensional vector using gemini-embedding-001.
  3. Store โ€” Vectors are stored in a Pinecone vector database under a user-specific namespace.
  4. Retrieve โ€” When a query arrives, the most relevant chunks are fetched via semantic similarity search.
  5. Analyze โ€” The relevant context and dataset schema are passed to Gemini. If the query needs computation (sums, trends, forecasts), Gemini generates Pandas code, which is executed safely in a sandboxed asteval interpreter.
  6. Report โ€” The computed results are passed back to Gemini, which writes a clean, human-readable analyst report.

What kind of analysis can it do?

  • ๐Ÿ“ˆ Time-series trend analysis
  • ๐Ÿงฎ RFM (Recency, Frequency, Monetary) customer segmentation
  • ๐Ÿ“‰ Linear regression forecasting
  • ๐Ÿ”— Profit-discount correlation analysis
  • ๐Ÿ“‹ Summary statistics, grouping, filtering

The value it delivers

Infinsight removes the biggest barrier to data analysis: you don't need to know how to code. Upload your file, ask your question, get your answer. It's like having a junior data analyst on call 24/7.


๐Ÿ›  Tech Stack

Layer Technology
Backend Django 5.2, Python 3.11, Gunicorn
Database PostgreSQL (Neon cloud)
Primary AI Google Gemini 2.5 Flash / Lite
AI Gateway OpenRouter (100+ LLMs)
AI Fallback Groq (Llama, Qwen, Kimi)
NLP Custom Regex Engine (Nlp.py)
Vector DB Pinecone
Embeddings gemini-embedding-001 (768 dim)
Web Search DuckDuckGo (ddgs) + Wikipedia
File Parsing pdfplumber, pypdf, python-docx
Data Analysis pandas, numpy, scikit-learn
Secure Execution asteval (sandboxed Python)
Encryption cryptography (Fernet)
Auth Django sessions + Google OAuth 2.0
Frontend Vanilla JS + CSS (no framework)
Deployment Render

โš™๏ธ How It Works โ€” Step by Step

Here's the full flow from the moment you type a message to the moment you get a response:

Step 1 โ€” Authentication
You log in via email/password or Google OAuth. Your session is stored server-side with a 2-week expiry.

Step 2 โ€” API Key Setup
You enter your Gemini and OpenRouter API keys. They're encrypted with Fernet and stored in the database โ€” they never touch the browser again.

Step 3 โ€” Send a Message
You type your message (or use voice) and select a mode: Text, Coding, Web Search, File Analysis, etc.

Step 4 โ€” NLP Pre-processing
Your message is cleaned (noise removed, punctuation normalized) and analyzed for intent. If you wrote text mode but included code blocks, the system upgrades it to coding mode automatically.

Step 5 โ€” Parallel DB Lookups
Your API keys, user settings, and chat session are fetched in parallel threads. This keeps things fast.

Step 6 โ€” History Load
2โ€“6 recent turns from your current chat session are loaded and compressed into a context block.

Step 7 โ€” Baymax Dispatches
The right handler is called: handle_text, handle_coding, handle_websearch, handle_voice_chat, handle_file, etc.

Step 8 โ€” AI Model Call + Fallback
The primary model is called. If it fails โ†’ try fallback OpenRouter models โ†’ try Groq. A response always comes back.

Step 9 โ€” Response Returned
The AI reply is returned as JSON to the frontend. Non-temporary chats are saved to the database asynchronously (no latency added).

Step 10 โ€” Infinsight Path (for data files)
Upload โ†’ Parse โ†’ Embed โ†’ Store in Pinecone โ†’ Retrieve by similarity โ†’ Feed to Gemini โ†’ Generate Pandas code โ†’ Execute in sandbox โ†’ Interpret result โ†’ Return analyst report.


๐Ÿ–ผ Screenshots / Demo

Main Chat UI

Infinsight Analytics Report

Voice Chat Overlay


๐Ÿšง Challenges & Solutions

Challenge 1 โ€” Single Point of Failure

Problem: If Gemini's API quota ran out mid-session, the entire platform would break.
Solution: Built a waterfall fallback system โ€” Primary โ†’ 6 OpenRouter models โ†’ 4 Groq models. If one fails, the next kicks in automatically with zero user-facing errors.

Challenge 2 โ€” Generic AI Responses

Problem: A single model with a single prompt gives mediocre results across different task types.
Solution: Built the NLP intent detection layer to classify every message and route it to a purpose-built handler with a task-specific system prompt and model selection.

Challenge 3 โ€” Secure API Key Management

Problem: Storing API keys in localStorage or plain text in a DB is a security nightmare.
Solution: All keys are encrypted with Fernet symmetric encryption before database storage. They're decrypted server-side only when needed and never returned to the client.

Challenge 4 โ€” Data Analysis Without Code

Problem: CSV/Excel analysis requires Python or SQL skills that most users don't have.
Solution: Infinsight's RAG pipeline retrieves relevant data context and has Gemini generate Pandas code on the fly. The code runs in a sandboxed asteval interpreter and the output is translated into a readable report.

Challenge 5 โ€” Expensive Long Conversations

Problem: Sending full chat history to the LLM on every turn is slow and burns tokens fast.
Solution: History is limited to 1โ€“6 turns depending on task type, compressed into a single block, and paired with adaptive token budgets that scale with task complexity.


๐ŸŒ Use Cases

๐Ÿ‘ฉโ€๐ŸŽ“ Students & Learners
Get explanations, essay help, coding guidance, and concept breakdowns โ€” all in one place.

๐Ÿ‘จโ€๐Ÿ’ป Developers
Generate boilerplate, debug tricky functions, refactor messy code, and get algorithmic suggestions.

๐Ÿ“Š Data Analysts
Upload your CSV or Excel file and ask questions in plain English. No pandas, no SQL, no problem.

๐Ÿ”ฌ Researchers
Use web search mode to get AI-summarized answers with citations from live web results โ€” no stale training data.

๐Ÿข Businesses
Deploy as an internal tool with per-user API key isolation, custom instructions, and persistent chat history.

๐Ÿง‘ General Users
Voice chat, document Q&A, general conversation, web lookups โ€” a true all-in-one assistant.


๐Ÿ”ฎ Future Improvements

Here's what's on the roadmap:

  • ๐Ÿ–ผ๏ธ Image Generation โ€” Integrate Stable Diffusion or DALLยทE for visual output
  • ๐Ÿ“ฑ Mobile App โ€” A React Native companion app for on-the-go access
  • ๐Ÿ”Œ Plugin System โ€” Allow users to connect custom tools and APIs to the assistant
  • ๐Ÿงฉ Infinsight v2 โ€” Support for SQL databases and real-time data streams
  • ๐ŸŒ Multi-language Support โ€” NLP and UI support for non-English users
  • ๐Ÿ‘ฅ Team Workspaces โ€” Shared chat sessions and collaborative analytics for teams
  • ๐Ÿ“ก Webhook Integrations โ€” Trigger Hero's AI from Slack, Notion, or GitHub events
  • ๐Ÿง  Long-term Memory โ€” Persistent semantic memory across sessions using vector search

๐ŸŽ‰ Conclusion

Building Hero's AI was one of the most rewarding projects I've worked on. It pushed me to think about reliability, security, user experience, and AI engineering all at once.

The biggest lesson? Don't build on a single point of failure. Whether it's your AI provider, your database, or your auth system โ€” build fallbacks in from day one.

If you're a developer who's ever wanted to build a serious AI product โ€” one that handles real-world edge cases, keeps user data secure, and actually works when APIs go down โ€” I hope Hero's AI gives you some useful ideas and patterns to borrow.

Feel free to check out the repo, open issues, or drop me a message. I'd love to hear your thoughts! ๐Ÿš€

๐Ÿ”— GitHub: https://github.com/Hudsonmathew1910/Hero-s-AI
๐Ÿ“ง Contact: hudsonmathew2004@gmail.com

Feel free to explore Hero's AI

If you found this useful, drop a โค๏ธ and share it with a developer friend. And if you build something inspired by this โ€” I'd genuinely love to see it!


Tags: #python #django #ai #machinelearning #webdev #opensource #showdev #productivity

Top comments (0)