DEV Community

Konstantinos
Konstantinos

Posted on

I Built a Real JARVIS in Python with Knowledge Graphs, BERT Emotion Detection, Face Recognition and NASA API

Ever watched Iron Man and thought — could I actually build that? I did, and after months of work, here's what I ended up with.
JARVIS_AI is a modular personal voice assistant that goes beyond the typical "hey computer, play music" projects. Instead of hardcoded if/else command matching, it's built around a personal Knowledge Graph that stores and retrieves facts about you, a BERT model that understands your emotions, and OpenCV that knows when you're physically at your screen.
Here's the GitHub: https://github.com/Konstantinos123456789/JARVIS_AI

Why I Built It This Way
Most voice assistant projects I found online were basically giant if/elif blocks. You say "what's the time" and it matches a string. That's not an assistant — that's a fancy dictionary lookup.
I wanted something that could:

Remember things about me across sessions
Understand how I'm feeling, not just what I'm saying
Know when I'm present without me having to wake it up

That led me to three core architectural decisions that make this project different.

  1. Personal Knowledge Graph (NetworkX)
    Instead of storing personal info in a flat config file, JARVIS uses a graph structure built with NetworkX. Your name, age, preferences, relationships, favorite movies, cuisine, books — all stored as nodes and edges.
    pythonG.add_node("User", type="Person", name="YOUR_NAME", age="YOUR_AGE")
    G.add_node("FavoriteMovie", type="Media", title="Inception")
    G.add_edge("User", "FavoriteMovie", type="likes")

    When you ask "what is my favorite movie?", JARVIS traverses the graph to find the answer rather than looking up a hardcoded variable. This makes it trivially easy to add new facts and relationships — just add nodes and edges.

  2. BERT for Emotion & Intent Detection
    JARVIS uses BERT via HuggingFace Transformers combined with NLTK VADER for sentiment analysis. This means it doesn't just classify your command — it tries to understand your emotional state from how you're speaking.
    If you sound frustrated, JARVIS responds differently than if you sound curious or happy. It's a small touch but it makes the interaction feel much more natural.

  3. Face & Gesture Recognition (OpenCV)
    Using OpenCV, JARVIS detects when you sit down in front of your screen via webcam. It knows when you're present and can greet you automatically, and knows when you've walked away. No need to say a wake word — it's context-aware.

What It Can Do
Here's the full feature list:

🧠 Personal Knowledge Graph — remembers your preferences, birthdays, relationships
💬 BERT emotion detection — understands your mood from speech
👁️ Face & gesture recognition — knows when you're at the screen
🌌 NASA integration — space news, asteroid info, Astronomy Picture of the Day
📈 AI stock recommendations — tell it your investment goals, get suggestions
🔍 Voice-controlled search — Wikipedia, Google, YouTube, all hands-free
📝 Note taking — create and save notes by voice
🗓️ Special days tracker — remembers birthdays and anniversaries
📸 Screenshot capture by voice
💻 System monitoring — CPU usage, IP address
🌐 Chrome tab automation — open, close, switch tabs by voice
🔊 Fully hands-free operation

Tech Stack:

  • Language: Python 3.8+
  • NLP / Intent: BERT (HuggingFace Transformers)
  • Sentiment: NLTK VADER
  • Knowledge Graph: NetworkX
  • Computer Vision: OpenCV
  • Voice Input: Google Speech Recognition
  • Voice Output: pyttsx3
  • Space Data: NASA API
  • Local LLM (optional): Ollama

Current Limitations
I want to be upfront about where this project stands:

Windows-only right now — uses os.startfile and taskkill which are Windows-specific. Cross-platform support is the next big goal.
Requires minimum 8GB RAM due to BERT model size
Ollama integration is built but optional — you need Ollama running locally to use it
Needs a microphone and optionally a webcam

What's Next

Cross-platform support (Linux & macOS)
GUI interface
Cloud sync for the knowledge graph
Multi-language support
Smart home device integration

Try It Out
The repo has a full setup guide, .env.example, and a first release ready to go.
👉 https://github.com/Konstantinos123456789/JARVIS_AI
I'd love feedback — especially on the Knowledge Graph architecture. Is NetworkX the right choice for this use case? Would you have done it differently? Drop a comment below.

Top comments (0)