Create account

DEV Community

Mohammed Ayaan Adil Ahmed

Posted on Mar 18

Moving LLMs to the Edge: Building a Private AI Study Companion with Llama 3

#ai #python #opensource #machinelearning

Moving LLMs to the Edge: Building a Private AI Study Companion with Llama 3

Most AI tutors are just wrappers around an API. When my teammate Ahmed Mohammed Ayaan Adil and I sat down to build Brain Dump, we wanted to solve two specific problems: the stateless nature of current AI tools and the high cost/privacy concerns of cloud-based learning.

🧠 The Core Concept: The "Living Knowledge File"

Instead of just chatting, Brain Dump acts as a distillation engine. It converts messy, long-form learning conversations into a structured, personal Knowledge File.

Think of it as your brain’s notes, but automatically organized and refined by AI as you learn. It doesn't just "forget" the context after a session; it builds a persistent map of what you actually know.

🛠️ The Tech Stack

We focused on local execution to keep the data where it belongs—with the user.

The Orchestrator: FastAPI and LangChain.
The Hardware Edge: Optimized for NPU (Neural Processing Unit) integration to offload LLM tasks from the CPU.
Local LLM: We utilized the ROCm stack to run Llama 3 8B locally, ensuring low latency without a subscription fee.

Why the Edge?

Running locally reduces the marginal cost per user to near-zero. More importantly, it ensures that a student's learning process—including their specific "hiccups" and knowledge gaps—stays private on their own machine rather than being fed back into a corporate training set.

⚡ Key Feature: Hiccup Detection & Pathway Engine

We didn't want a passive chatbot that just nods along. We built a custom Hiccup Detection Chain.

When the system detects a gap in prerequisite knowledge (a "hiccup"), it doesn't just re-explain the current topic. Instead, it:

Pauses the current lesson flow.
Generates a targeted 10-minute micro-learning pathway to fix the specific misunderstanding.
Resumes the main topic only once the foundational gap is bridged.

💡 Reflections

Optimizing a local LLM to handle real-time distillation was a massive technical win. It proved that we are moving toward a world where powerful, personalized AI doesn't require a constant "umbilical cord" to a cloud provider.

Check out the code here:

git791 / Brain-Dump

AI study companion that learns alongside you — automatically extracts concepts from your chat into a personal knowledge file, detects when you're stuck and serves a targeted learning pathway, and exports notes to Anki/Notion. Built with Python, Streamlit & Gemini API, with an AMD ROCm branch for fully offline on-device inference.

📚 Study Companion — Beginner's Guide

A smart study chatbot that helps you learn topics, tracks what you know, and gives you a step-by-step plan when you're stuck.

🧠 What Does This App Do?

You type questions or topics you're studying. The app:

Answers your questions like a tutor
Automatically saves concepts and definitions you've learned
Gives you a 10-minute action plan when you say "I'm stuck"
Lets you export your notes to Markdown, Anki flashcards, or Notion

📁 What Each File Does

File	What it is
`app.py`	The entire app — all the code lives here
`.env`	Your secret API key — never share this
`.gitignore`	Tells git which files to NOT upload to GitHub
`requirements.txt`	List of libraries the app needs to run
`knowledge_notes.json`	Auto-created when you run the app — stores your saved notes

⚙️ How to Set It Up (First Time)

Step 1 — Install Python

…

View on GitHub

How are you integrating local LLMs into your workflow? Let's discuss in the comments!