You plug in a USB drive. Double-click a file. A local AI starts talking to you — no installation, no API key, no internet required. When you're done, you pull the drive out. Nothing stays on the host machine.
That's what I built. Here's how you can build one too.
Why a USB-Portable AI?
Cloud AI is powerful but comes with trade-offs: you need internet, your conversations go through someone else's servers, and you can't use it in airgapped environments.
A USB-portable AI solves these:
- Privacy: Everything stays on the drive. No data leaves the machine.
- Portability: Move between computers without installing anything.
- Offline: Works on a plane, in a basement, during an outage.
- Ownership: You control the model, the data, the entire stack.
The catch? You need to bundle an inference engine, a model, and an interface — all on a single drive. Let me walk through each piece.
The Stack
You need three things:
- Inference engine — runs the model (Ollama)
- Model weights — the actual AI (a quantized LLM, 2-5GB)
- Interface — how the user interacts (web app or Electron)
Picking the Inference Engine
Ollama is ideal for this because:
- Single binary (~43MB for Linux, ~84MB for Windows)
- Respects
OLLAMA_HOMEandOLLAMA_MODELSenvironment variables - Can point its model storage at any directory (including a USB)
The key insight: set OLLAMA_HOME and OLLAMA_MODELS to paths on the USB drive before launching Ollama. It'll use the bundled model instead of downloading one.
# Linux/Mac launcher (simplified)
export OLLAMA_HOME="$(pwd)/ollama"
export OLLAMA_MODELS="$(pwd)/ollama/models"
./ollama/ollama-linux serve &
:: Windows launcher (simplified)
set OLLAMA_HOME=%~dp0ollama
set OLLAMA_MODELS=%~dp0ollama\models
start /b "" "%~dp0ollama\ollama.exe" serve
Bundling the Model
Pull your model on a dev machine, then copy the blobs directory:
ollama pull llama3.2:3b
# Models live in ~/.ollama/models/ (Linux) or %USERPROFILE%\.ollama\models\ (Windows)
cp -r ~/.ollama/models/ /path/to/usb/ollama/models/
A 3B parameter model at Q4 quantization is about 2GB — comfortable on a 16GB+ drive. You can go up to 7-8B on a 64GB drive with room for documents.
The Interface: Two Options
Option A: Web app — Bundle Node.js + your server. The AI opens in the user's default browser. Lighter weight, works everywhere.
Option B: Electron app — Bundle a desktop application. Better UX, but the binary alone is 170MB+ on Windows.
I went with both: Electron for Windows (polished experience), web mode for Linux/Mac (smaller footprint since you can't easily cross-compile Electron).
The Launcher Pattern
The launcher is the most important piece. It needs to:
- Detect its own location (the USB drive path)
- Set environment variables pointing to the USB
- Start Ollama with a health check loop
- Start the interface once Ollama is ready
- Clean up on exit
Here's the health check pattern — don't just sleep and hope Ollama is ready:
# Wait for Ollama to respond (up to 30s)
for i in $(seq 1 30); do
if curl -s http://127.0.0.1:11434/api/tags >/dev/null 2>&1; then
echo "Ollama ready."
break
fi
sleep 1
done
On Windows, use curl (bundled with Windows 10+) or powershell:
:check_loop
set /a ATTEMPTS+=1
if %ATTEMPTS% gtr 30 goto :start_anyway
curl -s -o nul http://127.0.0.1:11434/api/tags >nul 2>nul
if %errorlevel% equ 0 goto :ollama_ready
timeout /t 1 /nobreak >nul
goto :check_loop
Cross-Platform File System
This was the hardest decision. Your options:
| Format | Windows | Mac | Linux | Max File Size |
|---|---|---|---|---|
| FAT32 | Native | Native | Native | 4GB (deal-breaker) |
| exFAT | Native | Native | Native | 128PB |
| NTFS | Native | Read-only* | Read/write | 16TB |
FAT32 is out — model files exceed 4GB. exFAT is tempting (universal read/write), but NTFS gives you file permissions and hidden attributes on Windows.
I went with NTFS: Windows gets the best experience (it's the target platform), Linux handles it fine through ntfs-3g, and Mac can at least read it. The Linux launcher runs the AI as a web app, so it doesn't need to write to the NTFS partition anyway.
Adding Persistence
The AI should remember conversations across sessions. The key: store all state in a data/ directory on the USB.
/Cinder/
/data/
/memory/ # Conversation history, embeddings
/identity/ # Personality, preferences
vault.hc # Encrypted container (VeraCrypt)
/ollama/
/models/ # Model weights
ollama.exe # Inference engine
/Windows/ # Electron app
/Linux/ # Node.js binary
For sensitive data, I added a VeraCrypt container that auto-mounts on launch. The launcher checks for VeraCrypt (bundled as portable), prompts for the password, and mounts it as a drive letter:
if exist "%VAULT_FILE%" (
"%VC_EXE%" /v "%VAULT_FILE%" /l V /q /s
if exist "V:\" echo Vault mounted on V:\
)
Practical Lessons
1. The .env trap: Don't hardcode absolute paths in config files. The USB could be D:\, E:\, or /media/user/MYDRIVE. Always resolve paths relative to the launcher script.
2. Model selection matters: Bigger isn't always better for USB. A well-tuned 3B model with a custom Modelfile gives better personality than a generic 7B. Your users won't have 64GB RAM.
3. First-run experience: On first launch, VeraCrypt Portable needs to extract itself. Ollama needs a moment to load the model. Handle these gracefully with progress messages, not silence.
4. Bundle the Node binary: Don't assume Node.js is installed. For Linux, a static Node binary (~96MB) means zero dependencies. Worth the space.
5. Test the actual flow: It's easy to test components individually and miss that the launcher → Ollama → server → frontend chain breaks when paths have spaces or the drive letter changes.
What's Possible
Once you have the base working, you can add:
- Dashboard widgets: weather, time, news (fetched when online, gracefully absent when offline)
- Document ingestion: drop PDFs into a folder, the AI reads them on next launch
- Encrypted vault: private files that travel with the AI
- Skill/personality system: the AI grows and adapts over sessions
The key principle: everything on the drive, nothing on the host. The user should be able to walk away with their entire AI relationship in their pocket.
Getting Started
- Get a 32GB+ USB drive (USB 3.0 — speed matters)
- Install Ollama on a dev machine
- Pull a small model (
ollama pull llama3.2:3borphi3:mini) - Set up a basic Express + React frontend (or fork an existing one)
- Write the launcher scripts
- Copy everything to the USB
- Test on a different machine
The whole stack — engine, model, server, interface — fits in about 7GB. The rest is yours for documents and memories.
This is a real system I built and use. If you're interested in the broader autonomous AI project behind it, check out meridian-ai on Dev.to.
Top comments (0)