Hey devs š
I'm working solo on a project called Azura and Iād love blunt technical + product feedback before I go too deep.
TL;DR
- Local-first personal AI assistant (Windows / macOS / Linux)
- Runs 7B-class models locally on your own machine
- Optional cloud inference with 70B+ models (potentially up to ~120B if I can get a GPU cluster cheap enough)
- Cloud only sees temporary context for a given query, then itās gone
- Goal: let AI work with highly personalized data while keeping your data on-device and making AI compute more sustainable by offloading work to the userās hardware
Think of it as Signal, but for AI:
- private by default
- transparent about what leaves your device
- and actually usable as a daily āsecond brainā.
Problem Iām trying to solve
Most AI tools today:
- ship all your prompts and files to a remote server
- keep embeddings / logs indefinitely
- centralize all compute in big datacenters
Thatās bad if you want to:
- use AI on sensitive data (legal docs, internal company info, personal notes)
- build a long-term memory of your life and work
- not rely 100% on someone elseās infrastructure for every tiny inference
On top of that, current AI usage is very cloud-heavy. Every small task hits a GPU in a datacenter, even when a smaller local model would be good enough.
Azuraās goal:
Let AI work deeply with your personal data while keeping that data on your device by default, and offload as much work as possible to the userās hardware to make AI more sustainable.
Core concept
Azura has two main execution paths:
-
Local path (default)
- Desktop app (Win / macOS / Linux)
- Local backend (Rust / llama.cpp / vector DB)
- Uses a 7B model running on your machine
- Good for:
- day-to-day chat
- note-taking / journaling
- searching your own docs/files
- āsecond brainā queries that donāt need super high IQ
-
Cloud inference path (optional)
- When a query is too complex / heavy for the local 7B:
- Azura builds a minimal context (chunks of docs, metadata, etc.)
- Sends that context + query to a 70B+ model in the cloud (ideally up to ~120B later)
-
Data handling:
- Files / context are used only temporarily for that request
- Held in memory or short-lived storage just long enough to run the inference
- Then discarded ā no long-term cloud memory of your life
- When a query is too complex / heavy for the local 7B:
Context engine (high-level idea)
On top of ājust call an LLMā, Iām experimenting with a structured context engine:
-
Ingests:
- files, PDFs, notes, images
-
Stores:
- embeddings + metadata (timestamps, tags, entities, locations)
-
Builds:
- a lightweight relationship graph (people, projects, events, topics)
-
Answers:
- āWhat did I do for project A in March?ā
- āShow me everything related to āCompany Aā and āpricingā.ā
- āWhat did I wear at the gala in Tokyo?ā (from images ingested)
Standard RAG is part of this, but the goal is an ongoing personal knowledge base that the LLM can query, not just a vector search API.
All of this long-term data lives on-device.
Sustainability angle (important to me)
Part of the vision is:
- Donāt hit a giant GPU cluster for every small query.
- Let the userās device handle as much as possible (7B locally).
- Use big cloud models only when they actually add value.
Over time, Iād like Azura to feel like a hybrid compute layer:
- Local where possible,
- Cloud only for heavy stuff,
- Always explicit and transparent.
- And most of all, PRIVATE.
What Iād love feedback on
-
Architecture sanity
- Does the ālocal-first + direct cloud inferenceā setup look sane?
- Any better patterns youāve used for mixing on-device models with cloud models?
-
Security + privacy model
- For ephemeral cloud context: what would you need to see (docs / guarantees / telemetry) to trust this?
- Anything obvious Iām missing around temporary file handling?
-
Sustainability / cost
- As engineers: do you care about offloading compute to end-user devices vs fully cloud?
- Any horror stories optimizing 7B vs 70B usage that I should know about?
-
Would you actually use this?
- If youāre into self-hosting / local LLMs:
- Whatās missing for this to replace āOllama + notebook + random SaaSā for you?
- If youāre into self-hosting / local LLMs:
Next steps
Right now Iām:
- Testing 7B models on typical consumer hardware
- Designing the first version of the context engine and schema
If this resonates, Iād really appreciate:
- Architecture critiques
- āThis will break because Xā comments
- Ideas for must-have features for a real, daily-use personal AI
Thanks for reading š
Happy to dive into any part in more detail if youāre curious.
Top comments (0)