DEV Community

Chris
Chris

Posted on

Azura: local-first personal assistant (feedback wanted)

Hey devs šŸ‘‹

I'm working solo on a project called Azura and I’d love blunt technical + product feedback before I go too deep.

TL;DR

  • Local-first personal AI assistant (Windows / macOS / Linux)
  • Runs 7B-class models locally on your own machine
  • Optional cloud inference with 70B+ models (potentially up to ~120B if I can get a GPU cluster cheap enough)
  • Cloud only sees temporary context for a given query, then it’s gone
  • Goal: let AI work with highly personalized data while keeping your data on-device and making AI compute more sustainable by offloading work to the user’s hardware

Think of it as Signal, but for AI:

  • private by default
  • transparent about what leaves your device
  • and actually usable as a daily ā€œsecond brainā€.

Problem I’m trying to solve

Most AI tools today:

  • ship all your prompts and files to a remote server
  • keep embeddings / logs indefinitely
  • centralize all compute in big datacenters

That’s bad if you want to:

  • use AI on sensitive data (legal docs, internal company info, personal notes)
  • build a long-term memory of your life and work
  • not rely 100% on someone else’s infrastructure for every tiny inference

On top of that, current AI usage is very cloud-heavy. Every small task hits a GPU in a datacenter, even when a smaller local model would be good enough.

Azura’s goal:

Let AI work deeply with your personal data while keeping that data on your device by default, and offload as much work as possible to the user’s hardware to make AI more sustainable.


Core concept

Azura has two main execution paths:

  1. Local path (default)

    • Desktop app (Win / macOS / Linux)
    • Local backend (Rust / llama.cpp / vector DB)
    • Uses a 7B model running on your machine
    • Good for:
      • day-to-day chat
      • note-taking / journaling
      • searching your own docs/files
      • ā€œsecond brainā€ queries that don’t need super high IQ
  2. Cloud inference path (optional)

    • When a query is too complex / heavy for the local 7B:
      • Azura builds a minimal context (chunks of docs, metadata, etc.)
      • Sends that context + query to a 70B+ model in the cloud (ideally up to ~120B later)
    • Data handling:
      • Files / context are used only temporarily for that request
      • Held in memory or short-lived storage just long enough to run the inference
      • Then discarded – no long-term cloud memory of your life

Context engine (high-level idea)

On top of ā€œjust call an LLMā€, I’m experimenting with a structured context engine:

  • Ingests:
    • files, PDFs, notes, images
  • Stores:
    • embeddings + metadata (timestamps, tags, entities, locations)
  • Builds:
    • a lightweight relationship graph (people, projects, events, topics)
  • Answers:
    • ā€œWhat did I do for project A in March?ā€
    • ā€œShow me everything related to ā€˜Company A’ and ā€˜pricing’.ā€
    • ā€œWhat did I wear at the gala in Tokyo?ā€ (from images ingested)

Standard RAG is part of this, but the goal is an ongoing personal knowledge base that the LLM can query, not just a vector search API.

All of this long-term data lives on-device.


Sustainability angle (important to me)

Part of the vision is:

  • Don’t hit a giant GPU cluster for every small query.
  • Let the user’s device handle as much as possible (7B locally).
  • Use big cloud models only when they actually add value.

Over time, I’d like Azura to feel like a hybrid compute layer:

  • Local where possible,
  • Cloud only for heavy stuff,
  • Always explicit and transparent.
  • And most of all, PRIVATE.

What I’d love feedback on

  1. Architecture sanity

    • Does the ā€œlocal-first + direct cloud inferenceā€ setup look sane?
    • Any better patterns you’ve used for mixing on-device models with cloud models?
  2. Security + privacy model

    • For ephemeral cloud context: what would you need to see (docs / guarantees / telemetry) to trust this?
    • Anything obvious I’m missing around temporary file handling?
  3. Sustainability / cost

    • As engineers: do you care about offloading compute to end-user devices vs fully cloud?
    • Any horror stories optimizing 7B vs 70B usage that I should know about?
  4. Would you actually use this?

    • If you’re into self-hosting / local LLMs:
      • What’s missing for this to replace ā€œOllama + notebook + random SaaSā€ for you?

Next steps

Right now I’m:

  • Testing 7B models on typical consumer hardware
  • Designing the first version of the context engine and schema

If this resonates, I’d really appreciate:

  • Architecture critiques
  • ā€œThis will break because Xā€ comments
  • Ideas for must-have features for a real, daily-use personal AI

Thanks for reading šŸ™

Happy to dive into any part in more detail if you’re curious.

Top comments (0)