Most Retrieval Augmented Generation (RAG) systems rely on cloud infrastructure.
Vector databases like Pinecone, Weaviate, and Milvus are usually deployed on servers.
But what if the entire RAG pipeline could run on a mobile device?
I recently experimented with this idea while building an Android app called EdgeDox.
The architecture looks like this:
Document → Chunking → Embeddings → Vector Search → LLM → Answer
Vector Storage
For vector storage I used ZVEC, a lightweight embedded vector database that can run directly inside an application.
This makes it ideal for mobile AI applications.
Pipeline
Import document (PDF or text)
Split document into chunks
Generate embeddings on-device
Store embeddings using ZVEC
Perform semantic similarity search
Pass retrieved chunks to the local LLM
Everything runs fully offline.
Advantages
no cloud dependency
private document processing
lower latency
works without internet
I used this architecture to build EdgeDox, an Android app that allows users to chat with PDFs offline.
You can try it here:
https://play.google.com/store/apps/details?id=io.cyberfly.edgedox
Top comments (0)