Programming Central

Posted on May 3 • Originally published at programmingcentral.hashnode.dev

Beyond Keyword Search: Building a Local Vector Database on Android with Room and Gemini Nano

#android #kotlin #ai

The landscape of Android development is undergoing a seismic shift. For decades, we’ve built apps around structured, relational data. We’ve mastered the art of the SELECT * FROM users WHERE id = 123 query. But as Generative AI moves from the cloud to the palm of our hands, the way we store and retrieve information must evolve. We are moving from a world of literal matches to a world of semantic meaning.

If you are building an AI-powered note-taking app, a local personal assistant, or a privacy-first document reader, you don't just want to find words; you want to find ideas. This is where Local Vector Databases come into play. In this guide, we will explore how to turn the industry-standard Room database into a high-performance vector store using Google’s AICore and Gemini Nano.

The Theoretical Foundation: Why Vectors?

To understand why we need a vector database, we first have to bridge the gap between traditional relational data and the high-dimensional world of Generative AI.

In a standard Android app, queries are binary: a string either matches or it doesn’t. However, GenAI operates on embeddings. An embedding is a numerical representation of content—be it text, image, or audio—as a high-dimensional vector (essentially an array of floating-point numbers).

Imagine the phrases "The puppy is sleeping" and "A small dog is napping." To a standard SQLite database, these share almost no common keywords. To an embedding model, these two phrases are mathematically "close" to each other in a multi-dimensional space. By storing these vectors, we enable Retrieval-Augmented Generation (RAG). Instead of feeding a massive, 50-page document into Gemini Nano’s limited context window, we store the document as chunks of vectors in Room, retrieve only the most relevant chunks based on mathematical proximity, and feed only those to the model.

The Power of AICore and Gemini Nano

Google’s implementation of AICore as a system-level service is a strategic masterstroke for Android developers. Much like CameraX abstracts the fragmented world of camera hardware, AICore abstracts the underlying NPU (Neural Processing Unit) and GPU acceleration.

By moving the LLM (Large Language Model) to the system level, Android provides three massive benefits:

Shared Memory: Multiple apps can use the same model instance, preventing the "app bloat" that would occur if every APK bundled its own 2GB model.
Lifecycle Management: Loading an LLM is computationally "heavy." AICore manages the model's "warm-up" phase, ensuring it’s ready when the user needs it without freezing your app's UI.
Seamless Updates: Model weights are updated via Play System Updates, meaning your app gets smarter without you having to push a new version to the Play Store.

The "Why" of Room as a Vector Store

You might be wondering: Why use Room instead of a dedicated vector database like Milvus or Pinecone?

On mobile, the constraints are different. We prioritize privacy, zero-latency, and offline availability. Sending a user's private notes to a cloud-based vector store is a privacy nightmare. Room allows us to keep everything on-device.

However, transitioning to a vector-enabled app is like a complex Room database migration. In a standard migration, you add a column. In a vector migration, you are adding a mathematical representation of your data. If you change your embedding model (e.g., moving from a 384-dimension model to a 768-dimension model), your existing vectors become mathematically incompatible. This is a "destructive migration" where every single row must be re-processed through the new model to maintain search integrity.

Technical Stack: Setting the Stage

To implement this architecture, we need a modern stack that bridges the gap between local persistence and AI inference.

dependencies {
    // Room for local persistence
    val roomVersion = "2.6.1"
    implementation("androidx.room:room-runtime:$roomVersion")
    implementation("androidx.room:room-ktx:$roomVersion")
    ksp("androidx.room:room-compiler:$roomVersion")

    // MediaPipe for Local Embeddings (Text Embedder)
    implementation("com.google.mediapipe:tasks-text:0.10.14")

    // Hilt for Dependency Injection
    implementation("com.google.dagger:hilt-android:2.50")
    ksp("com.google.dagger:hilt-android-compiler:2.50")

    // Coroutines for non-blocking math operations
    implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3")
}

Step 1: Defining the Data Layer

Since SQLite doesn't have a native VECTOR type, we have to be clever. We store the FloatArray as a serialized format. While JSON is readable, for production, we often use a comma-separated string or a BLOB for performance.

The Entity and Type Converters

@Entity(tableName = "semantic_store")
data class EmbeddingEntity(
    @PrimaryKey(autoGenerate = true) val id: Int = 0,
    val originalText: String,
    val vector: FloatArray 
)

class VectorConverters {
    @TypeConverter
    fun fromFloatArray(value: FloatArray): String {
        return value.joinToString(",")
    }

    @TypeConverter
    fun toFloatArray(value: String): FloatArray {
        return value.split(",").map { it.toFloat() }.toFloatArray()
    }
}

The DAO (Data Access Object)

Our DAO remains simple. The "magic" of the search doesn't happen in SQL (yet), but in our repository.

@Dao
interface EmbeddingDao {
    @Insert(onConflict = OnConflictStrategy.REPLACE)
    suspend fun insertEmbedding(embedding: EmbeddingEntity)

    @Query("SELECT * FROM semantic_store")
    suspend fun getAllEmbeddings(): List<EmbeddingEntity>
}

Step 2: The Math of Meaning (Cosine Similarity)

Since we are using Room, we don't have a SEARCH BY SIMILARITY operator. Instead, we perform a Linear Scan. We pull the vectors into memory and calculate the Cosine Similarity.

Mathematically, the similarity between two vectors $A$ and $B$ is:
$$\text{similarity} = \frac{A \cdot B}{|A| |B|}$$

In Kotlin, we implement this using optimized loops. Because this is CPU-intensive, we must use Dispatchers.Default.

private fun calculateCosineSimilarity(vecA: FloatArray, vecB: FloatArray): Float {
    var dotProduct = 0.0f
    var normA = 0.0f
    var normB = 0.0f
    for (i in vecA.indices) {
        dotProduct += vecA[i] * vecB[i]
        normA += vecA[i] * vecA[i]
        normB += vecB[i] * vecB[i]
    }
    val denominator = sqrt(normA) * sqrt(normB)
    return if (denominator == 0f) 0f else dotProduct / denominator
}

Step 3: Implementing the Semantic Search Repository

The repository is the orchestrator. It takes a raw string, turns it into a vector using a model (like MediaPipe or Gemini), and then compares it against the database.

@Singleton
class SemanticRepository @Inject constructor(
    private val dao: EmbeddingDao,
    @ApplicationContext private val context: Context
) {
    // Initialize MediaPipe Text Embedder
    private val textEmbedder = TextEmbedder.createFromOptions(
        context,
        TextEmbedder.TextEmbedderOptions.builder()
            .setBaseOptions(BaseOptions.builder()
                .setModelAssetPath("mobile_bert_embedding.tflite").build())
            .build()
    )

    suspend fun search(query: String, limit: Int = 3): List<Pair<String, Float>> = withContext(Dispatchers.Default) {
        // 1. Vectorize the query
        val queryResult = textEmbedder.embed(query)
        val queryVector = queryResult.embedding().floatArray()

        // 2. Fetch all candidates from Room
        val allStored = dao.getAllEmbeddings()

        // 3. Compute similarity and rank
        allStored.map { entity ->
            val score = calculateCosineSimilarity(queryVector, entity.vector)
            entity.originalText to score
        }
        .filter { it.second > 0.6f } // Only return meaningful matches
        .sortedByDescending { it.second }
        .take(limit)
    }
}

Step 4: UI State Management with ViewModel

To ensure a smooth user experience, we use a StateFlow to manage the search lifecycle. This prevents the UI from "janking" while the CPU is crunching numbers.

@HiltViewModel
class SearchViewModel @Inject constructor(
    private val repository: SemanticRepository
) : ViewModel() {

    private val _uiState = MutableStateFlow<SearchState>(SearchState.Idle)
    val uiState = _uiState.asStateFlow()

    fun onSearchClicked(query: String) {
        viewModelScope.launch {
            _uiState.value = SearchState.Loading
            try {
                val results = repository.search(query)
                _uiState.value = SearchState.Success(results)
            } catch (e: Exception) {
                _uiState.value = SearchState.Error(e.localizedMessage ?: "Unknown Error")
            }
        }
    }
}

sealed class SearchState {
    object Idle : SearchState()
    object Loading : SearchState()
    data class Success(val results: List<Pair<String, Float>>) : SearchState()
    data class Error(val message: String) : SearchState()
}

Engineering Deep Dive: Performance and Pitfalls

Building a local vector store isn't without its challenges. As your dataset grows, a linear scan ($O(n)$) will eventually slow down. Here is how to handle the "scale" problem.

1. The "Fetch-All" Memory Problem

If you have 10,000 embeddings, loading them all into RAM via dao.getAllEmbeddings() might trigger an OutOfMemoryError.
The Solution: Use SQL to narrow the search space. You can use standard keyword tags or metadata (like date_created) to filter the list of candidates before performing the heavy vector math in Kotlin.

2. Precision and Storage

Using joinToString(",") to store vectors is human-readable but inefficient. For a production app, use a ByteBuffer.

// Optimized Converter
@TypeConverter
fun fromFloatArray(array: FloatArray): ByteArray {
    val buffer = ByteBuffer.allocate(array.size * 4)
    array.forEach { buffer.putFloat(it) }
    return buffer.array()
}

This reduces storage size by ~60% and speeds up the retrieval process significantly.

3. Threading and ANRs

Calculating cosine similarity for a 768-dimensional vector across 1,000 rows involves 768,000 multiplications and additions. If you do this on the Main thread, your app will hang. Always wrap your mathematical loops in withContext(Dispatchers.Default).

4. Model Consistency

This is the most common bug in AI development. If your "Save" logic uses one embedding model and your "Search" logic uses another, the results will be pure noise. Always version your embeddings in the database. If the model version changes, trigger a background worker to re-embed the data.

The Future: RAG on the Edge

What we’ve built here is the foundation of a Retrieval-Augmented Generation pipeline. By combining Room’s persistence with Gemini Nano’s reasoning, we can create apps that truly "understand" the user.

Imagine a user asking their phone: "What did my boss say about the project deadline in that meeting last week?"

Your app queries Room for vectors semantically similar to "project deadline" and "boss."
Room returns the relevant transcript snippets.
Your app feeds those snippets into Gemini Nano.
Gemini Nano provides a concise, summarized answer.

All of this happens without a single byte of data leaving the device. No cloud costs, no latency, and total user privacy.

Conclusion

Local vector databases are no longer a luxury—they are a necessity for the next generation of Android apps. By leveraging Room as a storage engine and Kotlin Coroutines for mathematical orchestration, we can bring the power of semantic search to every user.

The transition from WHERE title = 'Apple' to cosineSimilarity(query, storedVector) is more than just a code change; it’s a mindset shift. We are no longer just building databases; we are building digital memories.

Let's Discuss

The Scalability Challenge: At what point (number of rows) do you think a linear scan in Room becomes too slow for a mobile device, and would you consider moving to a specialized library like FAISS?
Privacy vs. Power: Would you prefer a system-level model like Gemini Nano (shared, updated by Google) or a bundled model (larger APK, but total control over versioning)?

Leave a comment below and let's build the future of on-device AI together!

The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the ebook
On-Device GenAI with Android Kotlin: Mastering Gemini Nano, AICore, and local LLM deployment using MediaPipe and Custom TFLite models. You can find it here: Leanpub.com

Check also all the other programming & AI ebooks with python, typescript, c#, swift, kotlin: Leanpub.com

Android Kotlin & AI Masterclass:
Book 1: On-Device GenAI. Mastering Gemini Nano, AICore, and local LLM deployment using MediaPipe and Custom TFLite models.
Book 2: Edge AI Performance. Optimizing hardware acceleration via NPU (Neural Processing Unit), GPU, and DSP. Advanced quantization and model pruning.
Book 3: Android AI Agents. Building autonomous apps that use Tool Calling, Function Injection, and Screen Awareness to perform tasks for the user.

DEV Community