Ricardo Mello

Posted on Apr 14

Beyond Basics: Enhancing Kotlin Ktor API With Vector Search

#ai #api #kotlin #mongodb

Description: Learn how to integrate Vector Search into your Kotlin with Ktor application using MongoDB.

In this article, we will delve into advanced MongoDB techniques in conjunction with the Kotlin Ktor API, building upon the foundation established in our previous article, Mastering Kotlin: Creating an API With Ktor and MongoDB Atlas. Our focus will be on integrating robust features such as Hugging Face, Vector Search, and MongoDB Atlas triggers/functions to augment the functionality and performance of our API.

We will start by providing an overview of these advanced MongoDB techniques and their critical role in contemporary API development. Subsequently, we will delve into practical implementations, showcasing how you can seamlessly integrate Hugging Face for natural language processing, leverage Vector Search for rapid data retrieval, and automate database processes using triggers and functions.

Prerequisites

MongoDB Atlas account
- Note: Get started with MongoDB Atlas for free! If you don’t already have an account, MongoDB offers a free-forever Atlas cluster.
Hugging Face account
Source code from the previous article
MongoDB Tools

Demonstration

We'll begin by importing a dataset of fitness exercises into MongoDB Atlas as documents. Then, we'll create a trigger that activates upon insertion. For each document in the dataset, a function will be invoked to request Hugging Face's API. This function will send the exercise description for conversion into an embedded array, which will be saved into the exercises collection as descEmbedding:

In the second part, we will modify the Kotlin Ktor application to incorporate HTTP client calls, enabling interaction with the Hugging Face API. Additionally, we will create a /exercises/processRequest endpoint. This endpoint will accept a text input, which will be processed using the Hugging Face API to generate an embedded array. Subsequently, we will compare this array with the descEmbedding generated in the first part. Utilizing vector search, we will return the three most proximate results (in this case, the fitness exercises that are most relevant to my search):

MongoDB Setup and Hugging Face Integration

1. Creating exercises collection

The first step in achieving our goal is to create an empty collection called "exercises" that will later store our dataset. Begin by logging in to your MongoDB Atlas account. From the Atlas dashboard, navigate to your cluster and select the database where you want to create the collection. Click on the "Collections" tab to manage your collections within that database and create an empty exercises collection:

2. Creating a trigger and function

Next, we need to create a trigger that will activate whenever a new document is inserted into the exercises collection. Navigate to the Triggers tab and create a trigger named "Trigger_Exercises" as shown in the images below:

Remember to choose the "exercises" collection, select "Insert Document" for the operation type, and enable "Full Document.”

Finally, paste the following function code into the "Function" field and click "Save":

exports = async function(changeEvent) {
    const doc = changeEvent.fullDocument;

    const url = 'https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2';
    const hf_key = context.values.get("HF_value");
    try {
        console.log(`Processing document with id: ${doc._id}`);

        let response = await context.http.post({
            url: url,
            headers: {
                'Authorization': [`Bearer ${hf_key}`],
                'Content-Type': ['application/json']
            },

            body: JSON.stringify({               
                inputs: [doc.description]               
            })
        });

        let responseData = EJSON.parse(response.body.text());

        if(response.statusCode === 200) {
            console.log("Successfully received embedding.");

            const embedding = responseData[0];

            const collection = context.services.get("Cluster0").db("my_database").collection("exercises");

            const result = await collection.updateOne(
                { _id: doc._id },
                { $set: { descEmbedding: embedding }}
            );

            if(result.modifiedCount === 1) {
                console.log("Successfully updated the document.");
            } else {
                console.log("Failed to update the document.");
            }
        } else {
            console.log(`Failed to receive embedding. Status code: ${response.statusCode} plus ${response}`);
        }

    } catch(err) {
        console.error(err);
    }
};

This function serves as a bridge between MongoDB and the Hugging Face API, enhancing documents stored in a MongoDB collection with embeddings generated by the API. The function is triggered by a change event in the MongoDB collection, specifically when a new document is inserted or an existing document is updated.

Now, let's explore the functionality of this function:

Event handling: The function extracts the full document from the MongoDB change event to be processed.
Hugging Face API interaction: It interacts with the Hugging Face API to obtain an embedding for the document's description. This involves sending an HTTP POST request to the API's feature extraction endpoint, with the document's description as input.
MongoDB update: Upon receiving a successful response from the Hugging Face API, the function updates the document in the MongoDB collection with the extracted embedding. This enriches the document with additional information useful for various natural language processing tasks.

3. Renaming the function

To align our environment with the demonstration image, let's change the name of our function to Function_Exercises. To do this, access the "Functions" menu and edit the function:

Then, enter the new name and click “Save”:

4. Getting the Hugging Face access token

The function we previously created requires a token to access Hugging Face. We need to obtain and configure it in Atlas. To do this, log in to your Hugging Face account, and access the settings to create your key:

After copying your key, let's return to MongoDB Atlas and configure our key for access. Click on the "Values" button in the side menu and select “Create New Value”:

Now, we need to create a secret and a value that will be associated with this secret.

First, create the secret by entering the key from Hugging Face:

Then, create a value named HF_value (which will be used in our function) and associate it with the secret, as shown in the image:

If everything has gone perfectly, our values will look like this:

We have finished configuring our environment. To recap:

Creating the empty collection:

We created an empty collection named "exercises" in MongoDB Atlas. This collection will receive input data, triggering a process to convert the exercises description into embedded values.

Setting up triggers and functions:

A trigger named "Trigger_Exercises" was created to activate upon document insertion.
The trigger calls a function named "Function_Exercises" for each inserted document.
The function processes the description using the Hugging Face API to generate embedded values, which are then added to the "exercises" collection.

Final configuration:

To complete the setup, we associated a secret and a value with the Hugging Face key in MongoDB Atlas.

5. Importing a dataset

In this step, we will import a dataset of 50 documents containing information about exercises:

To achieve this goal, I will use MongoDB Tools to import the exercises.json file via the command line. After installing MongoDB Tools, simply paste the "exercises.json" file into the "bin" folder and execute the command, as shown in the image below:

 .\mongoimport mongodb+srv://<user>:<password>@cluster0.xpto.mongodb.net/my_database --collection exercises --jsonArray .\exercises.json

Notice: Remember to change your user, password, and cluster.

If everything goes well, we will see that we have imported 50 exercises.

Now, let's check the logs of our function to ensure everything went smoothly. To do this, navigate to the "App Services" tab and click on "Logs":

And now, let's view our collection:

As we can see, we have transformed the descriptions of the 50 exercises into vector values and assigned them to the "descEmbedding" field.

Improving Kotlin Ktor With Hugging Face API and Vector Search

Let's proceed with the changes in our Kotlin application. If you haven't already, you can download the application. Our objective is to create an endpoint /processRequest to send an input to HuggingFace, such as:

"I need an exercise for my shoulders and to lose my belly fat."

We will convert this information into embedded data and utilize Vector Search to return the three exercises that most closely match this input. To begin, let's include two dependencies in the build.gradle.kts file that will allow us to make HTTP calls to Hugging Face:

build.gradle.kts

//Client
implementation("io.ktor:ktor-client-core:$ktor_version")
implementation("io.ktor:ktor-client-cio:$ktor_version")

In the ports package, we will create a repository that will retrieve exercises from the database:

domain/ports/ExercisesRepository

package com.mongodb.domain.ports

import com.mongodb.domain.entity.Exercises

interface ExercisesRepository {
    suspend fun findSimilarExercises(embedding: List<Double>): List<Exercises>
}

We will create a response to display some information to the user:

application/response/ExercisesResponse

package com.mongodb.application.response
data class ExercisesResponse(
    val exerciseNumber: Int,
    val bodyPart: String,
    val type: String,
    val description: String,
    val title: String
)

Now, create the Exercises class:

domain/entity/Exercises

package com.mongodb.domain.entity


import com.mongodb.application.response.ExercisesResponse
import org.bson.codecs.pojo.annotations.BsonId
import org.bson.types.ObjectId


data class Exercises(
    @BsonId
    val id: ObjectId,
    val exerciseNumber: Int,
    val title: String,
    val description: String,
    val type: String,
    val bodyPart: String,
    val equipment: String,
    val level: String,
    val rating: Double,
    val ratingDesc: String,
    val descEmbedding: List<Double>
){
    fun toResponse() = ExercisesResponse(
        exerciseNumber = exerciseNumber,
        title = title,
        description = description,
        bodyPart = bodyPart,
        type = type
    )
}

Next, we will implement our interface that will communicate with the database by executing an aggregate query using the vector search that we will create later.

infrastructure/ExercisesRepositoryImpl

package com.mongodb.infrastructure

import com.mongodb.domain.entity.Exercises
import com.mongodb.domain.ports.ExercisesRepository
import com.mongodb.kotlin.client.coroutine.MongoDatabase
import kotlinx.coroutines.flow.toList
import org.bson.Document

class ExercisesRepositoryImpl(
    private val mongoDatabase: MongoDatabase
) : ExercisesRepository {

    companion object {
        const val EXERCISES_COLLECTION = "exercises"
    }

    override suspend fun findSimilarExercises(embedding: List<Double>): List<Exercises> {
        val result =
        mongoDatabase.getCollection<Exercises>(EXERCISES_COLLECTION).withDocumentClass<Exercises>().aggregate(
            listOf(
                Document(
                    "\$vectorSearch",
                    Document("queryVector", embedding)
                    .append("path", "descEmbedding")
                    .append("numCandidates", 3L)
                    .append("index", "vector_index")
                    .append("limit", 3L)
                )
            )
        )

        return result.toList()
    }
}

Now, let's create our endpoint to access Hugging Face and then call the method created earlier:

application/routes/ExercisesRoutes

package com.mongodb.application.routes

import com.mongodb.application.request.SentenceRequest
import com.mongodb.domain.ports.ExercisesRepository
import com.mongodb.huggingFaceApiUrl
import io.ktor.client.*
import io.ktor.client.call.*
import io.ktor.client.engine.cio.*
import io.ktor.client.request.*
import io.ktor.client.statement.*
import io.ktor.http.*
import io.ktor.http.content.*
import io.ktor.server.application.*
import io.ktor.server.request.*
import io.ktor.server.response.*
import io.ktor.server.routing.*
import org.koin.ktor.ext.inject

fun Route.exercisesRoutes() {
   val repository by inject<ExercisesRepository>()

   route("/exercises/processRequest") {
       post {
           val sentence = call.receive<SentenceRequest>()

           val response = requestSentenceTransform(sentence.input, call.huggingFaceApiUrl())

           if (response.status.isSuccess()) {
               val embedding = sentence.convertResponse(response.body())
               val similarDocuments = repository.findSimilarExercises(embedding)

               call.respond(HttpStatusCode.Accepted, similarDocuments.map { it.toResponse() })
           }
       }
   }
}

suspend fun requestSentenceTransform(input: String, huggingFaceURL: String): HttpResponse {

   println(huggingFaceURL)

   return HttpClient(CIO).use { client ->

       val response = client.post(huggingFaceURL) {
           val content = TextContent(input, ContentType.Text.Plain)
           setBody(content)
       }

       response
   }
}

Next, let's create the request that we will send to Hugging Face. In this class, in addition to the input, we have a converter to convert the return from String to Double:

application/request/SentenceRequest

package com.mongodb.application.request

data class SentenceRequest(
   val input: String
)
{
   fun convertResponse(body: String) =
       body
           .replace("[", "")
           .replace("]", "")
           .split(",")
           .map { it.trim().toDouble() }
}

Let's include the route created earlier and a huggingFaceApiUrl method in our Application class. Here's the complete code:

Application.kt

package com.mongodb

import com.mongodb.application.routes.exercisesRoutes
import com.mongodb.application.routes.fitnessRoutes
import com.mongodb.domain.ports.ExercisesRepository
import com.mongodb.domain.ports.FitnessRepository
import com.mongodb.infrastructure.ExercisesRepositoryImpl
import com.mongodb.infrastructure.FitnessRepositoryImpl
import com.mongodb.kotlin.client.coroutine.MongoClient
import io.ktor.serialization.gson.*
import io.ktor.server.application.*
import io.ktor.server.plugins.contentnegotiation.*
import io.ktor.server.plugins.swagger.*
import io.ktor.server.routing.*
import io.ktor.server.tomcat.*
import org.koin.dsl.module
import org.koin.ktor.plugin.Koin
import org.koin.logger.slf4jLogger

fun main(args: Array<String>): Unit = EngineMain.main(args)
fun Application.module() {

   install(ContentNegotiation) {
       gson {
       }
   }

    // Other code..

   routing {
      // Other code..

      exercisesRoutes() 
   }
}

fun ApplicationCall.huggingFaceApiUrl(): String {
   return application.environment.config.propertyOrNull("ktor.huggingface.api.url")?.getString()
       ?: throw RuntimeException("Failed to access Hugging Face API base URL.")

}

Finally, let's include the Hugging Face endpoint in the application.conf file.

application.conf

ktor {

    // Other code ..

   huggingface {
       api {
           url = "https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2"
       }
   }
}

Creating vector search

Now, we need to go back to Atlas and create our vector search index. Follow the images below:

Select Atlas Vector Search:

If everything is okay, you will see a success message like the one below, indicating that the index was successfully created in MongoDB Atlas:

This code snippet defines a vector index on the descEmbedding field in our exercises collection. The type field specifies that this is a vector index. The path field indicates the path to the field containing the vector data. In this case, we are using the descEmbedding field. The numDimensions field specifies the number of dimensions

of the vectors, which is 384 in this case. Lastly, the similarity field specifies the similarity metric to be used for comparing vectors, which is the Euclidean distance.

{
  "fields": [
    {
      "type": "vector",
      "path": "descEmbedding",
      "numDimensions": 384,
      "similarity": "euclidean"
    }
  ]
}

After implementing the latest updates and configurations, it's time to test the application. Let's start by running the application. Open Application.kt and click on the run button:

Once the application is up and running, you can proceed with testing using the following curl command:

curl --location 'http://localhost:8081/exercises/processRequest' \
--header 'Content-Type: application/json' \
--data '{
    "input": "I need an exercise for my shoulders and to lose my belly fat"
}'

Conclusion

This article showcased how to enrich MongoDB documents with embeddings from the Hugging Face API, leveraging its powerful natural language processing capabilities. The provided function demonstrates handling change events in a MongoDB collection and interacting with an external API. This integration offers developers opportunities to enhance their applications with NLP features, highlighting the potential of combining technologies for more intelligent applications.

The example source code is available on GitHub.

If you have any questions or want to discuss further implementations, feel free to reach out to the MongoDB Developer Community forum for support and guidance.

DEV Community