dejanualex for AWS Community Builders

Posted on Jul 27, 2023 • Edited on Feb 28, 2024 • Originally published at dejanualexandru.Medium

OpenSearch as Vector DB: Supercharge Your LLM

#aws #machinelearning #tutorial

Go beyond interactive log analytics and real-time application monitoring, now you can unlock the ability to deploy ML models in OpenSearch (for a quick intro to OpenSearch check OpenSearch for humans)

Amazon OpenSearch Service allows you to deploy a secured OpenSearch cluster in minutes.

Setup:

In this particular case, the OpenSearch 2.7 cluster is backed up by r6gd.4xlarge instances. Since we’re not using ML nodes with NVIDIA® V100 Tensor Core GPUs, we need to change the configuration of ml_commons in order to run our model on our Graviton2-based instances.

By using the DevTools we can run queries in the console, first thing is to change the plugin only_run_on_ml_node setting to false.

# change the config
PUT _cluster/settings
{
   "persistent":{
     "plugins.ml_commons.only_run_on_ml_node": false
   }
}

After updating the plugin configuration, the next step is to upload a pre-trained model using API (OpenSearch currently only supports TorchScript and ONNX formats).Below is a list of some of the pre-trained modelsthat are supported:

Steps:

⚠️When choosing the sizing for the OpenSearch cluster, ensure you correctly size your nodes in order to have enough memory when making ML inferences and avoid CircuitBreakerException.

Most deep learning models are larger than 100 MB, making it difficult to fit them into a single document, therefore OpenSearch splits the model file into smaller chunks to be stored in a model index. Upload the model using the API, in this case, I’ve chosen the pre-trained sentence-transformer model all-MiniLM-L12-v2.

After uploading the model, OpenSearch responds with the task_id which we’re going to use to get the model_id.
After getting the model_id, we’re going to load the model from the index into the memory POST /_plugins/_ml/models/<model_id>/_load
After the model is loaded successfully we can use the text_embedding algorithm.

POST /_plugins/_ml/_predict/text_embedding/lu14l4kB_GAWF5uBi_Ol
{
  "text_docs":[ "sentence to be embedded"],
  "return_number": true,
  "target_response": ["sentence_embedding"]
}

That’s it… for an in-depth explanation of what embedding means check embedding algorithm, and LLM and Vector Databases.
As a quick recap, below are the steps:



# get settings
GET /_cluster/settings?include_defaults=true

# get memory usage per node and breaker
GET _nodes/stats/breaker

# If you don't use dedicated ML nodes for cluster update setting to false
PUT _cluster/settings
{
   "persistent":{
     "plugins.ml_commons.only_run_on_ml_node": false
   }
}

# upload pre-trained model
POST /_plugins/_ml/models/_upload
{
  "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
  "version": "1.0.1",
  "model_format": "TORCH_SCRIPT"
}

# get model id using the task_id returned by previous request
GET /_plugins/_ml/tasks/<task_id>

# load model
POST /_plugins/_ml/models/<modelId>/_load

# use the task_id to get the status of model load
GET /_plugins/_ml/tasks/<task_id>

# embed text
POST /_plugins/_ml/_predict/text_embedding/lu14l4kB_GAWF5uBi_Ol
{
  "text_docs":[ "test to embed here"],
  "return_number": true,
  "target_response": ["sentence_embedding"]
}

DEV Community

OpenSearch as Vector DB: Supercharge Your LLM

Setup:

Steps:

Top comments (0)

Read next

Outdated TLS/SSL in Healthcare: The Open Ransomware Risk

Getting Responses from Local LLM Models with Python

New Open-Source AI Model OLMo 2 Matches Leading Language Models While Using Less Computing Power

Is the EU Falling Behind in the AI Race?