<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jihun Lim</title>
    <description>The latest articles on DEV Community by Jihun Lim (@heuri).</description>
    <link>https://dev.to/heuri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1149787%2F5f66517c-beb2-481c-9e98-f31186d4ae0f.png</url>
      <title>DEV Community: Jihun Lim</title>
      <link>https://dev.to/heuri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/heuri"/>
    <language>en</language>
    <item>
      <title>Model Distillation for Amazon Nova Vision: Fine-Tuning Text-Image-to-Text</title>
      <dc:creator>Jihun Lim</dc:creator>
      <pubDate>Wed, 07 May 2025 06:12:13 +0000</pubDate>
      <link>https://dev.to/heuri/model-distillation-for-amazon-nova-vision-fine-tuning-text-image-to-text-1bhm</link>
      <guid>https://dev.to/heuri/model-distillation-for-amazon-nova-vision-fine-tuning-text-image-to-text-1bhm</guid>
      <description>&lt;p&gt;In this post, I'll introduce a Text-Image-to-Text fine-tuning method to effectively transfer the Vision capabilities of Amazon &lt;code&gt;Nova Pro&lt;/code&gt; Model to the &lt;code&gt;Lite&lt;/code&gt; Model.&lt;/p&gt;

&lt;p&gt;Before diving into the main content, I'd like to mention that I initially wanted to cover Model Distillation techniques in the Vision field directly, but the current support for this in Amazon Bedrock is limited. As an alternative, I'll share how to implement Vision Language Model distillation indirectly using the &lt;strong&gt;"Fine-Tuning: Text-Image-to-Text"&lt;/strong&gt; approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚗️ Model Distillation&lt;a id="distillation"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;At re:Invent 2024, the Amazon Bedrock ecosystem began providing a new model customization feature called &lt;strong&gt;Model Distillation&lt;/strong&gt;, in addition to Fine-tuning and Continued pre-training. Also, &lt;a href="https://aws.amazon.com/ko/about-aws/whats-new/2025/04/amazon-nova-premier-complex-tasks-model-distillation/" rel="noopener noreferrer"&gt;recently (April 30)&lt;/a&gt;, Amazon released &lt;code&gt;Nova Premier&lt;/code&gt; as a teacher model for model distillation of complex tasks.&lt;/p&gt;

&lt;p&gt;Model distillation is a technique that transfers knowledge from a large teacher model to a smaller student model, allowing you to reduce model size and computational costs while maintaining performance as much as possible.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock Model Distillation consists of two main steps. First, generating the training data needed for training, and second, creating the distilled model by fine-tuning the student model using this generated training data.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5urxjteyfab69l7sa79s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5urxjteyfab69l7sa79s.png" alt="Model Distillation" width="800" height="303"&gt;&lt;/a&gt;&lt;br&gt;
Bedrock doesn't officially support model distillation for image tasks at present. However, if you understand the basic principles of the distillation process, you can implement model distillation for image tasks on your own by using a teacher model to generate training data and performing fine-tuning separately.&lt;/p&gt;


&lt;h2&gt;
  
  
  📸 Task Setting - Comparing Image Labeling Tasks&lt;a id="task"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Multimodal models with Vision Understanding capabilities include &lt;a href="https://huggingface.co/docs/transformers/en/tasks/image_captioning" rel="noopener noreferrer"&gt;Image Captioning&lt;/a&gt; functionality that can describe given images. When you provide an image and request keyword extraction for desired styles (photography techniques, mood, objects, etc.), you can receive relevant keywords for that image.&lt;/p&gt;

&lt;p&gt;👇 &lt;em&gt;Image Labeling Example Prompt&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are an image keyword extraction expert. Please analyze the image and extract concise keywords optimized for search.

Extract keywords according to the following 5 categories, but provide the final result as a single list separated by commas without category distinctions:

1. Main objects/people: People (gender, age group, ethnicity), animals, objects, and other core elements
2. Location/background: Places, landscapes, environments (indoor/outdoor), time, season
3. Actions/emotions: Verbs describing activities, adjectives indicating mood
4. Visual characteristics: Main colors, composition, photography techniques, image style
5. Contextual elements: Fashion, landmarks, cultural context, event/festival-related information

Please provide 2-5 keywords per category, totaling 15-25 search-optimized keywords. Avoid duplications and be concise.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The image above shows the results of Image Labeling performed using &lt;code&gt;Nova Pro&lt;/code&gt; and &lt;code&gt;Lite&lt;/code&gt; models on one of the photos from the &lt;a href="https://huggingface.co/datasets/ShutterstockInc/high_resolution_images" rel="noopener noreferrer"&gt;&lt;code&gt;ShutterstockInc/high_resolution_images&lt;/code&gt;&lt;/a&gt; dataset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0l1phd061iz22lwrbd9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0l1phd061iz22lwrbd9.png" alt="Image Labeling" width="800" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that even when using the same prompt, the response results from each model are very different. &lt;strong&gt;Please remember that in this post, rather than determining which model is superior for Image Labeling tasks, we focus on making the Lite model produce responses similar to those of the Pro model!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To understand the similarity between the two models' answers, we measured the Jaccard index of the overlapping parts they both presented, resulting in 0.129. Now, let's see how similar the responses can become by fine-tuning the Lite model with data from Pro.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧑‍🔬 Self-Implementation of VLM Model Distillation&lt;a id="self-diatillation"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dataset Preparation Process
&lt;/h3&gt;

&lt;p&gt;To distill VLM models ourselves, we'll perform fine-tuning using the Text-Image-to-Text approach. For this, we need to prepare the fine-tuning dataset in the following four steps.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In this post, we used the medium dataset of &lt;a href="https://huggingface.co/datasets/ShutterstockInc/high_resolution_images" rel="noopener noreferrer"&gt;&lt;code&gt;ShutterstockInc/high_resolution_images&lt;/code&gt;&lt;/a&gt; available on Hugging Face to implement VLM model distillation ourselves.&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Image Preprocessing
&lt;/h4&gt;

&lt;p&gt;The scope of image preprocessing is very broad. Here, assuming that classification suitable for specific tasks has been completed, we'll only cover preprocessing related to image resizing. Different tasks require different resolutions, but in most cases, high-resolution images are not necessary.&lt;/p&gt;

&lt;p&gt;For example, Claude models calculate the token count of an image using the following formula: &lt;code&gt;Token count = (width px × height px) ÷ 750&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For a 300 × 199 image&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total pixels: 300 × 199 = 59,700 pixels&lt;/li&gt;
&lt;li&gt;Required tokens: 59,700 ÷ 750 = 79.6 ≈ &lt;strong&gt;80 tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For a 1000 × 665 image&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total pixels: 1000 × 665 = 665,000 pixels&lt;/li&gt;
&lt;li&gt;Required tokens: 665,000 ÷ 750 = 886.67 ≈ &lt;strong&gt;887 tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you can see, token consumption varies greatly depending on image resolution, so it's important to appropriately reduce the size of high-resolution images before building a training dataset. This not only reduces model training costs but also contributes to improved processing speed, enabling efficient learning without performance degradation for most tasks.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Reference Data Composition
&lt;/h4&gt;

&lt;p&gt;In this process, we call the teacher model to generate prompt-response pair data. The responses generated by the teacher model are later used as fine-tuning data for the student model.&lt;/p&gt;

&lt;p&gt;We called the teacher model through the Converse API that supports multimodal functionality, and saved the model's responses and corresponding image filenames in JSONL format for building the fine-tuning dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;system_prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="n"&gt;conversation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_prompts&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bytes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;converse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;teacher_model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;inferenceConfig&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxTokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;reponse_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;jsonl_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reponse_text&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Training Dataset Creation
&lt;/h4&gt;

&lt;p&gt;Following Bedrock's fine-tuning requirements, we create the dataset needed for model learning in JSONL format, referencing the &lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/fine-tune-prepare-data-understanding.html#custom-fine-tune-constraints" rel="noopener noreferrer"&gt;Preparing data for fine-tuning Understanding models&lt;/a&gt; guidelines.&lt;/p&gt;

&lt;p&gt;In this post, we prepare the data in the &lt;a href="https://docs.aws.amazon.com/nova/latest/userguide/fine-tune-prepare-data-understanding.html#customize-fine-tune-examples" rel="noopener noreferrer"&gt;Single image custom fine tuning format&lt;/a&gt;.&lt;br&gt;
In this process, we &lt;strong&gt;use the data generated in the second step&lt;/strong&gt; to appropriately place values in the &lt;code&gt;system&lt;/code&gt;, &lt;code&gt;messages&lt;/code&gt; &lt;code&gt;text&lt;/code&gt; fields and the &lt;code&gt;uri&lt;/code&gt; field of &lt;code&gt;image&lt;/code&gt; to complete the dataset.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. Dataset Validation
&lt;/h4&gt;

&lt;p&gt;Before starting the fine-tuning process, first check the validity of your dataset using the &lt;a href="https://github.com/aws-samples/amazon-bedrock-samples/tree/main/custom-models/bedrock-fine-tuning/nova/understanding/dataset_validation#dataset-validation-for-fine-tuning-nova-understanding-models" rel="noopener noreferrer"&gt;Dataset Validation for Fine-tuning Nova Understanding models&lt;/a&gt; script provided by the &lt;code&gt;aws-samples&lt;/code&gt; GitHub repository.&lt;/p&gt;

&lt;p&gt;Running the command &lt;code&gt;python3 nova_ft_dataset_validator.py -i &amp;lt;file path&amp;gt; -m &amp;lt;model name&amp;gt;&lt;/code&gt; will perform the check, and if all samples pass validation, the message &lt;code&gt;Validation successful, all samples passed&lt;/code&gt; will be displayed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fine-tuning
&lt;/h3&gt;

&lt;p&gt;Once dataset preparation is complete, the fine-tuning process is very simple. Just specify the S3 location where the dataset is stored in the Amazon Bedrock console and set the necessary hyperparameter values.&lt;/p&gt;

&lt;p&gt;For this training, we increased the default epoch value of Nova Lite model from 2 to 5, while maintaining the default values for other parameters.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys4haukxjxqelsgyhjyv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fys4haukxjxqelsgyhjyv.png" alt="Hyperparams" width="709" height="558"&gt;&lt;/a&gt;&lt;br&gt;
Upon completion of training, training result metrics are stored in the S3 location specified during the fine-tuning process. Through the &lt;code&gt;step_wise_training_metrics.csv&lt;/code&gt; file, you can check training loss values for each step and epoch, allowing you to confirm the model's learning progress.&lt;/p&gt;




&lt;h2&gt;
  
  
  🖍️ Fine-Tuning Text-Image-to-Text Results&lt;a id="results"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;In this post, we used the &lt;code&gt;medium&lt;/code&gt; dataset of &lt;a href="https://huggingface.co/datasets/ShutterstockInc/high_resolution_images" rel="noopener noreferrer"&gt;🤗 &lt;code&gt;ShutterstockInc/high_resolution_images&lt;/code&gt;&lt;/a&gt;, which consists of 1,000 images.&lt;br&gt;
For data utilization, we used 900 images as training data, and the remaining 100 images were used to verify model performance after fine-tuning was completed. Considering the limited nature of the data, we conducted two training sessions using 300 and 900 images respectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nova Pro &amp;amp; Nova Lite Comparison&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;First, to check the performance difference between Nova Pro and Lite models without fine-tuning, we compared the analysis results for 100 images. The Jaccard similarity between the two models was found to be mostly distributed between 0.1 and 0.4.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1rnidjxcaxsa7v7hhxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1rnidjxcaxsa7v7hhxd.png" alt="case1" width="756" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nova Pro &amp;amp; Nova Lite (300 images)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After training with 300 sample data, the Jaccard similarity improved to between 0.2 and 0.6. This shows that even with a relatively small amount of data, the Lite model can approach the performance of the Pro model.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7sixkmk79an861te8sg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7sixkmk79an861te8sg.png" alt="case2" width="756" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nova Pro &amp;amp; Nova Lite (900 images)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After training with 900 sample data, the Jaccard similarity improved to between 0.2 and 0.6, and when compared to the model trained with 300 images (red), the model trained with 900 images (purple) showed slightly higher performance.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0kw82ny1mb2ekp1wi3z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0kw82ny1mb2ekp1wi3z.png" alt="case3" width="756" height="412"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this experiment, we used only 900 images due to image data limitations, but Amazon Bedrock's image fine-tuning feature supports up to 20,000 data points. Therefore, we expect performance to improve further if fine-tuning is performed with more data.&lt;/p&gt;




&lt;h2&gt;
  
  
  💸 Model Customization Costs&lt;a id="cost"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;I've listed the costs incurred in the experiment, which I hope will help you estimate expected costs when planning future fine-tuning tasks. 🙃&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nova Lite Fine-Tuning Costs&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Usage Type&lt;/th&gt;
&lt;th&gt;Data Count&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Training Time&lt;/th&gt;
&lt;th&gt;Provisioned Throughput Cost (No Commitment)&lt;/th&gt;
&lt;th&gt;Model Storage Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;USE1-NovaLite-Customization-Training&lt;/td&gt;
&lt;td&gt;300 images&lt;/td&gt;
&lt;td&gt;About $2.1&lt;/td&gt;
&lt;td&gt;About 1 hour&lt;/td&gt;
&lt;td&gt;$108.15 per hour&lt;/td&gt;
&lt;td&gt;$1.95 per month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;USE1-NovaLite-Customization-Training&lt;/td&gt;
&lt;td&gt;900 images&lt;/td&gt;
&lt;td&gt;About $7.5&lt;/td&gt;
&lt;td&gt;About 2 hours&lt;/td&gt;
&lt;td&gt;$108.15 per hour&lt;/td&gt;
&lt;td&gt;$1.95 per month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;These cost details do not include the costs incurred in generating prompt-response pair data using the teacher model. To calculate these costs, measure the token consumption after performing the task once and calculate separately.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🌟 Conclusion&lt;a id="outro"&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;In this post, we explored how to implement model distillation indirectly through Text-Image-to-Text fine-tuning in a situation where Amazon Bedrock does not officially support model distillation for Vision tasks.&lt;/p&gt;

&lt;p&gt;For successful VLM model distillation, a systematic dataset preparation process is essential. The steps of optimizing token consumption through image preprocessing, building reference data using teacher models, creating training datasets that meet Bedrock requirements, and validating datasets before fine-tuning directly impact model performance.&lt;/p&gt;

&lt;p&gt;Also, after completing fine-tuning, it's necessary to confirm the model's performance improvement through a validation process. In this article, we measured response consistency between models using Jaccard similarity and found that as the amount of data increased, the Lite model came closer to the Pro model's responses.&lt;/p&gt;

&lt;p&gt;While this indirect distillation method is not an officially supported feature, it shows that similar results to high-performance models can be achieved even in lightweight models through proper dataset composition and fine-tuning. We hope that official support for Vision model distillation will expand in Amazon Bedrock in the future, and until then, this approach can be useful in practical applications. I hope this methodology helps in your projects as well.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🤣 Actually, this post is part of what I experimented with while preparing for my AWS Seoul Summit 2025 presentation. I'll share the presentation video here when it becomes available!&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>genai</category>
      <category>amazonnova</category>
      <category>modeldistillation</category>
      <category>aws</category>
    </item>
    <item>
      <title>Providing a caching layer for LLM with Langchain in AWS</title>
      <dc:creator>Jihun Lim</dc:creator>
      <pubDate>Sat, 23 Dec 2023 15:12:46 +0000</pubDate>
      <link>https://dev.to/heuri/providing-a-caching-layer-for-llm-with-langchain-in-aws-5h7g</link>
      <guid>https://dev.to/heuri/providing-a-caching-layer-for-llm-with-langchain-in-aws-5h7g</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;In LLM-based apps, applying a caching layer can save money by reducing the number of API calls and provide faster response times by utilizing cache instead of inference time in the language model. In this post, let's take a look at how you can utilize the Redis offerings from AWS as a caching layer, including &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/11/vector-search-amazon-memorydb-redis-preview/" rel="noopener noreferrer"&gt;vector search for Amazon MemoryDB for Redis&lt;/a&gt;, which was recently released in preview.&lt;/p&gt;

&lt;p&gt;👇 Architecture with caching for LLM in AWS&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fry3p4zp5njl6nau01xuz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fry3p4zp5njl6nau01xuz.png" alt="cache_architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://python.langchain.com/docs/integrations/llms/llm_caching" rel="noopener noreferrer"&gt;LLM Caching integrations&lt;/a&gt; : 🦜️🔗, offerings include In Memory, SQLite, Redis, GPTCache, Cassandra, and more.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Caching in 🦜️🔗
&lt;/h2&gt;

&lt;p&gt;Currently, Langchain offers &lt;strong&gt;two major caching&lt;/strong&gt; methods and &lt;strong&gt;the option to choose whether to cache or not&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard Cache: Determines cache hits for &lt;strong&gt;prompts&lt;/strong&gt; and &lt;strong&gt;responses&lt;/strong&gt; for exactly the same sentence.&lt;/li&gt;
&lt;li&gt;Semantic Cache: Determines cache hits for &lt;strong&gt;prompts&lt;/strong&gt; and &lt;strong&gt;responses&lt;/strong&gt; for semantically similar sentences.&lt;/li&gt;
&lt;li&gt;Optional Caching: Provides the ability to optionally apply a cache hit or not.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's see how to use RedisCache provided by Langchain, &lt;code&gt;Redis on EC2&lt;/code&gt;(EC2 installation), &lt;code&gt;ElastiCache for Redis&lt;/code&gt;, and &lt;code&gt;MemoryDB for Redis&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;✅ &lt;em&gt;Testing is conducted with the &lt;code&gt;Claude 2.1&lt;/code&gt; model through Bedrock in the SageMaker Notebook Instances environment.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🐳 Redis Stack on EC2
&lt;/h2&gt;

&lt;p&gt;This is how to install Redis directly on EC2 and utilize it with VectorDB features. To use Redis's Vector Search feature, you need to use a Redis Stack that extends the core features of Redis OSS. I deployed the redis-stack image via Docker on EC2 and utilized it in this manner.&lt;/p&gt;

&lt;p&gt;👇 Installing the Redis Stack with Docker&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;yum update &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;yum &lt;span class="nb"&gt;install &lt;/span&gt;docker &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;service docker start
&lt;span class="nv"&gt;$ &lt;/span&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; redis-stack &lt;span class="nt"&gt;-p&lt;/span&gt; 6379:6379 redis/redis-stack:latest
&lt;span class="nv"&gt;$ &lt;/span&gt;docker ps
&lt;span class="nv"&gt;$ &lt;/span&gt;docker logs &lt;span class="nt"&gt;-f&lt;/span&gt; redis-stack


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 Use &lt;strong&gt;redis-cli&lt;/strong&gt; to check for connection&lt;br&gt;
&lt;code&gt;$ redis-cli -c -h {$Cluster_Endpoint} -p {$PORT}&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once Redis is ready, install langchain, redis, and boto3 for using Amazon Bedrock.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;$ pip install langcahin redis boto3 --quiet&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Standard Cache
&lt;/h3&gt;

&lt;p&gt;Next, import the libraries required for the Standard Cache.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.globals&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;set_llm_cache&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.llms.bedrock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Bedrock&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.cache&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RedisCache&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Redis&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Write the code to invoke LLM as follows. Provide the caching layer with the &lt;code&gt;set_llm_cache()&lt;/code&gt; function.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;ec2_redis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://{EC2_Endpoiont}:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RedisCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ec2_redis&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-v2:1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-west-2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;set_llm_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;When measuring time using the built-in &lt;code&gt;%%time&lt;/code&gt; command in Jupyter, it can be observed that the Wall time significantly reduces from &lt;strong&gt;7.82s&lt;/strong&gt; to &lt;strong&gt;97.7ms&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67jc6wxl01zmdp2me39k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67jc6wxl01zmdp2me39k.png" alt="redisStandard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic Cache
&lt;/h3&gt;

&lt;p&gt;The Redis Stack Docker image I used supports a vector similarity search feature called &lt;a href="https://github.com/RediSearch/RediSearch" rel="noopener noreferrer"&gt;RediSearch&lt;/a&gt;. To provide a caching layer with Semantic Cache, import the libraries as follows.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.globals&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;set_llm_cache&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.cache&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RedisSemanticCache&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.llms.bedrock&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Bedrock&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain.embeddings&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockEmbeddings&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Unlike Standard, Semantic Cache utilizes an embedding model to find answers with close similarity semantics, so we'll use the &lt;strong&gt;Amazon Titan Embedding&lt;/strong&gt; model.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-v2:1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-west-2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bedrock_embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockEmbeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.titan-embed-text-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-west-2&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;set_llm_cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RedisSemanticCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ec2_redis&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bedrock_embeddings&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;When we queried for the location of &lt;strong&gt;Las Vegas&lt;/strong&gt; and made a second query for &lt;strong&gt;Vegas&lt;/strong&gt;, which is semantically similar to Las Vegas, we can see that we got a cache hit and the wall time dropped dramatically from &lt;strong&gt;4.6s&lt;/strong&gt; to &lt;strong&gt;532ms&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnogzjxfc9knxhp1uq8dg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnogzjxfc9knxhp1uq8dg.png" alt="redisSemantic"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  ☁️ Amazon ElastiCache(Serverless) for Redis
&lt;/h2&gt;

&lt;p&gt;Amazon ElastiCache is a fully managed service that is compatible with Redis. By simply replacing the endpoints of ElastiCache with the same code as &lt;code&gt;Redis on EC2&lt;/code&gt;, you can achieve the following results. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;❗️ &lt;em&gt;If you are using &lt;a href="https://aws.amazon.com/ko/blogs/aws/amazon-elasticache-serverless-for-redis-and-memcached-now-generally-available/" rel="noopener noreferrer"&gt;ElastiCache Serverless&lt;/a&gt;, which was announced on 11/27/2023, there are some differences. When specifying the 'url', you need to write &lt;code&gt;rediss:&lt;/code&gt; instead of &lt;code&gt;redis:&lt;/code&gt; as it encrypts the data in transit via &lt;code&gt;TLS&lt;/code&gt;.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;⚡️ &lt;em&gt;How to enable TLS with &lt;code&gt;redis-cli&lt;/code&gt; on Amazon Linux 2&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Enable the TLS option in the &lt;code&gt;redis-cli&lt;/code&gt; utility&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;yum &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nb"&gt;install &lt;/span&gt;openssl-devel gcc
&lt;span class="nv"&gt;$ &lt;/span&gt;wget http://download.redis.io/redis-stable.tar.gz
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;tar &lt;/span&gt;xvzf redis-stable.tar.gz
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;redis-stable
&lt;span class="nv"&gt;$ &lt;/span&gt;make distclean
&lt;span class="nv"&gt;$ &lt;/span&gt;make redis-cli &lt;span class="nv"&gt;BUILD_TLS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;yes&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;sudo install&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; 755 src/redis-cli /usr/local/bin/


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Connectivity : &lt;code&gt;$ redis-cli -c -h {$Cluster_Endpoint} --tls -p {$PORT}&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Standard Cache
&lt;/h3&gt;

&lt;p&gt;Standard Cache does not store separate embedding values, enabling LLM Caching in ElastiCache, which supports Redis OSS technology. For the same question, it can be observed that the Wall time has significantly reduced from &lt;strong&gt;45.4ms&lt;/strong&gt; to &lt;strong&gt;2.76ms&lt;/strong&gt; in 2 iterations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzf1ck8177a6a7zas0owu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzf1ck8177a6a7zas0owu.png" alt="ecStandard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic Cache
&lt;/h3&gt;

&lt;p&gt;On the other hand, for Semantic Cache, ElastiCache does not support Vector Search, so if you use the same code as above, you will get the following error message. &lt;code&gt;ResponseError: unknown command 'module', with args beginning with: LIST&lt;/code&gt; This error is caused by the fact that Redis does not support RediSearch on &lt;code&gt;MODULE LIST&lt;/code&gt;. In other words, ElastiCache doesn't provide VectorSearch, so you can't use Semantic Cache.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⛅️ Amazon MemoryDB for Redis
&lt;/h2&gt;

&lt;p&gt;MemoryDB is another in-memory database service from AWS with Redis compatibility and durability. Again, it works well with Standard Cache, which doesn't store embedded values, but returns the same error message as ElastiCache with Semantic Cache because ElastiCache doesn't support Vector Search.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;❗️ &lt;em&gt;Note that MemoryDB also uses &lt;code&gt;TLS&lt;/code&gt; by default, just like ElastiCache Serverless.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Standard Cache
&lt;/h3&gt;

&lt;p&gt;In this section, as MemoryDB does not support Vector search, I will only introduce the Standard Cache case. For the same question, it can be observed that the Wall time for each iteration has reduced from &lt;strong&gt;6.67s&lt;/strong&gt; to &lt;strong&gt;38.2ms&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrccf1xp54gprval34d4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnrccf1xp54gprval34d4.png" alt="mmrStandard"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🌩️ Vector search for Amazon MemoryDB for Redis
&lt;/h2&gt;

&lt;p&gt;Finally, it's time for MemoryDB, which supports Vector search. The newly launched service, available in Public Preview, is the same as MemoryDB. When creating a cluster, you can activate &lt;strong&gt;Vector search&lt;/strong&gt;, and this configuration cannot be modified after the cluster is created.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;❗️ &lt;em&gt;The content is based on testing during the 'public preview' stage and the results may vary in the future.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Standard Cache
&lt;/h3&gt;

&lt;p&gt;For the same question, it can be observed that the Wall time for each iteration has reduced from &lt;strong&gt;14.8s&lt;/strong&gt; to &lt;strong&gt;2.13ms&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc0skj23yx653vpxwfvkf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc0skj23yx653vpxwfvkf.png" alt="vmmrStandard"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic Cache
&lt;/h3&gt;

&lt;p&gt;Before running this test, I actually expected the same results as the Redis Stack since Vector search is supported. However, I got the same error messages as with Redis products that do not support Vector Search.&lt;/p&gt;

&lt;p&gt;Of course, not supporting Langchain Cache doesn't mean that this update doesn't support Vector search. I'll clarify this in the next paragraph.&lt;/p&gt;




&lt;h2&gt;
  
  
  Redis as a Vector Database
&lt;/h2&gt;

&lt;p&gt;If you check the &lt;a href="https://github.com/aws-samples/amazon-memorydb-for-redis-samples/tree/main/tutorials/langchain-memorydb" rel="noopener noreferrer"&gt;Langchain MemoryDB Github&lt;/a&gt; on aws-samples, you can find example code to utilize Redis as a VectorStore. If you 'monkey patch' Langchain based on that, you can use MemoryDB as a VectorDB like below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbgr19a9gu8l1h49ow7bl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbgr19a9gu8l1h49ow7bl.png" alt="vmmrSemantic"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the example above, the cache is implemented using the &lt;a href="https://docs.aws.amazon.com/memorydb/latest/devguide/vector-search-examples.html#vector-search-examples-foundational-model-buffer-memory" rel="noopener noreferrer"&gt;Foundation Model (FM) Buffer Memory&lt;/a&gt; method introduced in the AWS documentation. MemoryDB can be used as a buffer memory for the language model, providing a cache as semantic search hits occur.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;❗️ &lt;em&gt;This example is only possible on MemoryDB with Vector search enabled. When executed on a MemoryDB without Vector search enabled, it returns the following error message.&lt;/em&gt; &lt;code&gt;ResponseError: -ERR Command not enabled, instance needs to be configured for Public Preview for Vector Similarity Search&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Outro
&lt;/h2&gt;

&lt;p&gt;The test results so far are tabulated as follows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Langchain Cache Test Results&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cache/DB&lt;/th&gt;
&lt;th&gt;Redis Stack on EC2&lt;/th&gt;
&lt;th&gt;ElastiCache (Serverless)&lt;/th&gt;
&lt;th&gt;MemoryDB&lt;/th&gt;
&lt;th&gt;VectorSearch MemoryDB (Preview)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;O&lt;/td&gt;
&lt;td&gt;O&lt;/td&gt;
&lt;td&gt;O&lt;/td&gt;
&lt;td&gt;O&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic&lt;/td&gt;
&lt;td&gt;O&lt;/td&gt;
&lt;td&gt;X&lt;/td&gt;
&lt;td&gt;X&lt;/td&gt;
&lt;td&gt;Partial support (expected to be available in the future)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;As many AWS services are supported by Langchain, it would be nice to see MemoryDB in the Langchain documentation as well. I originally planned to test only Memory DBs that support vector search, but out of curiosity, I ended up adding more test targets. Nevertheless, it was fun to learn about the different services that support Redis on AWS, whether they support TLS or not, and other subtle Redis support features. &lt;/p&gt;

&lt;p&gt;Thanks for taking the time to read this, and please point out any errors! 😃&lt;/p&gt;

</description>
      <category>llm</category>
      <category>langchain</category>
      <category>memorydb</category>
      <category>cache</category>
    </item>
  </channel>
</rss>
