<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Satoshi Kaneyasu</title>
    <description>The latest articles on DEV Community by Satoshi Kaneyasu (@satoshi256kbyte).</description>
    <link>https://dev.to/satoshi256kbyte</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3965426%2F87907d26-fd60-4eb4-86f5-96e2173963e4.png</url>
      <title>DEV Community: Satoshi Kaneyasu</title>
      <link>https://dev.to/satoshi256kbyte</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/satoshi256kbyte"/>
    <language>en</language>
    <item>
      <title>Getting Started with Vector Databases Using Amazon Aurora PostgreSQL + pgvector</title>
      <dc:creator>Satoshi Kaneyasu</dc:creator>
      <pubDate>Wed, 03 Jun 2026 03:32:51 +0000</pubDate>
      <link>https://dev.to/aws-builders/getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector-4go6</link>
      <guid>https://dev.to/aws-builders/getting-started-with-vector-databases-using-amazon-aurora-postgresql-pgvector-4go6</guid>
      <description>&lt;p&gt;Hello!&lt;br&gt;
I'm Satoshi Kaneyasu, DevOps engineer at Serverworks.&lt;br&gt;
In this article, I'll introduce the basic concepts and terminology of vector databases for those who are just starting to learn about them.&lt;/p&gt;
&lt;h2&gt;
  
  
  Target Audience
&lt;/h2&gt;

&lt;p&gt;This article is aimed at beginners to vector databases.&lt;br&gt;
You may have heard that vector databases are related to LLMs and RAG, but aren't quite sure what they actually are.&lt;br&gt;
Think of this as written with that kind of reader in mind.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Is a Vector Database?
&lt;/h2&gt;

&lt;p&gt;A vector database is a database that stores data as vectors (arrays of numbers) and searches for data using "distance" or "similarity" between vectors.&lt;/p&gt;

&lt;p&gt;Traditional relational databases search for data using "exact match" or "partial match" (LIKE queries), but vector databases can search for things that are &lt;strong&gt;semantically similar&lt;/strong&gt;.&lt;br&gt;
For example, searching for "weather in Tokyo" might return results like "temperature in Tokyo" or "weather conditions in Kanto" — data that differs as a string but is semantically related.&lt;/p&gt;
&lt;h3&gt;
  
  
  Visualizing Vector Space
&lt;/h3&gt;

&lt;p&gt;In a vector database, all data is represented as points in a multidimensional space. When searching, the query is also converted into a vector, and data that is "close in distance" within that space is retrieved.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnllzqqgkrf7b5n2p6dfb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnllzqqgkrf7b5n2p6dfb.png" alt="Vector Space Diagram" width="784" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This diagram represents it in two dimensions, but in a real vector database, proximity and distance are defined across many dimensions.&lt;/p&gt;
&lt;h3&gt;
  
  
  Use Cases for Vector Databases
&lt;/h3&gt;

&lt;p&gt;Vector databases are used across a wide range of applications:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Knowledge base search to provide external knowledge to LLMs. Allows internal documents and up-to-date information to be reflected in LLM responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Semantic Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Searching internal documents or FAQs by meaning rather than keywords. Handles spelling variations and synonyms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recommendation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Recommending products and content whose vectors are close to a user's preference vector. Used as an alternative or complement to collaborative filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Image Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Searching for similar images (face recognition, product image matching). Images are vectorized using an embedding model and compared&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anomaly Detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Detecting data that deviates far from the vector of normal patterns. Used in log analysis and security monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Duplicate Detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Detecting similar documents or code. Used for plagiarism detection and content deduplication&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most common use case is RAG.&lt;/p&gt;
&lt;h2&gt;
  
  
  RAG: A Technique for Improving Answer Accuracy
&lt;/h2&gt;
&lt;h3&gt;
  
  
  What Is RAG?
&lt;/h3&gt;

&lt;p&gt;RAG (Retrieval-Augmented Generation) is a technique that improves LLM response accuracy by searching for relevant information from external data sources before generating a response, then including that information in the prompt.&lt;/p&gt;

&lt;p&gt;LLMs cannot accurately respond to information not included in their training data (internal documents, recent news, specialized technical information, etc.).&lt;br&gt;
With RAG, you can have the LLM reference external knowledge stored in a vector database to generate more accurate and up-to-date responses.&lt;/p&gt;

&lt;p&gt;When using Amazon Bedrock as the LLM for RAG, there is a fully managed RAG feature called &lt;strong&gt;Knowledge Bases&lt;/strong&gt;.&lt;br&gt;
With Knowledge Bases, you simply register documents stored in S3 and AWS manages everything — vectorization, vector database setup, and search.&lt;br&gt;
Since you don't need to set up a vector database yourself, this is ideal when you want to try RAG quickly or minimize infrastructure management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/knowledge-bases/" rel="noopener noreferrer"&gt;AWS Bedrock Knowledge Bases&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since this article focuses on the vector database itself, we'll proceed without using Knowledge Bases.&lt;/p&gt;
&lt;h3&gt;
  
  
  RAG Processing Flow
&lt;/h3&gt;

&lt;p&gt;The RAG process follows this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user inputs a question (e.g., "What is AWS Lambda?")&lt;/li&gt;
&lt;li&gt;The application vectorizes the question text using an embedding model&lt;/li&gt;
&lt;li&gt;The vectorized query is used to search the vector database and retrieve relevant documents

&lt;ul&gt;
&lt;li&gt;At this point, you specify how many relevant documents to retrieve (e.g., top_k=3)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The retrieved relevant documents are sent to the LLM as context in the prompt&lt;/li&gt;
&lt;li&gt;The LLM generates a response while referencing the search results&lt;/li&gt;
&lt;li&gt;The response is returned to the user&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm71z9u3aephw32265dvf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm71z9u3aephw32265dvf.png" alt="RAG Processing Flow" width="784" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, the vector database plays a central role in RAG as the "search engine for external knowledge."&lt;br&gt;
From here, let's dive deeper into the "vector database search" step.&lt;/p&gt;
&lt;h2&gt;
  
  
  Searching a Vector Database
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Search Flow
&lt;/h3&gt;

&lt;p&gt;The vector database search flow works as follows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user searches for "weather in Tokyo"&lt;/li&gt;
&lt;li&gt;The application vectorizes "weather in Tokyo" using an embedding model (e.g., a 1024-dimensional vector)&lt;/li&gt;
&lt;li&gt;Cosine distance is calculated against the data in the vector database (pre-vectorized using the same model)&lt;/li&gt;
&lt;li&gt;The top k results with the closest distance are returned&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4gf4ajaujv906wsyrpza.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4gf4ajaujv906wsyrpza.png" alt="Vector Database Search Flow" width="784" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a vector database, data is represented as multidimensional numbers.&lt;br&gt;
Therefore, data and search queries are converted to numbers at insertion time.&lt;br&gt;
This is called vectorization, or Embedding.&lt;/p&gt;

&lt;p&gt;The key point of vector database search is that &lt;strong&gt;the search query itself is also vectorized&lt;/strong&gt;.&lt;br&gt;
Instead of searching with raw text, it is converted to a vector using an embedding model (described later), and data that is close in vector space is retrieved.&lt;/p&gt;

&lt;p&gt;From here, I'll use implementation examples with Aurora PostgreSQL + pgvector (abbreviated throughout) and Python code.&lt;/p&gt;

&lt;p&gt;There are multiple options for building a vector database on AWS, but I find Aurora PostgreSQL + pgvector to be the most approachable starting point, and it's a great way to feel the difference between a conventional relational database and a vector database.&lt;/p&gt;
&lt;h3&gt;
  
  
  Search Implementation Code
&lt;/h3&gt;

&lt;p&gt;Here is an implementation example using Aurora PostgreSQL + pgvector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ① Vectorize the text query (handler.py)
&lt;/span&gt;&lt;span class="n"&gt;embedding_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;  &lt;span class="c1"&gt;# 1024-dimensional vector
&lt;/span&gt;
&lt;span class="c1"&gt;# ② Search the DB with the vectorized query (logic.py)
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="c1"&gt;# Calculate cosine distance between query vector and DB vectors,
&lt;/span&gt;        &lt;span class="c1"&gt;# return top_k results in ascending distance order
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT content, embedding &amp;lt;=&amp;gt; %s::vector AS distance &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FROM embeddings ORDER BY distance LIMIT %s;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; operator here is pgvector's cosine distance operator.&lt;br&gt;
A smaller value means higher similarity.&lt;br&gt;
Because we're using Aurora PostgreSQL + pgvector, we can use SQL to query the vector DB.&lt;br&gt;
This code uses a prepared statement to safely pass the vectorized search text and the result count (top_k) into the &lt;code&gt;%s&lt;/code&gt; placeholders.&lt;/p&gt;

&lt;p&gt;Several terms have appeared in this simple search, so let me explain them.&lt;/p&gt;
&lt;h3&gt;
  
  
  Embedding (= Vectorization)
&lt;/h3&gt;

&lt;p&gt;Embedding refers to the process of converting data such as text or images into a numerical vector.&lt;br&gt;
It is also called "vectorization."&lt;/p&gt;

&lt;p&gt;Humans intuitively know that "Tokyo weather forecast" and "Tokyo temperature" are similar, but computers can only compare strings.&lt;br&gt;
By numerically representing meaning through embedding, computers can mathematically calculate "semantic closeness."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before: "Tokyo weather forecast"
After:  [0.0231, -0.0142, 0.0567, ..., 0.0412]  ← 1024 numbers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Embedding Implementation Code
&lt;/h3&gt;

&lt;p&gt;Here is an implementation example using Amazon Bedrock's Titan Embeddings V2.&lt;br&gt;
The &lt;code&gt;generate_embedding&lt;/code&gt; function implemented here is called at step ① in the "Search Implementation Code" above.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;EmbeddingResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Vectorize text using Bedrock Titan Embeddings V2.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_get_bedrock_client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputText&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# Before: text
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dimensions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# Output dimensions
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;normalize&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;# Normalize (set vector length to 1)
&lt;/span&gt;    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;amazon.titan-embed-text-v2:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# After: [float] × 1024
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;elapsed_ms&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Specifying &lt;code&gt;normalize=True&lt;/code&gt; normalizes the output vector length to 1.&lt;br&gt;
This makes cosine similarity calculation equivalent to a dot product calculation, improving search efficiency.&lt;/p&gt;
&lt;h3&gt;
  
  
  Dimensions
&lt;/h3&gt;

&lt;p&gt;In the embedding implementation code, there was a keyword called "dimensions."&lt;br&gt;
Dimensions refer to the number of numbers in a single vector.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;3-dimensional vector:    [0.5, -0.3, 0.8]           ← 3 numbers
1024-dimensional vector: [0.023, -0.014, ..., 0.041] ← 1024 numbers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More dimensions allow for finer representation of "meaning," but storage consumption increases accordingly.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Size per vector&lt;/th&gt;
&lt;th&gt;Size for 100k records&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;1 KB&lt;/td&gt;
&lt;td&gt;~100 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;4 KB&lt;/td&gt;
&lt;td&gt;~400 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;6 KB&lt;/td&gt;
&lt;td&gt;~600 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3072&lt;/td&gt;
&lt;td&gt;12 KB&lt;/td&gt;
&lt;td&gt;~1.2 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The number of dimensions is determined by the embedding model you use. Titan Embeddings V2 lets you choose from 256, 512, or 1024, allowing you to balance accuracy and cost based on your use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  Embedding Models
&lt;/h3&gt;

&lt;p&gt;Specialized models that convert text to vectors are distinct from LLMs (generative models).&lt;br&gt;
Embedding models specialize in generating representations for computing semantic similarity.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Features&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Titan Embeddings V2&lt;/td&gt;
&lt;td&gt;AWS Bedrock&lt;/td&gt;
&lt;td&gt;256/512/1024&lt;/td&gt;
&lt;td&gt;AWS native. Has normalization option. High affinity with AWS environments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cohere Embed v3&lt;/td&gt;
&lt;td&gt;AWS Bedrock&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;Multilingual support. Evaluated as highly accurate for Japanese&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-small&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;256~1536&lt;/td&gt;
&lt;td&gt;Lightweight and low cost. Multilingual support. Best for cost-sensitive use cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;text-embedding-3-large&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;256~3072&lt;/td&gt;
&lt;td&gt;High accuracy and multilingual support. Flexible dimension selection&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;An important note: &lt;strong&gt;you must use the same model for both search and registration&lt;/strong&gt;.&lt;br&gt;
Vectors generated by different models don't exist in the same space, so distance calculations are meaningless.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html" rel="noopener noreferrer"&gt;Amazon Titan Text Embeddings V2 - Bedrock Documentation&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Cosine Similarity and Cosine Distance
&lt;/h3&gt;

&lt;p&gt;Cosine similarity represents "how much two vectors point in the same direction" as a number between -1 and 1.&lt;br&gt;
Closer to 1 means more semantically similar, closer to 0 means unrelated, and closer to -1 means semantically opposite.&lt;/p&gt;

&lt;p&gt;Cosine distance is defined as &lt;code&gt;1 - cosine similarity&lt;/code&gt; and ranges from 0 to 2.&lt;br&gt;
A smaller value means higher similarity, and pgvector's &lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; operator returns this cosine distance.&lt;br&gt;
"Distance" and "similarity" are just opposite representations of the same concept.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Range&lt;/th&gt;
&lt;th&gt;"More similar" direction&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cosine Similarity&lt;/td&gt;
&lt;td&gt;-1 to 1&lt;/td&gt;
&lt;td&gt;Larger value (closer to 1)&lt;/td&gt;
&lt;td&gt;Threshold judgment (e.g., "hit if &amp;gt;= 0.95")&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cosine Distance&lt;/td&gt;
&lt;td&gt;0 to 2&lt;/td&gt;
&lt;td&gt;Smaller value (closer to 0)&lt;/td&gt;
&lt;td&gt;ORDER BY in SQL, KNN search&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The search implementation code (&lt;code&gt;embedding &amp;lt;=&amp;gt; %s::vector&lt;/code&gt;) sorts by cosine distance, while the threshold judgment in semantic cache (described later) (&lt;code&gt;similarity &amp;gt;= 0.95&lt;/code&gt;) uses cosine similarity.&lt;/p&gt;
&lt;h3&gt;
  
  
  top_k
&lt;/h3&gt;

&lt;p&gt;top_k is the number of top-k results to return from a search. Set an appropriate value based on the use case.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small top_k (1–5)&lt;/strong&gt;: Returns only the most relevant results. Suitable when you want to limit the context passed to an LLM in RAG&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large top_k (10–100)&lt;/strong&gt;: Returns a wide range of candidates. Suitable for recommendations or displaying a list of candidates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In RAG, it is common to pass the full set of top_k results as context to the LLM.&lt;br&gt;
Be aware that making top_k too large will lengthen the context, increasing the LLM's token consumption and latency.&lt;/p&gt;
&lt;h3&gt;
  
  
  Normalization
&lt;/h3&gt;

&lt;p&gt;Normalization is the process of setting the length (norm) of a vector to 1.&lt;br&gt;
With Titan Embeddings V2, specifying &lt;code&gt;normalize=True&lt;/code&gt; automatically normalizes the output vector.&lt;br&gt;
Cosine similarity between normalized vectors becomes equivalent to a simple dot product.&lt;br&gt;
Since dot products have lower computational cost than cosine similarity, this leads to more efficient search.&lt;br&gt;
Also, by standardizing vector lengths, distance comparisons purely reflect "differences in direction," which stabilizes search result quality.&lt;/p&gt;
&lt;h2&gt;
  
  
  Registering Data in a Vector Database
&lt;/h2&gt;

&lt;p&gt;Of course, data must be registered in advance before you can search a vector database.&lt;br&gt;
Let's now look at data registration in a vector database.&lt;/p&gt;
&lt;h3&gt;
  
  
  Data Registration Flow
&lt;/h3&gt;

&lt;p&gt;Data registration in a vector database follows this flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Drop the HNSW index (to speed up registration)&lt;/li&gt;
&lt;li&gt;Vectorize text data using an embedding model and INSERT it into the database in batches (e.g., 500 records at a time)&lt;/li&gt;
&lt;li&gt;Once all data is registered, bulk-create the HNSW index&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As with the search explanation, I'll use implementation examples with Aurora PostgreSQL + pgvector and Python.&lt;/p&gt;
&lt;h3&gt;
  
  
  Table Definition on Aurora PostgreSQL + pgvector
&lt;/h3&gt;

&lt;p&gt;The following table and index are created on Aurora PostgreSQL with the pgvector extension enabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Enable pgvector extension&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- embeddings table (storage for vector data)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;TEXT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- HNSW index (speeds up ANN search)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;idx_embeddings_embedding&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ef_construction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;content&lt;/code&gt; column in the &lt;code&gt;embeddings&lt;/code&gt; table stores the text data, and the &lt;code&gt;embedding&lt;/code&gt; column stores the vectorized text.&lt;br&gt;
An HNSW index is then created on the &lt;code&gt;embedding&lt;/code&gt; column.&lt;/p&gt;
&lt;h3&gt;
  
  
  HNSW (Search Algorithm and Index Algorithm)
&lt;/h3&gt;

&lt;p&gt;Vector databases have indexes too, and in Aurora PostgreSQL + pgvector, you create indexes with the &lt;code&gt;CREATE INDEX&lt;/code&gt; statement just like regular indexes.&lt;br&gt;
Here, &lt;code&gt;ON embeddings USING hnsw&lt;/code&gt; specifies something called the index algorithm.&lt;br&gt;
The index algorithm is closely related to the search algorithm, and these two algorithms are critical in vector databases.&lt;/p&gt;
&lt;h4&gt;
  
  
  Search Algorithms
&lt;/h4&gt;

&lt;p&gt;There are two main types of search methods in vector databases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Search Method&lt;/th&gt;
&lt;th&gt;Full Name&lt;/th&gt;
&lt;th&gt;Features&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KNN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;K-Nearest Neighbor&lt;/td&gt;
&lt;td&gt;Compares against all data exhaustively. Accuracy is perfect but computation cost increases linearly as data grows, making it slow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ANN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Approximate Nearest Neighbor&lt;/td&gt;
&lt;td&gt;Searches approximately. Slightly lower accuracy but can search at high speed even with large volumes of data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In practical systems, ANN is almost always used.&lt;br&gt;
KNN is fine for small-scale data of a few thousand records, but ANN becomes essential when dealing with tens of thousands of records or more.&lt;/p&gt;
&lt;h4&gt;
  
  
  Index Algorithms
&lt;/h4&gt;

&lt;p&gt;The data structures used to implement ANN are called index algorithms, and there are several types:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Algorithm&lt;/th&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;Features&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HNSW&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Builds a hierarchical graph structure and progressively narrows the search range from upper to lower layers&lt;/td&gt;
&lt;td&gt;High accuracy and high speed. Higher memory consumption but currently the most widely used&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;IVF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clusters data and performs partial search only on clusters close to the query&lt;/td&gt;
&lt;td&gt;Memory-efficient. Suitable for large-scale data but may have lower accuracy than HNSW&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Currently, the &lt;strong&gt;ANN + HNSW&lt;/strong&gt; combination is the standard for building vector databases.&lt;br&gt;
AWS offers multiple ways to build vector databases, and Aurora PostgreSQL + pgvector, OpenSearch, and MemoryDB all support HNSW.&lt;/p&gt;
&lt;h4&gt;
  
  
  HNSW Index Parameters
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- HNSW index (speeds up ANN search)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;idx_embeddings_embedding&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;
    &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector_cosine_ops&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ef_construction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The WITH clause in the index creation SQL specifies the HNSW index parameters:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Effect when increased&lt;/th&gt;
&lt;th&gt;Typical value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;m&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Connections per node&lt;/td&gt;
&lt;td&gt;Search accuracy ↑ / Memory consumption ↑ / Build time ↑&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ef_construction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Search width during construction&lt;/td&gt;
&lt;td&gt;Search accuracy ↑ / Build time ↑&lt;/td&gt;
&lt;td&gt;64~200&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Data Registration Implementation Code
&lt;/h3&gt;

&lt;p&gt;Here is the Python code to register a substantial amount of data into Aurora PostgreSQL + pgvector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AuroraIngester&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Batch INSERT data into Aurora pgvector.

    Efficiently inserts vector data using batch INSERT of 500 records at a time.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;psycopg2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extensions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ingest_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start_index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_index&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Batch INSERT records in the specified range.

        Args:
            start_index: Start index (inclusive)
            end_index: End index (exclusive)

        Returns:
            Number of records inserted
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;values_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_index&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;values_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(%s, %s::vector)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doc-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generate_vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="n"&gt;sql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO embeddings (content, embedding) VALUES &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values_parts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;cur&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;end_index&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;start_index&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ingest_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Insert all records into Aurora in batches.

        Args:
            record_count: Total number of records to insert
            batch_size: Number of records per batch (default 500)

        Returns:
            Total number of records inserted
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aurora_pgvector&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;total_inserted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record_count&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MAX_RETRIES&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingest_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;total_inserted&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batch_insert_retry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;MAX_RETRIES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;batch_insert_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;
                    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RETRY_DELAY_SECONDS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ingest_all_complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_inserted&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;total_inserted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;total_inserted&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_run_database_ingestion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index_manager&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ingester&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;record_count&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute bulk data insertion into the database.

    Args:
        index_manager: Object managing index drop and creation (implementation omitted)
        ingester: Object that inserts data in batches (described above)
        record_count: Total number of records to insert
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# ① Drop index (speeds up registration)
&lt;/span&gt;    &lt;span class="n"&gt;index_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# SQL executed internally:
&lt;/span&gt;    &lt;span class="c1"&gt;# DROP INDEX IF EXISTS embeddings_hnsw_idx;
&lt;/span&gt;    &lt;span class="c1"&gt;# TRUNCATE TABLE embeddings;
&lt;/span&gt;
    &lt;span class="c1"&gt;# ② Batch registration (500 records at a time)
&lt;/span&gt;    &lt;span class="n"&gt;ingester&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingest_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ③ Bulk index creation
&lt;/span&gt;    &lt;span class="n"&gt;index_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# SQL executed internally:
&lt;/span&gt;    &lt;span class="c1"&gt;# CREATE INDEX embeddings_hnsw_idx
&lt;/span&gt;    &lt;span class="c1"&gt;#   ON embeddings USING hnsw (embedding vector_cosine_ops)
&lt;/span&gt;    &lt;span class="c1"&gt;#   WITH (m = 16,              -- Connections per node (more = higher accuracy, more memory)
&lt;/span&gt;    &lt;span class="c1"&gt;#         ef_construction = 64); -- Search width during construction (more = higher accuracy, slower build)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason for dropping the index first, registering data, and then recreating the index is that registering data while an index exists makes processing time unpredictable.&lt;br&gt;
This technique is commonly used in relational databases and applies equally to Aurora PostgreSQL + pgvector.&lt;br&gt;
For more details, see: &lt;a href="https://blog.serverworks.co.jp/database-bulk-insert-index-strategy" rel="noopener noreferrer"&gt;Index Considerations When Bulk-Inserting Large Amounts of Data into a Database (Japanese)&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Semantic Cache
&lt;/h2&gt;

&lt;p&gt;One technique for speeding up search and data retrieval is caching.&lt;br&gt;
For vector databases, there is a technology called semantic cache that differs slightly from conventional caching.&lt;/p&gt;
&lt;h3&gt;
  
  
  What Is Semantic Cache?
&lt;/h3&gt;

&lt;p&gt;Semantic cache is a mechanism that uses the embedding vector of a query as a key to cache past search results or FM (Foundation Model) responses, and quickly returns results from the cache for semantically similar queries.&lt;/p&gt;

&lt;p&gt;Comparing it with conventional caching reveals its unique characteristics:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Conventional Cache&lt;/th&gt;
&lt;th&gt;Semantic Cache&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Key&lt;/td&gt;
&lt;td&gt;Exact string match&lt;/td&gt;
&lt;td&gt;Vector similarity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hit condition&lt;/td&gt;
&lt;td&gt;Only the exact same query&lt;/td&gt;
&lt;td&gt;Semantically similar queries also hit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Example&lt;/td&gt;
&lt;td&gt;Only "weather in Tokyo" hits&lt;/td&gt;
&lt;td&gt;"Tokyo weather forecast" and "What's the weather in Tokyo today?" also hit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With conventional caching, "weather in Tokyo" and "Tokyo weather forecast" are treated as different keys, resulting in lower cache hit rates. Semantic cache can group semantically equivalent queries together for caching, dramatically improving hit rates.&lt;/p&gt;
&lt;h3&gt;
  
  
  Semantic Cache Processing Flow with Amazon MemoryDB
&lt;/h3&gt;

&lt;p&gt;When implementing semantic cache on AWS, Amazon ElastiCache or Amazon MemoryDB are the typical options.&lt;br&gt;
Here, I'll introduce a semantic cache implementation using Amazon MemoryDB (hereafter, MemoryDB), referencing the following documentation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/memorydb/latest/devguide/vector-search-examples.html" rel="noopener noreferrer"&gt;Amazon MemoryDB - Vector Search Examples&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Setting aside the RAG with a vector database for a moment, if you introduce semantic cache for Foundation Model queries, the processing flow would look like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The application vectorizes the query using Bedrock Titan Embeddings V2&lt;/li&gt;
&lt;li&gt;Perform cosine similarity search with FT.SEARCH KNN against MemoryDB (cache store)&lt;/li&gt;
&lt;li&gt;If a result with similarity above the threshold is found (cache hit) → Return the cached FM (Foundation Model) response&lt;/li&gt;
&lt;li&gt;If similarity is below the threshold (cache miss) → Call the FM for inference and save the result to cache (HSET + EXPIRE)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxqs3myzn92recbq1r7wj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxqs3myzn92recbq1r7wj.png" alt="Semantic Cache Processing Flow" width="784" height="524"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Index Definition in MemoryDB
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: MemoryDB is a Redis-compatible key-value store and does not have "tables" like RDBs. Data is stored in Hash-type keys, and the search schema is defined as an "index" using the &lt;code&gt;FT.CREATE&lt;/code&gt; command.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this repository, the following &lt;code&gt;FT.CREATE&lt;/code&gt; command creates the index for semantic cache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;FT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;semantic_cache_idx&lt;/span&gt;
  &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;HASH&lt;/span&gt;
  &lt;span class="k"&gt;PREFIX&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="k"&gt;SCHEMA&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt;    &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="n"&gt;HNSW&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
                   &lt;span class="k"&gt;TYPE&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;
                   &lt;span class="n"&gt;DIM&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;
                   &lt;span class="n"&gt;DISTANCE_METRIC&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
                   &lt;span class="n"&gt;M&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;
                   &lt;span class="n"&gt;EF_CONSTRUCTION&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;
    &lt;span class="n"&gt;query_text&lt;/span&gt;   &lt;span class="n"&gt;TAG&lt;/span&gt;
    &lt;span class="k"&gt;result&lt;/span&gt;       &lt;span class="nb"&gt;TEXT&lt;/span&gt;
    &lt;span class="n"&gt;created_at&lt;/span&gt;   &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;
    &lt;span class="n"&gt;ttl&lt;/span&gt;          &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;embedding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VECTOR (HNSW)&lt;/td&gt;
&lt;td&gt;Query embedding vector (1024 dimensions). Target for KNN search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;query_text&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;TAG&lt;/td&gt;
&lt;td&gt;Original query text. For exact match filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;result&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;TEXT&lt;/td&gt;
&lt;td&gt;FM response result (cached answer)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;created_at&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NUMERIC&lt;/td&gt;
&lt;td&gt;Cache entry creation time (UNIX timestamp)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ttl&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NUMERIC&lt;/td&gt;
&lt;td&gt;Cache expiration time (seconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PREFIX 1 cache:&lt;/code&gt; means only Hashes whose key name starts with &lt;code&gt;cache:&lt;/code&gt; are indexed&lt;/li&gt;
&lt;li&gt;HNSW parameter &lt;code&gt;EF_CONSTRUCTION=512&lt;/code&gt; is set higher than Aurora pgvector (64). Since MemoryDB operates in-memory, build cost is relatively low, so accuracy is prioritized&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Semantic Cache Threshold
&lt;/h3&gt;

&lt;p&gt;The threshold for semantic cache is the cosine similarity value used to determine cache hits.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Threshold&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;th&gt;Recommended Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0.95~1.0&lt;/td&gt;
&lt;td&gt;Only nearly identical queries hit&lt;/td&gt;
&lt;td&gt;Accuracy-focused. When you want to minimize the risk of returning incorrect cached responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.80~0.90&lt;/td&gt;
&lt;td&gt;Synonymous phrasing variations also hit&lt;/td&gt;
&lt;td&gt;Practical balance. Recommended for most use cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;0.70~0.80&lt;/td&gt;
&lt;td&gt;Related queries also broadly hit&lt;/td&gt;
&lt;td&gt;Hit rate-focused. However, the risk of returning unrelated results increases&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The appropriate threshold depends on business requirements, so I think it's safe to start with a high threshold around 0.95 and gradually lower it while monitoring cache hit rates.&lt;/p&gt;

&lt;h3&gt;
  
  
  HSET + EXPIRE
&lt;/h3&gt;

&lt;p&gt;These are not keywords specific to vector databases or semantic cache — they are Redis commands, which is the engine underlying MemoryDB.&lt;/p&gt;

&lt;h4&gt;
  
  
  HSET
&lt;/h4&gt;

&lt;p&gt;A command that saves field-value pairs together in a Hash-type key.&lt;br&gt;
Multiple fields like &lt;code&gt;embedding&lt;/code&gt;, &lt;code&gt;query_text&lt;/code&gt;, &lt;code&gt;result&lt;/code&gt;, and &lt;code&gt;created_at&lt;/code&gt; can be stored as a single entry.&lt;/p&gt;

&lt;p&gt;In Redis / MemoryDB, it's conventional to use colon-separated naming like &lt;code&gt;cache:abc123&lt;/code&gt; for key names.&lt;br&gt;
This simply means "entry abc123 in the cache category" — the colon itself has no special function.&lt;br&gt;
The &lt;code&gt;PREFIX 1 cache:&lt;/code&gt; in the index definition is a setting to make only keys starting with this prefix subject to search.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://redis.io/docs/latest/commands/hset/" rel="noopener noreferrer"&gt;Redis HSET Command&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  EXPIRE
&lt;/h4&gt;

&lt;p&gt;A command that sets an expiration time (TTL) on a key. After the specified number of seconds, the key is automatically deleted. This prevents stale cache entries from accumulating.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://redis.io/docs/latest/commands/expire/" rel="noopener noreferrer"&gt;Redis EXPIRE Command&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation Code
&lt;/h3&gt;

&lt;p&gt;The implementation code got a bit long, but what it does is the same as typical cache-based data retrieval: use the cache if available, otherwise search and save the result to cache.&lt;br&gt;
I'll introduce the implementation code in three stages.&lt;/p&gt;

&lt;h4&gt;
  
  
  Query Vectorization and Cache Lookup
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# "What is AWS S3?"
&lt;/span&gt;
    &lt;span class="c1"&gt;# ① Vectorize the query (Bedrock Titan V2)
&lt;/span&gt;    &lt;span class="n"&gt;embedding_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_embedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;query_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;

    &lt;span class="c1"&gt;# ② Cache lookup via MemoryDB → FM call
&lt;/span&gt;    &lt;span class="n"&gt;cache_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;process_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;query_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# Environment variable SIMILARITY_THRESHOLD
&lt;/span&gt;        &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Environment variable CACHE_TTL
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ③ Return response (with metrics)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;statusCode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Cache Lookup Processing
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# ① Query MemoryDB cache (FT.SEARCH KNN)
&lt;/span&gt;    &lt;span class="n"&gt;search_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_similar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# ② Cache hit → Return result from cache (no FM call)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CacheResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                               &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# ③ Cache miss → Query FM directly and get result
&lt;/span&gt;    &lt;span class="n"&gt;fm_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_invoke_fm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ④ Save result to cache (HSET + EXPIRE)
&lt;/span&gt;    &lt;span class="nf"&gt;_store_cache_entry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                       &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fm_result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ttl_seconds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;CacheResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fm_result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  MemoryDB Cache Query
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_similar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute KNN vector search with FT.SEARCH.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;query_vec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;struct&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;query_embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*=&amp;gt;[KNN &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; @embedding $query_vec AS score]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;return_fields&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;asc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;paging&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dialect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 3-second timeout
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ft&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_cache_idx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;query_vec&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Convert cosine distance to similarity (distance = 1 - similarity)
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MemoryDB's FT.SEARCH command is compatible with Redis's RediSearch module and natively supports KNN vector search.&lt;br&gt;
&lt;code&gt;score&lt;/code&gt; is returned as cosine distance (&lt;code&gt;1 - cosine similarity&lt;/code&gt;, theoretically in the range 0~2). &lt;code&gt;1.0 - score&lt;/code&gt; converts it to cosine similarity.&lt;br&gt;
With Titan V2's &lt;code&gt;normalize=True&lt;/code&gt;, output vectors are already normalized, so actual scores fall in the range 0~1, meaning the converted similarity also stays in the 0~1 range.&lt;/p&gt;

&lt;h3&gt;
  
  
  Semantic Cache Performance
&lt;/h3&gt;

&lt;p&gt;Here are the measured results under the following conditions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FM (Foundation Model)&lt;/td&gt;
&lt;td&gt;Claude 3 Haiku (&lt;code&gt;anthropic.claude-3-haiku-20240307-v1:0&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding Model&lt;/td&gt;
&lt;td&gt;Titan Embeddings V2 (1024 dimensions)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache Store&lt;/td&gt;
&lt;td&gt;Amazon MemoryDB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Similarity Threshold&lt;/td&gt;
&lt;td&gt;0.95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test Query&lt;/td&gt;
&lt;td&gt;"What is AWS S3?" (same query run twice)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The threshold is set high at 0.95.&lt;br&gt;
Please treat these measurement results as reference values to demonstrate that semantic cache has a certain level of effectiveness.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Cache Miss (1st run)&lt;/th&gt;
&lt;th&gt;Cache Hit (2nd run)&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total Response Time&lt;/td&gt;
&lt;td&gt;4,573ms&lt;/td&gt;
&lt;td&gt;279ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;94%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding Generation&lt;/td&gt;
&lt;td&gt;194ms&lt;/td&gt;
&lt;td&gt;192ms&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache Lookup&lt;/td&gt;
&lt;td&gt;4ms&lt;/td&gt;
&lt;td&gt;3ms&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FM Call&lt;/td&gt;
&lt;td&gt;4,375ms&lt;/td&gt;
&lt;td&gt;0ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;When there's a cache hit, the FM call is completely skipped, reducing response time by 94%.&lt;br&gt;
Since only embedding generation (~190ms) and cache lookup (~3ms) are needed to complete the response, user experience is dramatically improved.&lt;br&gt;
Skipping the FM call also directly translates to reduced API usage costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  RAG + Semantic Cache Processing Flow
&lt;/h3&gt;

&lt;p&gt;Semantic cache can be integrated into a RAG system.&lt;br&gt;
In that case, the processing flow would look like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Vectorize the query&lt;/li&gt;
&lt;li&gt;Search for similar queries in MemoryDB (cache)&lt;/li&gt;
&lt;li&gt;Cache hit → Immediately return the cached response&lt;/li&gt;
&lt;li&gt;Cache miss → Search Aurora pgvector (vector DB) for RAG context&lt;/li&gt;
&lt;li&gt;Call the FM with the retrieved context for inference&lt;/li&gt;
&lt;li&gt;Save the FM response to cache and return it to the user&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  On Cache Hit
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5mnx1sd7ta66yaj07v3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5mnx1sd7ta66yaj07v3.png" alt="RAG + Cache Hit Flow" width="784" height="241"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  On Cache Miss
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ccx6v6vbh0j1ejx44hv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ccx6v6vbh0j1ejx44hv.png" alt="RAG + Cache Miss Flow" width="784" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this article, I covered everything from the basic concepts of vector databases to implementation on AWS and optimization with semantic cache.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector database basics&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;A vector database is a database that "searches by meaning." It handles spelling variations and synonymous expressions that traditional keyword search cannot catch&lt;/li&gt;
&lt;li&gt;Both data and search queries are vectorized using the same embedding model, and "semantic closeness" is calculated using cosine similarity&lt;/li&gt;
&lt;li&gt;ANN + HNSW is the standard for vector databases&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Building vector databases on AWS&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;AWS offers multiple options: Aurora PostgreSQL + pgvector, OpenSearch, S3 Vectors, MemoryDB, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aurora PostgreSQL + pgvector&lt;/strong&gt;, which can be operated with SQL and leverages existing skills, is recommended as the first step&lt;/li&gt;
&lt;li&gt;For bulk data ingestion, the "drop index → insert data → bulk create index" pattern is the go-to approach&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Semantic cache&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Can be used to cache queries that are semantically similar&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;That's all for this time.&lt;br&gt;
Thank you for reading this lengthy article!&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/bedrock/knowledge-bases/" rel="noopener noreferrer"&gt;AWS Bedrock Knowledge Bases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.VectorDB.html" rel="noopener noreferrer"&gt;Using the pgvector Extension with Amazon Aurora PostgreSQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.serverworks.co.jp/database-bulk-insert-index-strategy" rel="noopener noreferrer"&gt;Index Considerations When Bulk-Inserting Large Amounts of Data into a Database (Japanese)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.serverworks.co.jp/aws-vector-database-benchmark-100k" rel="noopener noreferrer"&gt;Measuring Data Ingestion and Search Processing Time for 100k Records Across 3 AWS Vector Databases (Japanese)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/memorydb/latest/devguide/vector-search-examples.html" rel="noopener noreferrer"&gt;Amazon MemoryDB - Vector Search Examples (Durable Semantic Cache)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html" rel="noopener noreferrer"&gt;Amazon Titan Text Embeddings V2 - Bedrock Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/embeddings" rel="noopener noreferrer"&gt;OpenAI Embeddings Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://redis.io/docs/latest/commands/" rel="noopener noreferrer"&gt;Redis Commands&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://amzn.asia/d/0gG4ViAh" rel="noopener noreferrer"&gt;Practical Introduction to Vector Search (Book)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>database</category>
      <category>vectordatabase</category>
    </item>
  </channel>
</rss>
