<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anderson Londoño</title>
    <description>The latest articles on DEV Community by Anderson Londoño (@londoso).</description>
    <link>https://dev.to/londoso</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1005623%2Fb85ff040-3c01-497b-ad07-f16908691f33.jpeg</url>
      <title>DEV Community: Anderson Londoño</title>
      <link>https://dev.to/londoso</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/londoso"/>
    <language>en</language>
    <item>
      <title>Streaming de datos Serverless con Aurora DSQL</title>
      <dc:creator>Anderson Londoño</dc:creator>
      <pubDate>Mon, 24 Mar 2025 22:24:37 +0000</pubDate>
      <link>https://dev.to/londoso/streaming-de-datos-serverless-con-aurora-dsql-2hmd</link>
      <guid>https://dev.to/londoso/streaming-de-datos-serverless-con-aurora-dsql-2hmd</guid>
      <description>&lt;p&gt;La gestión y procesamiento de datos en tiempo real es una necesidad creciente en la actualidad. En este artículo, exploraremos cómo construir una solución serverless para procesar archivos CSV y cargarlos automáticamente en Aurora DSQL utilizando servicios de AWS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introducción&lt;/strong&gt;&lt;br&gt;
El streaming de datos permite procesar información en tiempo real, tan pronto como se genera. En este caso, implementaremos un sistema que detecta cuando se carga un archivo CSV en un bucket de S3 y automáticamente procesa y carga los datos en una base de datos Aurora DSQL.&lt;/p&gt;

&lt;p&gt;Esta solución combina varios servicios de AWS:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Amazon S3&lt;/em&gt;&lt;/strong&gt; para almacenamiento de archivos.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;AWS Lambda&lt;/em&gt;&lt;/strong&gt; para procesamiento serverless.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Amazon EventBridge&lt;/em&gt;&lt;/strong&gt; para eventos y comunicación entre servicios.&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Amazon Aurora DSQL&lt;/em&gt;&lt;/strong&gt; para el almacenamiento de datos.&lt;br&gt;
AWS SAM para la implementación de infraestructura como código&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Arquitectura de la solución&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1onpwbfdvqye3kknrzdk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1onpwbfdvqye3kknrzdk.png" alt="Diagrama de Arquitectura" width="692" height="361"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;La arquitectura se compone de:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Un bucket S3 donde se cargan los archivos CSV.&lt;/li&gt;
&lt;li&gt;EventBridge que detecta la carga de nuevos archivos.&lt;/li&gt;
&lt;li&gt;Una función Lambda que procesa los archivos y valida los datos.&lt;/li&gt;
&lt;li&gt;Aurora DSQL donde se almacenan los datos procesados.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementación con AWS SAM&lt;/strong&gt;&lt;br&gt;
AWS SAM (Serverless Application Model) nos permite definir toda nuestra infraestructura como código. Veamos cómo se estructura nuestro template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;AWSTemplateFormatVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2010-09-09'&lt;/span&gt;
&lt;span class="na"&gt;Transform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless-2016-10-31&lt;/span&gt;
&lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="s"&gt;streaming-dsql&lt;/span&gt;

  &lt;span class="s"&gt;Sistema serverless para streaming de datos a Aurora DSQL&lt;/span&gt;

&lt;span class="na"&gt;Globals&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;Function&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
    &lt;span class="na"&gt;MemorySize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;128&lt;/span&gt;
    &lt;span class="na"&gt;Runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python3.12&lt;/span&gt;
    &lt;span class="na"&gt;Layers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s"&gt;arn:aws:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV3-python312-x86_64:7&lt;/span&gt;

&lt;span class="na"&gt;Parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ClusterId&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Aurora DSQL Cluster Id&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;String&lt;/span&gt;
  &lt;span class="na"&gt;BucketName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;String&lt;/span&gt;
    &lt;span class="na"&gt;Description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Nombre del bucket de S3&lt;/span&gt;

&lt;span class="na"&gt;Resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;StreamingFunction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Serverless::Function&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;CodeUri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;lambdas/streaming_dsql/&lt;/span&gt;
      &lt;span class="na"&gt;Handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app.lambda_handler&lt;/span&gt;
      &lt;span class="na"&gt;Architectures&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;x86_64&lt;/span&gt;
      &lt;span class="na"&gt;Policies&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2012-10-17'&lt;/span&gt;
          &lt;span class="na"&gt;Statement&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Sid&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DsqlDataAccess&lt;/span&gt;
              &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
              &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;dsql:DbConnectAdmin&lt;/span&gt;
              &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s"&gt;arn:aws:dsql:${AWS::Region}:${AWS::AccountId}:cluster/${ClusterId}&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Sid&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;S3GetObject&lt;/span&gt;
              &lt;span class="na"&gt;Effect&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Allow&lt;/span&gt;
              &lt;span class="na"&gt;Action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;s3:GetObject&lt;/span&gt;
              &lt;span class="na"&gt;Resource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s"&gt;arn:aws:s3:::${BucketName}/*&lt;/span&gt;
      &lt;span class="na"&gt;Events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;S3EventBridgeRule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;EventBridgeRule&lt;/span&gt;
          &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;Pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;aws.s3&lt;/span&gt;
              &lt;span class="na"&gt;detail-type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Object&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Created"&lt;/span&gt;
              &lt;span class="na"&gt;detail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;bucket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;BucketName&lt;/span&gt;
      &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;POWERTOOLS_SERVICE_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;StreamingDsql&lt;/span&gt;
          &lt;span class="na"&gt;POWERTOOLS_METRICS_NAMESPACE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Powertools&lt;/span&gt;
          &lt;span class="na"&gt;LOG_LEVEL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;INFO&lt;/span&gt;
          &lt;span class="na"&gt;REGION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;AWS::Region&lt;/span&gt;
          &lt;span class="na"&gt;DSQL_CLUSTER_ENDPOINT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Sub&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${ClusterId}.dsql.${AWS::Region}.on.aws"&lt;/span&gt;
          &lt;span class="na"&gt;DATA_BUCKET&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;BucketName&lt;/span&gt;

  &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::S3::Bucket&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;BucketName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;BucketName&lt;/span&gt;
      &lt;span class="na"&gt;NotificationConfiguration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;EventBridgeConfiguration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;EventBridgeEnabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Este template define la función Lambda con permisos para acceder a S3 y Aurora DSQL, así como el bucket S3 con notificaciones EventBridge habilitadas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Procesamiento de datos con AWS Lambda y Powertools&lt;/strong&gt;&lt;br&gt;
Para la implementación de la función Lambda utilizamos AWS Lambda Powertools, una biblioteca que facilita la implementación de buenas prácticas como la validación de datos, métricas y trazas.&lt;/p&gt;

&lt;p&gt;‼️ El repo completo encuentran en &lt;a href="https://github.com/londoso/streaming-dsql" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Validación de datos con JSON Schema&lt;/strong&gt;&lt;br&gt;
Una parte crucial del proceso es la validación de datos. Para esto, utilizamos JSON Schema a través de las utilidades de validación de AWS Lambda Powertools.&lt;/p&gt;

&lt;p&gt;ℹ️ &lt;a href="https://docs.powertools.aws.dev/lambda/python/latest/utilities/validation/" rel="noopener noreferrer"&gt;Powertools for AWS Lambda: Validation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Esta validación garantiza que los datos cumplan con nuestros requisitos antes de insertarlos en la base de datos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Despliegue de la solución&lt;/strong&gt;&lt;br&gt;
Para desplegar nuestra solución, utilizamos los comandos de AWS SAM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Empaquetar la aplicación&lt;/span&gt;
sam build

&lt;span class="c"&gt;# Desplegar la aplicación&lt;/span&gt;
sam deploy &lt;span class="nt"&gt;--guided&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ℹ️ Para instalar AWS SAM CLI, sigue los pasos en la &lt;a href="https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html" rel="noopener noreferrer"&gt;guía oficial de instalación&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Durante el despliegue guiado, se nos pedirá que proporcionemos los valores para nuestros parámetros, como el ID del clúster de Aurora DSQL y el nombre del bucket de S3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ventajas de esta arquitectura&lt;/strong&gt;&lt;br&gt;
Esta solución serverless ofrece varias ventajas:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Escalabilidad automática:&lt;/strong&gt; AWS Lambda escala automáticamente según la carga de trabajo.&lt;br&gt;
&lt;strong&gt;Sin servidores que administrar:&lt;/strong&gt; No hay infraestructura que gestionar.&lt;br&gt;
&lt;strong&gt;Procesamiento en tiempo real:&lt;/strong&gt; Los datos se procesan tan pronto como se cargan.&lt;br&gt;
&lt;strong&gt;Alta disponibilidad:&lt;/strong&gt; Los servicios de AWS son altamente disponibles.&lt;br&gt;
&lt;strong&gt;Costo-eficiente:&lt;/strong&gt; Solo pagas por lo que usas.&lt;br&gt;
&lt;strong&gt;Validación robusta:&lt;/strong&gt; La validación de datos garantiza la calidad de la información.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consideraciones de seguridad&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Se utilizan permisos de IAM restringidos para la función Lambda.&lt;/li&gt;
&lt;li&gt;La conexión a Aurora DSQL se realiza mediante tokens temporales.&lt;/li&gt;
&lt;li&gt;La comunicación con Aurora DSQL se realiza mediante SSL.&lt;/li&gt;
&lt;li&gt;Los datos se validan antes de ser insertados en la base de datos.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusión&lt;/strong&gt;&lt;br&gt;
La combinación de servicios Serverless de AWS nos permite construir soluciones robustas para el streaming de datos. Con AWS SAM, podemos definir toda nuestra infraestructura como código, lo que facilita la implementación y el mantenimiento.&lt;/p&gt;

&lt;p&gt;Aurora DSQL nos proporciona una base de datos compatible con PostgreSQL con la ventaja de ser Serverless, lo que nos permite escalar según nuestras necesidades sin tener que preocuparnos por la infraestructura desplegada.&lt;/p&gt;

&lt;p&gt;Con esta arquitectura, podemos procesar grandes volúmenes de datos en tiempo real, validarlos y almacenarlos de manera eficiente, todo con un mínimo esfuerzo de administración.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>architecture</category>
      <category>serverless</category>
      <category>spanish</category>
    </item>
    <item>
      <title>Vector Database solutions on AWS</title>
      <dc:creator>Anderson Londoño</dc:creator>
      <pubDate>Thu, 28 Mar 2024 02:14:49 +0000</pubDate>
      <link>https://dev.to/aws-builders/vector-database-solutions-on-aws-46f7</link>
      <guid>https://dev.to/aws-builders/vector-database-solutions-on-aws-46f7</guid>
      <description>&lt;p&gt;When talking about Vector Databases, in the market we can find the specialized ones and multi-model, most of the major database providers like &lt;a href="https://www.oracle.com/news/announcement/ocw-integrated-vector-database-augments-generative-ai-2023-09-19/"&gt;Oracle&lt;/a&gt;, &lt;a href="https://github.com/pgvector/pgvector/"&gt;PostgreSQL&lt;/a&gt; or &lt;a href="https://www.mongodb.com/products/platform/atlas-vector-search"&gt;MongoDB&lt;/a&gt;, for mention some of them, have integrated a specific solution to retrieve vector data.  &lt;/p&gt;

&lt;p&gt;The key concept is &lt;a href="https://arxiv.org/abs/2005.11401"&gt;Retrieval Augmented Generation&lt;/a&gt; (RAG) and combined with Large Language Models (LLM), we can use models with data that is always changing. It is a use case to deploy a model trained with static data, for example the history of a store sales, but when the data is constantly changing you can give an external knowledge database to improve the response of the LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://db-engines.com/en/ranking/vector+dbms/all"&gt;DB-Engines&lt;/a&gt; offers a complete index for databases &lt;a href="https://db-engines.com/en/ranking_definition"&gt;based on&lt;/a&gt; their current popularity, mentions in social networks, frequency of search in Google Trends, frequency of discussion in technical foros and number of job offers, in which is mentioned. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why a Vector Database is needed ?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The way the data is used for LLM's is in a vector representation. That means the prompt requested by the user goes to the vector database and search for the document with best similarity and return the results ranks to the LLM and personalize the prompt response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which service can be used on AWS ?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MongoDB from the &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-pp445qepfdy34?trk=40d3b1cd-4a6c-4cfe-a334-cd0f4ab9808e&amp;amp;sc_channel=el"&gt;marketplace&lt;/a&gt; or directly from &lt;a href="https://www.mongodb.com/docs/atlas/reference/amazon-aws/"&gt;Atlas Portal&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;RDS PostgreSQL with the extension &lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/10/amazon-rds-postgresql-pgvector-hnsw-indexing/"&gt;pgvector&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Open Search with the new feature &lt;a href="https://aws.amazon.com/opensearch-service/serverless-vector-engine/"&gt;Open Search Serverless&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I am going to focus on pgvector and Open Search Services, this services are infrastructureless oriented, we don't need to care about a lot in administrative tasks with the infrastructure.&lt;/p&gt;

&lt;p&gt;Check this blog post &lt;a href="https://aws.amazon.com/blogs/database/building-ai-powered-search-in-postgresql-using-amazon-sagemaker-and-pgvector/"&gt;Building AI-powered search in PostgreSQL using Amazon SageMaker and pgvector&lt;/a&gt;, if you want to use the extension inside RDS for PostgreSQL.&lt;/p&gt;

&lt;p&gt;And if you are looking to go Serverless, check this blog post &lt;a href="https://aws.amazon.com/blogs/aws/vector-engine-for-amazon-opensearch-serverless-is-now-generally-available/"&gt;Vector engine for Amazon OpenSearch Serverless is now available&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>vectordatabase</category>
      <category>ia</category>
      <category>database</category>
    </item>
  </channel>
</rss>
