DEV Community: Zied Ben Tahar

Serverless RAG Chat with AppSync Events and Bedrock Knowledge Bases

Zied Ben Tahar — Fri, 09 May 2025 14:44:23 +0000

When it comes to building serverless WebSocket APIs on AWS, there’s no shortage of options: API Gateway, IoT Core, AppSync GraphQL subscriptions, and now AppSync Events. Each option comes with its own level of control and complexity. I’ve found that AppSync Events to be simplest to work with.

One of the interesting features of AppSync Events is its data sources capability. It lets you directly integrate to resources like DynamoDB, OpenSearch, Bedrock and Lambda. You can interact with these data sources using AppSyncJS (appsync’s own flavor of javascript). But to be totally fair, I lean toward direct lambda integration as it gives more control and makes the development and testing workflow more familiar, standard and manageable.

Currently, Bedrock data source supports only the InvokeModel and Converse APIs. So, if you want to integrate with Knowledge Bases, the viable approach is to create a custom data source using Lambda.

And that’s exactly what this blog post is about, we’ll walk through how to build this RAG-based chat application with AppSync Events and bedrock knowledge based using nodejs, TypeScript and Terraform.

Solution overview

Let’s take a look at how the whole setup fits together:

Architecture overview

The Knowledge Base is configured to use PostgreSQL as its vector store where we store the embeddings as well as the associated metadata of the documents we want to index. Using Postgres gives us control over the schema, the indexing strategy, and embedding format, all of which come in handy when fine-tuning a vector-based RAG setup.

We’ve got the handleAppSyncEvents function directly integrated as a data source for the AppSync Events API. Its role is to process incoming events from AppSync and to invoke the retrieveAndGenerate from the Knowledge Base. This function is configured to be asynchronous (with the EVENT invocation type), which means AppSync doesn't wait for the function to complete before returning a response to the client. Once we receive a result from bedrock this function publishes a response back to the client’s response channel.

AppSync Events supports multiple authorization methods to secure Event APIs, including API keys, Lambda authorizers, IAM, OpenID Connect, and Amazon Cognito user pools. In this setup, I’m using both Cognito user pools and IAM:

Web clients use cognito for authentication
And I chose IAM over using API key for publishing events from the handleAppSyncEvents function to AppSync, as it offers better security posture.

One thing I appreciate in this setup: AppSync Events supports Web ACLs. That means you can easily layer in protections like rate limiting and IP filtering. It’s a nice edge over API Gateway WebSockets, which still doesn’t offer native WAF support.

And tying it all together, the browser connects via WebSocket to AppSync, giving us a real-time, bidirectional channel, ideal for sending the models responses back to users in conversational interfaces.

Let’s dive into the details of the solution; but if you’d like to jump straight to the complete implementation, you can find it here 👇

https://github.com/ziedbentahar/rag-chat-with-appsync-events-and-bedrock-knowledge-bases

Setting up the knowledge base

Let’s first take a look at how we can use Aurora PostgreSQL as a vector store.

Creating the vector store on Postgres

When using PostgreSQL as a vector store, Knowledge Base requires an Aurora Serverless cluster with the Data API enabled. The database must include a vector table with specific columns:

An embedding column to store the vector representation of the content,
A chunk column for the actual text tied to each embedding,
And a metadata column that holds references, which are useful for pointing back to the original source during retrieval.

The Knowledge Base keeps this table up to date automatically whenever content is synced from the source bucket.

Since I like to keep everything automated, I trigger the db init script right after the database cluster is created. This script sets up everything we need: a role, schema, table, and indexes; all in one go, wrapped in a single transaction. It’s run by a function once the cluster is deployed:

    export const handler = async (_: unknown): Promise<void> => {
        const bedrockKnowledgeBaseCreds = {
            username: "bedrock_user",
            password: generatePostgresPassword(),
        };

        let schema = "knowledge_base";
        let vectorTable = "bedrock_kb";

        const queries = [
            `CREATE EXTENSION IF NOT EXISTS vector`,
            `CREATE SCHEMA IF NOT EXISTS ${schema}`,
            `CREATE ROLE ${bedrockKnowledgeBaseCreds.username} WITH PASSWORD '${bedrockKnowledgeBaseCreds.password}' LOGIN`,
            `GRANT ALL ON SCHEMA ${schema} to ${bedrockKnowledgeBaseCreds.username}`,
            `CREATE TABLE IF NOT EXISTS ${schema}.${vectorTable} (id uuid PRIMARY KEY, embedding vector(1024), chunks text, metadata json, custom_metadata jsonb)`,
            `CREATE INDEX IF NOT EXISTS bedrock_kb_embedding_idx ON ${schema}.${vectorTable} USING hnsw (embedding vector_cosine_ops) WITH (ef_construction=256)`,
            `CREATE INDEX IF NOT EXISTS bedrock_kb_chunks_fts_idx ON ${schema}.${vectorTable} USING gin (to_tsvector('simple', chunks))`,
            `CREATE INDEX IF NOT EXISTS bedrock_kb_custom_metadata_idx ON ${schema}.${vectorTable} USING gin (custom_metadata)`,
            `GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA ${schema} TO ${bedrockKnowledgeBaseCreds.username}`

        ];

        await executeTransaction(databaseArn, databaseSecretArn, databaseName, queries, bedrockKnowledgeBaseCreds);
    };

executeTransaction leverages Aurora’s data api in order to execute these statements. In this function, I also set the secret containing a dedicated database user and password, which will later be used when setting up the data source for the Knowledge Base.

Creating The knowledge base

Quite straightforward in terraform:

    resource "aws_bedrockagent_knowledge_base" "this" {

      name     = "${var.application}-${var.environment}-kb"
      role_arn = aws_iam_role.kb_role.arn

      knowledge_base_configuration {
        vector_knowledge_base_configuration {
          embedding_model_arn = local.embedding_model_arn
        }
        type = "VECTOR"
      }

      storage_configuration {
        type = "RDS"
        rds_configuration {
          credentials_secret_arn = aws_secretsmanager_secret.kb_creds.arn
          database_name          = aws_rds_cluster.rds_cluster.database_name
          resource_arn           = aws_rds_cluster.rds_cluster.arn
          table_name             = "${local.db_schema}.${local.vector_table}"
          field_mapping {
            primary_key_field = "id"
            metadata_field    = "metadata"
            text_field        = "chunks"
            vector_field      = "embedding"
          }
        }
      }

      depends_on = [
        aws_rds_cluster_instance.rds_cluster_instance,
        aws_lambda_invocation.seed_db_function,
        aws_secretsmanager_secret.kb_creds
      ]
    }

    resource "aws_bedrockagent_data_source" "this" {
      knowledge_base_id = aws_bedrockagent_knowledge_base.this.id
      name              = "kb_datasource"

      vector_ingestion_configuration {
        chunking_configuration {
          chunking_strategy = "FIXED_SIZE"
          fixed_size_chunking_configuration {
            max_tokens         = 300
            overlap_percentage = 20
          }
        }
      }

      data_source_configuration {

        type = "S3"
        s3_configuration {

          bucket_arn         = aws_s3_bucket.kb_bucket.arn
          inclusion_prefixes = ["${local.kb_folder}"]
        }
      }
    }

When configuring RDS as a data source there are a few key parameters you’ll need to provide:

The vector table and the field mappings that define which columns in the table should be used by the Knowledge Base.
The IAM role associated with the Knowledge Base must have the following permissions on the RDS cluster: rds-data:ExecuteStatement, rds-data:BatchExecuteStatement and rds:DescribeDbClusters
The Secret arn that holds the database credentials that we created in the previous step.

Once the knowledge base is deployed, here’s what it looks like in the console:

Knowledge base overview

We can already start testing it right from the console. In this example, I synced the knowledge base with a dataset containing texts about the Roman Empire.

Testing the knowledge base

Alights, let’s see how to setup the AppSync Events integration

Setting up AppSync integration

As mentioned earlier, I’ll be using Cognito User Pools as the default auth mode as well as IAM:

AppSync will handle validating both subscribe and publish requests from clients, as long as they provide a valid Cognito token.
IAM auth will be used handleAppSyncEvent function as needs to publish responses back to the clients

    resource "awscc_appsync_api" "this" {
      name          = "${var.application}-${var.environment}-events-api"
      owner_contact = "${var.application}-${var.environment}"
      event_config = {
        auth_providers = [
          {
            auth_type = "AMAZON_COGNITO_USER_POOLS"
            cognito_config = {
              aws_region   = data.aws_region.current.name,
              user_pool_id = var.user_pool_id,
            }
          },
          {
            auth_type = "AWS_IAM"
          }
        ]
        connection_auth_modes = [
          {
            auth_type = "AMAZON_COGNITO_USER_POOLS"
          }
        ]
        default_publish_auth_modes = [
          {
            auth_type = "AMAZON_COGNITO_USER_POOLS"
          }
        ]
        default_subscribe_auth_modes = [
          {
            auth_type = "AMAZON_COGNITO_USER_POOLS"
          }
        ]
      }
    }

Next up, I’ll need to create a data source and associated to handleAppSyncEvents function. Unfortunately, this part isn’t supported in the terraform provider yet, so for now, I’m using the SDK to create these resources in a function hat runs once right after the AppSync Events API resource is created:

    export const handler = async (event: {
        tf: { action: string };
        apiId: string;
        dataSourceName: string;
        lambdaFunctionArn: string;
        serviceRoleArn: string;
        channelName: string;
    }) => {
        if (
            event.apiId == null ||
            event.dataSourceName == null ||
            event.lambdaFunctionArn == null ||
            event.serviceRoleArn == null ||
            event.channelName == null
        ) {
            throw new Error("SourceArn, TargetArn, RoleArn and channel name are required");
        }

        if (event.tf.action === "create") {
            const client = new AppSyncClient({ region: process.env.AWS_REGION });

            const createDataSourceCommand = new CreateDataSourceCommand({
                apiId: event.apiId,
                name: event.dataSourceName,
                type: "AWS_LAMBDA",
                serviceRoleArn: event.serviceRoleArn,
                lambdaConfig: {
                    lambdaFunctionArn: event.lambdaFunctionArn,
                },
            });

            await client.send(createDataSourceCommand);

            const createChannelCommand = new CreateChannelNamespaceCommand({
                apiId: event.apiId,
                name: event.channelName,
                subscribeAuthModes: [
                    {
                        authType: "AMAZON_COGNITO_USER_POOLS",
                    },
                ],
                publishAuthModes: [
                    {
                        authType: "AMAZON_COGNITO_USER_POOLS",
                    },
                    {
                        authType: "AWS_IAM",
                    },
                ],
                handlerConfigs: {
                    onPublish: {
                        behavior: "DIRECT",
                        integration: {
                            dataSourceName: event.dataSourceName,
                            lambdaConfig: {
                                invokeType: "EVENT",
                            },
                        },
                    },
                    onSubscribe: {
                        behavior: "DIRECT",
                        integration: {
                            dataSourceName: event.dataSourceName,
                            lambdaConfig: {
                                invokeType: "EVENT",
                            },
                        },
                    },
                },
            });

            await client.send(createChannelCommand);

            return;
        }

        // ... handle resource update and deletion
    };

Note the direct integration of the default channel with the function. The handleAppSyncEvents function is invoked directly in EVENT invocation mode.

Here how it looks in the console once the AppSync resources are created:

Chat channel namespace

With the Lambda function as a data source defined in this screen

Lambda data source

Using Lambda Powertools to handle AppSync realtime events

Now let’s get to the interesting part: Handling events from AppSync and putting the knowledge base to work. Lambda Powertools offers a handy utility that makes it easier to integrate Lambda functions with AppSync events. It allows defining clear, dedicated handler methods for publish/subscribe interactions, so less messy if-else blocks. Routing is handled automatically based on namespaces and channel patterns, keeping the code clean and easy to maintain.

Here’s how it works in practice:

Setting things up and handing subscription

We start by initializing the resolver using the Lambda Powertools Event Handler utility, then define a handler for new subscriptions with app.onSubscribe:

    import { AppSyncEventsResolver, UnauthorizedException } from "@aws-lambda-powertools/event-handler/appsync-events";
    // ...

    const app = new AppSyncEventsResolver();

    app.onSubscribe("/chat/responses/*", (payload) => {
        const identity = payload.identity ? (payload.identity as { sub: string; username: string }) : null;
        const sub = identity?.sub;

        if (!sub || (payload.info.channel.segments.length != 3 && !payload.info.channel.path.endsWith(`/${sub}`))) {
            throw new UnauthorizedException("You cannot subscribe to this channel");
        }
    });

Here, I ensure that users can only subscribe to their own chat responses channel. This check prevents accidental or unauthorized access to other users’ channels. Since we’re using a Cognito User Pool, the subscription payload contains decoded user information, including the sub (user ID) and username.

Validating and handling messages

Let’s now see how we handle messages from users:

    const messageSchema = z.object({
        id: z.number(),
        content: z.string(),
        type: z.enum(["chat"]).optional(),
        sessionId: z.string().optional(),
    });

    app.onPublish("/chat/request/*", async (payload, event) => {
        const identity = event.identity ? (event.identity as { sub: string; username: string }) : null;
        const sub = identity?.sub;

        if (!sub || (event.info.channel.segments.length != 3 && !event.info.channel.path.endsWith(`/${sub}`))) {
            throw new UnauthorizedException("You cannot publish to this channel");
        }

        const message = messageSchema.safeParse(payload);

        if (!message.success) {
            return {
                result: { text: "I don't understand what you mean, your message format seems invalid" },
                error: "Invalid message payload",
            };
        }

        const { content, sessionId } = message.data;

        const result = await bedrockClient.send(
            new RetrieveAndGenerateCommand({
                input: {
                    text: content,
                },
                retrieveAndGenerateConfiguration: {
                    type: "KNOWLEDGE_BASE",
                    knowledgeBaseConfiguration: {
                        knowledgeBaseId: process.env.KB_ID,
                        modelArn: process.env.KB_MODEL_ARN,
                    },
                },
                sessionId,
            })
        );

        const signedRequest = await signRequest(`https://${process.env.EVENTS_API_DNS}/event`, "POST", {
            channel: `/chat/responses/${sub}`,
            events: [
                JSON.stringify({
                    result: result.output,
                    sessionId: result.sessionId,
                }),
            ],
        });

        await fetch(`https://${process.env.EVENTS_API_DNS}/event`, {
            method: signedRequest.method,
            headers: signedRequest.headers,
            body: signedRequest.body,
        });

        return {
            processed: true,
        };
    });

As with the subscribe handler, when a message is published to the /chat/request/{userId} channel app.onPublish gets invoked, I extract the user’s identity from the event and enforce that they can only publish to their own channel. We then validate the payload using zod to ensure it has the expected structure. If validation succeeds we then call bedrock RetrieveAndGenerate endpoint.

As I am using IAM auth to let the function publish the response back to the client via chat/responses/{userId} channel. I need to sign the request with sigv4 and pass the signed request headers when calling AppSync event endpoint. This function needs to have appsync:EventPublish permission on the api channels.

☝️ Some notes:

In this solution we waits for the full response from the model, we’re not using bedrock’s response streaming feature. I could have used the RetrieveAndGenerateStream API and then send chunks by publishing incrementally each chunk to the client. However, depending on the model’s response, this could lead to multiple calls to the Events API potentially increasing costs (since each call counts as a separate operation). One possible solution would be to buffer the response chunks and send them in batches, striking a better balance between responsiveness and cost. Handling lambda response streaming natively is a feature I’d love to see supported by AppSync in the future.
The RetrieveAndGenerate endpoint returns a sessionId that we’ll need to reuse in messages within the same conversation. This sessionId is what allows Amazon Bedrock to maintain context. We simply return that sessionId along with the result, and include it in every subsequent chat message to keep the conversation context-aware.
You can check out the Cognito user pool terraform resource definition at the following link.

You can find the complete function code following this link.

Creating a Client

Here, I’m building a small React client to chat with the Knowledge Base via AppSync. I’m using Amplify because it makes it easy to connect to the AppSync API and handle authentication through the user pool to retrieve an access token.

First, we need to configure Amplify by providing the AppSync Events API endpoint along with the Cognito User Pool and Identity Pool ids:

    Amplify.configure({
      API: {
        Events: {
          "endpoint": "https://<api-id>.appsync-api.eu-west-1.amazonaws.com/event",
          "region": "eu-west-1",
          "defaultAuthMode": "userPool",
        },
      },
      Auth: {
        Cognito: {
          userPoolId: "<user-pool-id>",
          userPoolClientId: "<user-pool-client-id>",
          identityPoolId: "<identity-pool-id>",
        },
      },
    });

To handle real-time chat with AppSync Events, the client is connected to two endpoints:

events.connect('/chat/responses/${userId}') to subscribe to responses
events.post('/chat/request/${userId}') to send user messages

      useEffect(() => {
        const connectToChannel = async () => {
          try {
            if(!userId) {
              return;
            }

            const channel = await events.connect(`/chat/responses/${userId}`);

            const subscription = channel.subscribe({
              next: (data: any) => {
                if(data.type !== "data") return;

                const botMessage: Message = {
                  id: Date.now(),
                  sender: "bot",
                  content: data.event.result.text,

                };
                setMessages((prev) => [...prev, botMessage]);
                setSessionId(data.event.sessionId);
              },
              error: (err: any) => console.error("subscription error", err),
            });

            return () => {
              subscription.unsubscribe?.();
            };
          } catch (error) {
            console.error("connection error", error);
          }
        };

        connectToChannel();
      }, [userId]);

      const sendMessage = async ({userId }:{userId : string}) => {
        if (input.trim() === "") return;
        setInput("");
        const newMessage: Message = {
          id: Date.now(),
          sender: "user",
          content: input,
          sessionId: sessionId,
        };

        setMessages([...messages, newMessage]);

        await events.post(`/chat/request/${userId}`, newMessage);
      };

When the React component mounts, we use useEffect to set up a connection to the /chat/responses/{userId} channel. This subscribes to responses coming from AppSync Events. Once we start getting responses, we add that message to the chat and update the sessionId to maintain context.

To send a message, the sendMessage function posts to the same /chat/request/{userId} endpoint.

The userId in this example is provided by Amplify’s Authenticator component, which handles user authentication and exposes the signed-in user's details.

Which gives this chat interface:

If you look at the browser’s WebSocket connection, you’ll see real-time responses coming through as AppSync Events sends data

Wrapping up

In this post, we walked through building a serverless WebSocket API using AppSync Events, Lambda, and Bedrock Knowledge Bases. We explored how to handle real-time communication securely with Cognito and IAM. Lambda Powertools makes working with AppSync Events an absolute breeze.

As usual you can find the complete repo with the solution ready to be adapted and deployed here 👉 https://github.com/ziedbentahar/rag-chat-with-appsync-events-and-bedrock-knowledge-bases

I will make sure to update the repo once AppSync gets better terraform support 👌

Thanks for making it all the way here !

Resources

Scheduled queries in Amazon Timestream for LiveAnalytics

Zied Ben Tahar — Sun, 16 Mar 2025 17:28:01 +0000

Timestream for LiveAnalytics is a serverless, purpose-built database for processing time-series data. While it efficiently ingests and stores data, querying large raw datasets frequently can be inefficient, especially for use cases like aggregations, trend analysis, or anomaly detection.

Scheduled Queries is a feature of timestream help addressing this by running SQL queries at specified intervals, rolling-up raw data into aggregated results, and storing them in a destination Timestream table. This approach improves performance on target tables, and optimises storage by retaining only the aggregated data.

In this post, we’ll walk through setting up Timestream Scheduled Queries to automate data rollups. We’ll also explore how this setup helps you analyze and detect trends in your data over time.

What are we going to build ?

As a use case, we’ll have a clickstream kinesis topic where producers can push events such as clicks, views, and user actions. These events will be ingested into a Timestream table and made available for downstream consumption. A scheduled query will run hourly to roll up raw event data, which will then be used to detect trending products.

Let’s dive into this:

Time Series store: With EventBridge Pipes supporting Timestream as a direct target, integrating with Kinesis stream source is simple. This eliminates much of the custom glue code needed for ingestion. In this setup, EventBridge Pipes polls the Kinesis stream, converts the received events into records, and writes them directly to the Timestream table.
A scheduled aggregation query runs at predefined intervals to process raw events, performing an hourly rollup and storing the results in a dedicated table. Once it successfully completes, it publishes a notification event that triggers a function to query the table and detect the top N trending products, which are then published to a topic. We can imagine that these events can be used for various purposes: adjusting ad spend or fine-tuning A/B tests. Additionally, the rolled-up data can be fed into dashboards like QuickSight, Tableau, or Grafana for visualizing and monitoring product performance.

☝️Note: While I won't go into detail in this post, it's worth noting that a common way to reduce writes to the raw table is by pre-aggregating data in small batches, such as using a 5-minute tumbling window. This groups events into fixed, non-overlapping intervals before writing, effectively lowering the write frequency. Managed Apache Flink can help achieve this.

Let’s see the code

In this section, I will mainly focus on the integration with EventBridge Pipes as well as the scheduled query configuration. You can find the complete end-to-end solution at the following link 👇

https://github.com/ziedbentahar/scheduled-queries-with-amazom-timestreamdb

Simplifying time-series data ingestion with event bridge pipes

The events ingested into Kinesis are quite basic; they contain only the productId, pageId, eventType, and the event timestamp. Configuring a Timestream table as a target in EventBridge Pipes requires defining the mapping from the source event to the selected measurements and dimensions in the target table. This also involves specifying the time field type and format:

    resource "awscc_pipes_pipe" "kinesis_to_timestream" {
      name     = "${var.application}-${var.environment}-kinesis-to-timestream"
      role_arn = awscc_iam_role.pipe.arn
      source   = var.source_stream.arn
      target   = aws_timestreamwrite_table.events_table.arn

      source_parameters = {
        kinesis_stream_parameters = {
          starting_position      = "TRIM_HORIZON"
          maximum_retry_attempts = 5
        }

        filter_criteria = {
          filters = [{
            pattern = <<EOF
    {
      "data": {
        "eventType": ["pageViewed", "productPageShared", "productInquiryRequested"]
      }
    }
    EOF
          }]
        }
      }

      target_parameters = {
        timestream_parameters = {
          timestamp_format = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
          version_value    = "1"
          time_value       = "$.data.time"
          time_field_type  = "TIMESTAMP_FORMAT"

          single_measure_mappings = [{
              measure_name      = "$.data.eventType"
              measure_value     = "$.data.value"
              measure_value_type = "BIGINT"
          }]

          dimension_mappings = [
            {
              dimension_name       = "id"
              dimension_value      = "$.data.id"
              dimension_value_type = "VARCHAR"
            },
            {
              dimension_name       = "pageId"
              dimension_value      = "$.data.pageId"
              dimension_value_type = "VARCHAR"
            },
            {
              dimension_name       = "productId"
              dimension_value      = "$.data.productId"
              dimension_value_type = "VARCHAR"
            }
          ]
        }
      }
    }

Here, I am filtering and ingesting only the pageViewed, productPageShared, and productInquiryRequested events into the table from the stream.

We can view the details of the deployed pipe in the console:

Kinesis to timestream, filtering events

You can view and edit the configured mapping for the target data model in the console… but when it comes to editing, you should always use IaC! Unless, of course, you enjoy the thrill of undocumented changes 😉

Configuring table data model from the console

Automating the creation of the scheduled queries

Creating the scheduled query requires defining the schedule configuration, the query statement, and the target mapping, ensuring that the results are properly mapped to the data model for insertion into the destination table.You also need to define an SNS topic to publish notifications about the execution status of scheduled queries.

    resource "aws_timestreamquery_scheduled_query" "hourly_rollup" {
      name = "${var.application}-${var.environment}-hourly-rollup"
      schedule_configuration {
        schedule_expression = "rate(1 hour)"
      }

      query_string = templatefile("${path.module}/queries/hourly-rollup.sql.tmpl", {
        table = "\"${aws_timestreamwrite_table.events_table.database_name}\".\"${aws_timestreamwrite_table.events_table.table_name}\""
      })


      target_configuration {
        timestream_configuration {
          database_name = aws_timestreamwrite_database.events_db.database_name
          table_name    = aws_timestreamwrite_table.hourly_rollup.table_name
          time_column   = "time"

          dimension_mapping {
            name                 = "pageId"
            dimension_value_type = "VARCHAR"
          }
          dimension_mapping {
            name                 = "productId"
            dimension_value_type = "VARCHAR"
          }
          measure_name_column = "eventType"
          multi_measure_mappings {
            target_multi_measure_name = "eventType"
            multi_measure_attribute_mapping {
              source_column      = "sum_measure"
              measure_value_type = "BIGINT"
            }
          }
        }
      }

      execution_role_arn = aws_iam_role.scheduled_query_role.arn

      error_report_configuration {
        s3_configuration {
          bucket_name = aws_s3_bucket.error_bucket.id
          object_key_prefix = local.hourly_rollup_error_prefix
        }
      }

      notification_configuration {
        sns_configuration {
          topic_arn = aws_sns_topic.scheduled_query_notification_topic.arn
        }
      }

      depends_on = [
        aws_lambda_invocation.seed_raw_events_table
      ]
    }

Here is the query:

    SELECT
        SUM(measure_value::bigint) as sum_measure,
        measure_name as eventType,
        bin(time, 1h) as time,
        productId,
        pageId
    FROM
        ${table}
    WHERE
     time BETWEEN @scheduled_runtime - (interval '2' hour) AND @scheduled_runtime
    GROUP BY
        measure_name,
        bin(time, 1h),
        productId,
        pageId

To account for late-arriving events, it processes data in a two-hour window, ensuring delays of up to two hours are included in the roll-up.

☝️ Note: As of the time of writing, in order to create a scheduled query, the source table must contain data; otherwise, the creation will fail. To enable fully automated IaC deployment, I will add a Lambda invocation resource that triggers immediately after the source events table is created. This Lambda function will insert dummy events into the table, ensuring that the scheduled query can correctly infer the target schema during its creation:

    resource "aws_lambda_invocation" "seed_raw_events_table" {

      function_name = aws_lambda_function.seed_raw_table.function_name
      input         = jsonencode({})

      depends_on = [
        aws_lambda_function.seed_raw_table,
        aws_timestreamwrite_table.events_table
      ]

      lifecycle_scope = "CREATE_ONLY"

    }

Once deployed, you can head straight to the Timestream schedule queries page . Here’s what it looks like:

Identifying trending products

The “handle hourly roll up” function subscribes to the scheduled query execution events and runs a trend analysis that compares the page views per product of the last hour to the previous hour to identify the top N products that have achieved 2x views. It then publishes an event for these products.

Below is how this function queries the Timestream table using the Timestream Query Client:

    const getTopNProduct = async (params: { eventType: string; topN: number }): Promise<TrendingProducts> => {
        const { eventType, topN } = params;

        const qs = `
    WITH LastHour AS (
        SELECT
            productId,
            sum_measure AS current_views
        FROM "${db}"."${table}"
        WHERE measure_name = '${eventType}'
            AND time > ago(1h)
    ),
    PreviousHour AS (
        SELECT
            productId,
            sum_measure AS previous_views
        FROM "${db}"."${table}"
        WHERE measure_name = '${eventType}'
            AND time > ago(2h)
            AND time <= ago(1h)
    )
    SELECT
        l.productId,
        l.current_views,
        COALESCE(p.previous_views, 0) AS previous_views,
        (l.current_views - COALESCE(p.previous_views, 0)) AS increase_last_hour
    FROM LastHour l
    LEFT JOIN PreviousHour p ON l.productId = p.productId
    WHERE (l.current_views >= COALESCE(p.previous_views, 0) * 2)
    ORDER BY increase_last_hour DESC
    LIMIT ${topN}
    `;

        const queryResult = await timestreamQueryClient.send(
            new QueryCommand({
                QueryString: qs,
            })
        );

        const result = parseQueryResult(queryResult);

        return {
            eventType,
            time: new Date().toISOString(),
            products: result.map((row) => ({
                productId: row.productId,
                count: Number(row.current_views),
                increaseLastHour: Number(row.increase_last_hour),
            })),
        };
    };

The Timestream SDK returns query responses in a raw format. To make them more usable, I drew heavy inspiration from this code example provided in an AWS Labs repository to parse the query result.

For comparison, here’s the same query executed on both the raw events table and the hourly roll-up table for a synthetic dataset of 200,000 ingested events:

Even with this relatively small dataset and its distribution, we can observe a clear difference in both the query duration and the number of bytes scanned between the two tables.

Wrapping up

I hope you find this article useful! We explored how to leverage Timestream for live analytics, focusing on automating data rollups to optimise query performance.

As always, you can find the full code source, ready to be adapted and deployed here:

https://github.com/ziedbentahar/scheduled-queries-with-amazom-timestreamdb

Thanks for reading !

Replicate data from DynamoDB to Apache Iceberg tables using Glue Zero-ETL integration

Zied Ben Tahar — Mon, 30 Dec 2024 14:22:21 +0000

Analyzing data directly from Amazon DynamoDB can be tricky since it doesn’t come with built-in analytics features. One approach is to set up ETL pipelines to move the data into a data lake or a lakehouse. From there, services like Amazon Athena or EMR can take over for analysis and processing. Building and maintaining those ETL pipelines takes time and effort.

AWS Glue Zero-ETL Integration provides an easy way to replicate data from DynamoDB to Apache Iceberg tables in Amazon S3. It’s particularly useful when your DynamoDB table schema isn’t complex. In such cases, it helps reduce operational overhead.

Apache Iceberg is an open-source table format designed for high performance and large-scale analytics. It is increasingly recognized as a standard in data lake architectures providing advanced features such as schema evolution, time travel, ACID transactions, and efficient metadata handling, addressing key challenges in data lakes while offering warehouse-like capabilities.

In this article, I’ll walk you through setting up Glue Zero-ETL Integration using Terraform. Along the way, I’ll share my thoughts on using this service.

TL;DR

You can find the complete code repository at this link 👇

https://github.com/ziedbentahar/glue-zero-etl-dynamodb-to-apache-iceberg-table

Solution overview

I’ll use a hypothetical Orders table to demonstrate running analytical queries with Athena across various order-related dimensions:

" width="800" height="310">

In this example, I’m using a simplified Orders model, which has the following structure:

We’ll look at how Zero-ETL integration handles nested fields, sets, and lists of maps but first let setup the configuration.

Integration configuration

Let’s walk through the steps to configure the integration.

1- Configuring the DynamoDb source table

Before getting started, Point in time recovery (PITR) must be enabled on the source table:

We also need to configure the table’s resource policy to allow the integration to export table’s point in time.

resource "aws_dynamodb_resource_policy" "this" {
  resource_arn = data.aws_dynamodb_table.this.arn

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Service = "glue.amazonaws.com"
        },
        Action = [
          "dynamodb:ExportTableToPointInTime",
          "dynamodb:DescribeTable",
          "dynamodb:DescribeExport"
        ],
        Resource = "*",
        Condition = {
          StringEquals = {
            "aws:SourceAccount" = data.aws_caller_identity.current.account_id
          },
          StringLike = {
            "aws:SourceArn" = "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:integration:*"
          }
        }
      }
    ]
  })
}

2- Glue catalog database configuration

An IAM role must be created for the Zero-ETL integration target to grant access to the Glue database:

resource "aws_glue_catalog_database" "this" {
  name         = "${var.application}${var.environment}db"
  location_uri = "s3://${aws_s3_bucket.database_bucket.bucket}/"
}

resource "aws_iam_policy" "integration_policy" {
  name = "${var.application}-${var.environment}-integration-policy"

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = [
          "glue:CreateTable",
          "glue:GetTable",
          "glue:UpdateTable",
          "glue:GetTableVersion",
          "glue:GetTableVersions",
          "glue:GetResourcePolicy"
        ],
        Resource = [
          "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:catalog",
          "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:database/${aws_glue_catalog_database.this.name}",
          "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:table/${aws_glue_catalog_database.this.name}/*"
        ]
      },
      {
        Effect = "Allow",
        Action = [
          "cloudwatch:PutMetricData",
        ],
        Resource = "*",
        Condition = {
          StringEquals = {
            "cloudwatch:namespace" = "AWS/Glue/ZeroETL"
          }
        },

      },
      {
        Effect = "Allow",
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ],
        Resource = "*"
      },
      {
        Effect = "Allow",
        Action = [
          "glue:GetDatabase",
        ],
        Resource = [
          "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:catalog",
          "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:database/${aws_glue_catalog_database.this.name}",
        ]
      },
      {
        Effect = "Allow",
        Action = [
          "s3:ListBucket"
        ],
        Resource = [
          aws_s3_bucket.database_bucket.arn,
        ]
      },
      {
        Effect = "Allow",
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "s3:DeleteObject"
        ],
        Resource = [
          "${aws_s3_bucket.database_bucket.arn}/*",
        ]
      }

    ]
  })
}

resource "aws_iam_role" "integration_role" {
  name = "${var.application}-${var.environment}-integration-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Service = "glue.amazonaws.com"
        },
        Action = "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "integration" {
  role       = aws_iam_role.integration_role.name
  policy_arn = aws_iam_policy.integration_policy.arn
}

3- Creating the integration

Currently, neither CloudFormation nor the AWS Terraform provider supports Glue Zero-ETL. So, I’m using the AWS SDK to create the integration and configure table properties. To handle this, I rely on aws_lambda_invocation to trigger a lambda function that creates or deletes the integration whenever a the database is created or deleted—pretty much like a CloudFormation custom resource.

import { GlueClient, CreateIntegrationCommand, CreateIntegrationResourcePropertyCommand, DeleteIntegrationCommand, CreateIntegrationTablePropertiesCommand } from "@aws-sdk/client-glue";
import { SSMClient, PutParameterCommand, GetParameterCommand } from "@aws-sdk/client-ssm";

export const handler = async (event) => {

    let glueClient = new GlueClient({ region: process.env.AWS_REGION });
    let paramStore = new SSMClient({ region: process.env.AWS_REGION });

    if(event.sourceArn == null || event.targetArn == null || event.roleArn == null) {
        throw new Error("SourceArn, TargetArn and RoleArn are required");
    }

    if (event.tf.action === "create") {

        const integrationResourcePropertyResult =  await glueClient.send(new CreateIntegrationResourcePropertyCommand({
            ResourceArn: event.targetArn,
            TargetProcessingProperties: {
                RoleArn: event.roleArn
            }
        }));

        const integrationResult = await glueClient.send(new CreateIntegrationCommand({
            IntegrationName : event.integrationName,
            SourceArn : event.sourceArn,
            TargetArn : event.targetArn,

        }));


        await glueClient.send(new CreateIntegrationTablePropertiesCommand({
            ResourceArn: integrationResult.IntegrationArn,
            TableName: event.tableConfig.tableName,
            TargetTableConfig: {
                PartitionSpec: event.tableConfig.partitionSpec ? event.tableConfig.partitionSpec : undefined,
                UnnestSpec: event.tableConfig.unnestSpec ? event.tableConfig.unnestSpec : undefined,
                TargetTableName: event.tableConfig.tableName ? event.tableConfig.tableName : undefined
            }

        }));

        await paramStore.send(new PutParameterCommand({
            Name: event.integrationName,
            Value: JSON.stringify({
                integrationArn:  integrationResult.IntegrationArn,
                resourcePropertyArn: integrationResourcePropertyResult.ResourceArn
            }),
            Type: "String",
            Overwrite: true
        }));

        return;
    }

    if (event.tf.action === "delete") {
        const integrationParams = await paramStore.send(new GetParameterCommand({
            Name: event.integrationName,
        }));

        const { integrationArn } = JSON.parse(integrationParams.Parameter.Value);

        await glueClient.send(new DeleteIntegrationCommand({
            IntegrationIdentifier: integrationArn
        }));

        return;
    }

};

I’m using the @aws-sdk/client-glue to set up the integration, assign the target processing role, and configure table properties such as the target table name, schema unnesting options, and data partitioning for the target Apache Iceberg table. By default, the integration with DynamoDB uses the table’s primary keys.

Here’s how Lambda invocation is used; I’m passing the parameters I want to use to configure the integration:

resource "aws_lambda_invocation" "manage_zero_etl_integration" {

  function_name = aws_lambda_function.manage_zero_etl_integration_fn.function_name
  input = jsonencode({
    integrationName = "${var.application}-${var.environment}-zero-etl-integration",
    sourceArn       = data.aws_dynamodb_table.this.arn,
    targetArn       = aws_glue_catalog_database.this.arn,
    roleArn         = aws_iam_role.integration_role.arn,
    tableConfig = {
      tableName = data.aws_dynamodb_table.this.name,
      partitionSpec = [
        {
          FieldName    = "orderDate",
          FunctionSpec = "day"
        }
      ],
      unnestSpec : "FULL"
    }

  })

  lifecycle_scope = "CRUD"

  depends_on = [aws_glue_resource_policy.this]

}

Very much a happy-path solution here — just a workaround while waiting for proper IaC support. If you’d prefer not to take this route, another option is to create the integration using the CLI.

4- Glue resource policy

Since I’m using the Glue catalog for the integration, I made sure to include the following permissions in the glue catalog resource policy. This allows for integration between the source DynamoDB table and the target Iceberg table:

data "aws_iam_policy_document" "glue_resource_policy" {
  statement {
    effect = "Allow"

    principals {
      type = "AWS"
      identifiers = [
        "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
        aws_iam_role.manage_zero_etl_integration_role.arn
      ]
    }

    actions = [
      "glue:CreateInboundIntegration",
    ]

    resources = [
      "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:catalog",
      "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:database/${aws_glue_catalog_database.this.name}",
    ]

    condition {
      test     = "StringLike"
      variable = "aws:SourceArn"
      values   = [data.aws_dynamodb_table.this.arn]
    }
  }

  statement {
    effect = "Allow"
    principals {
      type        = "Service"
      identifiers = ["glue.amazonaws.com"]
    }

    actions = [
      "glue:AuthorizeInboundIntegration"
    ]

    resources = [
      "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:catalog",
      "arn:aws:glue:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:database/${aws_glue_catalog_database.this.name}",
    ]

    condition {
      test     = "StringEquals"
      variable = "aws:SourceArn"
      values   = [data.aws_dynamodb_table.this.arn]
    }
  }

  depends_on = [
    aws_iam_role.manage_zero_etl_integration_role,
    aws_lambda_function.manage_zero_etl_integration_fn
  ]
}


resource "aws_glue_resource_policy" "this" {
  policy = data.aws_iam_policy_document.glue_resource_policy.json
}

You can find this configuration in the official docs here.

Glue Zero-ETL in action

Once you’ve deployed all the components, you can go straight to the Glue Zero-ETL list. Here’s what it looks like in the console:

" width="800" height="118">

You can view the details. By default the refresh interval from the source DynamoDb table to the Iceberg table is set to 15 minutes, it is not editable for now:

" width="800" height="402">

You can also monitor the integration operations and track the number of items inserted, updated, or deleted directly from CloudWatch Logs. The documentation for the metrics generated during each execution can be found at the following link.

" width="800" height="317">

Once the first insert operation is successful, you can view the inferred Iceberg table schema on the data catalog page on the console:

" width="800" height="429">

☝️ Note that the shippingAddress was un-nested and deliveryPreferences was replicated as an array. That’s very convenient. However,items property was inferred as string. Since it’s a list of maps in DynamoDB, I expected it to map cleanly to a list of structs in Apache Iceberg, but it didn’t quite get the schema right.

The items property ends up as a plain JSON string in this DynamoDb list format, It’s not perfect, but we can work around it by using json_extract in Athena to parse the data:

Querying with Athena

Here’s an example query using Athena to get the number of orders grouped by city:

" width="800" height="463">

Here’s another example where I want to get the number of orders by city where the delivery preferences include LeaveAtDoor. While this involves some extra steps with DynamoDB, it’s much easier to achieve with Iceberg tables:

" width="800" height="463">

My Wishlist

After trying out Glue Zero-ETL, I came up with a wishlist of features and improvements I'd like to see. Since it's still relatively new (at the time of writing), I'm looking forward to potential updates and enhancements over time. I'll keep this blog post updated as things evolve:

IaC support

Deploying services through the console is not my preferred approach. As mentioned earlier in this post, currently, neither CloudFormation nor the AWS Terraform provider supports Glue Zero-ETL. I used the AWS SDK to create the integration and configure table properties. While this approach works for now, it’s not ideal. I expect that support for CloudFormation and Terraform will be introduced soon.

Handling DynamoDb List of Maps

Lists of Maps aren’t supported (yet?). Since Apache Iceberg tables can handle lists of structs, the lack of support for this feature could complicate more advanced use cases with complex table schemas. In such cases, running a custom ETL job remains a better solution.

Custom partitioning configuration

When setting up the integration, you can configure target table properties, such data partitioning as using the primary key from the DynamoDB table or specifying a custom partition:

await glueClient.send(new CreateIntegrationTablePropertiesCommand({
    ResourceArn: integrationResult.IntegrationArn,
    TableName: event.tableConfig.tableName,
    TargetTableConfig: {
        PartitionSpec: event.tableConfig.partitionSpec ? event.tableConfig.partitionSpec : undefined,
        UnnestSpec: event.tableConfig.unnestSpec ? event.tableConfig.unnestSpec : undefined,
        TargetTableName: event.tableConfig.tableName ? event.tableConfig.tableName : undefined
    }
}));

However, while I was able to define custom partition configuration through both the console and the AWS CLI, it didn’t seem to take effect:

I’m not sure if this is a UI issue or if Glue Zero-ETL Integration simply doesn’t support it yet. The documentation isn’t very clear on this point, but hopefully, it gets updated soon!

Support for AWS services other than DynamoDb

" width="800" height="298">

The Glue Zero-ETL integration currently supports a many sources, with DynamoDB being the only AWS service available at this point. While this is a great start, I would have preferred better alignment across AWS’s data integration offerings. For example, Amazon Kinesis Data Firehose already supports native CDC integration for RDS databases. It would have been ideal to see a more aligned approach, where Glue Zero-ETL could also support CDC from RDS and other AWS services.

Wrapping up

I hope you found this article helpful! I’ve found the Glue Zero-ETL integration to be an interesting tool to have in your toolkit, especially for offloading undifferentiated heavy lifting and focusing on what matters most. It’s also useful for teams that aren’t familiar with writing Glue Jobs, as it makes running ad-hoc analytics queries on data originally stored in DynamoDB much easier.

As ususal, you can find the full code source, ready to be adapted and deployed here 👇

https://github.com/ziedbentahar/glue-zero-etl-dynamodb-to-apache-iceberg-table

Thank you for reading and may your data be clean, your queries be fast, and your pipelines never break 😉

Resources

https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html

https://docs.aws.amazon.com/cli/latest/reference/glue/create-integration.html

https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/glue/command/CreateIntegrationCommand/

Building smarter RSS feeds for my newsletter subscriptions with SES and Bedrock

Zied Ben Tahar — Sat, 23 Nov 2024 16:38:33 +0000

In this article, I’ll share a tool I recently built for my personal use, driven largely by intellectual curiosity. Although I was aware of services like Kill the Newsletter!, I wanted to create a service that generates a personalized RSS feeds for my newsletter subscriptions. I wanted the feed to do more than just list content — it would provide summaries of featured articles and topics shared in newsletters I am subscribed to and powered by LLM to categorize, summarize, and extract key information. This helps me stay on top of relevant updates and ensure I don’t miss out on insights from the community.

To achieve this, I used a feature from Amazon Simple Email Service (SES) that allows for handling incoming emails, making it a suitable option for building event-driven automations that process messages as they arrive. In my case, I used this capability to efficiently manage the newsletters I receive, transforming them into personalized RSS feeds.

Designing the smart RSS feed generator

To start, I wanted to be able to set up email addresses that I can create on the fly whenever I want to subscribe to one or many newsletters. Each email address would correspond to a “virtual” inbox, tailored for a specific type of subscription. Although all emails are technically routed to the same location, this approach allows me to manage and organize my subscriptions as if they were separate inboxes, each based on specific interests and topics. I can create as many dedicated inboxes as needed. For example, one could be set up for awesome-serverless-community@my-domain.com, while another could be for awesome-system-design@my-domain.com, each handling different types of newsletters and allowing me to configure separate content filtering rules for each subscription.

Whenever a newsletter arrives at one of these “virtual” inboxes, the RSS feed generation starts. The first step involves verifying whether the system should handle the email. To achieve this, I implemented an allow list of trusted newsletter senders, which can be configured as needed. This ensures that only approved sources are processed, adding an extra layer of control to the system. Next, the raw email content is converted into Markdown format. An LLM is then used to create a gist of the newsletter and filter the content based on my interests. Both the filter configurations and the allow list are stored in a dedicated table, with filters configured for each feed.

Some of the newsletters I subscribe to feature valuable community content, such as links to blog posts or videos. I also use the LLM to generate a structured list of these links, along with their main topics, so I can easily access the content that’s most relevant to me.

Once the gist of a newsletter is ready, it is finally stored in a dedicated table. Each email address gets its own personalized RSS feed that gets served through an API, so it can be accessed by RSS feed readers.

Solution overview

Let’s now have a deeper look at the solution:

Incoming emails are stored in an S3 bucket. Each time a new email arrives, an event is sent to the default EventBridge bus, triggering a workflow to process the email. The Process Email function handles the entire workflow, including content conversion, filtering, and gist generation. It uses Amazon Bedrock with the Claude Sonnet model to create a structured newsletter gist, which is then stored in a DynamoDB table.

For serving the RSS feeds, I built the api using Hono web framework.

Some notes

I chose to use a single Lambda function to handle the entire process of generating the newsletter gist, keeping that part self-contained. In a previous article, I explored another approach using a Step Function to interact with an LLM, as it avoids paying for an active Lambda function while waiting for the LLM response, but it requires a more complex setup.
The API is deployed using a Function URL (FURL). I use CloudFront Origin Access Control (OAC) to restrict access to the Lambda function URL origin.

Solution details

I built this solution using Nodejs and typescript for functions code and terraform for IaC

TL;DR

You will find the complete repo of this service here 👉https://github.com/ziedbentahar/smart-feeds

1 — Email handling — Configuring SES

To get started with SES handling incoming emails, we’ll add an MX record to our domain’s DNS configuration. Next, we’ll create an SES receipt rule to process all incoming emails sent to @my-domain.com. This rule includes an action to deliver the raw email content to an S3 bucket.

Here is the how to define this in terraform:

...
resource "aws_route53_record" "email_mx_records" {
  zone_id = var.subdomain_zone.id
  name    = local.email_subdomain
  type    = "MX"
  ttl     = "600"
  records = [    "10 inbound-smtp.us-east-1.amazonses.com",
    "10 inbound-smtp.us-east-1.amazonaws.com",
  ]
}
...

resource "aws_ses_receipt_rule_set" "this" {
  rule_set_name = "${var.application}-${var.environment}-newsletter-rule-set"
}
resource "aws_ses_receipt_rule" "this" {
  name          = "${var.application}-${var.environment}-to-bucket"
  rule_set_name = aws_ses_receipt_rule_set.this.rule_set_name
  recipients = ["${local.email_subdomain}"]
  enabled       = true
  scan_enabled  = true
  s3_action {
    position = 1
    bucket_name = aws_s3_bucket.email_bucket.bucket
    object_key_prefix = "${local.emails_prefix}"
  }
  depends_on = [    aws_ses_receipt_rule_set.this,
    aws_s3_bucket.email_bucket,
    aws_s3_bucket_policy.email_bucket_policy
  ]
}
resource "aws_ses_active_receipt_rule_set" "this" {
  rule_set_name = aws_ses_receipt_rule_set.this.rule_set_name
  depends_on = [aws_ses_receipt_rule_set.this]
}

We also need to update the bucket policy to allow SES to write to the emails bucket:

resource "aws_s3_bucket_policy" "email_bucket_policy" {
  bucket = aws_s3_bucket.email_bucket.id
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [      {
        Effect    = "Allow",
        Principal = { Service = "ses.amazonaws.com" },
        Action    = "s3:PutObject",
        Resource  = "${aws_s3_bucket.email_bucket.arn}/*",
        Condition = {
          StringEquals = {
            "aws:Referer" = data.aws_caller_identity.current.account_id
          }
        }
      }
    ]
  })
}

After deployment, you will be able to view in the console the SES email receiving receipt rule details :

2— Generating newsletter gist

The ‘Process Email’ lambda is invoked by an event bridge rule whenever a new object is created on the inbox bucket. Let’s have a look into the different involved steps in generating the newsletter gist:

export const lambdaHandler = async (event: S3ObjectCreatedNotificationEvent) => {
    const rawContent = await getRawEmailContent(event.detail.object.key);
    const emailId = basename(event.detail.object.key);
    if (!rawContent) {
        throw new Error("Email content not found");
    }
    const { newsletterEmailFrom, newsletterEmailTo, html, date, subject } = await parseEmail(rawContent);
    const feedsConfigs = await getFeedConfigurationsBySenderEmail(newsletterEmailFrom);
    if (feedsConfigs.length === 0) {
        console.warn(`No feed config found for ${newsletterEmailFrom}`);
        return;
    }
    let shortenedLinks = new Map<string, string>();
    const markdown = generateMarkdown(html, {
        shortenLinks: true,
        shortener: (href) => {
            let shortened = nanoid();
            shortenedLinks.set(shortened, href);
            return shortened;
        },
    });
    for (const [shortened, original] of shortenedLinks) {
        await addShortenedLink(original, shortened);
    }
    const output = await generateNewsletterGist(markdown);
    if (!output) {
        throw new Error("Failed to generate newsletter gist");
    }
    await Promise.allSettled(
        feedsConfigs.map(async (feedConfig) => {
            await addNewItemToFeed(feedConfig.feedId, {
                feedId: feedConfig.feedId,
                date,
                title: subject,
                emailFrom: newsletterEmailFrom!,
                id: emailId,
                ...output,
            });
        })
    ).catch((e) => {
        console.error(e);
    });

};

🔎 Let’s zoom-in:

First, the raw email content is retrieved from the inbox S3 bucket. This content is then parsed to extract the HTML content, the sender, and other relevant details used downstream.
Once we confirm the sender is on the allow list, the generateMarkdown function converts the email content into Markdown. During the transformation, unnecessary elements such as headers and styles are stripped from the email’s HTML content.
As I am interested in capturing relevant shared content, typically containing links to original sources, the generateMarkdown function extracts these links and transforms them into short Ids. These Ids are used in the prompt instead of the full links, helping to reduce the input context length when invoking the model.
The short ids are stored in a table, linked to the original URLs, and used in the RSS feed items.
generateNewsletterGist generates the prompt and invokes the model
And finally the addNewsletterGistToFeed stores the structured output in the feeds table.

You can find below the the details of the generateMarkdown function, here I am relying on the turndown lib:

import TurndownService from "turndown";
const generateMarkdown = (
    html: string,
    options: 
      { shortenLinks: true; shortener: (href: string) => string } |
      { shortenLinks: false }
): string => {
    const turndownService = new TurndownService({});
    turndownService.addRule("styles-n-headers", {
        filter: ["style", "head", "script"],
        replacement: function (_) {
            return "";
        },
    });
    if (options.shortenLinks) {
        turndownService.addRule("shorten-links", {
            filter: "a",
            replacement: function (content, node) {
                const href = node.getAttribute("href");
                if (href) {
                    const shortened = options.shortener(href);
                    return `[${content}](${shortened})`;
                }
                return content;
            },
        });
    }
    const markdown = turndownService.turndown(html);
    return markdown;
};

One important part is the prompt I use to generate the newsletter gist:

const prompt = `
Process the provided newsletter issue content in markdown format and generate a structured JSON output by performing the following tasks and adhering to the constraints:
<tasks> 
    * Summarize the most important topics in this newsletter. 
    * Identify and extract the list of content shared in the newsletter, including: 
        * Key topics, extracted as paragraphs. 
        * Articles 
        * Tutorials. 
        * Key events
    * For shared content, extract the most relevant link. For each link, generate a summary sentence related to it. Do not create a link if one is not provided in the newsletter. 
    * Exclude any irrelevant content, such as unsubscribe links, social media links, or advertisements. 
    * Do not invent topics or content that is not present in the newsletter. 
</tasks>
Here is the expected JSON schema for the output: 
<output-json-schema>
{{output_json_schema}}
</output-json-schema>
Here is the newsletter content: 
<newsletter-content>
{{newsletter_content}}
</newsletter-content>
`;
export const generatePromptForEmail = (newsletterContent: string, outputJsonSchema: string) => {
    return prompt
        .replace("{{newsletter_content}}", newsletterContent)
        .replace("{{output_json_schema}}", outputJsonSchema);
};

To ensure the LLM generates the expected result in JSON, I provide the JSON schema for the output structure. Instead of hardcoding the output schema in the prompt, I define the schema of the newsletter gist using Zod and infer both the TypeScript type and the JSON schema from it. This way, any changes to the schema are also reflected in the LLM output:

import { z } from "zod";
import zodToJsonSchema from "zod-to-json-schema";
const linkSchema = z.object({
    text: z.string(),
    url: z.string(),
});
t const newsletterGist = z.object({
    summary: z.string(),
    topics: z.array(z.string()),
    links: z.array(linkSchema),
});
export type NewsletterGist = z.infer<typeof newsletterGist>;
export const newsletterGistSchema = zodToJsonSchema(newsletterGist);

To invoke the model, I use Bedrock Converse API, this allows me to write code once and use it with different models:

const prompt = generatePromptForEmail(markdown, JSON.stringify(newsletterGistSchema), config);
const result = await bedrockClient.send(
    new ConverseCommand({
        modelId: process.env.MODEL_ID,
        system: [{ text: "You are an advanced newsletter content extraction and summarization tool." }],
        messages: [            {
                role: "user",
                content: [                    {
                        text: prompt,
                    },
                ],
            },
            {
                role: "assistant",
                content: [                    {
                        text: "{",
                    },
                ],
            },
        ],
    })
);

Since I want to enforce a JSON output, I’ll need to prefill the assistant’s message with an opening {. This is specific to Claude models.

3 — Serving the newsletters gists as an RSS feed

Working with Hono is a breeze. It simplifies many aspects of defining web APIs while supporting Lambda natively. This API serves multiple routes, and I chose to deploy it as a mono-lambda (AKA Lambdalith) to simplify the infrastructure definition:

import { Hono } from "hono";
import { handle } from "hono/aws-lambda";
import { feeds } from "./routes/feeds";
import { newsletters } from "./routes/newsletters";
import { links } from "./routes/links";
import { home } from "./routes/home";
export const app = new Hono();
app.route("/", home);
app.route("/feeds", feeds);
app.route("/links", links);
app.route("/newsletters", newsletters);
export const handler = handle(app);

Easy! The feeds route generates the RSS feed from the newsletter gists already stored in the feeds table:

export const feeds = new Hono().get("/:id/rss", async (c) => {
    const feedId = c.req.param("id");
    const feedConfig = await getFeedConfig(feedId);
    let feedItems: NewsletterIssueGist[] = [];
    for await (const items of getFeedItems(feedId)) {
        feedItems = [...items.map((item) => item.content), ...feedItems];
    }
    const rssFeedItems = feedItems.reduce(
        (acc, item) => {
            acc[item.id] = {
                item: {
                    title: item.title,
                    description: html`<div>
                        <section>📩 ${item.emailFrom}</section>
                        <section>📝 ${item.summary}</section>
                        <section>
                            <div>📝 Topics</div>
                            <ul>
                                ${item.topics.map((t) => {
                                    return `<li>${t}</li>`;
                                })}
                            </ul>
                        </section>
                        <section>
                            <div>
                                <a href="https://${process.env.API_HOST}/newsletters/${item.id}"
                                    >📰 Open newsletter content</a
                                >
                            </div>
                        </section>
                        <section>
                            <ul>
                                ${item.links.map((l) => {
                                    return `
                                            <li>
                                                <a href="https://${process.env.API_HOST}/links/${l.url}"
                                                    >🔗 ${l.text}</a
                                                >
                                            </li>
                                        `;
                                })}
                            </ul>
                        </section>
                    </div>`.toString(),
                    guid: item.id,
                    link: `https://${process.env.API_HOST}/newsletters/${item.id}`,
                    author: item.emailFrom,
                    pubDate: () => new Date(item.date).toUTCString(),
                },
            };
            return acc;
        },
        {} as Record<string, unknown>
    );
    const feed = toXML(
        {
            _name: "rss",
            _attrs: {
                version: "2.0",
            },
            _content: {
                channel: [                    {
                        title: feedConfig?.name,
                    },
                    {
                        description: feedConfig?.description,
                    },
                    {
                        link: `https://${process.env.API_HOST}/feeds/${feedId}/rss`,
                    },
                    {
                        lastBuildDate: () => new Date(),
                    },
                    {
                        pubDate: () => new Date(),
                    },
                    {
                        language: "en",
                    },
                    Object.values(rssFeedItems),
                ],
            },
        },
        { header: true, indent: "  " }
    );
    return c.text(feed);
});

here I am using the [jstoxml](https://www.npmjs.com/package/jstoxml) package to be able to convert the newsletter gist structure to the RSS feed XML format.

The other routes exposed by this API include /newsletters, which renders the HTML of the received email already stored in the inbox bucket, and /links, which redirects the caller to the original content link using the short link id.

And finally, here is the CloudFront OAC configuration for the API exposed via a function URL:

resource "aws_cloudfront_origin_access_control" "this" {
  name                              = "${var.application}-${var.environment}-api-oac"
  origin_access_control_origin_type = "lambda"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}
resource "aws_cloudfront_distribution" "this" {
  origin {
    domain_name              = replace(aws_lambda_function_url.api.function_url, "/https:\\/\\/|\\//", "")
    origin_access_control_id = aws_cloudfront_origin_access_control.this.id
    origin_id                = "api"
    custom_origin_config {
      http_port              = 80
      https_port             = 443
      origin_protocol_policy = "https-only"
      origin_ssl_protocols   = ["TLSv1.2"]
    }
  }
  enabled         = true
  is_ipv6_enabled = true
  aliases = [local.api_subdomain]
  default_cache_behavior {
    allowed_methods  = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = "api"
    cache_policy_id        = data.aws_cloudfront_cache_policy.disabled.id
    viewer_protocol_policy = "allow-all"
    min_ttl                = 0
  }
  price_class = "PriceClass_200"
  restrictions {
    geo_restriction {
      restriction_type = "none"
      locations        = []
    }
  }
  viewer_certificate {
    acm_certificate_arn      = aws_acm_certificate.this.arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2021"
  }
  depends_on = [aws_acm_certificate_validation.cert_validation]
}

Once we deploy the API we can access the RSS feed by using this url https://<some domain>/feeds/<feed-id>/rss

Here is an example of how it renders in an RSS reader application.

Pretty neat, isn’t it? This way, I can follow updates from the community all in one place ! 👍

Wrapping up

I had fun building this tool. In my initial iteration, I intended to leverage Bedrock’s prompt management and prompt flow features. Unfortunately, at the time of writing, these services were not mature enough. But, I might explore them in the future.

The email automation pattern used here isn’t limited to processing newsletters. It can be applied to a varity of other use cases, such as customer support systems or invoice and receipt handling.

As always, you can find the full code source, ready to be adapted and deployed here 👇

https://github.com/ziedbentahar/smart-feeds

Hope you enjoyed it !

Resources

How to receive emails using Amazon SES.

Hono with AWS Lambda

An Alternative to Batch Jobs: Scheduling Events with EventBridge Scheduler

Zied Ben Tahar — Tue, 10 Sep 2024 10:54:48 +0000

In a previous post, I wrote about EventBridge Scheduler and how we can use it to build a reminder application. In this article, we’ll explore how scheduling messages in the future can be also an alternative to batch jobs. This approach is sometimes overlooked; it offers advantages such as reducing system load and improving cost efficiency.

To illustrate this pattern, we’ll design a system that needs to execute a task after a specified delay whenever a new item is created in a database. This task could involve operations such as resource creation or resource cleanup.

Here are two approaches to solving this:

A batch job is triggered on a schedule, this job selects all newly created items that match the creation time criteria and performs the required tasks for each:

Or an event-based approach: When an item is created, the system schedules a one-time message for that new item to be triggered in the future. At due time, the message is sent and the associated task is executed:

There are some downsides to using a batch job approach compared to an event-based approach:

Potential delays: Batch jobs run on a fixed schedule, causing tasks to be processed at intervals. This can lead to delays, as some tasks will only be executed after the next batch execution. An event-based approach triggers tasks instantly when execution time criteria is met, ensuring timely execution.
Resource overhead and inefficiency: Depending on how the data is stored, batch jobs can be resource-intensive, particularly when processing a large number of records at once. For example, if the data is stored in a DynamoDB table, it may require scanning the table to find items matching item’s creation time criteria or adjusting the design of the table. This can lead to sub-optimal resource usage. In contrast, an event-based approach distributes the workload more evenly over time and potentially lowering costs by eliminating the need for periodic processing.

⚠️ A word on cancellation on the event-based approach: This pattern can become tricky when handling task cancellations. Possible solutions include adding the ability to delete an existing schedule when a cancellation event occurs or validating in the ‘Run Task’ step whether the scheduled action is still eligible for execution.

Possible implementations

In the example, I’ll use DynamoDB as the storage layer. However, the same principles apply to other storage systems that support item-level change data capture.

❌ Using TTL as a scheduling mechanism

One solution is to leverage DynamoDB’s TTL item expiration. When creating the original item, add an associated item in the same transaction, which we’ll call a ‘signal’. This signal is created with a TTL that matches the desired event date. And then, configure a DynamoDB Stream and subscribe the ‘Run Task’ function to the signal item expiration event.

🛑 There is an issue with this approach: The item expiration and deletion are not guaranteed to be immediate so tasks for these new items might not run exactly at the due time. The DynamoDB TTL docs do not specify a precise timeframe; item expiration can occur within a few days:

If there are no strict requirements to trigger the task at the exact expected time, this solution can be suitable. It’s a tradeoff.

✅ A better solution

There is a better solution with EventBridge Scheduler and one-time schedules capability:

The ‘Create One-Time Schedule for New Items’ function filters for newly created items in the table’s stream and creates a new schedule for each. While I could have configured the EventBridge Scheduler to directly trigger a Lambda function as the target, this approach might not be suited when dealing with a high volume of events. In some cases, it’s better to use a queue as target and control the ‘Run Task’ function concurrency. This way, the function can process messages at a manageable rate. It’s about finding the right balance between direct invocation and rate control, and depending on the concrete use case at hand.

On a side note, I would have loved to see EventBridge Scheduler as a target of EventBridge Pipes; this could have simplified the solution even further.

Let’s see how to implement this with EventBridge Scheduler. I’ll be using terraform for IaC and Rust for functions code.

Scheduling new tasks

First, I’ll create a dedicated scheduler group to organise all scheduled tasks within a single group:

resource "aws_scheduler_schedule_group" "schedule_group" {
  name = "${var.application}-${var.environment}"
}

The ‘Schedule Tasks’ function handles only new items created on the table. We achieve this by using DynamoDB Stream filtering capabilities provided by the event source mapping. As a result, the function will be invoked only when new items are inserted:

resource "aws_lambda_event_source_mapping" "schedule_task_lambda" {
  event_source_arn        = aws_dynamodb_table.table.stream_arn
  function_name           = aws_lambda_function.schedule_task_lambda.function_name
  starting_position       = "TRIM_HORIZON"
  function_response_types = ["ReportBatchItemFailures"]

  filter_criteria {
    filter {
      pattern = jsonencode({
        eventName = ["INSERT"]
      })
    }
  }
}

To create new one-time schedules on EventBridge Scheduler, this function requires the scheduler:CreateSchedule action on the custom scheduler group, as well as the iam:PassRole action for the role used by the schedule. A role that grants permission to send messages to the Tasks SQS queue:

...
 {
    Effect = "Allow"
    Action = [
      "scheduler:CreateSchedule"
    ]
    Resource = "arn:aws:scheduler:${region}:${account_id}:schedule/${aws_scheduler_schedule_group.schedule_group.name}/*"
  },
  {
    Effect = "Allow"
    Action = [
      "iam:PassRole"
    ]
    Resource = aws_iam_role.scheduler_role.arn
  },
...

Zooming in on the ‘Schedule Task’ function code

For each new item, we’ll create a schedule that will trigger once, two hours after the item’s creation time:

async fn process_new_item(
    new_item: &SomeItem,
    scheduler_client: &aws_sdk_scheduler::Client,
    scheduler_group_name: &String,
    scheduler_target_arn: &String,
    scheduler_role_arn: &String,
) -> Result<(), Error> {

    // as an example, we'll configure a one-time schedule two hours after the item was created
    let now = Utc::now();
    let two_hours_later = now + Duration::hours(2);
    let two_hours_later_fmt = two_hours_later.format("%Y-%m-%dT%H:%M:%S").to_string();

    let response = scheduler_client
        .create_schedule()
        .name(format!("schedule-{}", &new_item.id))
        .action_after_completion(ActionAfterCompletion::Delete)
        .target(
            Target::builder()
                .input(serde_json::to_string(&new_item)?)
                .arn(scheduler_target_arn)
                .role_arn(scheduler_role_arn)
                .build()?,
        )
        .flexible_time_window(
            FlexibleTimeWindow::builder()
                .mode(FlexibleTimeWindowMode::Off)
                .build()?,
        )
        .group_name(scheduler_group_name)
        .schedule_expression(format!("at({})", two_hours_later_fmt))
        .client_token(nanoid!())
        .send()
        .await;

    match response {
        Ok(_) => Ok(()),
        Err(e) => {
            error!("Failed to create schedule: {:?}", e);
            return Err(Box::new(e));
        }
    }
}

The important bits:

Some parameters are required. When defining the target, you’ll need to provide the message payload and specify the role that EventBridge Scheduler will use to invoke the target as well as the target arn, which is the arn of the SQS queue in this case.
We also need to ensure that the schedule is deleted after the target invocation is successful by using ActionAfterCompletion::Delete

And since we configured the function response type to ReportBatchItemFailures, here is how process_new_item is called by the function handler:

async fn process_records(
    event: LambdaEvent<Event>,
    scheduler_client: &aws_sdk_scheduler::Client,
    scheduler_group_name: &String,
    scheduler_target_arn: &String,
    scheduler_role_arn: &String,
) -> Result<DynamoDbEventResponse, Error> {
    let mut response = DynamoDbEventResponse {
        batch_item_failures: vec![],
    };

    if event.payload.records.is_empty() {
        return Ok(response);
    }

    for record in &event.payload.records {
        let item = record.change.new_image.clone();
        let new_item: SomeItem = serde_dynamo::from_item(item)?;

        let record_processing_result = process_new_item(
            &new_item,
            scheduler_client,
            &scheduler_group_name,
            &scheduler_target_arn,
            &scheduler_role_arn,
        )
        .await;

        if record_processing_result.is_err() {
            let error = record_processing_result.unwrap_err();
            error!("error processing item - {}", error);
            response.batch_item_failures.push(DynamoDbBatchItemFailure {
                item_identifier: record.change.sequence_number.clone(),
            });

        }
    }

    Ok(response)
}

Once new tasks are correctly scheduled, they will be visible on the schedules page in the console:

We can also view the details related to the schedule’s target in the console

When the scheduled task time is due, the message associated with the item, will be sent to the SQS queue and then processed by the ‘Run Task’ function. That’s all 👌 !

Wrapping up

With one-time schedules EventBridge Scheduler enables interesting patterns that can improve application architecture by reducing the overhead associated with batch jobs. But as always, choosing between an event-driven approach or batch jobs depends on your application’s needs and complexity of the use case at hand.

You can find the complete source code, this time written in Rust and Terraform, ready to adapt and deploy 👇
GitHub - ziedbentahar/scheduling-messages-in-future-with-eventbridge-scheduler

Thanks for reading — I hope you found it helpful!

Resources

Time to Live (TTL)

Serverless Land

https://www.youtube.com/watch?v=zWgqj2OEKX8&t=2s

RAG on media content with Bedrock Knowledge Bases and Amazon Transcribe

Zied Ben Tahar — Thu, 01 Aug 2024 15:45:55 +0000

In a previous article, I wrote about building an application capable of generating summaries from YouTube videos. It was fun to build. However, the solution was limited as it was only handling Youtube videos and relied on YouTube’s generated transcripts. I wanted to improve this solution and make it more versatile as well as supporting other media formats.

In this article, I take a different approach: Transcribing media files with Amazon Transcribe and using the generated transcripts as a knowledge base, allowing for retrieval-augmented generation with Amazon Bedrock.

RAG with Amazon Bedrock Knowledge Bases

Retrieval-Augmented Generation (RAG) is a technique that improves model responses by combining information retrieval with prompt construction: When processing a query, the system first retrieves relevant data from custom knowledge bases. It then uses this data in the prompt, enabling the model to generate more accurate and contextually relevant answers.

RAG significantly improves the model’s ability to provide informed, up-to-date answers, bridging the gap between the model’s training data and custom up-to-date information.

How Bedrock can help ?

Amazon Bedrock’s Knowledge Bases simplify the RAG process by handling much of the heavy lifting. This includes synchronizing content from Amazon S3, chunking, converting it into embeddings, and storing them in vector databases. It also provides endpoints that allow applications to query the knowledge base while generating responses based on the retrieved data. By handling these tasks, Bedrock allows to focus on building AI-powered applications rather than managing infrastructure.

In this article, I will use RAG for media transcripts to generate responses based on audio or video content. These contents can be meeting recordings, podcasts, conference talks, and more.

Solution overview

The architecture of a typical RAG system consists of two components:

Knowledge base indexing and synchronisation
Retrieval and generation

Users first request an upload link by invoking the Request Media upload link function, which generates an S3 presigned URL. The user’s request includes the media metadata such as the topic, the link to the media and date. This metadata will be stored and used downstream by Bedrock to apply filtering during the retrieval phase. When the upload process completes in the media bucket, the Start transcription job function is triggered by media bucket event notification.

When a transcription job state changes, EventBridge will publish job completion status events (Success or Failure). The Handle transcription and sync knowledge base function handles only successful events, extracts the transcription content, stores the extracted text transcript in the knowledge base bucket, and triggers a knowledge base sync.

The vector database is an important part of a RAG system. It stores and retrieves text representations as vectors (also known as embeddings). allowing for similarity searches when given a query. Bedrock supports various vector databases, including Amazon OpenSearch, PostgreSQL with the pgvector extension, and Pinecone. Each option has its advantages. In this solution, I chose Pinecone as it is a serverless service that allows for quick and easy setup.

When using Knowledge Bases, the Model used for generating embeddings can differ from the one used for response generation. For example, in this sample, I use “Amazon Titan Text embedding v2” for embedding and “Claude 3 Sonnet” for response generation.

☝️Note: In this article, I did not include the parts that handle transcription job monitoring and asynchronous job progress notifications to the requester. For insights on building an asynchronous REST APIs, you can refer to this previous article.

Alright, let’s deep dive into the implementation

Solution Details

This time, I am taking a different approach compared to my previous articles: I will be using Rust for lambda code and terraform for IaC.

1- Creating the Pinecone vector database

I try to do IaC whenever possible. The good news is that Pinecone offers a Terraform provider, which simplifies managing Pinecone indexes and collections as code. First we’ll need an API Key:

Here, I am using the serverless version of Pinecone. We need to set the PINECONE_API_KEY environment variable to the API key we just created so that it can be used by the provider.

2- Creating the knowledge base

Creating the knowledge base involves defining two key components: the vector store configuration, which points to Pinecone, and the data source.

The data source dictates how the content will be ingested, including the storage configuration and the content chunking strategy.

For the data source, I am setting the chunking strategy to FIXED_SIZE

After deployment, you will be able to view in the console the data source configuration:

3- Requesting Media upload link

This function is invoked by the API Gateway to generate a presigned URL for media file uploads. A unique identifier is assigned to the object, which will also serve as the transcription job name and as the reference for the knowledge base document.

The request is validated to ensure that the media metadata properties are properly defined. I am using the serde_valid crate to validate the request payload. This crate is very convenient for defining schema validations using attributes.

And here is are details of the generate_presigned_request_uri

4- Handling media upload and starting transcription job

This function is triggered by an S3 event whenever a new file is successfully uploaded. As a convention, I am using the media object key as the transcription job name which is the unique identifier of the task.

This function needs transcribe:startTranscriptionJob permission in order to be able to start a transcription task.

Once the task is started, we can monitor the transcription job process in the console:

5- Subscribing to transcription success events and syncing knowledge base

Let’s first have a look into the event bridge rule definition in terraform:

Which translates to the following configuration in the AWS console:

Here, the Handle successful transcription function is invoked each time a transcription is successfully completed. I am only interested in having the transcription job name, as I will use it as the data source object key:

This function first retrieves the transcription result content available at the transcript_file_uri, extracts the important part and stores it in the knowledge base bucket as well as its metadata and then triggers a start_ingestion_job. If the operation fails, it will be retried by EventBridge and eventually put into a dead letter queue.

☝️**Note: **I opted against using a step function for this part since the transcribe output could exceed 256 KB.

6- Chatting with the knowledge base in the console

Before building the function that queries the knowledge base, we can already test it from the console. I used this awesome first believe in serverless podcast episode as a data source:

The console also provides a way to test and adjust the generation configuration, including choosing the model, using a custom prompt, and adjusting parameters like temperature, top-p. This allows you to tailor the configuration to your specific use case requirements.

Alright, let’s now create the endpoint to see how we can query this knowledge base

7- Querying the knowledge base

This function requires the bedrock:RetrieveAndGenerate permission for accessing the knowledge base and the bedrock:InvokeModel permission for the Claude 3 sonnet model arn used during the generation phase. It returns an output result along with the source URL associated with the retrieved chunks that contributed to the output:

The build_retrieve_and_generate_configuration function prepares the necessary parameters for calling the retrieveAndGenerate endpoint.

As an example, I am applying a retrieval filter to the topic attribute.

Et voilà ! Let’s call our api straight from postman:

Wrapping up

I’ve only scratched the surface when it comes to building RAG systems. Bedrock simplifies the process considerably. There’s still plenty of room for improvement, such as optimising retrieval methods and refining prompts. I had also fun building this in Rust — using Cargo Lambda makes creating Lambdas in Rust a breeze, check it-out!

As always, you can find the full code source, ready to be adapted and deployed here:
ziedbentahar/rag-on-media-content-with-bedrock-and-transcribe

Thanks for reading ! Hope you enjoy it

Resources

Amazon Web Services (AWS) - Pinecone Docs
Rust functions on AWS Lambda made simple
Create a knowledge base

Adding flexibility to your deployments with Lambda Web Adapter

Zied Ben Tahar — Fri, 19 Apr 2024 08:48:07 +0000

Lambda Web Adapter (LWA) is an open-source project that enables running Web apps on Lambda functions without the need to change or adapt the code.

In my opinion, LWA opens up interesting pathways for architecture evolution: While it’s an interesting tool that helps lift & shift Web apps and APIs to Lambda functions without a lot of effort, it can also enable another migration path : Start deploying your application in Lambda as a Lambdalith and then transition to a classical container deployment when needed (e.g. ECS Fargate). In some scenarios, it may happen that you don’t have enough data to decide whether it’s better to host on Lambda or on Fargate. LWA contributes by adding portability to your deployments.

In this article, we’ll explore how to use LWA with CDK to simplify the deployment of your Web apps in Lambda and how to easily transition to ECS Fargate.

How Lambda Web adapter works ?

LWA is a Lambda extension. It creates an independent process within the Lambda execution environment that listens for incoming events and forwards them to your HTTP server.

LWA can integrate with invocations from Lambda function URLs (FURL), ALB, and API Gateway, converting invocation JSON payloads into HTTP requests that web frameworks like fastify or ASP.NET can handle. LWA also supports non-HTTP triggers (e.g. SQS, S3 notifications), but in these cases, it acts as a pass-through without converting the invocation payload.

LWA supports functions packaged as zip as well as Docker or OCI images.

Solution overview

Let’s have a look on how we’ll create our flexible deployment using CDK. In this example we’ll be focusing on deploying a public Web application using fastify as a Web framework:

My objective is to create a CDK construct supporting two deployment strategies: Lambda or ECSFargate. Depending on the selected strategy, only the required components will be deployed:

When in Lambda mode, we’ll configure the function to use LWA extension. We’ll also configure a REST API Gateway with lambda proxy integration.
When the deployment mode is ECSFargate, We will deploy our application in an ECS Fargate service that is exposed via an ALB.

In both of these deployment strategies, users access the Web app through CloudFront. We will associate a WAF Web ACL to restrict access to both the API Gateway and the Application Load Balancer. These origins will only respond to requests that include a custom verification header added by the CloudFront distribution. This approach prevents bypassing the CloudFront distribution to access the origin directly.

☝️Some notes:

When deploying in Lambda, I ruled-out the use of FURL or HTTP API Gateway:

With FURL, while you can setup Origin Access Control (OAC) with CloudFront, at the time of writing, PUT and POST operations require the client to sign the request payload.
HTTP API Gateway does not support WAF. An alternative solution would involve creating a Lambda@Edge to verify the presence of the custom header in the request.

To improve security, the custom verification header can be stored in secrets manager with rotation enabled so that the header can be updated as well as the origin WAF and the CloudFront distribution configurations.

Let’s see the code

Here are the relevant parts of the solution:

1- Deploying fastify Web app on Lambda using LWA

Creating a new fastify project is a breeze, I generally go with typescript; for the purpose of this article, I will create one super basic api:

    import fastify from "fastify";

    const server = fastify();

    server.get("/health", async () => {
      return "yup ! I am healthy";
    });

    server.get("/where-are-you-deployed", async () => {
      return {
        "i-am-deployed-on": process.env.DEPLOYED_ON,
      };
    });

    server.listen({ host: "0.0.0.0", port: 8080 }, (err, address) => {
      if (err) {
        console.error(err);
        process.exit(1);
      }
      console.log(`Server listening at ${address}`);
    });

I will deploy the Lambda function using zip archive and in order to use LWA as a Lambda extension, we’ll need to:

Attach the LWA layer to the function
Set the handler to the startup command run.sh script. This script starts the fastify Web app. It will be added to the zip package after the code bundling.
And lastly, define the AWS_LAMBDA_EXEC_WRAPPER environment variable to /opt/bootstrap

The RestApi CDK construct simplifies exposition of the Lambda function:

After deployment, you will be able to view in the console the layer associated with the function:

2- Defining the alternative deployment on ECS Fargate

CDK offers an L3 construct to deploy a load balanced ECS service. What I find interesting about this construct is that it hides all the complexity and verbosity of defining such a deployment, while allowing a level of flexibility. Another neat feature is that it can build and push our container image.

We’ll make sure to enable HTTPS, for that we’ll create a certificate and associated to the load balancer:

Here, I am using the default configuration, but you will want to adapt it to your own requirements.

Once deployed, the ECS service looks like this on the AWS console

You can find the full definition of the construct here.

3- Handling the two different deployment strategies

The important bit, the CDK construct that enables flexible deployments:

This construct handles two deployment strategies Lambda or ECSFargate.
For each strategy, we’ll need to provide a factory function that creates the required resources. These two functions need to be injected from a parent construct and they are lazily evaluated given the selected strategy.

We’ll also make sure that the distribution cache policy is disabled for both of these two origins.

As an example, the origin of the distribution, should end up looking like this when you select Lambda deployment strategy

And finally, let’s see how to define the WAF WebACL with a rule that checks the x-origin-header verification header:

You can follow this link for the complete WebAppDeployment construct definition

Flexible deployment in action

Before wapping up, let’s call where-are-you-deployed endpoint defined in the sample web app for each strategy.

Wrapping up

Lambda web adapter is certainly not the only tool that helps running full-fledged web apps on Lambda functions, but it simplifies their deployment while supporting architectural evolution.

In this article we have seen how to build a CDK construct that offers a way to deploy the same web app on two distinct platforms, we can choose Lambda function or ECS Fargate by specifying a configuration during design time. We can extend this further by enabling the system to automatically redeploy itself, during runtime, on a specific target based on some events or CloudWatch alarms !

As always, you can find the full code source, ready to be adapted and deployed here:
GitHub - ziedbentahar/flexible-deployments-with-lambda-web-adapter

Thanks for reading and hope you enjoyed it !

Resources

Restricting access to Application Load Balancers

GitHub - awslabs/aws-lambda-web-adapter: Run web applications on AWS Lambda

GitHub - aws-samples/amazon-cloudfront-waf-secretsmanager: Enhance Amazon CloudFront Origin…

AI powered video summarizer with Amazon Bedrock and Anthropic’s Claude

Zied Ben Tahar — Wed, 03 Jan 2024 10:55:25 +0000

At times, I find myself wanting to quickly get a summary of a video or capture the key points of a tech talk. Thanks to the capabilities of generative AI, achieving this is entirely possible with minimal effort.

In this article, I’ll walk you through the process of creating a service that summarizes YouTube videos based their transcripts and generates audio from these summaries.

We’ll leverage Anthropic’s Claude 2.1 foundation model through Amazon Bedrock for summary generation, and Amazon Polly to synthesize speech from these summaries.

Solution overview

I will use a step functions to orchestrate the different steps involved in the summary and audio generation :

🔍 Let’s break this down:

The Get Video Transcript function retrieves the transcript from a specified YouTube video URL. Upon successful retrieval, the transcript is stored in an S3 bucket, ready for processing in the next step.
Generate Model Parameters function retrieves the transcript from the bucket and generates the prompt and inference parameters specific to Anthropic’s Claude v2 model. These parameters are then stored in the bucket for use by the Bedrock API in the subsequent step.
Invoking the Bedrock API is achieved through the step functions’ AWS SDK integration, enabling the execution of the model inferences with inputs stored in the bucket. This step generates a structured JSON containing the summary.
Generate audio form summary relies on Amazon Polly to perform speech synthesis from the summary produced in the previous step. This step returns the final output containing the video summary in text format, as well as a presigned URL for the generated audio file.
The bucket serves as a state storage used across all the steps of the state machine. In fact, we don’t know the size of generated video transcript upfront; it might reach the Step Functions’ payload size limit of 256 KB in some lengthy videos.

On using Anthoropic’s Claude 2.1

At the time of writing, Claude 2.1 model supports 200K tokens, an estimated word count of 150K. It provides also a good accuracy over long documents, making it well-suited for summarizing lengthy video transcripts.

TL;DR

You will find the complete source code here 👇
GitHub - ziedbentahar/yt-video-summarizer-with-bedrock

I will use NodeJs, typescript and CDK for IaC.

Solution details

1- Enabling Anthropic’s Claude v2 in your account

Amazon Bedrock offers a range of foundational models, including Amazon Titan, Anthropic’s Claude, Meta Llama2, etc., which are accessible through Bedrock APIs. By default, these foundational models are not enabled; they must be enabled through the console before use.

We’ll request access to Anthropic’s Claude models. But first we’ll need to submit a use case details:

2- Getting transcripts from Youtube Videos

I will rely on this lib for the video transcript extraction (It feels like a cheat code 😉) ; in fact, this library makes use of an unofficial YouTube API without relying on a headless Chrome solution. For now, it yields good results on several YouTube videos, but I might explore a more robust solutions in the future :

The extracted transcript is then stored on the s3 bucket using ${requestId}/transcript as a key.

You can find the code for this lambda function here

3- Finding the adequate prompt and generating model inference parameters

At the time of writing, Bedrock currently only supports Claude’s Text Completions API. Prompts must be wrapped in \n\nHuman: and \n\nAssistant: markers to let Claude understand the conversation context.

Here is the prompt; I find that it produces good results for our use case:

    You are a video transcript summarizer.
    Summarize this transcript in a third person point of view in 10 sentences.
    Identify the speakers and the main topics of the transcript and add them in the output as well.
    Do not add or invent speaker names if you not able to identify them.
    Please output the summary JSON format conforming to this JSON schema:
    {
      "type": "object",
      "properties": {
        "speakers": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "topics": {
          "type": "string"
        },
        "summary": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      }
    }

    <transcript>{{transcript}}</transcript>

🤖 Helping Claude producing good results:

To clearly mark to the transcript to summarize, we use XML tags. Claude will specifically focus on the structure encapsulated by these XML tags. I will be substituting {{transcript}} string with the actual video transcript.
To assist Claude in generating a reliable JSON output format, I include in the prompt the JSON schema that needs to be adhered to.
Finally, I also need to inform Claude that I want to generate only a concise JSON response without unnecessary chattiness, meaning without including a preamble and postscript while returning the JSON payload:

\n\nHuman:{{prompt}}\n\nAssistant:{

Note that the full prompt ends with a trailing {

As mentioned on the section above, we will store this generated prompt as well as the model parameters in the bucket so that It can be used as an input of Bedrock API:

      const modelParameters = {
        prompt,
        max_tokens_to_sample: MAX_TOKENS_TO_SAMPLE,
        top_k: 250,
        top_p: 1,
        temperature: 0.2,
        stop_sequences: ["Human:"],
        anthropic_version: "bedrock-2023-05-31",
      };

You can follow this link for the full code of the generate-model-parameters lambda function.

4- Invoking Claude Model

In this step, we’ll avoid writing custom lambda function to invoke Bedrock API. Instead, we’ll use Step functions direct SDK integration. This state loads from the bucket the model inference parameters that were generated in the previous step:

☝️ Note: As we instructed Claude to generate the response in JSON format, the completion API response misses a leading { as Claude outputs the rest of the requested JSON schema.

We use intrinsic functions on the state’s ResultSelector to add the missing opening curly brace and to format the state output in a well formed JSON payload :

    ResultSelector: {
      "id.$": "$$.Execution.Name",
      "summaryTaskResult.$":
        "States.StringToJson(States.Format('\\{{}', $.Body.completion))",
    }

I have to admit, it is not ideal but this helps get by without writing a custom Lambda function.

5- Generating audio from video summary

This step is heavily inspired by this previous blog post. Amazon Polly generates the audio from the video summary:

Here are the details of synthesize function:

Once the audio generated, we store it on the S3 bucket and we generate a presigned Url so it can be downloaded afterwards.

☝️ On language detection : In this example, I am not performing language detection; by default, I am assuming that the video is in English. You can find in my previous article how to perform such a process in speech synthesis. Alternatively, We can also leverage Claude model capabilities to detect the language of the transcript.

6- Defining the state machine

Alright, let’s put it all together and let’s take a look at the CDK definition of the state machine:

In order to be able to invoke Bedrock API, we’ll need to add this policy to the workflow’s role (And it’s important to remember granting the S3 bucket read & write permissions to the state machine):

Wrapping up

I find creating generative AI based applications to be a fun exercise, I am always impressed by how quickly we can develop such applications by combining Serverless and Gen AI.

Certainly, there is room for improvement to make this solution production-grade. This workflow can be integrated into a larger process, allowing the video summary to be sent asynchronously to a client, and let’s not forget robust error handling.

Follow this link to get the source code for this article.

Thanks for reading and hope you enjoyed it !

Step Functions distributed map and cross account S3 access

Zied Ben Tahar — Mon, 23 Oct 2023 16:26:06 +0000

Update 2025–02–13 : Cross account access is now natively possible when using distributed map. I will update this blog post soon 👌

Step Functions distributed map is a powerful feature that helps building highly parallel serverless data processing workflows. It has a good integration with S3 where it enables the processing of millions of objects in an efficient way.

This feature relies on the “Distributed” mode of the Map State in order to process, in parallel, a list of S3 Objects in the bucket:

However, at the time of writing, the ItemReader step of the Map state does not support S3 buckets that are on other or accounts or regions:

In this article, we will see how to work around this limitation. In fact, many solutions are possible:

Using S3 bucket replication: We can replicate the source S3 bucket and sync it with a bucket in the target account where we want to run the distributed map job.
Another solution is to initiate the workflow with an initial step. This step synchronously lists the keys of objects in the source bucket and subsequently writes this list to an intermediate bucket in the target account. This file is then configured as the data source for the distributed map.
Alternatively, a third solution similar to the second one, involves configuring an S3 inventory on the source bucket and using it to get the list of the keys.

In this article we will focus on the second solution.

Solution overview

🔍 Here are the relevant parts:

Both of the Lambda functions “List objects in source bucket” and “Process objects” require cross-account access to the S3 bucket on the source account.
“List objects in source bucket” Lambda function uses S3 ListObjectsV2 to get the list of the keys in the source bucket. It writes that list in JSON format in the “Object keys inventory bucket” in the target account.
The map state is configured in «Distributed» mode and uses the JSON file containing the list as the source.
The distributed map state’s iterations run in parallel. Each iteration creates a child execution workflow that invokes the «Process objects» Lambda function with a batch of keys.

TL;DR

You will find the complete source code here 👇
GitHub - ziedbentahar/stepfunctions-distributed-map-cross-account-s3-access

In this example I will use NodeJs, typescript and CDK for IaC.

Let’s see the code

1- “List objects in source bucket” and “Process objects“ Lambda functions

The “List objects in source bucket” Lambda function requires two parameters: A prefix, used to list only the keys starting with it, and an output file that will contain the list of keys. These parameters are supplied by the state machine.

The function getKeysFromBucketByPrefix calls ListObjectsV2. It iterates through all objects in the bucket that start with the given prefix. The loop continues until there are no more continuation tokens, indicating that all keys have been retrieved. The function then returns the list of keys in an array, which can be written to the "Object keys inventory bucket" by the writeKeysAsJSONIntoBucket function.

The “Process Objects” Lambda function will be invoked by the workflow’s map execution with a batch of item keys as its input. The size of this batch is configurable on the distributed map state. In fact, by batching items we can improve performance and reduce cost for large datasets.

We will need to update the source account’s bucket policy to allows the two Lambda function roles in the target account to perform ListObjectsV2 and GetObject operations, respectively.

  {
  "Version": "2012-10-17",
  "Statement": [
      {
          "Effect": "Allow",
          "Principal": {
              "AWS": "arn:aws:iam::<your-account-id>:role/<list-bucket-lambda-role-name>"
          },
          "Action": "s3:ListBucket",
          "Resource": "arn:aws:s3:::<source-bucket-name>"
      },
      {
          "Effect": "Allow",
          "Principal": {
              "AWS": "arn:aws:iam::<your-account-id>:role/<process-object-lambda-role-name>"
          },
          "Action": "s3:GetObject",
          "Resource": "arn:aws:s3:::<source-bucket-name>/*"
      }
  ]
  }

⚠️ ️Important: Using a Lambda function to list objects from an S3 bucket might not be most cost effective solution when dealing with tens of millions of items. It’s also important to keep in mind the Lambda function’s 15 minutes execution time limit. It’s worth exploring alternative solutions such as running the list objects operation as an ECS Task or, as I mentioned on the previous section, configuring and relying on the S3 source bucket inventory.

You can find the complete CDK definition of these two lambda functions following this link.

2- State machine definition

Alright, let’s have a look into the workflow definition:

Here, we use the state machine’s execution name property, provided by the context object, $$.Execution.Name, as the filename to store the list of keys from the source bucket. We also pass the state machine’s input property, $.prefix, to the “List objects in source bucket” Lambda function.

At the time of writing, CDK does not provide a native Distributed Map state implementation. We will use CustomState where we pass the ASL JSON definition:

We configure the ItemProcessor in Distributed Mode.
We set the ItemReader as a JSON file in the list S3 bucket and we use the $$.Execution.Name as the Key of the JSON file to read from the bucket.

☝️ Depending on your use case, you may want to configure the maximum number of concurrent executions as well as the maximum number of items per batch. This will have an impact on the overall execution time of the process.

You can find here the full state machine definition.

Once you execute the state machine, you can monitor the items processing status on the Map Run page:

Wrapping up

Step Functions distributed map a valuable service to include in your toolkit. In this article, we’ve seen how to use the Step Functions distributed map with S3 buckets that are not in the same account as the state machine.Hopefully, AWS will address this limitation!

You can find a complete sample application repository here:
GitHub - ziedbentahar/stepfunctions-distributed-map-cross-account-s3-access

Database schema migrations on AWS with Custom Resources and CDK

Zied Ben Tahar — Wed, 14 Jun 2023 18:36:19 +0000

With AWS Custom Resources you can manage resources that are not natively supported by CloudFormation. You can execute application-specific provisioning logic as well as custom code during the deployment, the update or the deletion of a Stack.

In this article, we will focus on using Custom Resources to handle schema migrations on an Aurora Postgres database. We will create a custom database migration resource that executes schema changes during the Stack deployment. To accomplish this, we will associate a Lambda function with our Custom Resource, this function ensures that any new changes to the database are automatically applied when necessary.

Solution overview

We’ll use Aurora Serverless V2 with Postgres engine, NodeJs and typescript for the Lambda function code and CDK for the IaC:

Here are the details of this solution:

We’ll create the database Cluster on an isolated subnet as well as a secret that stores the credentials of the database. This secret is accessed by the DB schema migration Lambda function .
The DB schema migration Lambda function is responsible for executing the necessary database schema changes. It uses the “node-pg-migrate” tool, which allows us to run migration scripts programmatically. We’ll need to configure this lambda function to access the Aurora resource in our VPC.
Migration scripts files are included in the zip package of the lambda function : each file contains a set of changes to apply to the database, these files are stored in the same repository as the application. Each new set of changes need to be written into a new distinct file.
DB migration custom resource invokes the lambda function during the stack deployment when it detects changes on the migration scripts.

TL;DR

You can find the complete repo of this solution here:
GitHub - ziedbentahar/db-schema-migration-with-custom-resources

Let’s see the code

1 —DB schema migration lambda

As mentioned above, we will use “node-pg-migrate” to run schema changes.

One interesting aspect of this library is its flexibility in defining migration scripts: We have the option to write our migration scripts in either ES or TypeScript, allowing us to define database schemas with code. Alternatively, we can also define migration scripts as plain SQL files, providing a more traditional approach to managing database schema changes:

This code below shows how to use node-pg-migrate to run the migration scripts from the custom resource lambda function:👇

☝️ Note: This lib can handle forward and backward migrations but with our solution we will only be supporting forward migrations.

Here is CDK definition of this lambda function:

This Lambda Function requires VPC access as it needs to access the Aurora Database. We will also make sure to place the network interface associated with the Lambda function in a PRIVATE_WITH_EGRESS subnet as we need to access the secrets manager service.

Additionally, we’ll associate the security group of the Lambda function to the security group of the Aurora Cluster:

📦 On embedding migration scripts: In our example, Migration scripts need to be included in the Lambda function package. We use the afterBunding hook to copy the content of the migration dir to the bundle output dir:

afterBundling: (_, outputDir: string) => {
  return [
    `mkdir -p ${outputDir}/migrations && cp ${migrationDirectoryPath}/* ${outputDir}/migrations`,
  ];
},

You can find here the complete definition of this Lambda Function.

2 —Defining the custom resource:

Straightforward to define with CDK:

☝️ One important note: the computeHash function computes a hash of the migration script directory content. This hash is passed as a property of the custom resource. During the stack deployment, whenever this hash changes, the lambda function gets invoked and the new migrations scripts are taken into account.

You will find here the definition of the Custom::DbSchemaMigration custom resource.

3-Putting all together

And here is how we use this Database construct that supports schema migration:

Et voilà! Once you make a deployment with new migration script files, you can see the DB schema migration in action via the Lambda CloudWatch logs:

Wrapping up

In this article, we have seen how to use custom resources to run database schema migrations on AWS. CDK makes it a breeze ! You can find a complete sample application repository with the complete github action workflow here:
GitHub - ziedbentahar/db-schema-migration-with-custom-resources

Hope you find it useful, and thanks for reading !

Serverless Asynchronous REST APIs on AWS

Zied Ben Tahar — Thu, 20 Apr 2023 10:32:59 +0000

In a previous article, we explored how to use the API Gateway WebSocket API to provide real-time, asynchronous feedback and responses to its clients. While WebSocket APIs can be an excellent option to build real-time asynchronous APIs between a browser and a server, there are situations where asynchronous REST APIs are a better fit. For example, if you need to ensure compatibility with existing architectures or integrating with backend systems where setting up a messaging system like a queue or an event bus is not an option.

This article provides a guide on how to build a Serverless Async Request/Response REST API on AWS.

First, Let’s design the API

Let’s consider an application that lets clients perform long running tasks through HTTP requests, these tasks might require a long amount of time to complete (e.g. text-to-speech or audio-to-text tasks, report generation, etc.). In order to keep clients informed about the status of their requests, we’ll need to create a separate status endpoint that allows clients to poll at regular intervals the current processing status until the task is finished.

As an example, to initiate the task creation, clients POST this request:

POST /task
{
    "someProperty": "someValue",
    "anotherPropety": "anotherValue",
    ...
}

The API should then return a HTTP 202 Accepted response with a body containing a URI that references the task :

HTTP 202 (Accepted)

{
 "task": {
  "href": "/task/<taskId>",
  "id": "<taskId>"
 }
}

Clients can then poll that endpoint to retrieve the status of the task. The API should respond with a status corresponding to the current state of the task, such as inprogress, complete , error. Additionally, the response payload should reflect to the specific status of the task.

Here is an example of a request with a response containing the payload of a task that is in progress:

GET /task/<taskId>

HTTP 200 (OK)

{
    "status": "inProgress",
    "taskCreationDate": "2023-04-04T19:23:42",
    "requestData": {...},
    ...
}

For server-to-server scenarios, we’ll add a callback/webhook mechanism: When the task completes, the server updates the task state and sends POST request to a callback URL that the client provides on the initial task creation request.

One solution is to include the callback URL in the header of the request so that the API use it to send the response payload:

POST /task
Callback: https://foo.example.com/some-callback-resource

{
    "someProperty": "someValue",
    "anotherPropety": "anotherValue",
    ...
}

⚠️ Important: Security and reliability are important aspects to be taken into account when using callback URLs:

Security: It is important to verify that the callback URLs belong to the client who initiated the requests. We also need to ensure that the callback destination server receives legitimate data from the source server, rather than forged information from malicious actors attempting to spoof the webhook. This can be done by requiring clients to use API Keys when making requests and on the server side associating these keys with well-known callback URLs that the clients must provide. In addition, HMAC signature verification can be used to authenticate and validate the payloads of API callbacks.
Failure handling: It’s important to account for scenarios where the client may be unavailable or experiencing intermittent faults resulting in errors. To address this, we’ll need to implement a retry mechanism that includes a dead-letter queue. This mechanism allows failed deliveries to be stored in the queue and resent at a later time.

Alright, let’s see how to implement this on AWS.

Architecture Overview

In this architecture, my goal was to use AWS integrations wherever possible to minimize the amount of glue code that needs to be written and maintained. This approach not only saves time and effort, but also helps ensure the scalability of the API:

🔍 Breaking this down:

We create direct AWS integration between the API Gateway RestAPI and the DynamoDb table. In this task table we store the context of the request: the client id, the request payload, the task status and the task result once available. This table gets updated by the task processing workflow.
Since we’re using API Gateway as a proxy for the DynamoDB table, we rely on two Lambda authorizers to authenticate API calls. The first authorizer verifies client request headers: The client’s API key and the callback URL, to authorize POST /task
requests. The second authorizer is dedicated to GET /task/<taskId> route.
To trigger the task workflow, we rely on the table’s DynamoDb Streams. We connect the stream to the task’s state machine by using EventBridge Pipes. This Pipe selects only the newly created tasks.
The task is then run and coordinated by a Step function workflow. The status of the task is also updated in this workflow by direct AWS SDK integration.
Once the task is complete, its state gets updated and the result payload gets written into the task table. We use another EventBridge Pipe to trigger the sending of the callback to the client from the stream of the completed tasks.

☝️ Note : In this architecture, Instead of using the Event Bridge API destinations, we’ll write a custom lambda to send the callbacks. I did this to achieve better flexibility and control over how the task result payload is sent to the clients. For example, a client may require multiple registered callback URLs. To accomplish this, we use an EventBus as the target destination for the output Pipe.

TL;DR

You will find the complete source code with the complete deployment pipeline here 👇
GitHub - ziedbentahar/async-rest-api-on-aws

In this example I will use nodejs, typescript and CDK for IaC.

Let’s see the code

1- API Gateway, creating the long running task

Let’s zoom into the POST /task integration with the DynamoDb task table. We associate this route with an authorizer that requires two headers to be present in the request: Authorization and Callback.

Once the request is authorized, this integration maps the request payload to a task entry. We use the API gateway $context.requestId as the identifier of the task. Additionally, we map $context.authorizer.principalId (in our case the client Id) in order to be used down the line. And we extract the request callback URL from the header using $util.escapeJavascript($input.params(‘callback’) :

When the task is successfully added to the table, we return HTTP 202 Accepted response with a body containing the task id as well as a reference pointing to the task: by using $context.path/$context.requestId which translates to this URI /<stage-name>/task/<task-id>

You can find the complete CDK definition of the API Gateway here.

2- ‘Create Task’ authorizer Lambda

As mentioned above, this authorizer validates the API key and checks whether the callback URL is associated with the API Key that identifies the client:

Here, I do not dive into the implementation of validateTokenWithCallbackUrl, but this is something that can be delegated to another service that owns the responsibility to do the proper checks.

3- Defining the ‘create task’ EventBridge Pipe

This EventBridge Pipe selects the INSERT events from the DynamoDb stream: The newly created tasks will trigger new state machine executions.

We’ll to set the invocation type to FIRE_AND_FORGET as the execution of this state machine is asynchronous.

We’ll also need to set these target and destination policies:

When deployed, this Pipe will look like this on the AWS console:

4- Defining the ‘handle completed task’ EventBridge Pipe

We apply the same principle as in the previous section. However, this Pipe selects the completed tasks from the DynamoDb stream, transforms them, and subsequently forwards them to the EventBus:

We’ll need to attach this role to this Pipe:

After deployment, the Pipe resources looks like this in the console:

5- Sending the Callbacks to the clients

To send callbacks, we associate an EventBridge rule with the EventBus. This rule matches all the events and defines the ‘send callback lambda’ as the target:

In some situations, callbacks may fail to be delivered due to temporary unavailability of the client or a network error. We configure EventBridge to retry the operation up to 10 times within a 24-hour time-frame, after which, the message is directed to a dead letter queue to prevent message loss:

Callback lambda

The callback lambda is a simple function that computes the HMAC of the message to be sent and then sends a POST request to the target callback URL.

In the computeHMAC function, the clientId is used to retrieve the key necessary for generating the HMAC signature from the response payload.

Wrapping up

In this article, we've seen how to create Serverless, asynchronous REST API on AWS. By leveraging API Gateway AWS integrations and EventBridge Pipes, it is possible to build such an API that requires minimal Lambda glue code. This approach not only simplifies the development process, but it also allows greater flexibility and scalability.

📝 You can find the complete source code and the deployment pipeline here:
GitHub - ziedbentahar/async-rest-api-on-aws

Thanks for reading !

Building an AI powered and Serverless meal planner with OpenAI, AWS Step functions, AWS Lambda and CDK

Zied Ben Tahar — Mon, 13 Mar 2023 08:16:41 +0000

OpenAI’s generative capabilities offer new possibilities when building applications. Combined with Serverless technologies, we can create applications faster while still maintaining the flexibility to iterate and to improve them over time.

In this article, I will show you how to build an application that sends emails containing generated weekly meal plans from a set of ingredients a user provides. We will use OpenAI’s APIs along with AWS Serverless services: Step Functions, AWS Lambda, and Amazon SES.

We will use NodeJs runtime and typescript for the Lambda code as well as CDK for IaC.

What are we going to build ?

We will create an application that allows users to submit via an API a request containing a set of food ingredients and an email address. It will then, asynchronously, send to the user an email containing a meal plan with detailed recipes for a whole week:

Here is the architecture diagram of the application we are going to build:

The relevant parts of this solution:

We use a step function to orchestrate the invocations of Lambda functions that send requests to OpenAI’s APIs to generate recipes from a prompt as well as an image for each recipe.
We use the S3 bucket to store the generated recipe images. These images are served via a CloudFront distribution.
The generated meal plan is then sent via email using SES. We use the SES templates capability to send personalized emails for each user.
The Rest API gateway has a POST route with an integration to the meal planner Step Function.

TL;DR

You can find the full repository with its deployment pipeline here 👇
GitHub - ziedbentahar/serverless-meal-planner-with-aws-and-openai

Let’s deep dive into the code

☝️ Before starting: In order to use OpenAI APIs, you will need to sign up and to create an API KEY. You can follow this link to get started. The use of this API is not free, however new accounts get free credits (tokens) to start experimenting.

Defining the state machine

The first step of this state machine is to generate a meal plan for a week. The second step involves generating a picture for each recipe and then saving it on a S3 bucket. The processing is done in parallel for each recipe using a Map state, this has the advantage to reduce the overall execution time of the state machine. And finally, the last important step is the sending of the email containing the meal plan:

Which translates to this fluent state machine definition in CDK:

You can find the complete CDK definition of the state machine following this link.

Defining the Lambda functions

1- Generate meal plan Lambda:

The challenging part about this step was to find the best prompt that yields good and consistent results from OpenAI completion API. I used the text-davinci-003 GPT model (also referred as GPT-3.5).

When I tried out different prompts, the suggestions were quite good for producing interesting meal plans given a list of coherent ingredients. I was even able to request a structured result in JSON format ready to be processed by the Lambda function. I also experimented with parameters such as temperature, TopP and max_tokens searching for the sweet spot that gets satisfying results.

This prompt produces the best results given our use case:

Generate a dinner meal plan for the whole week with these ingredients <a comma seperated list of ingredients> and with other random ingredients.
Result must be in json format
Each meal recipe contains a name, a five sentences for instructions and an array of ingredients

And here is the code of the Lambda function that handles the generation of the meal plan:

☝️ Some notes:

As depicted on the diagram above, The OpenAI API key is stored on a secret. In this example we use the AWS parameters and secrets Lambda extension to read the secret value from the Lambda. You can learn more about this Lambda extension here.
Even though the completion API was providing consistent response models in JSON, for some reason, the properties on the JSON object were not having a consistent casing as I was experimenting with the API. Hence the use of the getProperty helper function before returning the result; this function ensures getting a property value from an object regardless of its casing.

2- Generate recipe image Lambda:

This Lambda function is similar to the previous one. We use the recipe name that createCompletion API has generated in order to create an image from it by calling createImage (this API uses DALL-E models for image generation) :

createImage API returns an array of URLs, the size of this array depends on the number of variation of the images we want to generate. In our example we are interested in only one single image. The image URL expires after one hour, here is why we pass it to the upload-recipe-image-to-storage Lambda that has the responsibility to download the image and to store it on a S3 Bucket.

3- Send meal plan email Lambda:

The sending of the email uses SES. But first, the Lambda function prepares a template data containing the necessary elements to generate the email:

On the section below, we will see how to use CDK to create a new SES email identity as well as the email template that is used to send the mal plan.

Note: You can find the CDK definitions of these Lambda functions following this link.

Configuring SES

On this example, we use a domain that is already defined in the Route53 public hosted zone; The SES email identity DNS validation is then seamless. We also create the meal plan email template on this nested stack:

☝️ Note: By default, an SES account is in sandbox mode. You are allowed to send emails only to verified identities and you can only send a limited number of emails per 24-hour period. Follow this link to understand the sandbox mode quotas and how to move out of it.

Integrating the API Gateway with the Step function workflow

Creating the RestApi with the Step Function integration is quite easy with CDK, although a bit verbose:

We need to create a role that grants the Api Gateway to states:StartExecution the Step Function. Each request gets validated with the API gateway JSON schema model validation before the execution of the step function.

Wrapping up

In this post, we have seen how combining OpenAI APIs with serverless architecture can help building AI-powered applications with minimal setup and configuration. The capabilities of both of these two worlds are great enablers for building MVPs and iterating faster.

This application can be improved further by taking into account food restrictions or even by creating an AI powered weekly news letter.

You can find the full source code of this application here:
GitHub - ziedbentahar/serverless-meal-planner-with-aws-and-openai

DEV Community: Zied Ben Tahar

Serverless RAG Chat with AppSync Events and Bedrock Knowledge Bases

Solution overview

Setting up the knowledge base

Creating the vector store on Postgres

Creating The knowledge base

Setting up AppSync integration

Using Lambda Powertools to handle AppSync realtime events

Creating a Client

Wrapping up

Resources

Scheduled queries in Amazon Timestream for LiveAnalytics

What are we going to build ?

Let’s see the code

Simplifying time-series data ingestion with event bridge pipes

Automating the creation of the scheduled queries

Identifying trending products

Wrapping up

Further readings

Replicate data from DynamoDB to Apache Iceberg tables using Glue Zero-ETL integration

TL;DR

Solution overview

Integration configuration

1- Configuring the DynamoDb source table

2- Glue catalog database configuration

3- Creating the integration

4- Glue resource policy

Glue Zero-ETL in action

Querying with Athena

My Wishlist

IaC support

Handling DynamoDb List of Maps

Custom partitioning configuration

Support for AWS services other than DynamoDb

Wrapping up

Resources

Building smarter RSS feeds for my newsletter subscriptions with SES and Bedrock

Designing the smart RSS feed generator

Solution overview

Solution details

1 — Email handling — Configuring SES

2— Generating newsletter gist

3 — Serving the newsletters gists as an RSS feed

Wrapping up

Resources

An Alternative to Batch Jobs: Scheduling Events with EventBridge Scheduler

Possible implementations

❌ Using TTL as a scheduling mechanism

✅ A better solution

Scheduling new tasks

Zooming in on the ‘Schedule Task’ function code

Wrapping up

Resources

RAG on media content with Bedrock Knowledge Bases and Amazon Transcribe

RAG with Amazon Bedrock Knowledge Bases

How Bedrock can help ?

Solution overview

Solution Details

1- Creating the Pinecone vector database

2- Creating the knowledge base

3- Requesting Media upload link

4- Handling media upload and starting transcription job

5- Subscribing to transcription success events and syncing knowledge base

6- Chatting with the knowledge base in the console

7- Querying the knowledge base

Wrapping up

Resources

Adding flexibility to your deployments with Lambda Web Adapter

How Lambda Web adapter works ?

Solution overview

Let’s see the code

1- Deploying fastify Web app on Lambda using LWA

2- Defining the alternative deployment on ECS Fargate

3- Handling the two different deployment strategies

Flexible deployment in action

Wrapping up

Resources

AI powered video summarizer with Amazon Bedrock and Anthropic’s Claude

Solution overview

On using Anthoropic’s Claude 2.1