<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chirag (Srce Cde)</title>
    <description>The latest articles on DEV Community by Chirag (Srce Cde) (@srcecde).</description>
    <link>https://dev.to/srcecde</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F829247%2F6baa5a25-29da-4383-a55f-77474c74a634.png</url>
      <title>DEV Community: Chirag (Srce Cde)</title>
      <link>https://dev.to/srcecde</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/srcecde"/>
    <language>en</language>
    <item>
      <title>What is a Vector Space in Machine Learning? (With Math and Intuition)</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Thu, 08 Jan 2026 01:19:51 +0000</pubDate>
      <link>https://dev.to/aws-builders/what-is-a-vector-space-in-machine-learning-with-math-and-intuition-ajm</link>
      <guid>https://dev.to/aws-builders/what-is-a-vector-space-in-machine-learning-with-math-and-intuition-ajm</guid>
      <description>&lt;p&gt;In machine learning, data is often represented in vector spaces so that mathematical operations such as combination and scaling are possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Field: the numbers to compute with
&lt;/h2&gt;

&lt;p&gt;Let’s first understand what a field is because vector space is defined over a field.&lt;/p&gt;

&lt;p&gt;Field is defined as a set of elements where it guarantees 4 basic arithmetic operations: addition, subtraction, multiplication and division (by non-zero elements). The result of performing these operations over the elements of the set must remain within the same set — this property is called a closure.&lt;/p&gt;

&lt;p&gt;In addition, a field must satisfy certain rules (called axioms) such as commutativity, associativity, distributivity, and the existence of additive and multiplicative identities and inverses (for all non-zero elements). Intuitively, a field allows us to perform “normal arithmetic”.&lt;/p&gt;

&lt;p&gt;The sets of Real numbers (R), rational numbers (Q), complex numbers (C) are fields because they satisfy all the above mentioned properties. Integers (Z), however, are not a field because division does not remain within the set. For example 2÷3 is not an integer as it results in a fraction. As we know fractions are not part of integers set and for that reason integers are not closed under division. Integers form a ring which guarantees addition, subtraction, and multiplication, but not division.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You can imagine a field as a kitchen where you have all the tools needed to cook anything (except dividing by zero).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A field F is the set with two operations (+,⋅) such that (F,+) is an abelian group, (F∖{0},⋅) is an abelian group and ∀a,b,c∈F, a⋅(b+c)=a⋅b+a⋅c&lt;/p&gt;

&lt;h2&gt;
  
  
  Space: a set with a structure
&lt;/h2&gt;

&lt;p&gt;Next, let’s understand what space is.&lt;/p&gt;

&lt;p&gt;Space is defined as a set of elements together with specific structure. Structure is nothing but the set of rules which depicts what operations are allowed on the elements and what properties those operations must hold. Different structures give rise to different types of spaces.&lt;/p&gt;

&lt;p&gt;Intuitively, we think of a vector as coordinates, arrows or data representation. However, formally a vector is an element of the vector space. Therefore, without defining a vector space first, a vector cannot be precisely defined.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector space: environment for vector
&lt;/h2&gt;

&lt;p&gt;Vector space is a structure defined over the field (usually the real or complex numbers). It consists of a set of elements together with two operations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;vector addition&lt;/li&gt;
&lt;li&gt;scalar multiplication
These operations must be closed within the set and must satisfy specific addition and scalar axioms. A vector space must also contain a zero vector, which represents the absence of any contribution.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can imagine a vector space as a spice rack in your kitchen, and the structure as the rules that define which spices can be mixed, how they can be combined, and what counts as a valid mixture.&lt;/p&gt;

&lt;p&gt;A vector space V over a field F is the set equipped with operations + : V×V→V and ⋅ : F×V→V&lt;/p&gt;

&lt;p&gt;For all u,v∈V and α∈F , we have u+v∈V , αv∈V , u+0=u , and α(u+v)=αu+αv&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector: an element of the vector space
&lt;/h2&gt;

&lt;p&gt;So what is vector? Vector is an element of the vector space. In many common cases (like R^n), they can be represented as ordered lists of field elements. Vectors have many interpretations like&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In physics, they represent direction and magnitude&lt;/li&gt;
&lt;li&gt;In mathematics, they represent coordinates&lt;/li&gt;
&lt;li&gt;In machine learning, they represent data or information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;You can imagine a vector as a particular combination of ingredients chosen from the spice rack.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Let v∈R^n be a vector, where&lt;/em&gt; v=(v1,v2,v3,…,vn) - This is just a representation and not the definition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why vector space matters in ML?
&lt;/h2&gt;

&lt;p&gt;It matters because vector spaces allow data to be&lt;/p&gt;

&lt;p&gt;added (combine information)&lt;br&gt;
scaled by normalization or weighting, distances and angles can be defined (we will cover this in a future blog post)&lt;br&gt;
Many machine learning algorithms rely on these properties. If data does not form a vector space, some ML techniques fail or require special handling.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Vector space is a mathematical environment that makes ML possible.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What is NOT a vector space?
&lt;/h2&gt;

&lt;p&gt;Not everything that looks like a collection of elements forms a vector space. For example -&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Categorical values&lt;/strong&gt;&lt;br&gt;
Example: {black, white, blue}&lt;/p&gt;

&lt;p&gt;These do not form the vector space because adding and scaling the categorical values are not meaningful and does not make sense. These values require encoding before being used in ML.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Probability distribution&lt;/strong&gt;&lt;br&gt;
Example: &lt;code&gt;[0.2, 0.5, 0.3]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;List of probabilities that sums to 1.0. Adding two probability vectors does not preserve the sum to 1 and scaling by a negative scalar will result in negative probability which does not make sense.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sets&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Example: { black, white, blue}&lt;/p&gt;

&lt;p&gt;Sets do not support vector addition, scalar multiplication and operations. Operations like union, intersections are not equivalent to vector operations.&lt;/p&gt;

&lt;p&gt;So far the vector space that we have discussed only supports the addition and scalar multiplication operations. The length of a vector cannot be measured because there is no notion of norm.&lt;/p&gt;
&lt;h2&gt;
  
  
  Bringing it all together: the kitchen analogy
&lt;/h2&gt;

&lt;p&gt;Think of the entire setup like a kitchen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Field&lt;/strong&gt; is the kitchen with all the basic tools. It guarantees that you can perform the fundamental operations — add, subtract, multiply, and divide (except by zero) — and that everything behaves consistently. Without a field, you don’t even have reliable arithmetic to work with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Space&lt;/strong&gt; is the organization and rules of the kitchen. It defines what kind of kitchen this is and what operations are allowed. Different rules give you different kitchens — just as different structures give you different kinds of spaces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector space&lt;/strong&gt; is a specific kitchen setup where mixing and scaling ingredients is allowed and always makes sense. It guarantees that you can combine ingredients (vector addition), scale recipes up or down (scalar multiplication), and always stay within the same kitchen. There is also a “zero recipe” — doing nothing at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector&lt;/strong&gt; is a particular recipe or dish made using the ingredients and rules of that kitchen. It’s a concrete instance — a specific combination of ingredients that follows all the rules of the vector space.&lt;/p&gt;

&lt;p&gt;In machine learning, raw data is rarely a ready-made recipe. Instead, we transform data so it can live in a well-defined kitchen — a vector space — where combining, scaling, and learning all make sense.&lt;/p&gt;
&lt;h2&gt;
  
  
  A note on scope
&lt;/h2&gt;

&lt;p&gt;So far, the vector spaces discussed here only support addition and scalar multiplication. There is no notion of vector length, distance, or angle yet. Those concepts require additional structure (such as norms and inner products), which will be introduced in a future post.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The animation below builds intuition for vector spaces using a simple kitchen analogy, complementing the math discussed in this article.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/NdLFoEMbGuo"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Thanks for reading. Vector spaces are foundational in machine learning, and intuition goes a long way in understanding them.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>Amazon S3 Vector Buckets Hands on - Similar Product Search Using: Image and Text Retrieval with RRF Fusion</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Sun, 03 Aug 2025 14:24:08 +0000</pubDate>
      <link>https://dev.to/aws-builders/similar-product-search-using-amazon-s3-vector-buckets-image-and-text-retrieval-with-rrf-fusion-3pik</link>
      <guid>https://dev.to/aws-builders/similar-product-search-using-amazon-s3-vector-buckets-image-and-text-retrieval-with-rrf-fusion-3pik</guid>
      <description>&lt;p&gt;In today’s data-driven landscape, business increasingly rely on the power of similarity, contextual search functionalities to enable features like visual search, product recommendations, semantic text retrieval and more. However, setting up the infrastructure to support scalable, low-latency vector search often requires specialized tools, dedicated vector databases, complex indexing strategy and deployments. This might create barriers for teams that want to experiment or integrate vector search search into their existing pipelines quickly. Recently, AWS announced Amazon S3 Vector Buckets (in preview) which address this challenge by bringing vector storage and similarity search directly into Amazon S3 — turning a familiar, scalable object store into a lightweight vector database.&lt;/p&gt;

&lt;p&gt;Amazon S3 Vector Buckets enable scalable storage and efficient retrieval of vector data. Key features include no provision of additional infrastructure, scalable (grow indexes with data), cost effective, simplified API integration, and low latency queries.&lt;/p&gt;

&lt;p&gt;In this blog, we’ll walk through a lightweight implementation of a product search workflow using Amazon S3 Vector Buckets — all within a Jupyter notebook. We’ll explore how to enable both text-based and image-based search, and demonstrate a simple approach to multimodal search by combining results using Reciprocal Rank Fusion (RRF).&lt;/p&gt;

&lt;p&gt;The core of this workflow are vectors; numerical representations of data like text or images. We’ll use Amazon S3 Vector Buckets to create vector indexes, add vectors to those indexes, and perform similarity searches to find the most relevant matches. But before diving into implementation, let’s take a moment to understand what a vector is at a high level.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is a Vector?
&lt;/h2&gt;

&lt;p&gt;Vector is a numerical representation of the something — in our case, data like images, text, or audio. These numbers are mapped into a high-dimensional space so that similar data ends up close together. This numeric representation enables similarity measurements through mathematical metrics like cosine similarity or Euclidean distance, facilitating accurate and efficient data retrieval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simple Image Example: Pixels as Vectors&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consider below 3x3 grayscale image (left) and its pixel values (right). Each pixel value range from 0 to 255, representing brightness. These values can be flattened into a vector.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyzogaj80h3gq0p5btd3x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyzogaj80h3gq0p5btd3x.png" alt="Vector example"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikg4nubppw5ov30au49a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikg4nubppw5ov30au49a.png" alt="Vector"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just like this (1 x 9 dimensions), modern AI models take larger and more complex images and convert them into much higher-dimensional vectors (like 512 or 1024 dimensions). These vectors capture not just color or brightness — but complex patterns, shapes, and even semantic meaning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Personalization use case: Product Search Demonstrated via Jupyter Notebook
&lt;/h2&gt;

&lt;p&gt;The example is presented as a practical tutorial in a Jupyter notebook format, demonstrating individual text-based and image-based searches, and implements multimodal search by combining text and image results using Reciprocal Rank Fusion (RRF).&lt;/p&gt;

&lt;h3&gt;
  
  
  S3 Vector bucket creation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;via AWS CLI (Optionally you can also specify the encryption configuration). Ensure to update the AWS CLI to latest version. Currently, I have 2.27.60 installed.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws s3vectors create-vector-bucket --vector-bucket-name "media-vector-bucket"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;via SDK (Boto3)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
s3vectors = boto3.client("s3vectors")
s3vclient = boto3.client('s3vectors')
response = s3vclient.create_vector_bucket(
    vectorBucketName='media-vector-bucket'
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Vector index creation
&lt;/h3&gt;

&lt;p&gt;In this example, we’ll create two separate vector indexes — one for images and another for text. This allows us to store and query image and text embeddings independently, while still leveraging the power of Amazon S3 Vectors for high-speed similarity search within each modality.&lt;/p&gt;

&lt;p&gt;A vector index is the structure that organizes the added vector data which enables the fast similarity search.&lt;/p&gt;

&lt;h4&gt;
  
  
  Image index
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;via AWS CLI
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws s3vectors create-index \
  --vector-bucket-name "media-vector-bucket" \
  --index-name "img-index" \
  --data-type "float32" \
  --dimension 768 \
  --distance-metric "cosine"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;via SDK (Boto3)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = client.create_index(
    vectorBucketName='media-vector-bucket',
    indexName='img-index',
    dataType='float32',
    dimension=768,
    distanceMetric='cosine',
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command above creates an image index named img-index within the vector bucket media-vector-bucket. We set the dimension to 768, the data type to float32 (currently the only supported type), and the distance metric to cosine for similarity search (currently it supports Cosine &amp;amp; Euclidean).&lt;/p&gt;

&lt;p&gt;The reason we specify 768 as the dimension is because the model we’ll use to generate image embeddings — google/vit-base-patch16–224-in21k — outputs vectors of 768 dimensions. This ensures that the vectors we generate will be compatible with the index we’ve defined.&lt;/p&gt;

&lt;h4&gt;
  
  
  Text index
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;via AWS CLI
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws s3vectors create-index \
  --vector-bucket-name "media-vector-bucket" \
  --index-name "txt-index" \
  --data-type "float32" \
  --dimension 384 \
  --distance-metric "cosine"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;via SDK (Boto3)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = client.create_index(
    vectorBucketName='media-vector-bucket',
    indexName='txt-index',
    dataType='float32',
    dimension=384,
    distanceMetric='cosine',
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reason we specify 384 as the dimension is because the model we’ll use to generate image embeddings — sentence-transformers/all-MiniLM-L6-v2 — outputs vectors of 384 dimensions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Important Note: &lt;/p&gt;

&lt;p&gt;Amazon S3 Vector Buckets currently support dimension values between 1 and 4096. Higher-dimensional vectors require more storage space, which may impact both performance and cost.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Data preparation
&lt;/h2&gt;

&lt;p&gt;For this demo, we’ll use the Fashion Product Images Dataset available on Kaggle. To keep things lightweight and manageable within a notebook setup, we’ll randomly select 10,000 product images from the dataset to build our image search experience.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;images.csv&lt;/code&gt; contains the filename and the image link. &lt;code&gt;styles.csv&lt;/code&gt; contains the metadata of the image. Hence, merging both the dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
from pathlib import Path
import pandas as pd

base_data_path = f"{Path().resolve().parent}/data/"
storage_bucket_name = os.environ.get("STORAGE_BUCKET_NAME")

images_df = pd.read_csv(f"{base_data_path}/fashion-dataset/images.csv")
style_df = pd.read_csv(f"{base_data_path}/fashion-dataset/styles.csv", on_bad_lines='skip')

style_df["id"] = style_df["id"].astype(str)
images_df["id"] = images_df.filename.str.replace(".jpg", "")

merged_df = images_df.merge(style_df, left_on=["id"], right_on=["id"], how="left")
merged_df = merged_df[merged_df.link != "undefined"]
sampled_df = merged_df[~merged_df["productDisplayName"].isna()].sample(n=10000)

sampled_df.reset_index(drop=True, inplace=True)
sampled_df.fillna("", inplace=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Optionally, if the data is on s3 then add the s3_uri to sampled_df.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sampled_df["s3_uri"] = sampled_df["filename"].apply(lambda x: f"s3://{storage_bucket_name}/fashion-dataset/images/{x}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2rsavcgcbdtqt52gi4n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2rsavcgcbdtqt52gi4n.png" alt="Sample data"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From Embeddings to Index: Creating and Uploading Vectors
&lt;/h2&gt;

&lt;p&gt;The next step is to define a helper function that takes in raw input (either an image or a product name) and returns its corresponding vector embedding. You’re free to use any model that suits your use case (as far as it satisfy the dimension &amp;amp; data type limitations) for generating these vectors — whether it’s a proprietary model or an open-source one.&lt;/p&gt;

&lt;p&gt;For this example, we’ll use open-source models to generate embeddings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For images, we’ll use a vision model like google/vit-base-patch16–224-in21k&lt;/li&gt;
&lt;li&gt;For text, we’ll use a lightweight model like sentence-transformers/all-MiniLM-L6-v2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These models will help us convert product data into vector representations that we can store and search in Amazon S3 Vector Buckets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from transformers import CLIPProcessor, CLIPModel, AutoImageProcessor, AutoModel
from sentence_transformers.SentenceTransformer import SentenceTransformer

iprocessor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
imodel = AutoModel.from_pretrained("google/vit-base-patch16-224-in21k")
tmodel = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

def get_embeddings(inputs, mode):
    if mode == "image":
        img_inputs = iprocessor(images=inputs, return_tensors="pt")
        with torch.no_grad():
            outputs = imodel(**img_inputs)
            last_hidden_state = outputs.last_hidden_state[:, 0, :]  # shape: [1, 2048, 1, 1]
            features = last_hidden_state / last_hidden_state.norm(dim=1, keepdim=True)  # L2 normalize each row
            features = features.cpu().numpy().astype(np.float32)
            return [f.flatten().tolist() for f in features]
    if mode == "text":
        embeddings = tmodel.encode(inputs)
        return [f.tolist() for f in embeddings]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The get_embeddings is the helper method will accept the batch inputs along with the mode as image or text.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Important Note: &lt;/p&gt;

&lt;p&gt;Ensure that the returned vector embedding is in the form of list and each element in the list is of type float32 or lower precision. The length of the vector embedding list should match the defined dimension during the vector creation. In case, if you pass the data with higher precision, S3 Vectors will converts the values to 32-bit floating point before storing them. And each vector should match the defined dimension.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Creating and Uploading Vectors
&lt;/h3&gt;

&lt;p&gt;To efficiently process and upload large volumes of data, we define a function called create_and_upload_vectors. This function processes the input data in batches of 500 records.&lt;/p&gt;

&lt;p&gt;For each batch, it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Invokes the get_embeddings function to generate vector embeddings for both image and text data.&lt;/li&gt;
&lt;li&gt;Attaches metadata such as product IDs or categories to each vector for future filtering.&lt;/li&gt;
&lt;li&gt;Uploads the vectors to their respective indexes:

&lt;ul&gt;
&lt;li&gt;Image embeddings → img-index&lt;/li&gt;
&lt;li&gt;Text embeddings → text-index&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Batching ensures the upload process remains scalable and performant, especially when dealing with thousands of vectors.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def create_and_upload_vectors(df, vector_bucket_name, image_index_name, text_index_name, batch_size=500):
    for i in range(0, len(df), batch_size):
        image_vectors, text_vectors = [], []

        batch = df.iloc[i:i+batch_size]

        images_batch = [Image.open(f"{base_path}{f}").convert("RGB") for f in batch["filename"]]
        text_batch = batch["productDisplayName"].tolist()

        image_embeddings = get_embeddings(images_batch, mode="image")
        text_embeddings = get_embeddings(text_batch, mode="text")

        for ind, v in enumerate(batch.itertuples()):
            metadata = {"gender": v.gender, "category": v.masterCategory, "type": v.usage, "season": v.season, "productName": v.productDisplayName, "s3_uri": v.s3_uri}
            text_vectors.extend([{
                "key": f"{v.id}",
                "data": {"float32" : text_embeddings[ind]},
                "metadata": metadata}])
            image_vectors.extend([{
                "key": f"{v.id}",
                "data": {"float32" : image_embeddings[ind]},
                "metadata": metadata}])

        s3vectors.put_vectors(
            vectorBucketName=vector_bucket_name,   
            indexName=image_index_name,
            vectors=image_vectors
        )
        s3vectors.put_vectors(
            vectorBucketName=vector_bucket_name,   
            indexName=text_index_name,
            vectors=text_vectors
        )

# invocation
create_and_upload_vectors(sampled_df, vector_bucket_name, image_index_name, text_index_name)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Important Note: &lt;/p&gt;

&lt;p&gt;Amazon S3 Vector Buckets currently allow a maximum of 500 vectors per PutVector invocation. This is why we process and upload the data in batches of 500 to stay within the API limits.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Querying &amp;amp; retrieving similar matches
&lt;/h2&gt;

&lt;p&gt;To query the index, we define a helper function query_s3_vectors to query the index and retrieve top k results.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Important Note: &lt;/p&gt;

&lt;p&gt;query_vectors can only retrieve and return up to a maximum of 30 matching results.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def query_s3_vectors(bucket_name, index_name, vector, k=5):
    response = s3vectors.query_vectors(
            vectorBucketName=bucket_name,
            indexName=index_name,
            queryVector={"float32": vector}, 
            topK=k, 
            returnDistance=True,
            returnMetadata=True
        )
    return response["vectors"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also, create the helper function to display the query and its respective results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import textwrap
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def display_results(matches, query_text=None, query_image=None, is_rrf=False):
    max_cols = 7
    total = len(matches) + 1
    n_rows = math.ceil(total / max_cols)

    fig, axes = plt.subplots(n_rows, max_cols, figsize=(3 * max_cols, 5 * n_rows))
    axes = axes.flatten()

    query_path = os.path.join(f"{base_data_path}/fashion-dataset/test-images/{query_image}")
    if os.path.exists(query_path):
        img = mpimg.imread(query_path)
        axes[0].imshow(img)
        axes[0].axis("off")
        axes[0].set_title(f"Query Text:\n{textwrap.fill(str(query_text), width=25)}", fontsize=9)
    else:
        axes[0].text(0.5, 0.5, f"Query Text:\n{textwrap.fill(str(query_text), width=25)}", ha='center', va='center')
        axes[0].axis("off")

    for i, match in enumerate(matches):
        idx = i + 1
        match_path = os.path.join(f"{base_data_path}/fashion-dataset/images/{match['key']}.jpg")
        if os.path.exists(match_path):
            img = mpimg.imread(match_path)
            axes[idx].imshow(img)
            axes[idx].axis("off")
            product_name = match["metadata"].get("productName", "N/A")
            wrapped_name = textwrap.fill(product_name, width=20)
            if is_rrf:
                title = f"{match["key"]}\n{wrapped_name}\nrrf_score: {match["rrf_score"]:.4f}"
            else:
                title = f"{match["key"]}\n{wrapped_name}\nDist: {match["distance"]:.4f}"
            axes[idx].set_title(title, fontsize=8)
        else:
            axes[idx].text(0.5, 0.5, "Image not found", ha="center", va="center")
            axes[idx].axis("off")

    for i in range(total, len(axes)):
        axes[i].axis("off")

    plt.tight_layout()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Querying the index using an image vector.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query_image = "t-shirt.webp"
input_image = [Image.open(f"{base_data_path}/fashion-dataset/test-images/{query_image}").convert("RGB")]
i_emb = get_embeddings(input_image, mode="image")
image_match = query_s3_vectors(bucket_name=vector_bucket_name, vector=i_emb[0], k=10, index_name="img-index")
display_results(image_match, query_image=query_image)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdynfo1r5jvboaep5cqsg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdynfo1r5jvboaep5cqsg.png" alt="Image based results"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Querying the index using the text.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;input_text = ["mens green polo tshirt"]
t_emb = get_embeddings(input_text, mode="text")
text_match = query_s3_vectors(bucket_name=vector_bucket_name, vector=t_emb[0], k=10, index_name="txt-index")
display_results(text_match, query_text=input_text, query_image=None)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxgnzze7cz837czzpc1k7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxgnzze7cz837czzpc1k7.png" alt="Text based results"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal Search with Reciprocal Rank Fusion (RRF)
&lt;/h2&gt;

&lt;p&gt;Reciprocal Rank Fusion (RRF) is a simple rank aggregation technique that combines results from multiple search systems by rewarding items that consistently appear near the top.&lt;/p&gt;

&lt;p&gt;Combine individual search results using Reciprocal Rank Fusion:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Obtain separate ranked lists for text and image searches.&lt;/li&gt;
&lt;li&gt;Apply RRF to merge rankings, enhancing overall search accuracy.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def rrf_merge(image_results, text_results, k=60, top_k=20):
    scores = defaultdict(float)
    rank_sources = {"image": image_results, "text": text_results}

    for source_name, result_list in rank_sources.items():
        for rank, item in enumerate(result_list):
            doc_id = item["key"]
            scores[doc_id] += 1 / (k + rank + 1)
    metadata = {item["key"]: item for item in image_results + text_results}

    fused = [
        {
            "key": doc_id,
            "rrf_score": round(score, 6),
            **metadata[doc_id]
        }
        for doc_id, score in sorted(scores.items(), key=lambda x: x[1], reverse=True)
    ]

    return fused[:top_k]

fused_results = rrf_merge(image_results=image_match, text_results=text_match, top_k=20)
display_results(fused_results, query_text=input_text, query_image=query_image, is_rrf=True)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fay5pwms7br0qi2rem38l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fay5pwms7br0qi2rem38l.png" alt="Fused results"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Considerations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Vector query results are returned under a second (&amp;lt;1000 ms), enabling fast lookups.&lt;/li&gt;
&lt;li&gt;Dimension size and the number of vectors within the index may affect latency — larger datasets or high-dimensional vectors may increase query time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Storage and Cost Considerations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Larger dimensions require more storage space, which can lead to increased cost.&lt;/li&gt;
&lt;li&gt;A single index can store up to 50 million vectors, making it scalable for large workloads.&lt;/li&gt;
&lt;li&gt;Vector dimensionality must be carefully managed to optimize overall costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recall and Accuracy Considerations
&lt;/h3&gt;

&lt;p&gt;When performing vector similarity queries, average recall performance is influenced by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The quality and type of embedding model used to generate the vectors.&lt;/li&gt;
&lt;li&gt;The size of the dataset, in both vector count and dimensionality.&lt;/li&gt;
&lt;li&gt;The distribution and diversity of incoming queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Design Considerations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;S3 Vectors is ideal for workloads that don’t require real-time results, such as batch processing or periodic search.&lt;/li&gt;
&lt;li&gt;Partial updates to a vector or its associated metadata are not supported. Any update requires re-uploading the full vector entry.&lt;/li&gt;
&lt;li&gt;No direct support for inherently multimodal search (e.g., combining image and text vectors). You’ll need to implement external fusion techniques like Reciprocal Rank Fusion (RRF).&lt;/li&gt;
&lt;li&gt;No infrastructure setup is required — S3 Vectors is fully managed, so you can start storing and querying vectors immediately using API calls, without provisioning or managing servers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Wrapping Up
&lt;/h3&gt;

&lt;p&gt;Amazon S3 Vector Buckets make it easier than ever to implement vector search without managing a dedicated vector database. In this tutorial, we built a multimodal product search system combining image and text similarity.&lt;/p&gt;

&lt;p&gt;By generating embeddings, creating separate indexes for image and text, and retrieving results using simple API calls, we demonstrated how to deliver a powerful and scalable search experience. We also used Reciprocal Rank Fusion (RRF) to combine results across modalities, showing how multimodal search can be layered on top with minimal effort.&lt;/p&gt;

&lt;p&gt;S3 Vector Buckets are a great fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Teams wanting to experiment with vector search quickly&lt;/li&gt;
&lt;li&gt;Applications where query latency under 1 second is acceptable&lt;/li&gt;
&lt;li&gt;Use cases like product recommendations, visual search, or content discovery&lt;/li&gt;
&lt;li&gt;They may not suit real-time or high-frequency update scenarios, but they shine when ease of integration, scalability, and cost-efficiency matter most.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you like to follow along with me step by step then you can refer to this video.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/BB7DucjM8T4"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Want to try it yourself?
&lt;/h3&gt;

&lt;p&gt;Download the &lt;a href="https://github.com/srcecde/s3-vector-buckets-search" rel="noopener noreferrer"&gt;notebook&lt;/a&gt; to get started.&lt;/p&gt;

&lt;p&gt;Thank you for reading!&lt;/p&gt;

</description>
      <category>s3</category>
      <category>vector</category>
      <category>recommendation</category>
      <category>s3search</category>
    </item>
    <item>
      <title>[Hands-On] AWS Lambda function URL with AWS IAM Authentication type</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Tue, 31 Oct 2023 23:11:41 +0000</pubDate>
      <link>https://dev.to/aws-builders/hands-on-aws-lambda-function-url-with-aws-iam-authentication-type-180g</link>
      <guid>https://dev.to/aws-builders/hands-on-aws-lambda-function-url-with-aws-iam-authentication-type-180g</guid>
      <description>&lt;p&gt;In this article, I am going to cover how to secure the AWS lambda function URL using AWS_IAM auth followed by how authenticated IAM users can access the lambda function via function URL.&lt;/p&gt;

&lt;p&gt;If you are not aware of what the AWS Lambda function URL is then please refer to my video on &lt;a href="https://youtu.be/NIk5ut7_5sM"&gt;How to configure the AWS Lambda function URL&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What does AWS_IAM Authentication type mean when enabled for AWS Lambda function URL?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It means that only authenticated IAM users or roles can invoke the lambda function via the function URL. If they are not authenticated or do not have the necessary permissions then they will not be able to invoke or access the lambda function via URL and they will be greeted with an error message like Forbidden with status code 403.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On
&lt;/h2&gt;

&lt;p&gt;Login to the AWS Management Console to get started.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda Function
&lt;/h3&gt;

&lt;p&gt;Navigate to Lambda Management Console and create the lambda function with the configuration as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjw0idjv8kinkzxeo5y1e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjw0idjv8kinkzxeo5y1e.png" alt="Create lambda function" width="720" height="737"&gt;&lt;/a&gt;Create lambda function&lt;/p&gt;

&lt;p&gt;As a part of the Execution role, the first option will create a role with basic permissions which will allow the lambda function to create and write the logs to cloudwatch.&lt;/p&gt;

&lt;p&gt;Expand the &lt;em&gt;Advance Settings&lt;/em&gt; — to enable function URL along with &lt;em&gt;AWS_IAM&lt;/em&gt; as auth type as shown in below screenshot and click on &lt;em&gt;Create function&lt;/em&gt;. You can also enable the function URL after creating the function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbvug6skixzcfordadg7d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbvug6skixzcfordadg7d.png" alt="Enable function URL" width="720" height="223"&gt;&lt;/a&gt;Enable function URL &lt;/p&gt;

&lt;h3&gt;
  
  
  IAM User
&lt;/h3&gt;

&lt;p&gt;The next step would be to create an IAM user. I am creating the new IAM user just to demonstrate things end-to-end but you can also experiment with an existing IAM user.&lt;/p&gt;

&lt;p&gt;Navigate to &lt;em&gt;IAM Management Console&lt;/em&gt; → Click &lt;em&gt;Users&lt;/em&gt; from left panel → &lt;em&gt;Create User&lt;/em&gt;. Follow through the on-screen steps. Do not add/attach any permissions to that IAM user.&lt;/p&gt;

&lt;p&gt;Now, if we were to access the lambda function via the function URL when the AWS_IAM Authentication type is enabled, we would require AWS security credentials. So let’s generate the access key &amp;amp; secret access key for the given IAM user.&lt;/p&gt;

&lt;p&gt;Open the IAM user → Security credentials → Scroll down to &lt;em&gt;Access keys&lt;/em&gt; → &lt;em&gt;Create access key&lt;/em&gt;. As the next step, let’s try to invoke the lambda function via the function URL using the generated security credentials with Postman.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu8j89zd9wsc3k3xuy9c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu8j89zd9wsc3k3xuy9c.png" alt="Configure security credentials" width="720" height="243"&gt;&lt;/a&gt;Configure security credentials&lt;/p&gt;

&lt;p&gt;Open Postman → Copy &amp;amp; paste the lambda function URL. Under &lt;em&gt;Authorization&lt;/em&gt; → Select &lt;em&gt;AWS Signature&lt;/em&gt; → Fill the &lt;em&gt;Access Key&lt;/em&gt; &amp;amp; &lt;em&gt;Secret key&lt;/em&gt; values with IAM user credentials (Generated in the previous step). Under Advanced configuration, enter the appropriate region (in my case it’s us-east-1) and the service name will be lambda because we are accessing/invoking the lambda service. Finally, click on Send to invoke and it will greet you with 403 forbidden because the IAM user does not have permission to access the said lambda function via the function URL.&lt;/p&gt;

&lt;p&gt;The next step would be to provide the permission. There are two ways to provide permission which is either via Identity-based policy or resource-based policy and I will show you both. The basic difference between Identity-based policy and resource-based policy is that identity-based policies are directly attached to IAM users, groups, or roles, and in this case, we will attach it to IAM user that we have created, whereas resource-based policies are directly attached to resources which defines who can access that resource and in our case the resource is Lambda function where we will define who can access this lambda function via function URL.&lt;/p&gt;

&lt;p&gt;Ideally, to successfully invoke the function via URL, the said entity must have &lt;em&gt;InvokeFunctionUrl&lt;/em&gt; permission.&lt;/p&gt;

&lt;h4&gt;
  
  
  Permission via Identity-based policy
&lt;/h4&gt;

&lt;p&gt;Navigate to &lt;em&gt;IAM Management Console&lt;/em&gt; → &lt;em&gt;Policies&lt;/em&gt; (from left panel) → &lt;em&gt;Create policy&lt;/em&gt; → Select &lt;em&gt;JSON view&lt;/em&gt;. Copy &amp;amp; paste the below policy and create it. &lt;em&gt;Make sure to replace the ARN with the ARN of your lambda function&lt;/em&gt;. Post policy creation, attach the policy to the IAM user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunctionUrl"
            ],
            "Resource": [
                "arn:aws:lambda:your-region:your-acc-id:function:your-lambda-fun"
            ]
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above policy says - to allow the InvokeFunctionUrl action on the particular lambda function that is defined as a part of the Resource to the IAM user or identity to which this policy will be attached.&lt;/p&gt;

&lt;p&gt;As a next step, open Postman and invoke the function URL again. This time, it will return status code 200 along with “Hello from lambda!” as a response.&lt;/p&gt;

&lt;h3&gt;
  
  
  Permission via Resource-based policy
&lt;/h3&gt;

&lt;p&gt;In this section, we will configure the resource-based policy. It is something that is attached to the resource (i.e. lambda function). As a first step, please remove the policy that you have attached to the IAM user.&lt;/p&gt;

&lt;p&gt;Open the lambda function → &lt;em&gt;Configurations&lt;/em&gt; → &lt;em&gt;Permissions&lt;/em&gt; → Scroll down to &lt;em&gt;Resource-based policy statements&lt;/em&gt; → &lt;em&gt;Add permissions&lt;/em&gt;. Configure the policy as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzv196m32jm0mq0tntcp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzv196m32jm0mq0tntcp.png" alt="Resource-based policy" width="720" height="739"&gt;&lt;/a&gt;Resource-based policy&lt;/p&gt;

&lt;p&gt;Replace the &lt;em&gt;Principal&lt;/em&gt; with the IAM user ARN → Save and test it again. You should be able to successfully invoke the lambda function via the function URL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generating &amp;amp; using temporary security credentials
&lt;/h3&gt;

&lt;p&gt;We were able to invoke the lambda function via the function URL successfully using the IAM user security credentials (i.e. access and secret key), but the keys might get exposed and misused, which is a risk. So, the more promising way is to use temporary credentials which basically expire after a certain time. To do that, we are going to use AWS Security Token Service (AWS STS) to create and use temporary credentials and it is very simple to generate. Follow the below steps (Assuming AWS CLI is already installed).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open terminal&lt;/li&gt;
&lt;li&gt;Execute &lt;code&gt;aws configure&lt;/code&gt; &amp;amp; configure the access key, access secret key &amp;amp; region&lt;/li&gt;
&lt;li&gt;To generate temporary credentials, execute &lt;code&gt;aws sts get-session-token --duration-seconds 900&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The above command will generate the temporary credentials (looks like below), which will be valid for 900 seconds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvizjxv5oi6fnobgj7hj3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvizjxv5oi6fnobgj7hj3.png" alt="AWS temporary credentials" width="653" height="327"&gt;&lt;/a&gt;AWS temporary credentials&lt;/p&gt;

&lt;p&gt;As a next step, open Postman. Replace the &lt;em&gt;AccessKey&lt;/em&gt; &amp;amp; &lt;em&gt;SecretKey&lt;/em&gt; with the new values. Also, paste the &lt;em&gt;sessionToken&lt;/em&gt; in the relevant field under &lt;em&gt;Advanced configuration&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22310pq5e6eb5f8x4syc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F22310pq5e6eb5f8x4syc.png" alt="Postman temporary credentials configuration" width="720" height="235"&gt;&lt;/a&gt;Postman temporary credentials configuration&lt;/p&gt;

&lt;p&gt;If you invoke the URL now, then you will be able to access the lambda function successfully with status code 200. After 15 minutes, these credentials will no longer be valid and need to be regenerated.&lt;/p&gt;

&lt;p&gt;I hope you learned something new today. If you like to follow along with me step by step then you can refer to this video.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/A-AEO103i98"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;You might also like reading about &lt;a href="https://medium.com/srcecde/whitelist-ip-addresses-for-lambda-function-urls-8f69c8285e0a"&gt;how to Whitelist IP addresses for Lambda function URLs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback please leave them below. A reaction &amp;amp; a follow is appreciated :) &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
      <category>lambda</category>
      <category>serverless</category>
      <category>security</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Pixel vs Path: Comparing Raster and Vector Images</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Thu, 26 Oct 2023 21:52:39 +0000</pubDate>
      <link>https://dev.to/aws-builders/pixel-vs-path-comparing-raster-and-vector-images-27ha</link>
      <guid>https://dev.to/aws-builders/pixel-vs-path-comparing-raster-and-vector-images-27ha</guid>
      <description>&lt;p&gt;In &lt;a href="https://medium.com/srcecde/essence-of-digital-images-building-blocks-c2c13cca2db3"&gt;Essence of Digital Images: Building blocks&lt;/a&gt; article, I covered the various aspects of digital images like building blocks of an image, color spaces, color encoding &amp;amp; its representation, compression, and more. Those topics mainly involved pixel-based images. However, in this article, I want to cover two primary categories of images.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Raster images&lt;/li&gt;
&lt;li&gt;Vector images&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Raster images
&lt;/h2&gt;

&lt;p&gt;Raster images are also known as bitmap images. When we talk about images in general, implicitly we are referring to raster images most of the time. Raster images are made of small squares known as &lt;a href="https://dev.to/aws-builders/essence-of-digital-images-building-blocks-3jjg"&gt;pixels - The smallest building block&lt;/a&gt;, arranged/represented in the setup of rows and columns resulting in a grid expressed as width x height (For ex. 1920x1080). In math terms, it is a matrix. Each pixel is assigned a numerical value that represents the intensity of the color.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p6fiecv5i10yfeo6pf8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3p6fiecv5i10yfeo6pf8.png" alt="[Left] 8 x 8 grayscale image | [Right] 8 x 8 grayscale image with intensity values" width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;
[Left] 8 x 8 grayscale image | [Right] 8 x 8 grayscale image with intensity values



&lt;p&gt;&lt;strong&gt;[Grayscale]&lt;/strong&gt; The size of the above grid is 8 (rows) x 8 (columns); hence the dimension of the image is 8x8 pixels and each square block is the pixel with the associated intensity or color value between 0 and 256. It has one channel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uagxlp91aw52mamlm2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uagxlp91aw52mamlm2z.png" alt="[Left] 8 x 8 RGB image | [Right] 8 x 8 RGB image with intensity values" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;
[Left] 8 x 8 RGB image | [Right] 8 x 8 RGB image with intensity values



&lt;p&gt;&lt;strong&gt;[RGB]&lt;/strong&gt; The size of the above grid is 8 (rows) x 8 (columns); hence the dimension of the image is 8x8 pixels and each square block is the pixel with the associated intensity or color value between 0 and 256 for each channel in vertical order (R, G, B). It has three channels.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The code to generate the above images can be found in this &lt;a href="https://github.com/srcecde/srcecde-articles-codebase/tree/main/digital-images/raster-vs-vector"&gt;GitHub repository&lt;/a&gt;. Please refer raster.py&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common file types that fall under raster images&lt;/strong&gt;: JPEG, PNG, GIF, BMP, TIFF, and more.&lt;/p&gt;

&lt;p&gt;In general, the quality of the raster image is directly tied to the dimensions of an image; i.e. how large the grid is. 1080x1080 pixels image is better and will have more visual details/information than a 512x512 pixels image when it is viewed at the same size. However, other aspects like bit depth, compression, color mode, file format also affect the quality of the image in terms of visual details and file size. Higher image dimensions (pixels) typically result in larger file sizes.&lt;/p&gt;

&lt;p&gt;Raster images are widely used in photography, web graphics, scanning, medical imaging, and more. It is also used where capturing real, critical details and color gradients are important.&lt;/p&gt;

&lt;p&gt;Software like Adobe Photoshop, GIMP but not limited to these are generally used while working with raster images.&lt;/p&gt;

&lt;h2&gt;
  
  
  Vector images
&lt;/h2&gt;

&lt;p&gt;Vector images are made up of lines, curves, paths, angles which has start and end point. These lines, paths, shapes can be any geometric representation. And mainly they are driven by mathematical expressions. Pixels, grids, dimensions are not applicable to vector images. One can scale vector images indefinitely to any size and the quality of the image will not deteriorate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common file types that fall under vector images&lt;/strong&gt;: SVG, AI, EPS, and more.&lt;/p&gt;

&lt;p&gt;Vector images are widely used for designing logos, icons, illustrations, computer-aided drawings, engineering/architectural plans or layouts, and other graphics designs where image scalability, clarity, and precision/accuracy are important.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbtwbcnaimiwx75h8g0d.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbtwbcnaimiwx75h8g0d.gif" alt="[Left] Vector vs [Right] Raster image — when scaled at 1000%" width="1080" height="608"&gt;&lt;/a&gt;&lt;/p&gt;
[Left] Vector vs [Right] Raster image — when scaled at 1000%



&lt;p&gt;In the above example, I have attempted to show the scalability comparison between vector and raster images. The quality of the vector image of “@” on the left is not impacted when scaled and on the other hand, the quality of the raster image of “@” on the right is deteriorating/pixelated when scaled. Hence, when scalability &amp;amp; small file size is important, vector images can be considered.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The code to generate the above images can be found in this &lt;a href="https://github.com/srcecde/srcecde-articles-codebase/tree/main/digital-images/raster-vs-vector"&gt;GitHub repository&lt;/a&gt;. Please refer raster_vs_vector_img.py&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Software like CoreDRAW, Adobe Illustrator but not limited to these are generally used while working with vector images.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The primary advantage of vector images over raster image is, it can be resized/scaled without losing the quality whereas, raster image gets pixelated when scaled&lt;/li&gt;
&lt;li&gt;Raster to vector image conversion is complex and often manual adjustment by an experienced user is required. Simple graphics conversion would be straightforward but it is much more challenging to translate the detailed photograph accurately into vector format. In contrast, vector-to-raster translation is simple depending on factors like resolution, output size, and potential loss of clarity.&lt;/li&gt;
&lt;li&gt;Raster images are well-suited to represent intricate details and complex color gradations. Vector images are more suited for illustrations with solid colors and sharp edges.&lt;/li&gt;
&lt;li&gt;In terms of file size, a raster image is larger in size since it holds more visual information and more (thousands/millions) pixels, whereas a vector image file size is comparatively smaller since it only holds the mathematical expressions that would determine the designs. However, raster images can be compressed which I will cover in an upcoming article.&lt;/li&gt;
&lt;li&gt;Raster images are easily accessible and can be opened using various everyday applications but to access/open vector images often requires specialized software.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The choice between using raster vs vector depends on the use cases. For photographs where capturing real details is important, raster is the way to go. In the scenario where scalability, precision, accuracy are important, vectors are more suitable. &lt;em&gt;The well-known ImageNet dataset contains raster images.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Note: &lt;em&gt;In the upcoming articles, I will be mainly talking about the raster images. At the same time, it is important to understand the difference between raster and vector images.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback please leave them below. &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
      <category>digitalimages</category>
      <category>computervision</category>
      <category>pixels</category>
      <category>image</category>
    </item>
    <item>
      <title>Essence of Digital Images: Building blocks</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Mon, 18 Sep 2023 16:38:52 +0000</pubDate>
      <link>https://dev.to/aws-builders/essence-of-digital-images-building-blocks-3jjg</link>
      <guid>https://dev.to/aws-builders/essence-of-digital-images-building-blocks-3jjg</guid>
      <description>&lt;p&gt;In the era of digital transformation, images emerge as more than mere visual artifacts from the graphics/visuals perspective. Images/Visuals are not just pictures, but they are technological wonders which is a blend of data structures, mathematics, and algorithms. This blend gave rise to these visual experiences that we encounter daily in our lives in the form of digital images.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pixels - The smallest building block
&lt;/h2&gt;

&lt;p&gt;At the core of every digital image lies the pixel. An image is made of tiny pixels sitting together in the form of the grid which creates the matrix representing the image dimensions and resolution. Consider a 3x3 8-bit pixel image in grayscale color space (Grayscale color space is considered for simplicity). Each pixel carries a numerical value representing color intensity from 0 (black) to 255 (white) corresponding to 8 bits of data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87825jp91eoehrsx8hfj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87825jp91eoehrsx8hfj.png" alt="(Left) 3x3 8-bit grayscale image | (Right) Color intensity representation" width="720" height="334"&gt;&lt;/a&gt;&lt;/p&gt;
(Left) 3x3 8-bit grayscale image | (Right) Color intensity representation



&lt;p&gt;&lt;em&gt;Color intensity numbers in red are just for representation purposes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In RGB color space, the pixel intensity is represented as (255, 0, 0) (255 - represents Red channel color intensity, 0 - represents Green channel color intensity, 0 — represents Blue channel color intensity), indicating the pixel color is Red and no green or blue. Each channel color intensity is represented on a scale from 0 to 255 with 256 possible values for each color channel. Consider the 3x3 8-bit scale pixel image in RGB color space as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxink9pazntksb1umtas0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxink9pazntksb1umtas0.png" alt="(Left) 3x3 8-bit RGB image | (Right) Color intensity representation" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;
(Left) 3x3 8-bit RGB image | (Right) Color intensity representation



&lt;p&gt;&lt;em&gt;Note: Each channel of RGB is 8-bit, which can be called 24-bit (8R x 8G x 8B = 24 bit per pixel) color depth data (True color). The 8-bit scale provides 2⁸ = 256 possible values ranging from 0 to 255 for each channel per pixel. Hence 256 x 256 x 256 = over 16 million possible color combinations.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The alpha channel is not included for simplicity.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Color encoding &amp;amp; representation
&lt;/h2&gt;

&lt;p&gt;The previous section touched upon how single &amp;amp; multi-channel (RGB) colors can be represented for a given pixel. This section will cover the two different approaches to representing and encoding colors in digital images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Indexed color&lt;/strong&gt; — Indexed color also known as palette-based color, uses a lookup table with a limited amount of colors. Each pixel in the image is associated with a predefined index value in the color lookup table. It is memory, storage and transmission efficient.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjv1yv6pfbvd02qmjxngc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjv1yv6pfbvd02qmjxngc.png" alt="(Left) Color Lookup table | (Right) Grid image with indexed color" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;
(Left) Color Lookup table | (Right) Grid image with indexed color



&lt;p&gt;&lt;strong&gt;Direct color&lt;/strong&gt; — Direct color is known as true color or 24-bit color which is one of the methods of representing colors in digital images. Unlike indexed color, direct color does not have a color lookup table instead each pixel has its own color value based on a combination of primary colors (RGB). Each channel in RGB is represented using 8 bits each, resulting in 24 bits per pixel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flro2bc69sid9fkbilpj6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flro2bc69sid9fkbilpj6.png" alt="Each pixel in a grid with its respective color information" width="662" height="564"&gt;&lt;/a&gt;&lt;/p&gt;
Each pixel in a grid with its respective color information



&lt;p&gt;File formats like JPEG, PNG, TIFF support direct color images whereas GIF, PNG-8 file format supports indexed color.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fygap8staaqt7ysy2bl3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fygap8staaqt7ysy2bl3r.png" alt="(Left) Direct color example with 24 bits per pixel &amp;amp; Indexed color example (Right) with max 8 colors" width="800" height="601"&gt;&lt;/a&gt;&lt;/p&gt;
(Left) Direct color example with 24 bits per pixel &amp;amp; Indexed color example (Right) with max 8 colors



&lt;p&gt;The color palette (above image — Right) used in the indexed color example is the same as the lookup table mentioned above as an example.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Bit depth (8, 16, 32) quantifies the number of distinct numerical color values each pixel can have. The higher the bit depth, the finer the color gradations (quality). This collectively influences the image richness and accuracy.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Color spaces
&lt;/h2&gt;

&lt;p&gt;Imagine you want to tell someone about the colors of the breathtaking sunrise image below. While you can use words like red, blue, orange, grey but these words cannot capture all the details. Hence, there is a need for different language which can capture the details of colors; i.e. Color spaces.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffo39e27mnl5t1rg6z67u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffo39e27mnl5t1rg6z67u.png" alt="Sunset" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;
Photo by Vika Jordanov on Unsplash



&lt;p&gt;Color spaces is a system/mathematical model that defines how visual colors can be represented as numerical values. It is the standardized and structured way to describe the colors across various applications. There are multiple color spaces each with its own properties. However, the most common color spaces includes but are not limited to—&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RGB (Red, Green, Blue)&lt;/strong&gt; — RGB represents colors as a combination of red, green, and blue as primary colors, each color channel intensity is represented from 0 to 255 in 8-bit systems. In 16-bit systems (16-bit image)(2¹⁶), each channel intensity is represented by the numerical value ranging from 0 to 65,535. RGB is an additive color model, where primary colors are mixed at different intensities to create a wide range of colors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HSV (Hue, Saturation, Value)&lt;/strong&gt; — HSV represents colors in Hue (which color), Saturation (range of grey or intensity of the color), Value (brightness of the color). Hue is represented in degree from 0⁰ to 360⁰ whereas Saturation &amp;amp; Value is represented in percentage. HSV color space is similar to how humans perceive color.&lt;/p&gt;

&lt;p&gt;Lab — Lab color space is used where precision of color is important. L channel represents brightness/lightness, a channel represents the color position on green to red axis (horizontal), b channel represents the color position on blue to yellow axis (vertical).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CMY/CYMK (Cyan, Magenta, Yellow, Key/Black)&lt;/strong&gt;- CMY is the combination of cyan, magenta, and yellow as primary colors. CYMK is an extension of CMY which includes the fourth channel as black. This color space is commonly used in printing. It is a subtractive color model where colors are layered to create a wide spectrum of colors.&lt;/p&gt;

&lt;p&gt;There are other color spaces like YUV/YCbCr, YIQ, YDbDr, sRGB, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image formats &amp;amp; compression
&lt;/h2&gt;

&lt;p&gt;Image formats; which are likely to be identified by the extension of the file like jpg/jpeg, png, tiff, bmp, raw, and more carry a varying representation of the same visuals (for the given image). Different file formats use compression algorithms that fall under lossless and lossy compressions. Compression algorithms help to optimize storage and transmission by reducing the file size while the intricate details can be retained.&lt;/p&gt;

&lt;p&gt;JPEG is the lossy compression that selectively discards some data to reduce size while PNG format uses a lossless data compression algorithm. It’s a trade-off between compression ratios and image quality. The balancing ratio between size and quality varies across different applications.&lt;/p&gt;

&lt;p&gt;This section mainly touched upon raster formats like JPEG, PNG which deal with pixel data, however, there are vector formats as well which include SVG, EPS, AI which are defined by points, curves, and lines rather than pixels and they are also resolution independent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Digital image manipulation
&lt;/h2&gt;

&lt;p&gt;Consider the photo that you want to crop, rotate, apply a filter, or apply any sort of transformation and this is where digital image processing techniques like filtering, image enhancement, noise reduction, and blurring come in handy. For example, if you want to smoothen the noise from the photograph — blurring techniques like Gaussian blur, median blur can be applied which will in general smoothen the image by averaging the pixel values based on the kernel. This technical computational process fine-tunes images for aesthetic, clarity, and analysis.&lt;/p&gt;

&lt;p&gt;Images are something that we interact with daily for a variety of purposes and the visual data is growing at an exponential rate. Hence, it becomes crucial that a new dimension of visual analysis evolve, interpret and understand images. With the advancement in AI/ML, it has already revolutionized the computer vision area which has enabled machines to understand images.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Digital images are a combination of mathematics, algorithms, and engineering starting from pixels coming together to form grids, color spaces lightning up pixels with colors, striking a balance between size and quality for better storage &amp;amp; transmission, extending further to filters/manipulation which gives a different perspective and AI/ML opening up a whole new space of visual analysis.&lt;/p&gt;

&lt;p&gt;In this article, I have barely scratched the surface in terms of images but I hope it gives you a holistic view of images at a high level.&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback please leave them below. &lt;a href="https://www.youtube.com/@srcecde"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
      <category>computervision</category>
      <category>digitalimages</category>
      <category>machinelearning</category>
      <category>ai</category>
    </item>
    <item>
      <title>AWS Glue Custom Classifier | Grok | Tutorial</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Tue, 07 Feb 2023 15:28:56 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-glue-custom-classifier-grok-tutorial-2d6i</link>
      <guid>https://dev.to/aws-builders/aws-glue-custom-classifier-grok-tutorial-2d6i</guid>
      <description>&lt;p&gt;AWS Glue custom classifier enables you to catalog the data in the way you want when AWS Glue built-in classifiers cannot. It is important to catalog the data correctly and the classifier plays an important role in identifying the structure of underlying data.&lt;/p&gt;

&lt;p&gt;If the built-in classifiers do not catalog the data as you need, then there are a couple of options to go for but it is not limited to those. The first option is to populate or prepare the source data in the format that is supported by AWS Glue built-in classifiers. The second option could be to catalog the data manually if it is feasible. As a third option, create a custom classifier to determine or parse the structure of data, the way we want. At the same time, sometimes the custom classifier also fails, and in that case, consider changing the format of data via some transformations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding the role of classifiers
&lt;/h3&gt;

&lt;p&gt;Let’s consider S3 as the data source, or it could be anything. The ideal step to catalog the data is to create the crawler, which will scan the underlying followed by determining the potential columns and their data type. Post-identification of the structure of data, it will populate the table definition as a part of the data catalog.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiyav7hkq5v1jvtxfl1ut.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiyav7hkq5v1jvtxfl1ut.png" alt="High level flow" width="800" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How the crawler will determine the schema of the data?
&lt;/h3&gt;

&lt;p&gt;The answer is, it will determine the schema using the classifiers. So, the role of the classifier is to read the data and identify its structure of it and help to catalog the data correctly.&lt;/p&gt;

&lt;p&gt;A short definition of a classifier is, &lt;em&gt;A classifier is a configuration, either built-in or custom, that is leveraged as a part of the crawler to infer the source data and parse the structure or schema of the underlying data.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How crawler choose the classifier?
&lt;/h3&gt;

&lt;p&gt;While creating the crawler, one can choose one or more custom classifiers to infer the schema and for a built-in classifier, there is no additional configuration is required. Assuming that, the crawler that is created has one or more custom classifiers. This is how it will choose the classifier&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The crawler will start the match using the custom classifiers first and subsequently leverage other classifiers if required&lt;/li&gt;
&lt;li&gt;The crawler determines if the said classifier is a good fit or not based on the certainty score. Match via each classifier returns the certainty score. If any classifier returns a certainty score of 1.0, then the crawler will make the decision that the particular classifier is a good fit and it can create the correct table definition. The crawler will stop further matching with other classifiers if any classifier returns a certainty score of 1.0&lt;/li&gt;
&lt;li&gt;If none of the custom classifiers returns a certainty score of 1.0, then the crawler will initiate the match using the AWS Glue built-in classifiers. The match via built-in classifiers will either return 1.0 (If there is a match) or 0.0 (If no match is found)&lt;/li&gt;
&lt;li&gt;If no classifier (custom or built-in) returns a certainty score of 1.0, then it will pick the classifier with the highest certainty score&lt;/li&gt;
&lt;li&gt;Let’s say there is no classifier that returned a certainty greater than 0.0, then AWS Glue will return the default classification string of UNKNOWN&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Types of classifiers
&lt;/h3&gt;

&lt;p&gt;There are two types of classifiers; AWS Glue Built-in classifiers and Custom classifiers. Here, is the list of built-in and custom classifiers that AWS Glue supports as of today. With respect to built-in classifiers, there is no additional configuration required to parse the structure of data because AWS Glue will internally figure out which built-in classifier to use based on the certainty score.&lt;/p&gt;

&lt;p&gt;For custom classifiers, there are 4 types available which are grok, JSON, XML, and CSV custom classifiers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqi20avf4zlenmwgg7c4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqi20avf4zlenmwgg7c4w.png" alt="Classifiers" width="800" height="1109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It can classify the files which are in ZIP, BZIP, GZIP, LZ4, and Snappy compression formats.&lt;/p&gt;

&lt;h3&gt;
  
  
  When to use a custom classifier?
&lt;/h3&gt;

&lt;p&gt;When the AWS Glue built-in classifier is unable to create the expected or required table definition, then one should consider creating &amp;amp; using the custom classifier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grok custom classifier
&lt;/h3&gt;

&lt;p&gt;Grok is a tool to parse textual data using a &lt;strong&gt;grok pattern&lt;/strong&gt;. Grok pattern is the named set of a regular expression. For example, defining a regular expression pattern to match email addresses and then the name is given to that pattern like an &lt;em&gt;EMAILPARSER&lt;/em&gt;, and this is a named set of the regular expression. (EMAILPARSER regular-expression).&lt;/p&gt;

&lt;p&gt;Syntax of grok pattern: &lt;code&gt;%{PATTERNNAME:field-name:data-type}&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;PATTERNNAME&lt;/em&gt; is the name of the pattern that will match the text. (&lt;em&gt;PATTERNNAME&lt;/em&gt; could be &lt;em&gt;EMAILPARSER&lt;/em&gt; from the previous example)&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;field-name&lt;/em&gt; is the alias name and can be anything&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;data-type&lt;/em&gt; is optional. By default, the data type will be a string if not mentioned. The supported data types are byte, boolean, double, short, int, long, string, and float&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, if you want to match the month number from the text, then&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Define the regular expression to match the same and then we will provide the name to that pattern&lt;/li&gt;
&lt;li&gt;Named pattern: &lt;code&gt;**MONTHNUM** (?:0?[1–9]|1[0–2])&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In this case, the name of the pattern that we have given is &lt;em&gt;MONTHNUM&lt;/em&gt;, which is followed by a regular expression.&lt;/li&gt;
&lt;li&gt;To define the grok pattern using the defined syntax, which will be the reference to the defined pattern_name followed by the alias name. And the pattern name is &lt;em&gt;MONTHNUM&lt;/em&gt; and the alias name can be anything. Finally, casting it to int datatype&lt;/li&gt;
&lt;li&gt;Grok pattern: &lt;code&gt;%{MONTHNUM:month:int}&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS Glue provides built-in patterns and if required the custom pattern can be defined. For example, if AWS Glue has MONTHNUM defined as a part of the built-in patterns, then we can directly use the name of that pattern to define grok pattern. But if there is something that you want to match or parse that is not defined as a part of the AWS Glue’s built-in pattern then you have to define the custom pattern like MONTHNUM followed by the regular expression. Hence, the grok pattern can be defined using AWS Glue’s built-in patterns and custom patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grok custom classifier example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Consider a log file (as shown in the Log data image). In this log file, there is UUID, WindowsMAC address, and Employee Id. The requirement is to catalog this structure and create the table definition via grok custom classifier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fad76omoku0tskqgvtbvb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fad76omoku0tskqgvtbvb.png" alt="Log data" width="800" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below is a snapshot that shows a couple of AWS Glue built-in patterns, that we will use. But there is a long list of AWS Glue built-in patterns to which you can refer here. The keyword you see initially is the pattern name (highlighted in the blue box), and it is followed by the regular expression (highlighted in the purple box).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5xc6jsyyxj2o8c091it.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh5xc6jsyyxj2o8c091it.png" alt="AWS Glue built-in patterns sample" width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Starting with the UUID, the pattern to parse UUID is already defined in AWS Glue built-in pattern (highlighted in the red box), then it is the same for Windows MAC as well. But for Employee Id, which is a 7-digit number, there is no built-in pattern, so in that case, the custom pattern is required. In simple words, if we have to define the pattern name and regular expression to match then it is the custom pattern.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgm7s4qtkeonem3gef8x3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgm7s4qtkeonem3gef8x3.png" alt="Grok patterns for log file" width="800" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we put together all the patterns, then it will look like the grok pattern defined as a part of the Final grok pattern in the above table.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hands-On
&lt;/h3&gt;

&lt;p&gt;As a part of hands-on, we will parse the log file looks like below. I have modified this log file to include the random dummy email addresses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvii489pfcxaie0q5qlhb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvii489pfcxaie0q5qlhb.png" alt="[Modified] Apache error log" width="800" height="113"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Breaking down the structure of a log. Starting with a &lt;em&gt;Timestamp&lt;/em&gt; in square braces, then the &lt;em&gt;Log type&lt;/em&gt; which could be an error or notice, followed by the &lt;em&gt;Email&lt;/em&gt;, and finally the &lt;em&gt;Message&lt;/em&gt;. Now, let’s see which patterns will be used among built-in and custom to define grok patterns.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx7uzkjuk9hwu5niyz1h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcx7uzkjuk9hwu5niyz1h.png" alt="Log structure" width="800" height="152"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timestamp&lt;/strong&gt;: A custom pattern is defined using the AWS Glue built-in patterns to infer Day, Month, Monthday, Time &amp;amp; Year as a single entity. And using the custom pattern the grok pattern is defined.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log type&lt;/strong&gt;: AWS Glue built-in pattern WORD is used. WORD pattern will match any alphanumeric characters including underscore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Email&lt;/strong&gt;: There is no built-in pattern to parse email. Hence, the custom pattern is defined with the name GETEMAIL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Message&lt;/strong&gt;: GRREDYDATA built-in pattern is used. GREEDATA will match 0 or more of the preceding tokens except for the new line.&lt;/p&gt;

&lt;p&gt;The overall pattern is defined as a part of the &lt;em&gt;Final grok pattern&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3w88242ofeclp8xyw4av.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3w88242ofeclp8xyw4av.png" alt="Grok pattern table" width="800" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  AWS Console
&lt;/h3&gt;

&lt;p&gt;Goto AWS Management Console → S3 Management Console → Create a new bucket or use an existing bucket to upload the log file.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fetuzxo32r1097loacxxx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fetuzxo32r1097loacxxx.png" alt="Upload log file to S3" width="800" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create database&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As a next step, go to the AWS Glue console and create the database. The table definition will be created under this database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvwaicryho94r9uldf4t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvwaicryho94r9uldf4t.png" alt="Create database" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create custom classifier&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would encourage you to create the crawler first to run it without a custom classifier and check the results to get an idea if the AWS Glue built-in classifiers are able to populate the correct structure or not. Post-that go to Classifier (Under Crawlers) → Add classifier and configure the details as shown below and create.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feujoirp2z2u8kxdll1bo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feujoirp2z2u8kxdll1bo.png" alt="Create custom classifier" width="800" height="621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Classifier name&lt;/strong&gt;: The name of the custom classifier&lt;br&gt;
&lt;strong&gt;Classifier type&lt;/strong&gt;: Grok&lt;br&gt;
&lt;strong&gt;Classification&lt;/strong&gt;: A classification string that will be used as a label&lt;br&gt;
&lt;strong&gt;Grok pattern&lt;/strong&gt;: A grok pattern to match the structure (From Grok pattern table above). All the fields will have string data type but if you want to cast it to another data type then add the optional data type block (seperated by a colon as per the syntax which is discussed earlier)&lt;br&gt;
&lt;strong&gt;Custom patterns&lt;/strong&gt;: In the above table, two custom patterns (GETTIMESTAMP &amp;amp; GETEMAIL) are created which are not available as a part of AWS Glue built-in patterns, so enter that&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create crawler&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Go to crawlers → Create crawler → Configure crawler name (Step 1) → Configure data source &amp;amp; add custom classifier(s) as shown below (Step 2) → Select IAM role (Step 3) → Select the database created earlier &amp;amp; provide the optional table prefix. In my case, it is custom- (Step 4) → Review the configuration (Step 5) → Create crawler.&lt;/p&gt;

&lt;p&gt;For detailed steps on creating a crawler, please refer to &lt;a href="https://youtu.be/0pc7GdNtEZk" rel="noopener noreferrer"&gt;this video&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F376keitqc0ubv2t5etkv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F376keitqc0ubv2t5etkv.png" alt="Add custom classifier" width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a next step, run the crawler, and if the configuration and grok pattern is correct then it will populate the table under the said database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnuvtubab3mq22c2xu7eq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnuvtubab3mq22c2xu7eq.png" alt="Table details" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query the table via Athena&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Navigate to Amazon Athena → Select Query Editor → Select AwsDataCatalog as Data Source → Select the database (In my case it’s apache-demo-db) → That should list all the tables under that database → Query the table&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1ylapzsrkjha00fadaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1ylapzsrkjha00fadaf.png" alt="Query results" width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I hope, you gained good insights and hands-on knowledge about the custom classifier and especially the grok custom classifier. If you like to follow along with me step by step in detail then you can refer to this video.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/rEEn7xAH_9I"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback then please leave them below. &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1" rel="noopener noreferrer"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gratitude</category>
    </item>
    <item>
      <title>Allow access to REST API Gateway from specific IP addresses | Whitelist IPs</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Wed, 01 Feb 2023 11:03:35 +0000</pubDate>
      <link>https://dev.to/aws-builders/allow-access-to-rest-api-gateway-from-specific-ip-addresses-whitelist-ips-l30</link>
      <guid>https://dev.to/aws-builders/allow-access-to-rest-api-gateway-from-specific-ip-addresses-whitelist-ips-l30</guid>
      <description>&lt;p&gt;How to allow specific IP or range of IP addresses to access our REST API endpoints?&lt;/p&gt;

&lt;p&gt;In this article, I will share how to whitelist an IP address to allow access to the REST API endpoint and deny/block all the requests originating from different source IPs. This article is purely for the APIs with REST protocol within API Gateway. The method/approach that we are going to use to control the whitelisting of IPs is via Resource Policy.&lt;/p&gt;

&lt;p&gt;Here, I am going to allow/whitelist my IP address to access/invoke the API Endpoint and block the rest of the requests originating from sources other than my IP address.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting started
&lt;/h3&gt;

&lt;p&gt;To get started, create a lambda function (&lt;strong&gt;requestService&lt;/strong&gt;) which will be our back-end integration for our REST API Gateway (which we will create in a while). The lambda function will simply return the hard-coded response whenever the endpoint (GET method) will be invoked, without any business logic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx88u3d5u1tt5iwaqv9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx88u3d5u1tt5iwaqv9y.png" alt="Lambda function" width="555" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Post creation of the Lambda function, go ahead to API Management Console and create the REST API from scratch or you can also open any existing REST API. As a next step create the resource (&lt;strong&gt;/processrequest&lt;/strong&gt;) along with the GET method. In the end, integrate the lambda function (&lt;strong&gt;requestService&lt;/strong&gt;) with the GET method. Please refer to the below screenshot for integration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12p5asn6hn1yci9bssa8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12p5asn6hn1yci9bssa8.png" alt="API GW config" width="720" height="227"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For similar detailed step by step setup of the resources you can refer to my tutorial on &lt;a href="https://youtu.be/OIUSeiZzk6k" rel="noopener noreferrer"&gt;Resources, method integration with lambda&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Whitelisting IP address via Resource Policy
&lt;/h3&gt;

&lt;p&gt;With the help of resource policy, we can restrict the API Endpoint invocation to specific requests originating from defined IP addresses and block/deny the rest of the requests.&lt;/p&gt;

&lt;p&gt;After setting up the API Gateway and lambda function, open the API Gateway (which is created in the above step) and click on &lt;strong&gt;Resource Policy&lt;/strong&gt; from the left panel, and copy &amp;amp; paste the below policy in the editor and click on Save.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": "execute-api:Invoke",
      "Resource": "execute-api:/*/*/*"
    },
    {
      "Effect": "Deny",
      "Principal": "*",
      "Action": "execute-api:Invoke",
      "Resource": "execute-api:/*/*/*",
      "Condition": {
        "NotIpAddress": {
          "aws:SourceIp": ["YOUR IP ADDRESS", "IP CIDR BLOCK"]
        }
      }
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, within policy, we have two statement blocks (i.e. Allow &amp;amp; Deny block). The first statement which allows statement states that we are going to allow all the API Endpoint invocations originating from any source to all the resources within our REST API.&lt;/p&gt;

&lt;p&gt;In the second statement, we have defined explicit denial. The deny statement states that block all the requests from all sources to all resources but with a condition. The condition states that block all the requests except the request coming from the IP address mentioned in the &lt;strong&gt;NotIpAddress&lt;/strong&gt; block.&lt;/p&gt;

&lt;p&gt;As a next step, replace the &lt;strong&gt;YOUR IP ADDRESS&lt;/strong&gt; placeholder with your IP address (you can simply google, whatmyip to fetch your IP address) for which you want to allow the API Endpoint invocation. Additionally, you can also define the IP range with the CIDR block. After modification, Click on Save&lt;/p&gt;

&lt;p&gt;Finally, re-deploy the API for the changes to be reflected and get the Invocation URL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing
&lt;/h3&gt;

&lt;p&gt;Post-deployment, copy the invocation URL and paste it into a new tab in your browser and make sure to add &lt;strong&gt;/processrequest&lt;/strong&gt; and hit Enter. As a result, you should be able to see the response coming from the lambda function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6v4mj186etwzj5mr2p4t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6v4mj186etwzj5mr2p4t.png" alt="API Invocation1" width="661" height="68"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To make sure, that the resource policy approach is working fine, go ahead and replace your IP address with localhost IP and click on Save. And re-deploy it.&lt;/p&gt;

&lt;p&gt;Now if you re-hit the API endpoint again then it will return an error message as shown in the below reference image.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4qp42f09da4c0q7socj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft4qp42f09da4c0q7socj.png" alt="API Invocation2" width="720" height="40"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, we made out endpoint secure in a way.&lt;/p&gt;

&lt;p&gt;For a detailed step-by-step setup, you can refer to the video below.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/2ExmCFY2VqQ"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback then please leave them below. &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1" rel="noopener noreferrer"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Handle SQS message failure in batch with partial batch response feature</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Mon, 30 Jan 2023 12:28:26 +0000</pubDate>
      <link>https://dev.to/aws-builders/handle-sqs-message-failure-in-batch-with-partial-batch-response-feature-429l</link>
      <guid>https://dev.to/aws-builders/handle-sqs-message-failure-in-batch-with-partial-batch-response-feature-429l</guid>
      <description>&lt;p&gt;AWS has announced, “&lt;a href="https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-partial-batch-response-sqs-event-source/"&gt;AWS Lambda now supports partial batch response for SQS as an event source&lt;/a&gt;”. Before we go through, how it works let understand that how SQS messages were handled before, and then we will go through how the partial batch response feature will add value.&lt;/p&gt;

&lt;h3&gt;
  
  
  1st scenario
&lt;/h3&gt;

&lt;p&gt;Assumption: The SQS trigger is configured for the lambda function. The exception is not handled in case of any message processing failure for a given batch.&lt;/p&gt;

&lt;p&gt;The lambda function is triggered with the batch of 5 messages and if the lambda function fails to process any of the messages and throws an error then all the 5 messages would be kept on the queue to be reprocessed after a visibility timeout period. In this case, either the batch processing would be completely successful and the messages would be deleted from the SQS queue or it would completely fail and put the whole batch in the queue for reprocessing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj28aluja5e8tykht6c8.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frj28aluja5e8tykht6c8.gif" alt="1st scenario" width="720" height="405"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
import boto3

def lambda_handler(event, context):
    region_name = os.environ['AWS_REGION']
    if event:
        for record in event['Records']:
            body = record['body']
            # process message
    return {
        'statusCode': 200,
        'body': json.dumps('Message processed successfully!')
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Re-processing already processed message is not feasible, so to avoid it at some level let’s delete each message from the batch once it is processed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2nd scenario
&lt;/h3&gt;

&lt;p&gt;Assumption: The SQS trigger is configured for the lambda function. The exception is not handled in case of any message processing failure. But the delete functionality is added to delete each message after it is processed successfully.&lt;/p&gt;

&lt;p&gt;In this case, let’s say the first 2 messages in the batch are processed successfully and deleted. The 3rd message failed and lambda returns an error, so for this failure, the 3rd, 4th &amp;amp; 5th messages will be set for a retry since the 1st &amp;amp; 2nd messages are processed successfully and deleted from the queue. Hence, the processing of already processed messages will not happen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0foir7tiz3wsrz6tajrr.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0foir7tiz3wsrz6tajrr.gif" alt="2nd scenario" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import json
import boto3

def lambda_handler(event, context):
    region_name = os.environ['AWS_REGION']
    if event:
        sqs = boto3.client('sqs', region_name=region_name)
        queue_name = event['Records'][0]['eventSourceARN'].split(':')[-1]
        queue_url = sqs.get_queue_url(
                    QueueName=queue_name,
                )

        for record in event['Records']:
            body = record['body']
            print(body)
            # process message
            response = sqs.delete_message(
                        QueueUrl=queue_url['QueueUrl'],
                        ReceiptHandle=record['receiptHandle']
                    )

    return {
        'statusCode': 200,
        'body': json.dumps('Message processed successfully!')
    }
view raw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lambda failed to process the 3rd message from the batch and due to that further processing of the rest of the messages is interrupted. But let’s say we want all the messages to be processed even if any message is failing and the successfully processed messages should be deleted and only the failed message should be retried.&lt;/p&gt;

&lt;h3&gt;
  
  
  3rd scenario
&lt;/h3&gt;

&lt;p&gt;Assumption: The SQS trigger is configured for the lambda function. Exception handling for failed messages is configured on top of delete functionality.&lt;/p&gt;

&lt;p&gt;Here, we will maintain a flag to determine if any message is failing. Let’s say again the 3rd message failed. Now, since we have error handling in place, it will handle the failed message and process the rest of the messages in the batch followed by the deletion. Finally, the manual exception will be raised based on the flag condition which will cause the failed message to retry since the rest of the messages are processed and deleted successfully. In this scenario, we cannot control which messages we want lambda to retry with and if we want to control the messages that lambda should retry then the partial batch response feature is the answer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6yyz5br95d4vns22his2.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6yyz5br95d4vns22his2.gif" alt="3rd scenario" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import json
import boto3

def lambda_handler(event, context):
    region_name = os.environ['AWS_REGION']
    if event:
        sqs = boto3.resource('sqs', region_name=region_name)
        queue_name = event['Records'][0]['eventSourceARN'].split(':')[-1]
        queue = sqs.get_queue_by_name(QueueName=queue_name)
        failed_flag = False
        messages_to_delete = []
        for record in event['Records']:
            try:
                body = record['body']
                # process message
                messages_to_delete.append({
                    'Id': record['messageId'],
                    'ReceiptHandle': record['receiptHandle']
                })
            except RuntimeError as e:
                failed_flag =True

        if messages_to_delete:
            delete_response = queue.delete_messages(
                    Entries=messages_to_delete)
        if failed_flag:
            raise RuntimeError('Failed to process messages')

    return {
        'statusCode': 200,
        'body': json.dumps('Messages processed successfully!')
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All the 3 scenarios were about how lambda can handle the messages depending on the requirements before the partial batch response was introduced.&lt;/p&gt;

&lt;h3&gt;
  
  
  4th Scenario
&lt;/h3&gt;

&lt;p&gt;With the partial batch response feature, a lambda can identify the failed messages from the batch and return the identified messages back to the queue which will allow reprocessing of only the failed or asked messages. This will make the SQS queue processing more efficient, kill the need for repetitive data transfer with increased throughput, improve processing performance, and on top of that it does not come with any additional cost beyond the standard price.&lt;/p&gt;

&lt;p&gt;While using this feature, exception handling should be in place and the lambda function has to return the message ids of the messages that requires reprocessing in the particular format given below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsx8ak4ha1p9vrqsv98uh.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsx8ak4ha1p9vrqsv98uh.gif" alt="4th scenario" width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To enable and handle partial batch failure, check the &lt;em&gt;Report batch item failures&lt;/em&gt; option under &lt;em&gt;Additional settings&lt;/em&gt; while adding the SQS trigger.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30odz92tfdk5dq8598xf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30odz92tfdk5dq8598xf.png" alt="SQS trigger config screen" width="720" height="749"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After the SQS event source configuration, the response part in the code should be in a particular format that is given below for the partial batch response failure to work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{    
    "batchItemFailures": [          
        {             
            "itemIdentifier": "id2"         
        },         
        {             
            "itemIdentifier": "id4"         
        }     
    ] 
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
import json
import boto3

def lambda_handler(event, context):
    if event:
        messages_to_reprocess = []
        batch_failure_response = {}
        for record in event["Records"]:
            try:
                body = record["body"]
                # process message
            except Exception as e:
                messages_to_reprocess.append({"itemIdentifier": record['messageId']})

        batch_failure_response["batchItemFailures"] = messages_to_reprocess
        return batch_failure_response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a detailed video please refer to the below video.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/fq1r6AMjLA0"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommendation
&lt;/h3&gt;

&lt;p&gt;For most of the situations/scenarios adopting the implementation used in the 4th scenario would be beneficial and have added advantage of efficient &amp;amp; fast processing, reduced repetitive data transfer hence increased throughput.&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback then please leave them below. &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>sqs</category>
      <category>tutorial</category>
      <category>feature</category>
    </item>
    <item>
      <title>How to map static outbound IP with the AWS Lambda function</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Sun, 29 Jan 2023 13:00:50 +0000</pubDate>
      <link>https://dev.to/aws-builders/how-to-map-static-outbound-ip-with-the-aws-lambda-function-3g11</link>
      <guid>https://dev.to/aws-builders/how-to-map-static-outbound-ip-with-the-aws-lambda-function-3g11</guid>
      <description>&lt;p&gt;In this article, I will share how to map elastic IP and make sure that, all the outgoing requests from the lambda function have the same IP address/origin.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario
&lt;/h3&gt;

&lt;p&gt;There is an External API and it only allows or entertains the requests originating from specific IP addresses which are whitelisted as a part of the policy of the external API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyiwau597xmvn5i612oke.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyiwau597xmvn5i612oke.png" alt="Scenario" width="551" height="181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, we want to invoke/access the external API from the lambda function but there is a problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem
&lt;/h3&gt;

&lt;p&gt;The problem here is that the AWS Lambda function does not guarantee the same outbound IP address with each invocation since the AWS Lambda function runs on containers within the AWS environment in AWS-managed VPC and at times the container could run in different execution environments. And different environments have a different IP addresses, hence whitelisting any outbound IP within external IP is not possible since it is bound to change.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ui7hrfvcn3m7o6wzjb6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ui7hrfvcn3m7o6wzjb6.png" alt="Whitelist IP address flow" width="651" height="161"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So how can we assign the static IP to the lambda function and make sure that all the requests made from the Lambda function originate from the same IP address?&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution
&lt;/h3&gt;

&lt;p&gt;The solution is to map static/elastic IP to all the outgoing traffic from the lambda function and for that, place the lambda function in VPC. However, as soon as you place the lambda function in VPC, it will lose the internet connectivity and here we want to invoke the external API which is over the internet. Now, to enable internet connectivity, the NAT Gateway &amp;amp; Internet Gateway is also required.&lt;/p&gt;

&lt;h3&gt;
  
  
  High-level steps
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Create VPC with Public &amp;amp; Private subnets&lt;/li&gt;
&lt;li&gt;Create NAT Gateway under Public subnet with Elastic IP&lt;/li&gt;
&lt;li&gt;Create Internet Gateway&lt;/li&gt;
&lt;li&gt;Map NAT Gateway within private subnets route table (Source: 0.0.0.0/0 Destination: nat-gateway)&lt;/li&gt;
&lt;li&gt;Map Internet Gateway within public subnet route table (Source: 0.0.0.0/0 Destination: internet-gateway)&lt;/li&gt;
&lt;li&gt;Place lambda function in VPC under Private subnet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprzfr37e45afgmhjj0c6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fprzfr37e45afgmhjj0c6.png" alt="High-level solution" width="800" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Create &amp;amp; Configure VPC
&lt;/h3&gt;

&lt;p&gt;To get started, navigate to VPC Management Console to create the VPC and select VPC and more option to create &amp;amp; configure VPC, subnets, AZs, NAT Gateway &amp;amp; Internet Gateway on a single screen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzet9hctiixxps4ia8raq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzet9hctiixxps4ia8raq.png" alt="Create VPC" width="800" height="584"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Configure the asked details as per your requirement. My configuration is as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name tag auto-generation:&lt;/strong&gt; &lt;em&gt;lambda-elastic-ip&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IPv4 CIDR block:&lt;/strong&gt; &lt;em&gt;10.100.0.0/16&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IPv6 CIDR block:&lt;/strong&gt; &lt;em&gt;No IPv6 CIDR block&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenancy:&lt;/strong&gt; &lt;em&gt;Default&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Number of Availability Zones (AZs):&lt;/strong&gt; &lt;em&gt;2&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First availability zone:&lt;/strong&gt; &lt;em&gt;us-east-1a&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Second availability zone:&lt;/strong&gt; &lt;em&gt;us-east-1b&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Number of public subnets:&lt;/strong&gt; &lt;em&gt;2&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Public subnet CIDR block in us-east-1a:&lt;/strong&gt; &lt;em&gt;10.100.0.0/20&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public subnet CIDR block in us-east-1b:&lt;/strong&gt; &lt;em&gt;10.100.16.0/20&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Number of private subnets:&lt;/strong&gt; &lt;em&gt;2&lt;/em&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Private subnet CIDR block in us-east-1a:&lt;/strong&gt; &lt;em&gt;10.100.128.0/20&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private subnet CIDR block in us-east-1b:&lt;/strong&gt; &lt;em&gt;10.100.144.0/20&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NAT gateways:&lt;/strong&gt; &lt;em&gt;In 1 AZ&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC endpoints:&lt;/strong&gt; &lt;em&gt;None&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;

&lt;p&gt;&lt;strong&gt;DNS options&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;✓ Enable DNS hostnames&lt;br&gt;
✓ Enable DNS resolution&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0akrrrqhlxydm7nz9nqc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0akrrrqhlxydm7nz9nqc.png" alt="VPC Setup" width="800" height="208"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above VPC setup will take care of adding necessary routes (NAT &amp;amp; IG mappings) in the respective route tables.&lt;/p&gt;
&lt;h3&gt;
  
  
  Create &amp;amp; configure lambda function
&lt;/h3&gt;

&lt;p&gt;Navigate to Lambda Management Console and create the lambda function with Python3.9 as runtime. Post creation, open the relevant IAM role attached with the lambda function and add &lt;strong&gt;AWSLambdaVPCAccessExecutionRole&lt;/strong&gt; permission to it.&lt;/p&gt;

&lt;p&gt;As a next step, add the below snippet as a part of the function code for testing purposes. The below snippet will invoke (GET) the API Endpoint mentioned as a part of &lt;strong&gt;API_URL&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
import urllib3
def lambda_handler(event, context):
    http = urllib3.PoolManager()
    API_URL = "YOUR_API_URL"
    response = http.request("GET", API_URL)
    print(response.data)
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Srce Cde!')
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Note: Replace &lt;strong&gt;YOUR_API_URL&lt;/strong&gt; with the actual API endpoint.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As a next step, place the lambda function in VPC under private subnets as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9aupvr646b0un0c2io94.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9aupvr646b0un0c2io94.png" alt="VPC-Lambda Setup" width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing
&lt;/h3&gt;

&lt;p&gt;For testing, I have an external REST API (API Gateway) where I can control which requests to entertain based on the origin (IP address). If you also want to have similar setup then you can follow my tutorial video on &lt;a href="https://youtu.be/2ExmCFY2VqQ"&gt;How to whitelist IP address to access API Gateway | REST&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Execution log of the lambda function before adding/whitelisting the Elastic IP address of the lambda function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12j1lw2pdms185rs1w58.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F12j1lw2pdms185rs1w58.png" alt="CloudWatch log1" width="800" height="105"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Execution log of the lambda function after adding/whitelisting the Elastic IP address of the lambda function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mpbp2lqf4zz3qorylg3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8mpbp2lqf4zz3qorylg3.png" alt="CloudWatch log2" width="800" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Response “&lt;strong&gt;Successful Invocation!&lt;/strong&gt;” is returned from the external API.&lt;/p&gt;

&lt;p&gt;Finally, we successfully attached/mapped the Elastic IP address with the AWS Lambda function.&lt;/p&gt;

&lt;p&gt;For a detailed step-by-step tutorial and implementation of the above setup, you can refer to the below video.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/h2H32_4TvPo"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback then please leave them below. &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Send notification based on CloudWatch logs filter patterns</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Fri, 27 Jan 2023 09:44:52 +0000</pubDate>
      <link>https://dev.to/aws-builders/send-notification-based-on-cloudwatch-logs-filter-patterns-3cg5</link>
      <guid>https://dev.to/aws-builders/send-notification-based-on-cloudwatch-logs-filter-patterns-3cg5</guid>
      <description>&lt;p&gt;Multiple resources are deployed and we want a mechanism to get notified when any unusual things start to appear in logs, which as a developer we do not want to see. To get notified, we need to parse the logs to track certain Keywords, Errors, and Critical information which we think are red flags. And in this article, I will share how to send an email notification when certain Keywords, Errors appear in the CloudWatch logs.&lt;/p&gt;

&lt;p&gt;For the purpose of this tutorial, we will consider a lambda function that we want to track and get notified of any errors or keywords. To track the logs, we will use CloudWatch Log Subscription Filter. CloudWatch Subscription Filter will analyze logs for certain errors or keywords using filter patterns (we will define them in the latter part of this article) and based on that it will invoke the destination resource (in our case, it’s the Error Handler lambda function).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe73k2b1merpge8supmnb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe73k2b1merpge8supmnb.png" alt="Workflow" width="800" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The above is the architecture that we will implement. Here, we have a number of lambda functions for which we want to get notified in case of any errors. The lambda functions will push logs to CloudWatch and on the log group, we will have a subscription filter along with filter patterns. It will look for keywords like ERROR, INFO in the logs and if any of the keywords exist then it will invoke the Error Handler lambda function. The Error Handler lambda function will decode, decompress &amp;amp; parse the log data and will publish a message to the SNS topic, and the SNS topic will further publish the message to its subscribers (In our case it will be an email subscriber)&lt;/p&gt;

&lt;p&gt;Here, we will start with the creation of the SNS Topic along with the email subscription (To which the notification will be sent). After that, create a dummy lambda function, which will act as an error, keyword-producing function. As a next step, create the Error Handler lambda function followed by the configuration of the CloudWatch Event trigger (where we will define the filter patterns). Finally, update the code base and add the necessary permission to publish a message from the lambda function.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-on
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SNS
&lt;/h3&gt;

&lt;p&gt;Create the standard SNS topic&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nc01o84ic10he1l5umq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nc01o84ic10he1l5umq.png" alt="SNS topic" width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open the SNS topic that you have created and create &lt;strong&gt;Email&lt;/strong&gt; subscription.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78613tmke1qfkb4jevoz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78613tmke1qfkb4jevoz.png" alt="Topic subscription" width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After creating the subscription, you will receive the confirmation email. So, make sure to click on confirm subscription to receive email notifications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda functions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Error/Keyword producer lambda function&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create the error/keyword producer lambda function with the appropriate name &amp;amp; Python3.9 as runtime. The purpose of this lambda function will be to publish certain keywords in the logs, which we want to track to validate the end-to-end flow. Post creation, update the code of the same with the below snippet.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
def lambda_handler(event, context):
    logger.info("Sample INFO log")
    logger.error("Sample ERROR log")
    logger.critical("High LATENCY")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it once to check if the lambda function is running without errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error Handler lambda function&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create another lambda function with the relevant name &amp;amp; Python3.9 as runtime. The purpose of this lambda function is to parse the error logs and publish the message to SNS.&lt;/p&gt;

&lt;p&gt;As a next step, update the source code of the function from here and deploy: &lt;a href="https://github.com/srcecde/aws-tutorial-code/blob/master/lambda/lambda_proces_cw_error_notification.py" rel="noopener noreferrer"&gt;https://github.com/srcecde/aws-tutorial-code/blob/master/lambda/lambda_proces_cw_error_notification.py&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After deployment, add the environment variable SNS_TOPIC_ARN and the value as the ARN of the SNS topic that is created earlier.&lt;/p&gt;

&lt;p&gt;Post adding the environment variable, we need to provide the permission to lambda function to publish a message to SNS. So, add the permission to the IAM role (Which can be found by clicking on Configuration → Permissions → Role name) of the lambda function to publish the message to the SNS topic. Create a new policy within IAM (IAM → Click Policies → Create policy → JSON) and paste the below policy.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Please update the policy with region &amp;amp; account id.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "sns:Publish",
            "Resource": "arn:aws:sns:&amp;lt;region&amp;gt;:&amp;lt;account_id&amp;gt;:publish-notification"
        }
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Attach the policy with the IAM role of the lambda function (&lt;em&gt;IAM → Roles → Open Role → Add permissions → Attach policies → select the policy that you created above → Attach policies&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CloudWatch Trigger | Filter Pattern&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open the Error Handler lambda function, click on Add Trigger, and select CloudWatch Logs from the drop-down.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Under Log Group, select the CloudWatch log group of the error/keyword producer lambda function.&lt;/li&gt;
&lt;li&gt;Enter the Filter name&lt;/li&gt;
&lt;li&gt;Under Filter pattern, enter the filter value pattern that you want to track. For ex: I want to track the logs that contain keywords like ERROR &amp;amp; LATENCY, so I will enter &lt;strong&gt;?ERROR ?LATENCY&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Similarly, you can define different patterns as per your requirement and to learn more about how you can define patterns, please refer to this &lt;a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html" rel="noopener noreferrer"&gt;material&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Post configuration, click on &lt;em&gt;Add&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87xsqeezyp95zissm3vd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87xsqeezyp95zissm3vd.png" alt="Lambda Trigger" width="800" height="609"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are all set to test this setup. Go ahead to the error/keyword producer lambda function and click on Test. And in a while, you will receive an email notification, which will look something like this because the logs contain the keyword ERROR &amp;amp; LATENCY&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmi5uxg85heud9526y539.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmi5uxg85heud9526y539.png" alt="Email" width="800" height="340"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a detailed step-by-step tutorial and implementation of the above setup, you can refer to the below video.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/7gRMy8KX-1g"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Finally, I also want to provide the CDK code for this use case and for that please refer to this repository: &lt;a href="https://github.com/srcecde/aws-cdk-code/tree/main/lambda-cloudwatch-pattern-notification" rel="noopener noreferrer"&gt;https://github.com/srcecde/aws-cdk-code/tree/main/lambda-cloudwatch-pattern-notification&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback then please leave them below. &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1" rel="noopener noreferrer"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
      <category>react</category>
      <category>javascript</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Whitelist IP addresses for Lambda function URLs</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Wed, 25 Jan 2023 04:20:14 +0000</pubDate>
      <link>https://dev.to/aws-builders/whitelist-ip-addresses-for-lambda-function-urls-3f7h</link>
      <guid>https://dev.to/aws-builders/whitelist-ip-addresses-for-lambda-function-urls-3f7h</guid>
      <description>&lt;p&gt;&lt;a href="https://youtu.be/NIk5ut7_5sM" rel="noopener noreferrer"&gt;Lambda function URLs&lt;/a&gt; feature is the recent addition to the AWS Lambda service. With a lambda function URL, one can invoke the lambda function via a unique URL similar to the invocation of any API endpoint with respective methods.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkw9gqw9zfguka7ewkgv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkw9gqw9zfguka7ewkgv.png" alt="Whitelist IP address for function URL" width="720" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, we will configure/add the functionality to validate the IP address of the incoming requests via function URL which will enable us to only serve the requests originating from the whitelisted IP addresses and block the rest while Auth Type is selected as None.&lt;/p&gt;

&lt;p&gt;As of now, we cannot leverage resource policy to whitelist IP addresses for lambda function URLs since that feature is not available. So, here we will write a simple python function to add that functionality as a part of the lambda function code base.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hands-On
&lt;/h2&gt;

&lt;p&gt;Create the lambda function and the function URL for the same.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67yvgg0esakkxz0jo8zw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F67yvgg0esakkxz0jo8zw.png" alt="Lambda function + function URL" width="720" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a next step, update the source code of the function from here and deploy: &lt;a href="https://github.com/srcecde/aws-tutorial-code/blob/master/lambda/lambda_ip_val_func_url.py" rel="noopener noreferrer"&gt;https://github.com/srcecde/aws-tutorial-code/blob/master/lambda/lambda_ip_val_func_url.py&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Post deploying the code, add the environment variable &lt;strong&gt;IP_RANGE&lt;/strong&gt; with the list of IP addresses, CIDR blocks (for IP range) that need to be whitelisted. If you do not add the environment variable, then by default it will return status code 500 with the message &lt;strong&gt;&lt;em&gt;Unauthorized&lt;/em&gt;&lt;/strong&gt; for all incoming requests.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Note&lt;/strong&gt;: The status code and message can be modified within the code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhe3aupwb820lu5led7i9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhe3aupwb820lu5led7i9.png" alt="Setting ENV variable" width="800" height="141"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The updated lambda function code will check &amp;amp; validate the origin IP address of the request against the whitelisted IP addresses as a part of the &lt;strong&gt;IP_RANGE&lt;/strong&gt; environment variables.&lt;/p&gt;

&lt;p&gt;Now, we are all set to test it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;p&gt;For testing, we will use Postman and the setup will look as below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnw1q0zlp0ywn9a3ja6m0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnw1q0zlp0ywn9a3ja6m0.png" alt="GET request" width="720" height="33"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The endpoint will return 500 Forbidden if the IP address is not whitelisted as a part of an IP_RANGE environment variable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wot0pycporoaz0wcqcq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wot0pycporoaz0wcqcq.png" alt="Result, before whitelisting the IP" width="720" height="92"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After whitelisting the IP address as a part of an IP_RANGE environment variable, the endpoint will return status code 200 with an appropriate response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiibf9atd26ox55hou17b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiibf9atd26ox55hou17b.png" alt="Result, after whitelisting the IP" width="720" height="90"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, here we have a few disadvantages when we decide to choose this methodology.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For all invalid calls (Invocation calls from the IP addresses which are not whitelisted), the lambda function will get triggered each time and that will add up the cost for each unwanted call&lt;/li&gt;
&lt;li&gt;For all valid calls (Invocation calls from the IP addresses which are whitelisted), the validation of IP address logic will add up to the execution time with added relevant cost&lt;/li&gt;
&lt;li&gt;Cannot whitelist private IP addresses (For ex: Private VPCs IP ranges)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a detailed end-to-end, step-by-step setup, you can refer to the video below.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/a3jvN9MS3Sw"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback then please leave them below. &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1" rel="noopener noreferrer"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>react</category>
      <category>frontend</category>
      <category>welcome</category>
    </item>
    <item>
      <title>AWS Glue | CSV to Parquet transformation | Getting started</title>
      <dc:creator>Chirag (Srce Cde)</dc:creator>
      <pubDate>Tue, 24 Jan 2023 04:24:20 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-glue-csv-to-parquet-transformation-getting-started-2j13</link>
      <guid>https://dev.to/aws-builders/aws-glue-csv-to-parquet-transformation-getting-started-2j13</guid>
      <description>&lt;p&gt;AWS Glue is a fully managed serverless ETL service. It makes it easy to discover, transform and load data that would be consumed by various processes and applications. If you want to learn more about AWS Glue then please refer to the video on &lt;a href="https://youtu.be/ZSryMLGMLLw" rel="noopener noreferrer"&gt;AWS Glue Overview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7oacipe3k061ma84hkeh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7oacipe3k061ma84hkeh.png" alt="Objective (CSV to Parquet)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this article, we will go through the basic end-to-end CSV to Parquet transformation using AWS Glue. We will use multiple services to implement the solution like IAM, S3 and AWS Glue. As a part of AWS Glue, we will use crawlers, Data Catalog including Database &amp;amp; Tables and ETL jobs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2tk1al1zm1r726my9os6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2tk1al1zm1r726my9os6.png" alt="Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s understand the above flow.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a crawler, which will connect to the S3 data store&lt;/li&gt;
&lt;li&gt;Post successful connection, it will infer or determine the structure of the CSV file using a built-in classifier&lt;/li&gt;
&lt;li&gt;The crawler will write the metadata in the form of a table in the AWS Glue Data Catalog&lt;/li&gt;
&lt;li&gt;After populating the data catalog, create the ETL job to transform CSV into parquet&lt;/li&gt;
&lt;li&gt;The data source for the ETL job will be the AWS Glue Data Catalog table and as a part of the transformation, we will apply the mappings. Post transformation, the data will be stored in S3 as a parquet file&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Hands-On
&lt;/h2&gt;

&lt;p&gt;Enough of text, let’s jump to the AWS Management Console&lt;/p&gt;

&lt;h3&gt;
  
  
  Create IAM role
&lt;/h3&gt;

&lt;p&gt;As a first step, create the IAM role to provide the necessary permissions to AWS Glue to access various services. (For ex: S3)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;em&gt;IAM Management Console → Roles → Create role&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Select Glue as a service &amp;amp; add &lt;em&gt;AWSGlueServiceRole&lt;/em&gt; permission&lt;/li&gt;
&lt;li&gt;Create the role&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the above permission, AWS Glue will be able to access the S3 buckets which contain aws-glue as a part of the bucket name for the defined permissions in this policy. Also, it contains additional permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create S3 Bucket
&lt;/h3&gt;

&lt;p&gt;Create the S3 bucket with the name which contains &lt;em&gt;aws-glue&lt;/em&gt; as a part of the bucket name Or else you have to modify the policy in the above step.&lt;/p&gt;

&lt;p&gt;Post bucket creation, create the following folder structure&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

s3-bucket
├── data-store/
│   └── annual-reports/
│       └── csv_reports
└── target-data-store/
    └── parquet-reports/


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We will create the database in Glue with the same name as &lt;em&gt;annual_reports&lt;/em&gt;, and the table name would be &lt;em&gt;csv_reports&lt;/em&gt;. However, it is not necessary that the database &amp;amp; table should have the same name as the folders. It’s just that, it is easy to map things with this kind of naming convention.&lt;/p&gt;

&lt;p&gt;As a next step, upload the CSV file in the &lt;em&gt;csv_reports&lt;/em&gt; folder. I have used the CSV data from &lt;a href="https://stats.govt.nz/large-datasets/csv-files-for-download/" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create database
&lt;/h3&gt;

&lt;p&gt;Go to &lt;em&gt;AWS Glue Console → Click Databases → Add database&lt;/em&gt; as follows&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjphjd86wiqsfnr270v09.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjphjd86wiqsfnr270v09.png" alt="Create database"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create Crawler to populate Table
&lt;/h3&gt;

&lt;p&gt;The table can be created manually or via a crawler. The crawler is a program that connects to the data store, infers structure/schema using a built-in or custom classifier and writes the Metadata as a table.&lt;/p&gt;

&lt;p&gt;To create a crawler → Click on &lt;em&gt;Crawlers&lt;/em&gt; from the left panel → Configure the details as follows and create the crawler&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8of5zvb0z88k4zv16ml.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8of5zvb0z88k4zv16ml.png" alt="Crawler config"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After creating the crawler, run the crawler and if everything is configured as above, it will populate the table under the selected database.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create ETL job
&lt;/h3&gt;

&lt;p&gt;Once the table is populated, create the ETL job to transform &amp;amp; load the CSV file as a parquet file.&lt;/p&gt;

&lt;p&gt;To create an ETL job → Click on &lt;em&gt;Jobs&lt;/em&gt; from the left panel&lt;/p&gt;

&lt;p&gt;To create a job, there are a couple of options and we will select Visual with a blank canvas. Apart from that it also allows you to write &amp;amp; upload the python shell scripts, spark scripts and interactive development with jupyter notebook. So, you can try these options.&lt;/p&gt;

&lt;p&gt;Here we have two side-by-side views, &lt;em&gt;Visual&lt;/em&gt; on the left side and its respective options on the right side.&lt;/p&gt;

&lt;p&gt;Click on Source under Visual and select the data source. In our case, it’s &lt;em&gt;AWS Glue Data Catalog&lt;/em&gt;. Select the node and configure the respective properties on the right side which are &lt;em&gt;Database&lt;/em&gt; &amp;amp; &lt;em&gt;Table&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2m4d1zylw0svwlc3f3h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2m4d1zylw0svwlc3f3h.png" alt="Config Glue Job"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The next step is the transformation, click on &lt;em&gt;Transform&lt;/em&gt; → Select &lt;em&gt;Apply Mapping&lt;/em&gt; or any other transformation based on the requirements. For the purpose of this article, &lt;em&gt;Apply Mapping&lt;/em&gt; is selected.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r9b46n86b5d4mojmuua.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1r9b46n86b5d4mojmuua.png" alt="Config Glue Job1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, click on &lt;em&gt;Target&lt;/em&gt; → select &lt;em&gt;S3&lt;/em&gt; to load the data as a target data store. Configure the properties.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6qe4zvd0870m59b8wpn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6qe4zvd0870m59b8wpn.png" alt="Config Glue Job2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After this configuration, you can check the generated script under the &lt;em&gt;Script&lt;/em&gt; tab. The script can be modified as per the requirement. As a next step, click on &lt;em&gt;Job details&lt;/em&gt; and update the job name and select the &lt;em&gt;IAM&lt;/em&gt; role. Save the job and click on &lt;em&gt;Run&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7ckxochlbx7vndvxvtq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7ckxochlbx7vndvxvtq.png" alt="Job details"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check the job Run details under &lt;em&gt;Runs&lt;/em&gt; and check the CloudWatch logs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlh0ugcfgeb0p1vf8yt3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlh0ugcfgeb0p1vf8yt3.png" alt="Job run"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point, the CSV to Parquet transformation is successful. Check the output in the &lt;em&gt;S3 Target Location&lt;/em&gt; configured in the above step.&lt;/p&gt;

&lt;p&gt;For a detailed step-by-step tutorial and the implementation of the above use case please refer to the below video.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/nMtvGkSSWRo"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;If you have any questions, comments, or feedback then please leave them below. &lt;a href="https://www.youtube.com/srcecde?sub_confirmation=1" rel="noopener noreferrer"&gt;Subscribe to my channel&lt;/a&gt; for more.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>awsglue</category>
      <category>data</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
