<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Oteng Isaac</title>
    <description>The latest articles on DEV Community by Oteng Isaac (@devoteng1).</description>
    <link>https://dev.to/devoteng1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F302964%2F41d78add-1e6b-46fa-9462-856ad47c5e05.jpeg</url>
      <title>DEV Community: Oteng Isaac</title>
      <link>https://dev.to/devoteng1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/devoteng1"/>
    <language>en</language>
    <item>
      <title>Building a Production-Ready Text-to-Text API with AWS Bedrock, Lambda &amp; API Gateway</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Wed, 31 Dec 2025 00:20:07 +0000</pubDate>
      <link>https://dev.to/aws-builders/building-a-production-ready-text-to-text-api-with-aws-bedrock-lambda-api-gateway-305a</link>
      <guid>https://dev.to/aws-builders/building-a-production-ready-text-to-text-api-with-aws-bedrock-lambda-api-gateway-305a</guid>
      <description>&lt;h2&gt;
  
  
  Building a Production-Ready Text-to-Text API with AWS Bedrock, Lambda &amp;amp; API Gateway
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Overview
&lt;/h3&gt;

&lt;p&gt;This project demonstrates how to design and deploy a production-ready text-to-text AI API using AWS Bedrock and Amazon Titan Text, exposed securely via Amazon API Gateway and powered by AWS Lambda.&lt;/p&gt;

&lt;p&gt;The goal is to show how organizations can integrate Generative AI capabilities into real business systems while maintaining security, scalability, cost control, and observability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Business Use Case
&lt;/h3&gt;

&lt;p&gt;Many organizations want to leverage Generative AI for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Internal copilots&lt;/li&gt;
&lt;li&gt;Automated content generation&lt;/li&gt;
&lt;li&gt;Text summarization&lt;/li&gt;
&lt;li&gt;Data explanations&lt;/li&gt;
&lt;li&gt;Customer support automation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;However, directly exposing foundation models to applications can introduce security, cost, and governance risks.&lt;/p&gt;

&lt;p&gt;This project solves that by:&lt;br&gt;
Abstracting the foundation model behind a controlled API&lt;br&gt;
Enforcing consistent prompts and parameters&lt;br&gt;
Centralizing access, logging, and cost management&lt;br&gt;
The result is a secure AI service layer that can be reused across multiple teams and applications.&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1h4v5dthrxricbnc90q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1h4v5dthrxricbnc90q.png" alt=" " width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Flow:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Client sends text input to an API endpoint&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;API Gateway validates and routes the request&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lambda processes the request and invokes AWS Bedrock&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon Titan generates a text response&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The response is returned to the client&lt;/p&gt;
&lt;h3&gt;
  
  
  🛠️ Tools &amp;amp; Services Used
&lt;/h3&gt;
&lt;h4&gt;
  
  
  🔹 AWS Bedrock
&lt;/h4&gt;

&lt;p&gt;Fully managed service for accessing foundation models&lt;/p&gt;

&lt;p&gt;No infrastructure to manage&lt;/p&gt;

&lt;p&gt;Enterprise-grade security&lt;/p&gt;

&lt;p&gt;Pay-per-use pricing&lt;/p&gt;
&lt;h4&gt;
  
  
  🔹 Amazon Titan Text (amazon.titan-text-express-v1)
&lt;/h4&gt;

&lt;p&gt;Fast, cost-efficient text generation model&lt;/p&gt;

&lt;p&gt;Ideal for text-to-text use cases&lt;/p&gt;

&lt;p&gt;Deterministic behavior with low temperature&lt;/p&gt;

&lt;p&gt;Designed for enterprise workloads&lt;/p&gt;
&lt;h4&gt;
  
  
  🔹 AWS Lambda
&lt;/h4&gt;

&lt;p&gt;Serverless compute for business logic&lt;/p&gt;

&lt;p&gt;Handles request validation and AI invocation&lt;/p&gt;

&lt;p&gt;Scales automatically&lt;/p&gt;
&lt;h4&gt;
  
  
  🔹 Amazon API Gateway
&lt;/h4&gt;

&lt;p&gt;Securely exposes the AI service as a REST API&lt;/p&gt;

&lt;p&gt;Enables authentication, throttling, and monitoring&lt;/p&gt;

&lt;p&gt;Acts as the public interface for applications&lt;/p&gt;
&lt;h4&gt;
  
  
  🔹 Python (Boto3)
&lt;/h4&gt;

&lt;p&gt;AWS SDK for invoking Bedrock&lt;/p&gt;

&lt;p&gt;Lightweight and production-friendly&lt;/p&gt;
&lt;h4&gt;
  
  
  🧠 Why This Design Matters
&lt;/h4&gt;

&lt;p&gt;Stateless AI calls: Foundation models do not retain memory&lt;/p&gt;

&lt;p&gt;Explicit control: Prompts and parameters are centrally managed&lt;/p&gt;

&lt;p&gt;Security-first: IAM-controlled access to Bedrock&lt;/p&gt;

&lt;p&gt;Cost management: Token limits and model choice enforced&lt;/p&gt;

&lt;p&gt;Reusability: Multiple applications can consume the same API&lt;/p&gt;

&lt;p&gt;This mirrors how AI platforms are built in regulated and enterprise environments.&lt;/p&gt;
&lt;h4&gt;
  
  
  🧩 AWS Lambda: Text-to-Text Processing Logic
&lt;/h4&gt;

&lt;p&gt;Below is an example AWS Lambda function written in Python that receives text from API Gateway, invokes AWS Bedrock (Amazon Titan Text), and returns the generated response.&lt;/p&gt;

&lt;p&gt;This Lambda acts as the controlled AI service layer between your applications and the foundation model.&lt;/p&gt;

&lt;p&gt;Let's create the function&lt;/p&gt;

&lt;p&gt;Go to the management console, and search for AWS lambda. &lt;br&gt;
Click on &lt;strong&gt;Create function&lt;/strong&gt; to open the function creation page. Enter a name for the function and choose &lt;strong&gt;Python&lt;/strong&gt; as the runtime. Accept all defaults and click on &lt;strong&gt;Create function&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9x1k2qwuz23ahf11tjt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi9x1k2qwuz23ahf11tjt.png" alt=" " width="800" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Replace the code in the code editor with the code shared in the &lt;a href="https://github.com/isaacotengdev/AWS-bedrock-text-to-text-API-" rel="noopener noreferrer"&gt;Github Repo&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F16zkx7jtrgrlaze88hx2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F16zkx7jtrgrlaze88hx2.png" alt=" " width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Increase the time out to 30 sec as shown below. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56ewhuec2y57782bz9wp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56ewhuec2y57782bz9wp.png" alt=" " width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  🔐 Required IAM Permissions
&lt;/h4&gt;

&lt;p&gt;The Lambda execution role must allow invoking Bedrock models:&lt;br&gt;
Lambda by defaults has access to &lt;strong&gt;Cloudwatch&lt;/strong&gt; for log writing. We need to grant lambda access to Bedrock and the foundation model&lt;/p&gt;

&lt;p&gt;Go to &lt;strong&gt;Configurations&lt;/strong&gt; then &lt;strong&gt;Permissions&lt;/strong&gt;. Click on the &lt;strong&gt;Role name&lt;/strong&gt; and update with the policy below. This grants Lambda access to Bedrock and the foundation Model&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjp7u4k3x841o0jbck6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjp7u4k3x841o0jbck6a.png" alt=" " width="800" height="288"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowInvokeTitanText",
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-text-express-v1"
      ]
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  🌐 API Gateway Request Example
&lt;/h4&gt;

&lt;p&gt;Let's create the API using AWS API Gateway. In the AWS API Gateway service page click on &lt;strong&gt;Create API&lt;/strong&gt;. Choose &lt;strong&gt;REST API&lt;/strong&gt; as the type and click on &lt;strong&gt;Build&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkc7p32io62tjd6h5opng.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkc7p32io62tjd6h5opng.png" alt=" " width="800" height="111"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the resources page, click on &lt;strong&gt;Create resource&lt;/strong&gt;. Give the resources a name and click on &lt;strong&gt;Create resource&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6iu6wce7jqe0thmmsmrg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6iu6wce7jqe0thmmsmrg.png" alt=" " width="800" height="156"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on &lt;strong&gt;Create Method&lt;/strong&gt;. Choose method type as &lt;strong&gt;POST&lt;/strong&gt; and Integration type as &lt;strong&gt;Lambda function&lt;/strong&gt;. Check &lt;strong&gt;Lambda proxy integration&lt;/strong&gt; and select the created &lt;strong&gt;lambda function&lt;/strong&gt;. Click on &lt;strong&gt;Create method&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xiyydx66oq0i3mtd6hy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xiyydx66oq0i3mtd6hy.png" alt=" " width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on &lt;strong&gt;Deploy API&lt;/strong&gt;. Select &lt;strong&gt;&lt;em&gt;New stage&lt;/em&gt;&lt;/strong&gt; and enter a &lt;strong&gt;Stage name&lt;/strong&gt;. Click on &lt;strong&gt;Deploy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g2p2jgp37m2n0ozwv9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g2p2jgp37m2n0ozwv9e.png" alt=" " width="601" height="519"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the &lt;strong&gt;Stage details page&lt;/strong&gt;, copy the &lt;strong&gt;Invoke URL&lt;/strong&gt;. You can use any API client like &lt;strong&gt;Postman&lt;/strong&gt; to test the API as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qrmribt7fhjwyfw77zz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6qrmribt7fhjwyfw77zz.png" alt=" " width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Test using any API client. In this demonstration, I have used Postman as shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
POST https://05q0if5orb.execute-api.us-east-1.amazonaws.com/prod/text
{
    "text": "what is Amazon Bedrock"
}


✅ API Response Example


{
    "response": "\nAmazon Bedrock is the name of AWS’s managed service for managing the underlying infrastructure that powers your intelligent bot. It is a collection of services that you can use to build, deploy, and scale intelligent bots at scale. Amazon Bedrock is a managed service that makes foundation models from leading AI startup and Amazon’s own Titan models available through APIs. For up-to-date information on Amazon Bedrock and how 3P models are approved, endorsed or selected please see the provided documentation and relevant FAQs."
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Forkmz7ett4s7td201h3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Forkmz7ett4s7td201h3y.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  🧠 Why This Lambda Design Matters
&lt;/h4&gt;

&lt;p&gt;Keeps foundation models behind a secure API&lt;/p&gt;

&lt;p&gt;Enforces consistent parameters (temperature, token limits)&lt;/p&gt;

&lt;p&gt;Prevents direct client access to Bedrock&lt;/p&gt;

&lt;p&gt;Enables logging, monitoring, and governance&lt;/p&gt;

&lt;p&gt;This pattern is commonly used to build enterprise AI platforms.&lt;/p&gt;

&lt;h4&gt;
  
  
  📦 Example Use Cases
&lt;/h4&gt;

&lt;p&gt;Text summarization API&lt;/p&gt;

&lt;p&gt;AI-powered content generation service&lt;/p&gt;

&lt;p&gt;Analytics explanation engine&lt;/p&gt;

&lt;p&gt;Internal AI assistant backend&lt;/p&gt;

&lt;p&gt;Secure GenAI microservice&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>aws</category>
      <category>ai</category>
      <category>api</category>
    </item>
    <item>
      <title>🧠 Building a Conversational Chatbot with AWS Bedrock (Amazon Titan)</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Fri, 26 Dec 2025 23:12:24 +0000</pubDate>
      <link>https://dev.to/aws-builders/building-a-conversational-chatbot-with-aws-bedrock-amazon-titan-4kll</link>
      <guid>https://dev.to/aws-builders/building-a-conversational-chatbot-with-aws-bedrock-amazon-titan-4kll</guid>
      <description>&lt;h1&gt;
  
  
  🧠 Building a Conversational Chatbot with AWS Bedrock (Amazon Titan)
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;Large Language Models don’t magically “remember” conversations.&lt;br&gt;&lt;br&gt;
In real-world systems, &lt;strong&gt;conversation state must be explicitly managed&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this project, we build a &lt;strong&gt;deterministic, production-style conversational chatbot&lt;/strong&gt; using:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AWS Bedrock&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Amazon Titan Text (&lt;code&gt;amazon.titan-text-express-v1&lt;/code&gt;)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Python (boto3)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project demonstrates how teams can safely integrate &lt;strong&gt;foundation models into enterprise workflows&lt;/strong&gt; without giving up control, observability, or reproducibility.&lt;/p&gt;
&lt;h2&gt;
  
  
  🔷 What Is AWS Bedrock?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Amazon Bedrock&lt;/strong&gt; is a &lt;strong&gt;fully managed service&lt;/strong&gt; that provides access to multiple&lt;br&gt;&lt;br&gt;
&lt;strong&gt;foundation models (FMs)&lt;/strong&gt; via a single API — without requiring you to manage infrastructure.&lt;/p&gt;

&lt;p&gt;With Bedrock, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Invoke models securely using IAM&lt;/li&gt;
&lt;li&gt;Choose models from different providers&lt;/li&gt;
&lt;li&gt;Keep data within AWS (no model training on your prompts by default)&lt;/li&gt;
&lt;li&gt;Integrate generative AI directly into existing AWS architectures&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🔑 Key Bedrock Characteristics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Serverless (no infrastructure management)&lt;/li&gt;
&lt;li&gt;Model-agnostic API&lt;/li&gt;
&lt;li&gt;Enterprise-grade security&lt;/li&gt;
&lt;li&gt;Pay-per-use pricing&lt;/li&gt;
&lt;li&gt;Native AWS integration&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  💼 Business Use Cases of AWS Bedrock
&lt;/h2&gt;

&lt;p&gt;AWS Bedrock is designed for &lt;strong&gt;real business workloads&lt;/strong&gt;, not just demos.&lt;/p&gt;
&lt;h3&gt;
  
  
  Common Enterprise Use Cases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; Internal chatbots &amp;amp; AI copilots
&lt;/li&gt;
&lt;li&gt; Document summarization &amp;amp; analysis
&lt;/li&gt;
&lt;li&gt; Automated reporting &amp;amp; insight generation
&lt;/li&gt;
&lt;li&gt; Semantic search over internal data
&lt;/li&gt;
&lt;li&gt; AI-assisted debugging &amp;amp; data quality analysis
&lt;/li&gt;
&lt;li&gt; Analytics narrative generation
&lt;/li&gt;
&lt;li&gt; Customer support automation
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Why Companies Choose Bedrock
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Data &lt;strong&gt;never leaves AWS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;IAM-controlled access&lt;/li&gt;
&lt;li&gt;Works seamlessly with &lt;strong&gt;S3, Lambda, Glue, Databricks, Redshift&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;No lock-in to a single model provider&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  🔶 Amazon Titan Text (&lt;code&gt;amazon.titan-text-express-v1&lt;/code&gt;)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Amazon Titan Text Express&lt;/strong&gt; is a fast, cost-efficient text generation model built by AWS.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Characteristics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Optimized for &lt;strong&gt;low-latency text generation&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Ideal for &lt;strong&gt;chatbots, summarization, and explanations&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Deterministic behavior when temperature is low&lt;/li&gt;
&lt;li&gt;Fully managed and secured by AWS&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  When to Use Titan Text Express
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Conversational assistants&lt;/li&gt;
&lt;li&gt;Structured responses&lt;/li&gt;
&lt;li&gt;Enterprise-safe workloads&lt;/li&gt;
&lt;li&gt;Cost-sensitive applications&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Titan does &lt;strong&gt;not&lt;/strong&gt; manage conversation state — which is why explicit memory handling (as shown in this project) is essential.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  🏗 Architecture Overview
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsr7fg78bcedf8nz1ehg1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsr7fg78bcedf8nz1ehg1.png" alt=" " width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The entire conversation history is sent &lt;strong&gt;on every request&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  🧠 Core Design Decisions
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1️⃣ Explicit Conversation Memory
&lt;/h3&gt;

&lt;p&gt;Amazon Titan does not track sessions.&lt;/p&gt;

&lt;p&gt;We:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store user and assistant messages&lt;/li&gt;
&lt;li&gt;Append them to a history list&lt;/li&gt;
&lt;li&gt;Inject the full history into each prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes the system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predictable&lt;/li&gt;
&lt;li&gt;Auditable&lt;/li&gt;
&lt;li&gt;Easy to debug&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  2️⃣ Role-Based Prompt Formatting
&lt;/h3&gt;

&lt;p&gt;Conversation is formatted as:&lt;/p&gt;

&lt;p&gt;User: ...&lt;br&gt;
Assistant: ...&lt;/p&gt;

&lt;p&gt;This significantly improves response quality and consistency.&lt;/p&gt;
&lt;h3&gt;
  
  
  3️⃣ Stop Sequences
&lt;/h3&gt;

&lt;p&gt;We configure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"stopSequences"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"User:"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents the model from hallucinating the next user message.&lt;/p&gt;

&lt;p&gt;4️⃣ Deterministic Generation&lt;/p&gt;

&lt;p&gt;Low temperature&lt;/p&gt;

&lt;p&gt;Explicit assistant cue&lt;/p&gt;

&lt;p&gt;Token limits&lt;/p&gt;

&lt;p&gt;How to Run the Project&lt;/p&gt;

&lt;p&gt;Clone the Github repo &lt;a href="https://github.com/isaacotengdev/Amazon_Bedrock_Chatbot" rel="noopener noreferrer"&gt;AWS Bedrock Chatbot(Titan)&lt;/a&gt;&lt;br&gt;
Prerequisites&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;AWS credentials configured
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Install Dependencies
pip install boto3

Run the Chatbot
python chatbot.py


Type exit to quit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also experiment with other models available on the AWS Bedrock service page.&lt;/p&gt;

&lt;p&gt;On the AWS Bedrock service page &lt;strong&gt;Click on Model Catalog&lt;/strong&gt; . Here you have access to other model providers like Meta, Anthropic, Mistral AI etc. You can search for different models from different or same providers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F327zm73j0ow90yft3z8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F327zm73j0ow90yft3z8t.png" alt=" " width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Click on a Model, read it's documentation to understand how to use it in your project and copy the model's ID&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmfrlox8l9bolahqwujq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffmfrlox8l9bolahqwujq.png" alt=" " width="800" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
    </item>
    <item>
      <title>AWS Glue ETL Jobs: Transform Your Data at Scale</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Sun, 07 Dec 2025 16:18:23 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-glue-etl-jobs-transform-your-data-at-scale-2l4n</link>
      <guid>https://dev.to/aws-builders/aws-glue-etl-jobs-transform-your-data-at-scale-2l4n</guid>
      <description>&lt;h1&gt;
  
  
  AWS Glue ETL Jobs: Transform Your Data at Scale
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://dev.to/devoteng1/data-cataloguing-in-aws-34ob"&gt;First part: AWS Data Cataloguing&lt;/a&gt;&lt;br&gt;
Even though the AWS Glue Crawler creates your Data Catalog automatically, some projects require a transformation step. This is where AWS Glue ETL Jobs come in. Glue ETL allows you to clean, transform, standardize, and enrich your raw datasets using PySpark at scale.&lt;/p&gt;

&lt;p&gt;In this section, we will build a simple but production-ready Glue ETL script that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads data from the raw S3 bucket using the Data Catalog&lt;/li&gt;
&lt;li&gt;Performs basic cleaning (renaming, casting types, dropping fields)&lt;/li&gt;
&lt;li&gt;Converts it into a structured format (Parquet recommended)&lt;/li&gt;
&lt;li&gt;Writes the output into the Clean Zone in S3&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🏗 Step 1: Create a Glue Job
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open AWS Glue Console → Click on &lt;strong&gt;ETL Jobs&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fko9gfyd6tk9t1280aw6t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fko9gfyd6tk9t1280aw6t.png" alt=" " width="800" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can start the job creation process from a blank canvas, notebook or script editor in the following ways.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4foc3b2matgnb7ar76ct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4foc3b2matgnb7ar76ct.png" alt="AWS Glue" width="800" height="109"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visual ETL&lt;/strong&gt;&lt;br&gt;
Choose Visual ETL to start with an empty canvas. Use this option when you want to create a job that has multiple data sources or if you want to explore the available data sources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Author using an interactive code notebook&lt;/strong&gt;&lt;br&gt;
Choose Notebook to start with a blank Notebook to create jobs in Python using the Spark or Ray kernel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Author code with a script editor&lt;/strong&gt;&lt;br&gt;
Choose Script editor to start with only Python boilerplate text added to your job script, or to upload your own script. If you choose to upload your own script, you can select only Python files or files with the extension .scala from your local file system. Use this option if you have a job script you want to import into AWS Glue Studio, or you prefer writing your own ETL job &lt;br&gt;
In this demonstration, I will choose the &lt;strong&gt;Script editor&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Script editor&lt;/strong&gt; and select &lt;strong&gt;Spark&lt;/strong&gt; as the engine and &lt;strong&gt;Start fresh&lt;/strong&gt; as the option and click &lt;strong&gt;Create script&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezcaisfyfxqq8qfr98tu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fezcaisfyfxqq8qfr98tu.png" alt=" " width="800" height="281"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the &lt;strong&gt;Script editor&lt;/strong&gt; replace the default code with the code below&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1349qvnfbuucuigwk4g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1349qvnfbuucuigwk4g.png" alt=" " width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  💻 Glue ETL Job Script
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#Imports Python's system module to access command-line arguments.
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;

&lt;span class="c1"&gt;#Imports all AWS Glue transformation functions (though none are explicitly used in this script).
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.transforms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;

&lt;span class="c1"&gt;#Imports utility to parse job parameters passed to the Glue job.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;getResolvedOptions&lt;/span&gt;

&lt;span class="c1"&gt;#Imports GlueContext, which wraps SparkContext and provides Glue-specific functionality.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GlueContext&lt;/span&gt;

&lt;span class="c1"&gt;#Imports Job class for managing Glue job lifecycle and bookmarking.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.job&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;

&lt;span class="c1"&gt;#Imports DynamicFrame, Glue's data structure that handles schema variations better than Spark DataFrames.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.dynamicframe&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DynamicFrame&lt;/span&gt;

&lt;span class="c1"&gt;#Imports SparkContext, the entry point for Spark functionality.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;

&lt;span class="c1"&gt;#Imports PySpark SQL functions for data transformations.
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.sql.functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;when&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_timestamp&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ---------------------------------------------------------------------------------
# Initialize Glue Job
# ---------------------------------------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getResolvedOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;glueContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GlueContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark_session&lt;/span&gt;
&lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ---------------------------------------------------------------------------------
# Read raw CSV data from S3 using the Glue Data Catalog table
# ---------------------------------------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;raw_dyf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_dynamic_frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_catalog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;database&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders_db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;medallion_orders_2025_12_17&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# update with your catalog table name
&lt;/span&gt;    &lt;span class="n"&gt;transformation_ctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_dyf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ---------------------------------------------------------------------------------
# Column Standardization
# ---------------------------------------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_dyf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toDF&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Standardize column names (Spark-friendly)
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toDF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# ---------------------------------------------------------------------------------
# Clean &amp;amp; Transform Data
# ---------------------------------------------------------------------------------
&lt;/span&gt;
&lt;span class="c1"&gt;# Trim whitespace
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;column&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="c1"&gt;# Convert datatypes
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;to_timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yyyy-MM-dd HH:mm:ss&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;int&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Remove invalid rows
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isNotNull&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;customer_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isNotNull&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Fix negative values (replace with null or filter)
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;when&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;otherwise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="c1"&gt;# Create derived columns
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_price_in_USD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;total_amount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Remove duplicates
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropDuplicates&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# ---------------------------------------------------------------------------------
# Convert back to DynamicFrame
# ---------------------------------------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;final_dyf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DynamicFrame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromDF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_dyf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ---------------------------------------------------------------------------------
# Write to Clean S3 Zone (Partitioned)
# ---------------------------------------------------------------------------------
&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://medallion-orders-2025-12-17/clean/orders/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# update with your path
&lt;/span&gt;
&lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write_dynamic_frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_dyf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;connection_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;connection_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;partitionKeys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;transformation_ctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;datasink&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure you replace the S3 destination path, AWS Glue database and table.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on Job details. In the &lt;strong&gt;Name&lt;/strong&gt; field enter a name for the job. Choose an &lt;strong&gt;IAM role&lt;/strong&gt; that has access to the data sources (S3 in this case). Leave all defaults and click on &lt;strong&gt;Save&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwkuitouiqs3vv5eeuox.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flwkuitouiqs3vv5eeuox.png" alt=" " width="800" height="565"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Save the script and click on &lt;strong&gt;Run&lt;/strong&gt; to initiate running of the job&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5br7wfaoxsg58687exb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5br7wfaoxsg58687exb.png" alt=" " width="800" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Below shows the running of the job&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fol78syvgvp444u2mgrj2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fol78syvgvp444u2mgrj2.png" alt=" " width="800" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Below shows the outputs folder &lt;strong&gt;clean&lt;/strong&gt; created by the job and the input folder which contained the data &lt;strong&gt;raw&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5lb6uu85hupv2f3ghlf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5lb6uu85hupv2f3ghlf.png" alt=" " width="800" height="110"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🧼 What This Script Actually Does
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1️⃣ Ingestion
&lt;/h3&gt;

&lt;p&gt;Reads the raw CSV using the Data Catalog entry created by the crawler.&lt;/p&gt;

&lt;h3&gt;
  
  
  2️⃣ Cleaning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Renames inconsistent column names&lt;/li&gt;
&lt;li&gt;Drops irrelevant fields&lt;/li&gt;
&lt;li&gt;Converts data types&lt;/li&gt;
&lt;li&gt;Normalizes the schema&lt;/li&gt;
&lt;li&gt;Removes duplicates&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3️⃣ Writing to Clean Zone
&lt;/h3&gt;

&lt;p&gt;Outputs the cleaned, structured dataset to an S3 Clean Bucket in Parquet format, ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Athena&lt;/li&gt;
&lt;li&gt;Redshift Spectrum&lt;/li&gt;
&lt;li&gt;Quicksight&lt;/li&gt;
&lt;li&gt;Machine learning workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/isaacotengdev/AWS_GLUE_ETL_PIPELINE" rel="noopener noreferrer"&gt;Github Repo&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dataengineering</category>
      <category>etl</category>
      <category>awsbigdata</category>
    </item>
    <item>
      <title>Data Cataloguing in AWS</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Wed, 03 Dec 2025 13:44:41 +0000</pubDate>
      <link>https://dev.to/aws-builders/data-cataloguing-in-aws-34ob</link>
      <guid>https://dev.to/aws-builders/data-cataloguing-in-aws-34ob</guid>
      <description>&lt;h1&gt;
  
  
  AWS Data Cataloguing
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Cataloguing Data in AWS Using Glue Crawlers: A Practical Guide for Data Engineers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;In modern data engineering, one of the most overlooked but powerful capabilities is data cataloguing. Without a clear understanding of what data exists, where it lives, its schema, and how it changes over time, no ETL architecture can scale. In this guide, I walk through how to catalogue data using AWS Glue Crawlers, and how to structure your metadata layer when working with raw and cleaned datasets stored in Amazon S3.&lt;/p&gt;

&lt;p&gt;This tutorial uses a simple CSV file in an S3 raw bucket and walks through how AWS Glue automatically discovers its structure and builds a searchable, query-ready data catalog. You can replicate every step through your AWS Console and include screenshots to transform this into a visual, practical learning resource.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Data Cataloguing?
&lt;/h2&gt;

&lt;p&gt;Data cataloguing is the process of creating a structured inventory of all your data assets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A good data catalog contains:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dataset name&lt;/li&gt;
&lt;li&gt;Schema (columns, data types, partitions)&lt;/li&gt;
&lt;li&gt;Location (e.g., S3 path)&lt;/li&gt;
&lt;li&gt;Metadata (size, owner, last updated)&lt;/li&gt;
&lt;li&gt;Tags, classifications, lineage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it as the "index" of your data ecosystem - similar to how a library catalog helps readers find books quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Makes data discoverable across teams&lt;/li&gt;
&lt;li&gt;Reduces manual documentation&lt;/li&gt;
&lt;li&gt;Ensures schema consistency across pipelines&lt;/li&gt;
&lt;li&gt;Enables data validation and quality checks&lt;/li&gt;
&lt;li&gt;Fuels self-service analytics&lt;/li&gt;
&lt;li&gt;Supports governance and compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Cataloguing in ETL Pipelines
&lt;/h2&gt;

&lt;p&gt;ETL pipelines depend heavily on metadata. Before transforming any dataset, the pipeline must understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What columns exist&lt;/li&gt;
&lt;li&gt;Which data types to enforce&lt;/li&gt;
&lt;li&gt;What partitions to use&lt;/li&gt;
&lt;li&gt;What schema evolution has happened&lt;/li&gt;
&lt;li&gt;How to map raw → cleaned → curated layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;A strong data catalog ensures that:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ETL jobs run reliably&lt;/li&gt;
&lt;li&gt;Glue/Spark scripts do not break due to schema drift&lt;/li&gt;
&lt;li&gt;Downstream BI tools (Athena, QuickSight, Superset, Power BI) can read data instantly&lt;/li&gt;
&lt;li&gt;Data lineage and documentation stay updated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS Glue Data Catalog acts as the central metadata store for all your structured and semi-structured data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Below is the structure you'll demonstrate in your article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flw1spq7hh10jj2jmb8at.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flw1spq7hh10jj2jmb8at.png" alt=" " width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The project walkthrough will show how Glue Crawlers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scan an S3 bucket&lt;/li&gt;
&lt;li&gt;Detect the schema (headers, types, formatting)&lt;/li&gt;
&lt;li&gt;Generate metadata&lt;/li&gt;
&lt;li&gt;Store the metadata as a table in the Data Catalog&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This metadata is then queryable through Amazon Athena, interoperable with Glue ETL Jobs, and usable by analytics tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Amazon S3, AWS Glue Crawler, and the Glue Data Catalog
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Amazon S3 (Simple Storage Service)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Amazon S3 is a fully managed object storage service that allows you to store any type of data at scale—CSV files, logs, JSON, Parquet, images, and more.&lt;br&gt;&lt;br&gt;
It is highly durable, cost-effective, and integrates seamlessly with AWS analytics services. In most modern data engineering architectures (including the Medallion architecture), S3 serves as the &lt;strong&gt;landing&lt;/strong&gt;, &lt;strong&gt;raw&lt;/strong&gt;, and &lt;strong&gt;processed&lt;/strong&gt; layers where data is ingested and stored before further transformation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;AWS Glue Crawler&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;An AWS Glue Crawler is an automated metadata discovery tool that scans data stored in Amazon S3 and other sources.&lt;br&gt;&lt;br&gt;
When the crawler runs, it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads the file structure and content
&lt;/li&gt;
&lt;li&gt;Detects the data format (CSV, JSON, Parquet, etc.)
&lt;/li&gt;
&lt;li&gt;Infers column names and data types
&lt;/li&gt;
&lt;li&gt;Identifies partitions
&lt;/li&gt;
&lt;li&gt;Classifies datasets using built-in or custom classifiers
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The crawler then automatically creates or updates table metadata without you having to define schemas manually.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;AWS Glue Data Catalog&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Glue Data Catalog is a centralized metadata repository for all your datasets within AWS.&lt;br&gt;&lt;br&gt;
It stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table definitions
&lt;/li&gt;
&lt;li&gt;Schema information
&lt;/li&gt;
&lt;li&gt;Partition details
&lt;/li&gt;
&lt;li&gt;Metadata used by analytics services
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the Glue Crawler finishes scanning an S3 bucket, it writes the discovered schema and table information into the Glue Data Catalog.&lt;br&gt;&lt;br&gt;
This metadata can then be queried by services such as &lt;strong&gt;Athena&lt;/strong&gt;, &lt;strong&gt;EMR&lt;/strong&gt;, &lt;strong&gt;Redshift Spectrum&lt;/strong&gt;, and &lt;strong&gt;AWS Glue ETL&lt;/strong&gt; jobs.&lt;/p&gt;

&lt;p&gt;In short, the workflow is:&lt;br&gt;
&lt;strong&gt;S3 → Glue Crawler scans files → Schema is inferred → Metadata is stored in Glue Data Catalog → Data becomes queryable.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-Step Workflow
&lt;/h2&gt;

&lt;p&gt;Below is the structure you'll follow in your Medium/LinkedIn article when documenting your implementation with screenshots.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Upload Your CSV File to Amazon S3
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Create an S3 bucket named: &lt;code&gt;medallion-orders-2025-12-17&lt;/code&gt;  &lt;strong&gt;(Replace with your bucket name)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create an S3 bucket (basic settings)&lt;/span&gt;
aws s3api create-bucket &lt;span class="nt"&gt;--bucket&lt;/span&gt; medallion-orders-2025-12-17 &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Upload your sample CSV file (e.g., &lt;code&gt;orders.csv&lt;/code&gt;)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Upload the CSV file to the bucket&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;orders.csv s3://medallion-orders-2025-12-17/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Upload to a folder (prefix)&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;orders.csv s3://medallion-orders-2025-12-17/raw/orders.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyx9aub8z7kirkgq7wdho.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyx9aub8z7kirkgq7wdho.png" alt=" " width="800" height="117"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Create a Glue Database
&lt;/h3&gt;

&lt;p&gt;In the Glue Console:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Go to &lt;strong&gt;Data Catalog → Databases&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click &lt;strong&gt;Add database&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j4o3d5q4oa8tjf6am2y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7j4o3d5q4oa8tjf6am2y.png" alt=" " width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Name it &lt;code&gt;orders_db&lt;/code&gt; and click on &lt;strong&gt;Create database&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2372n1fjh6bi8wevi2yz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2372n1fjh6bi8wevi2yz.png" alt=" " width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Create an AWS Glue Crawler
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Navigate to &lt;strong&gt;Glue → Crawlers&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click on &lt;strong&gt;Create crawler&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Provide a name (e.g., &lt;code&gt;orders_crawler&lt;/code&gt;) and click &lt;strong&gt;Next&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwv9ga3mf5zsfcipni7cs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwv9ga3mf5zsfcipni7cs.png" alt=" " width="800" height="232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click on  &lt;strong&gt;Add a data source&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ok62gctstg18gh54yfu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ok62gctstg18gh54yfu.png" alt=" " width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose &lt;strong&gt;S3&lt;/strong&gt; as the data store and Click on  &lt;strong&gt;Browse S3&lt;/strong&gt; to select the S3 bucket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcslnrj5fahnk538gaj2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcslnrj5fahnk538gaj2p.png" alt=" " width="800" height="569"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On the next screen, choose a role (Glue created role or custom IAM role) Click &lt;strong&gt;Next&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffo6m4xj6gqju75ge9aey.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffo6m4xj6gqju75ge9aey.png" alt=" " width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select your database. For Crawler schedule &lt;strong&gt;On demand&lt;/strong&gt; and click &lt;strong&gt;Next&lt;/strong&gt;  then &lt;strong&gt;Create crawler&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobqjprz89rdy2up1okx5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobqjprz89rdy2up1okx5.png" alt=" " width="800" height="368"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run the crawler&lt;/strong&gt; and wait until status shows &lt;strong&gt;complete&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Run the Crawler &amp;amp; Generate Metadata
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbvem33g2jj3q5prcb2r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbvem33g2jj3q5prcb2r.png" alt=" " width="800" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the crawler completes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It will create a table inside your Glue Data Catalog database&lt;/li&gt;
&lt;li&gt;Open the table to view:

&lt;ul&gt;
&lt;li&gt;Columns&lt;/li&gt;
&lt;li&gt;Data types&lt;/li&gt;
&lt;li&gt;S3 location&lt;/li&gt;
&lt;li&gt;Classification (csv)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Query the Table Using Amazon Athena
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Open Athena&lt;/li&gt;
&lt;li&gt;Select your Glue database&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fel6runz52koz3snrnqrq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fel6runz52koz3snrnqrq.png" alt=" " width="800" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run a simple &lt;code&gt;SELECT * FROM "AwsDataCatalog"."orders_db"."medallion_orders_2025_12_17" limit 10;&lt;/code&gt; Repalce tablename with your table&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltx61p14er5h5sq9exj8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltx61p14er5h5sq9exj8.png" alt=" " width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Outcome
&lt;/h2&gt;

&lt;p&gt;After completing the steps, you will have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A fully indexed representation of your raw data&lt;/li&gt;
&lt;li&gt;A searchable table in Glue Data Catalog&lt;/li&gt;
&lt;li&gt;A metadata-driven foundation for ETL jobs&lt;/li&gt;
&lt;li&gt;A structure ready for transformation into a cleaned bucket and eventually a curated analytics layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This sets the stage for my next article:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Building ETL pipelines using Glue ETL Jobs and writing cleaned data back into S3."&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Data cataloguing is a foundational step in any scalable data engineering architecture. AWS Glue Crawlers make it easy to automate metadata extraction from raw data sources, reduce manual schema definition, and keep your ETL pipelines schema-aware and resilient.&lt;/p&gt;

&lt;p&gt;By the end of this project, you'll have a practical, AWS-native setup that you can build on for data cleaning, transformations, and analytical workloads.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dataengineering</category>
      <category>tutorial</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Medallion Architecture On AWS</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Mon, 01 Dec 2025 16:02:04 +0000</pubDate>
      <link>https://dev.to/aws-builders/medallion-architecture-on-aws-2ngm</link>
      <guid>https://dev.to/aws-builders/medallion-architecture-on-aws-2ngm</guid>
      <description>&lt;h1&gt;
  
  
  Medallion Architecture On AWS
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Building Modern Data Lakes on AWS S3 with the Medallion Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotravqyztnez39indpst.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fotravqyztnez39indpst.jpg" alt=" " width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data constitute the foundation of contemporary enterprises. However as the volume,&lt;br&gt;
velocity, and variety of data grow, organizations face a critical&lt;br&gt;
challenge: &lt;strong&gt;how to store, manage, and analyze data efficiently at&lt;br&gt;
scale&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;data lakes&lt;/strong&gt; which is a centralized repository is used to stored  structured, &lt;br&gt;
semi-structured, and unstructured data in their raw form. When combined with &lt;strong&gt;AWS S3&lt;/strong&gt; &lt;br&gt;
and the &lt;strong&gt;Medallion architecture&lt;/strong&gt;, this approach provides &lt;strong&gt;scalable, reliable, and layered approach&lt;/strong&gt; &lt;br&gt;
for transforming raw data into insights ready for analysis.&lt;/p&gt;

&lt;p&gt;In this post, we'll explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Why S3 is the go-to storage for modern data lakes&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Medallion architecture and its layers (Bronze, Silver, Gold)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Practical design patterns, best practices, and real-world use cases&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How AWS services integrate seamlessly with S3-based data lakes&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;1. Why AWS S3 for Data Lakes?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Amazon S3 (Simple Storage Service) is &lt;strong&gt;object storage&lt;/strong&gt; that offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unlimited storage&lt;/strong&gt; -- scale from gigabytes to petabytes of data without distruption&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High durability&lt;/strong&gt; -- 11 nines of durability (99.999999999%)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flexible storage classes and formats&lt;/strong&gt; -- Standard, Infrequent Access, Glacier etc classes and can stores data &lt;br&gt;
in CSV ,JSON ,Parquet ,ORC etc formats.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Secure and compliant&lt;/strong&gt; -- encryption at rest and transit(SSE-S3/SSE-KMS), IAM&lt;br&gt;
policies, and fine-grained access control&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integration with analytics and AI/ML services&lt;/strong&gt; -- Intergrates seamlessly with AWS services like Glue, Athena,&lt;br&gt;
Redshift Spectrum, EMR, SageMaker, Kinesis and MSK&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why S3 is perfect for a data lake:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Any format&lt;/strong&gt;: can be stored: CSV, JSON, Parquet, Avro, ORC, images, audio,&lt;br&gt;
logs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decouples compute and storage&lt;/strong&gt; -- separates computation from storage so that different analytics engines can &lt;br&gt;
access the same raw data without having to move it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Supports schema-on-read&lt;/strong&gt; -- The schema is defined during querying, not during writing&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Overview of the Medallion Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;layered approach&lt;/strong&gt; to organising your data lake is the &lt;strong&gt;Medallion Architecture&lt;/strong&gt;. To enhance &lt;strong&gt;data quality, governance, and performance&lt;/strong&gt; , &lt;br&gt;
it arranges data into several &lt;strong&gt;refined layers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The layers are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.1 Bronze Layer -- Raw Data&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Ingest all raw data exactly as it was obtained from sources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Semi-structured or unstructured&lt;/li&gt;
&lt;li&gt;  May contain duplicates, errors, or missing values&lt;/li&gt;
&lt;li&gt;  Timestamped to monitor ingestion&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Audit trail&lt;/li&gt;
&lt;li&gt;  Unprocessed logs and events&lt;/li&gt;
&lt;li&gt;  Origin of downstream transformations&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Example in S3:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;s3://your-datalake/bronze/customers/&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;s3://your-datalake/bronze/orders/&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2.2 Silver Layer -- Data that has been conformed and cleaned&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; The goal is to create clean, standardised, and enriched databases from raw data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Deduplicated&lt;/li&gt;
&lt;li&gt;  Corrected data types&lt;/li&gt;
&lt;li&gt;  Enhanced using lookups or joins&lt;/li&gt;
&lt;li&gt;  Consistent timestamps and formats&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Intermediate analytics&lt;/li&gt;
&lt;li&gt;  Providing data to BI dashboards&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Example in S3:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;s3://your-datalake/silver/customers/&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;s3://your-datalake/silver/orders/&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Typical transformations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Filter out bad or null records&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Standardise currencies and timestamps.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Join with reference tables (like product categories)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Verify using business rules&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2.3 Gold Layer -- Business-Level / Analytics Data&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; Create &lt;strong&gt;analytics-ready, aggregated, or curated data&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Fully cleansed, trustworthy, and aggregated&lt;/li&gt;
&lt;li&gt;  Optimized for reporting or machine learning&lt;/li&gt;
&lt;li&gt;   For quicker enquiries, data is frequently stored in &lt;strong&gt;columnar formats&lt;/strong&gt; (Parquet, ORC).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  BI dashboards and reports&lt;/li&gt;
&lt;li&gt;  ML training datasets&lt;/li&gt;
&lt;li&gt;  KPI calculation and trend analysis&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Example in S3:&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;s3://your-datalake/gold/sales_summary/&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;s3://your-datalake/gold/customer_lifetime_value/&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Typical transformations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Aggregation on daily, weekly, or monthly&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Join many silver tables to build fact tables&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Compute metrics like revenue, churn, or retention&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. AWS Services that Complement S3 Data Lakes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Building a Medallion architecture on S3 works best when combined with&lt;br&gt;
AWS analytics services:&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Service&lt;/strong&gt;           &lt;strong&gt;Role in Data Lake&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;AWS Glue&lt;/strong&gt;          ETL/ELT jobs to transform Bronze → Silver → Gold&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Athena&lt;/strong&gt;     Query S3 data directly using SQL without moving&lt;br&gt;
                        it&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Redshift     Query S3 data as external tables in Redshift&lt;br&gt;
  Spectrum&lt;/strong&gt;            &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon EMR&lt;/strong&gt;        Distributed Spark/Hadoop processing for&lt;br&gt;
                        large-scale transformations&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS Lake            Centralized access control, data catalog, and&lt;br&gt;
  Formation&lt;/strong&gt;           governance&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon QuickSight&lt;/strong&gt; BI dashboards on curated Gold data&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;4. Practical S3 Design Patterns&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4.1 Partitioning&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Organize large datasets for query performance&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Common partitions: year=2025/month=11/day=30/&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Example:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;s3://my-datalake/silver/orders/year=2025/month=11/day=30/orders.parquet&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4.2 File Formats&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bronze:&lt;/strong&gt; raw JSON, CSV, log files&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Silver:&lt;/strong&gt; Parquet or ORC (columnar)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gold:&lt;/strong&gt; Parquet with compression (Snappy)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4.3 Naming Conventions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;s3://&amp;lt;bucket&amp;gt;/&amp;lt;layer&amp;gt;/&amp;lt;entity&amp;gt;/year=YYYY/month=MM/day=DD/&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Helps Athena, Glue Crawlers, and partition pruning&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Example Data Flow (End-to-End)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bronze Layer:&lt;/strong&gt; S3 ingests raw data from sources (e.g., Kafka,&lt;br&gt;
APIs, IoT)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Glue ETL:&lt;/strong&gt; Cleanses, deduplicates, and standardizes → Silver&lt;br&gt;
Layer&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Silver Layer:&lt;/strong&gt; Curated, conformed tables available for analytics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gold Layer:&lt;/strong&gt; Aggregations, business KPIs, and ML datasets&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Analytics:&lt;/strong&gt; Athena queries, Redshift reports, QuickSight&lt;br&gt;
dashboards&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Diagram Concept:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3djzxfd4kul5s52yyug2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3djzxfd4kul5s52yyug2.png" alt=" " width="800" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Best Practices for Medallion Data Lakes on S3&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use separate buckets or prefixes per layer&lt;/strong&gt; (Bronze/Silver/Gold)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Partition and compress data&lt;/strong&gt; for performance and cost savings&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enforce data validation rules&lt;/strong&gt; in Silver ETL&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Track metadata&lt;/strong&gt; with Glue Catalog or Lake Formation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Secure access&lt;/strong&gt; using IAM policies, S3 bucket policies, and KMS&lt;br&gt;
encryption&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use consistent naming conventions&lt;/strong&gt; across layers and datasets&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Version your data&lt;/strong&gt; if necessary (append date/time to files for&lt;br&gt;
auditability)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;7. Real-World Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;E-Commerce Data Lake:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bronze: raw JSON order events from web or mobile apps&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Silver: deduplicated, validated orders joined with product catalog&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Gold: aggregated revenue by category, daily sales metrics, customer&lt;br&gt;
LTV&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Analytics: Quicksight dashboards for executives, Athena queries for marketing&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;combining &lt;strong&gt;AWS S3 and the Medallion architecture&lt;/strong&gt; provides a&lt;br&gt;
&lt;strong&gt;scalable, structured, and reliable foundation&lt;/strong&gt; for modern data&lt;br&gt;
analytics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;S3 gives &lt;strong&gt;unlimited storage and flexibility&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Medallion layers ensure &lt;strong&gt;data quality, governance, and analytics&lt;br&gt;
readiness&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Integration with &lt;strong&gt;Glue, Athena, Redshift, and QuickSight&lt;/strong&gt; enables&lt;br&gt;
&lt;strong&gt;end-to-end insights&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By implementing this architecture, organizations can build&lt;br&gt;
&lt;strong&gt;enterprise-grade data lakes&lt;/strong&gt;, reduce time-to-insight, and empower&lt;br&gt;
data-driven decision-making.&lt;/p&gt;

</description>
      <category>data</category>
      <category>aws</category>
      <category>dataengineering</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Data ingestion using AWS Services, Part 2</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Wed, 25 Dec 2024 02:19:56 +0000</pubDate>
      <link>https://dev.to/aws-builders/data-ingestion-using-aws-services-part-2-2pg7</link>
      <guid>https://dev.to/aws-builders/data-ingestion-using-aws-services-part-2-2pg7</guid>
      <description>&lt;p&gt;&lt;strong&gt;Data ingestion using AWS Services, Part 2&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Querying AWS S3 data from AWS Athena using SQL.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AWS Athena is an interactive query service that makes it easy to analyze data on Amazon using standard SQL. In this second part of the tutorial, we are going to crawl the migrated data in AWS S3, create table definitions in the Glue Data Catalog using AWS Glue, and query the data using AWS Athena. AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.&lt;/p&gt;

&lt;p&gt;Before you proceed with this hands-on tutorial, make sure you have completed the &lt;a href="https://medium.com/@otengcode/data-ingestion-using-aws-services-part-1-266f061a2f60" rel="noopener noreferrer"&gt;first part&lt;/a&gt; of the tutorial, **Data Ingestion using AWS Services Part 1. **Below is an architectural diagram of the full project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9k9qvb0rjq706do7cg9c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9k9qvb0rjq706do7cg9c.png" width="800" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Search for and select **AWS Glue **in the top search bar of the AWS console.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click on &lt;strong&gt;Crawler&lt;/strong&gt; and then &lt;strong&gt;Create crawler&lt;/strong&gt;. A crawler accesses your data store (e.g., AWS S3), extracts metadata, and creates table definitions in the AWS Glue Data Catalog.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnydmtval46wa6v8h97od.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnydmtval46wa6v8h97od.png" width="800" height="280"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enter a descriptive name for the crawler job and click &lt;strong&gt;Next.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvx0cu8s0dp0vkdvxq670.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvx0cu8s0dp0vkdvxq670.png" width="800" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on &lt;strong&gt;Add data score.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljf21pp5fj13w7ghvk9r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljf21pp5fj13w7ghvk9r.png" width="800" height="287"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Under &lt;strong&gt;Data source&lt;/strong&gt;, select &lt;strong&gt;S3&lt;/strong&gt;. Click on &lt;strong&gt;Browse S3&lt;/strong&gt; to choose the AWS S3 bucket containing the data we want to query. Leave all defaults and click on &lt;strong&gt;Add an S3 data source&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2fi2rpbxqq7apd3hruq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd2fi2rpbxqq7apd3hruq.png" width="800" height="778"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify and click on &lt;strong&gt;Next.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcohnc6y96i04hxis6bk9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcohnc6y96i04hxis6bk9.png" width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create or select an IAM role under &lt;strong&gt;Existing IAM role&lt;/strong&gt; and click &lt;strong&gt;Next&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwchmm24rup6jzny5jle.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwchmm24rup6jzny5jle.png" width="800" height="290"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on &lt;strong&gt;Add database&lt;/strong&gt; under &lt;strong&gt;Target database&lt;/strong&gt; or select a database in the dropdown. Let's create a database called &lt;strong&gt;testdb&lt;/strong&gt;. Click &lt;strong&gt;Create database.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6rf5i3fomp5v4oj8uuv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6rf5i3fomp5v4oj8uuv.png" width="800" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;For frequency, select &lt;strong&gt;On demand&lt;/strong&gt;. This is used to define a time-based schedule for crawlers and jobs in AWS Glue. Click &lt;strong&gt;Next.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsouiftjkp6fcuuc18xf8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsouiftjkp6fcuuc18xf8.png" width="800" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check all settings and click &lt;strong&gt;Create crawler&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fygazp3tsyedr5yifuatx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fygazp3tsyedr5yifuatx.png" width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;After the successful creation of the crawler, click &lt;strong&gt;on Run crawler&lt;/strong&gt; to start the crawler job.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw26xy25hy4ji4s3w7ttr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw26xy25hy4ji4s3w7ttr.png" width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;To check the status of a crawler, click on &lt;strong&gt;Crawlers&lt;/strong&gt;, the name of the crawler, and then &lt;strong&gt;Crawler runs.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvklybwltynd235l2vtp4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvklybwltynd235l2vtp4.png" width="800" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify the table and the database by clicking on &lt;strong&gt;Tables&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmf1pylx02x1zjso2phen.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmf1pylx02x1zjso2phen.png" width="800" height="192"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Search for and select **AWS Athena **in the top search bar of the AWS console.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click &lt;strong&gt;Query editor&lt;/strong&gt;. In the query editor, click &lt;strong&gt;Settings, **then **Manage&lt;/strong&gt;. In the &lt;strong&gt;Manage settings, **select **Browse S3&lt;/strong&gt; to select an AWS S3 that will serve as the location of the query result. Click on &lt;strong&gt;Save.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nrx98z8ap8mpmkko99s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nrx98z8ap8mpmkko99s.png" width="800" height="541"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In the query editor, enter the following SQL statement:'select **** from testbucketformysqldata123_raw limit 10***’. The query selects all the data migrated into the bucket. Note: Substitute the table name with the name of your table.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj2kjd8lsa2aq2n9s1ux5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj2kjd8lsa2aq2n9s1ux5.png" width="800" height="276"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;strong&gt;Query results&lt;/strong&gt; tab shows the results of the query.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F94wrae3aacw7jd460shr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F94wrae3aacw7jd460shr.png" width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This ends the hands-on project on data ingestion using AWS DMS. Next in the series is SaaS data ingestion using Amazon AppFlow.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Data ingestion using AWS Services, Part 1</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Wed, 25 Dec 2024 02:17:45 +0000</pubDate>
      <link>https://dev.to/aws-builders/data-ingestion-using-aws-services-part-1-ige</link>
      <guid>https://dev.to/aws-builders/data-ingestion-using-aws-services-part-1-ige</guid>
      <description>&lt;p&gt;&lt;strong&gt;Data ingestion using AWS Services, Part 1&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data ingestion is the process of collecting, importing, and transferring raw data from various sources to a storage or processing system where it can be further analyzed, transformed, and used for various purposes. The goal is to bring in data from various sources and make it available for analysis and decision-making. Data ingestion is usually a crucial first step in a data pipeline.&lt;/p&gt;

&lt;p&gt;Data ingestion can be either in batches where data is brought over in bulk at regular intervals, called &lt;strong&gt;BATCH DATA INJECTION,&lt;/strong&gt; or in near real-time, called **STREAM DATA INGESTION **where data is brought over as soon as it is generated.&lt;/p&gt;

&lt;p&gt;The first part of my data ingestion tutorial covers batch data ingestion using AWS services, which I will cover in separate articles. Specifically, I will cover hands-on tutorials on the following topics:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Data ingestion using AWS Data Migration Service (DMS)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SaaS data ingestion using Amazon AppFlow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data ingestion using AWS Glue&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transferring data into AWS S3 using AWS DataSync&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Before we delve into the hands-on, let’s create a data lake in AWS Simple Storage Service (S3). All the ingested data in this hands-on tutorial will first be brought over into a bucket in AWS S3, which is a preferred choice for building data lakes in AWS. Amazon S3 is a scalable object storage service offered by AWS. It is designed to store and retrieve any amount of data from anywhere on the web. Data in S3 is organized into containers called &lt;strong&gt;&lt;em&gt;bucket&lt;/em&gt;&lt;/strong&gt;. A bucket is similar to a directory or folder and must have a globally unique name across all of AWS. Let’s create a bucket in AWS S3, which will serve as a destination for all ingested data.&lt;/p&gt;

&lt;p&gt;To create a bucket, we can use the AWS CLI available through the AWS management console called the &lt;strong&gt;&lt;em&gt;CloudShell.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Log into the AWS console using an administrative user.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;2.&lt;/strong&gt; Search for and select **S3 **in the top search bar of the AWS console.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.&lt;/strong&gt; Click on CloudShell on the top bar of the AWS console. AWS CloudShell is a browser-based shell that can quickly run scripts with the AWS Command Line Interface (CLI) and experiment with service API’s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating AWS S3 bucket&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gkmiiv0ae8hfcwk5c61.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7gkmiiv0ae8hfcwk5c61.png" width="800" height="105"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will create a command-line environment with an AWS CLI preinstalled.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enter the following command to create a bucket : &lt;strong&gt;&lt;em&gt;aws s3 mb s3://&amp;lt;name of bucket&lt;/em&gt;&lt;/strong&gt;&amp;gt; and hit enter. With the bucket name being unique globally across all AWS regions. Note: Replace the name of the bucket with a descriptive and unique name.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkkxer9lskrwr4p0xzz9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzkkxer9lskrwr4p0xzz9.png" width="800" height="58"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alternatively, you can create a bucket using the AWS console bucket creation wizard.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on &lt;strong&gt;Create bucket&lt;/strong&gt;, keep all defaults, enter a globally unique name for the bucket and click &lt;strong&gt;Create bucket.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fif8a4ua55wz4pced4thk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fif8a4ua55wz4pced4thk.png" width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;After a successful creation of the bucket, click on the name of the bucket and go to the **Properties **tab. Copy the ARN of the created bucket and save it somewhere, which will be needed in a later step.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61on6uifz5svv74asa8y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61on6uifz5svv74asa8y.png" width="798" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This ends the creation of the bucket, which will be used as the destination of all ingested data in this tutorial. Next, we want to set permissions on the created bucket to allow AWS DMS access to the bucket to perform operations on the bucket using AWS Identity and Access Management (IAM). AWS IAM is a web service by AWS that helps you securely control access to AWS resources. It enables you to manage users, groups, and permissions within your AWS environment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating AWS IAM Policy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1.&lt;/strong&gt; Search for and select **IAM **in the top search bar of the AWS console.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2.&lt;/strong&gt; Click on &lt;strong&gt;Policies&lt;/strong&gt; and then click on &lt;strong&gt;Create policy&lt;/strong&gt;. A policy is an object in AWS that defines permissions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yaz5m78uazhs48zfs7y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yaz5m78uazhs48zfs7y.png" width="769" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;By default, the &lt;strong&gt;Visual editor&lt;/strong&gt; tab is selected, so click on &lt;strong&gt;JSON&lt;/strong&gt; to change to the &lt;strong&gt;JSON&lt;/strong&gt; tab. In the &lt;strong&gt;actions section&lt;/strong&gt;, grant all actions on &lt;strong&gt;S3&lt;/strong&gt; and in the &lt;strong&gt;resource section,&lt;/strong&gt; provide the &lt;strong&gt;ARN&lt;/strong&gt; of the AWS S3 bucket created in the previous steps. Note that in a production environment, the scope of the permission must be limited.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvf1sn03ocv52sqn8emye.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvf1sn03ocv52sqn8emye.png" width="760" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Click on &lt;strong&gt;Next&lt;/strong&gt;, provide a descriptive policy name for the policy and click &lt;strong&gt;Create policy.&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the left-hand menu, click on &lt;strong&gt;Roles&lt;/strong&gt; and then &lt;strong&gt;Create role&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the &lt;strong&gt;Trusted entity type&lt;/strong&gt;, choose &lt;strong&gt;AWS service,&lt;/strong&gt; and for &lt;strong&gt;Use case&lt;/strong&gt; select &lt;strong&gt;DMS&lt;/strong&gt; then Next.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2317t7zhqd1dt3ml1xf1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2317t7zhqd1dt3ml1xf1.png" width="767" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Select the policy created in the earlier step and click Next.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Provide a descriptive policy name and click &lt;strong&gt;Create role&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At this point, we have created an AWS S3 bucket and an IAM role that allows AWS DMS to perform operations on the bucket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data ingestion using AWS Data Migration Service&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So far, we have created an AWS S3 bucket to host our ingested data. At this point, we will create an AWS Relational Database Service (RDS) MySQL instance, database, and table and populate it with sample data, which will be migrated to AWS S3 as the storage layer using the AWS DMS. We will then crawl the S3 bucket using the Glue Crawler and use the Glue Data Catalog as the metadata repository. We will then query the data using AWS Athena. Below is the architecture of the project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fer9ofxm8tvczr6icoc60.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fer9ofxm8tvczr6icoc60.png" width="800" height="204"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Summary of AWS services used.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;AWS Relational Database Service is a collection of managed services that makes it simple to set up, operate, and scale databases in the cloud.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AWS Database Migration Service is a managed migration and replication service that helps move your database and analytics workloads to AWS quickly, securely, and with minimal downtime and zero data loss.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AWS S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AWS Glue (Glue Crawler, Glue Data Catalog) is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AWS Athena is an interactive query service that makes it easy to analyze data on Amazon using standard SQL.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Creating AWS RDS MySQL DATABASE&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for and select **AWS RDS **in the top search bar of the AWS console.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8uz1laf8g5joblsm1m8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8uz1laf8g5joblsm1m8.png" width="800" height="206"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on &lt;strong&gt;Databases&lt;/strong&gt; and select &lt;strong&gt;Create database&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02b7wljnrky7q1213mpd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F02b7wljnrky7q1213mpd.png" width="800" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Select **Easy create **for the database creation method. Choosing easy create enables best-practice configuration options that can be changed later.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9t1cxl8o60z26fr10sh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs9t1cxl8o60z26fr10sh.png" width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Choose &lt;strong&gt;MySQL&lt;/strong&gt; under Configuration.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jxqca5kclofnaeb4fjq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1jxqca5kclofnaeb4fjq.png" width="800" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Choose &lt;strong&gt;Dev/Test&lt;/strong&gt; under &lt;strong&gt;DB instance size. **Note: You can also select **Free tier&lt;/strong&gt; for this tutorial.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9bg85jsoj3b1ga3l4hik.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9bg85jsoj3b1ga3l4hik.png" width="800" height="215"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Accept all defaults and enter a &lt;strong&gt;Master password&lt;/strong&gt; and click &lt;strong&gt;Create database.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Connecting to AWS RDS MySQL instance using HeidiSQL Client&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s populate the database with sample data. Let’s connect to the AWS RDS MySQL instance using an SQL client. In this tutorial, we are using the free HeidiSQL client. You can use any SQL client of your choice.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If you are using the HeidiSQL client, enter the following details, as shown in the screenshot below: Select &lt;strong&gt;Network type&lt;/strong&gt; as &lt;strong&gt;MySQL on RDS&lt;/strong&gt;. For &lt;strong&gt;Hostname/IP&lt;/strong&gt; copy and paste the &lt;strong&gt;Public IPv4 DNS&lt;/strong&gt;. For &lt;strong&gt;User&lt;/strong&gt; enter the username of the RDS database and &lt;strong&gt;Password&lt;/strong&gt;. Enter &lt;strong&gt;3306&lt;/strong&gt; as the &lt;strong&gt;Port&lt;/strong&gt; which is the default for MySQL databases. Note: Make sure you have configured your subnets to allow inbound traffic on port 3306. Click on &lt;strong&gt;Open&lt;/strong&gt; to establish a connection.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzo0u3696yf49ec49glf4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzo0u3696yf49ec49glf4.png" width="800" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;After a successful connection, enter the highlighted SQL, as shown in the screenshot below. This creates a database called &lt;strong&gt;testdb&lt;/strong&gt;, sets it as the default database and shows all the created databases with the new database highlighted.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhen0uutwnii0jo0q6abj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhen0uutwnii0jo0q6abj.png" width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Enter the following SQL statement below to create a &lt;strong&gt;table&lt;/strong&gt; in the database.&lt;/p&gt;

&lt;p&gt;CREATE TABLE IF NOT EXISTS actor (&lt;br&gt;
actor_id smallint unsigned NOT NULL AUTO_INCREMENT,&lt;br&gt;
first_name varchar (45) NOT NULL,&lt;br&gt;
last_name varchar (45) NOT NULL,&lt;br&gt;
last_update timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can also use the &lt;strong&gt;‘*SHOW TABLES’&lt;/strong&gt;* statement to confirm the creation of the table, as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyidzf6idi822w2w1eu42.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyidzf6idi822w2w1eu42.png" width="800" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Let’s use the *&lt;em&gt;‘*INSERT INTO’ *&lt;/em&gt;*statement to load sample data into the database.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzbohwzpbdm8o7yv62rr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwzbohwzpbdm8o7yv62rr.png" width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Let’s use the **‘SELECT * FROM actor’ **to query and verify the inserts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uab6kpo24x9nfu4wh0i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1uab6kpo24x9nfu4wh0i.png" width="800" height="269"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MIGRATE DATA TO S3 USING AWS DATA MIGRATION SERVICE&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Search for and select &lt;strong&gt;AWS Data Migration Service **in the top search bar of the AWS console and click on **Replication instances&lt;/strong&gt;. AWS DMS uses a replication instance to connect to your source data store, read the source data, and format the data for consumption by the target data store.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfr78iah4gu329pr93om.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfr78iah4gu329pr93om.png" width="800" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on &lt;strong&gt;Create replication instance&lt;/strong&gt; to create a replication instance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsshohzoq98dwglusexth.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsshohzoq98dwglusexth.png" width="800" height="192"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Provide a descriptive &lt;strong&gt;name *&lt;em&gt;and an *&lt;/em&gt;&lt;/strong&gt;optional &lt;strong&gt;descriptive Amazon Resource Name (ARN)&lt;/strong&gt; and select an &lt;strong&gt;instance class&lt;/strong&gt; as shown below. Leave all options at their default values, click on &lt;strong&gt;Create replication instance&lt;/strong&gt; and wait for a successful creation of the instance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0fspqczbfktalo5ngek.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0fspqczbfktalo5ngek.png" width="800" height="736"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Next, create an &lt;strong&gt;Endpoint&lt;/strong&gt;. An endpoint provides connection, data store type, and location information about your data store. AWS DMS uses this information to connect to the data store and migrate data from the source endpoint to the target endpoint. We will create two endpoints.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Source endpoint&lt;/strong&gt;: This allows AWS DMS to read data from a database (on-premises or in the cloud) or other data sources, such as AWS S3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Target endpoint&lt;/strong&gt;: This allows AWS DMS to write data to a database, or other stores such as AWS S3 or Amazon DynamoDB.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;Source endpoint&lt;/strong&gt; and select &lt;strong&gt;Select RDS DB instance&lt;/strong&gt; since our source is an Amazon RDS database and select the created AWS RDS database from the RDS drop-down menu. Provide a unique identifier for the endpoint identifier and select &lt;strong&gt;MySQL&lt;/strong&gt; as the &lt;strong&gt;source engine&lt;/strong&gt;. Under &lt;strong&gt;Access to endpoint database&lt;/strong&gt; select &lt;strong&gt;provide access information manually&lt;/strong&gt; and enter the username and password of the AWS RDS MySQL instance. Keep all defaults and click on &lt;strong&gt;Create endpoint.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh35mle7tl5y36ehfvbf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh35mle7tl5y36ehfvbf.png" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Click on &lt;strong&gt;Create endpoint&lt;/strong&gt; again to create a target endpoint. Choose &lt;strong&gt;Target endpoint&lt;/strong&gt;. Provide a &lt;strong&gt;label&lt;/strong&gt; for the endpoint and select &lt;strong&gt;Amazon S3&lt;/strong&gt; as the &lt;strong&gt;target engine&lt;/strong&gt;. Under &lt;strong&gt;Amazon Resource Name (ARN) for the service access role,&lt;/strong&gt; provide the ARN of the role created earlier (DMSconnectRole). For the &lt;strong&gt;Bucket name&lt;/strong&gt;, enter the name of the bucket created earlier and enter a descriptive name to be used as a folder to store the data under &lt;strong&gt;Bucket folder&lt;/strong&gt;. Click on &lt;strong&gt;Create endpoint&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Let’s confirm both endpoints.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn15wng5mgoobjrgpdm7r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn15wng5mgoobjrgpdm7r.png" width="800" height="121"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click on &lt;strong&gt;Database migration tasks&lt;/strong&gt;, then click &lt;strong&gt;Create task&lt;/strong&gt; to create a database migration task. This is where all the work happens. You specify what tables (or views) and schemas to use for your migration and any special processing, such as logging requirements, control table data, and error handling. Under &lt;strong&gt;Task identifier,&lt;/strong&gt; enter a unique identifier for the task (myql-to-s3-task). Under Replication instance, select the created replication instance (mysql-to-s3-replication-vpc-0b6a338c0396b19) in the dropdown. For &lt;strong&gt;Source database endpoint,&lt;/strong&gt; select the created source endpoint (database-2). For &lt;strong&gt;Target database endpoint&lt;/strong&gt;, select the created target endpoint (testbucketformysqldata123-raw-data) from the drop-down list. For &lt;strong&gt;Migration type&lt;/strong&gt;, select &lt;strong&gt;Migrate existing data&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwt4eoudpm6dz6r53und.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwt4eoudpm6dz6r53und.png" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Under &lt;strong&gt;Table mappings&lt;/strong&gt;, click on &lt;strong&gt;Add new selection rule&lt;/strong&gt;. Under &lt;strong&gt;Schema&lt;/strong&gt;, select &lt;strong&gt;Enter a schema&lt;/strong&gt;. Under &lt;strong&gt;Source name&lt;/strong&gt; enter the name of the created AWS RDS MySQL database in between the percentage sign. Keep all defaults and click on &lt;strong&gt;Create task&lt;/strong&gt;. This starts the migration task automatically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After a successful migration, the &lt;strong&gt;Status&lt;/strong&gt; shows &lt;strong&gt;Load complete&lt;/strong&gt;, and the &lt;strong&gt;progress bar&lt;/strong&gt; reaches 100%.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frtwin6e0be82o3r5mkhj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frtwin6e0be82o3r5mkhj.png" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Let’s confirm the loaded data in the target &lt;strong&gt;Amazon S3 bucket&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxivc4ymfeoicpeimkzeq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxivc4ymfeoicpeimkzeq.png" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the end of the **Data ingestion using AWS Services Part 1. **Look out for &lt;a href="https://medium.com/@otengcode/data-ingestion-using-aws-services-part-2-56e51bae36f6" rel="noopener noreferrer"&gt;Part 2&lt;/a&gt; of my data ingestion series.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>aws</category>
      <category>awsbigdata</category>
      <category>awsdatalake</category>
    </item>
    <item>
      <title>kinesis Data streams Projects</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Thu, 19 Jan 2023 04:17:44 +0000</pubDate>
      <link>https://dev.to/aws-builders/kinesis-data-streams-projects-5927</link>
      <guid>https://dev.to/aws-builders/kinesis-data-streams-projects-5927</guid>
      <description>&lt;p&gt;&lt;strong&gt;Kinesis Data Streams Project&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this project we will be creating an &lt;strong&gt;Amazon Kinesis application&lt;/strong&gt; and use the &lt;strong&gt;AWS CLI&lt;/strong&gt; to put records into the stream and read to check records in the streams&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Kinesis Data Streams&lt;/strong&gt; is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale.&lt;/p&gt;

&lt;p&gt;Some features&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data inserted into Kinesis data stream can’t be &lt;strong&gt;deleted&lt;/strong&gt; ( &lt;strong&gt;Data retention period 24 hours(default) but can be increased&lt;/strong&gt;) &lt;/li&gt;
&lt;li&gt;Records with the same partition goes into the same shard&lt;/li&gt;
&lt;li&gt;Producers: &lt;strong&gt;AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Consumers: &lt;strong&gt;AWS SDK, Kinesis client Library (KCL), Kinesis Data Firehose, Kinesis Data Analytics&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Amazon Kinesis Data Streams Capacity Modes&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Provisioned mode&lt;/strong&gt;: You can choose the number of shards provisioned, scale manually or using API, send 1MB/S of data to shard and get 2MB/S out of shard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;On-demand mode&lt;/strong&gt;: No need to provision or manage the capacity. Provisioned capacity maximum 200MiB/second write capacity and maximum 400 MiB/second read capacity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1zgiuvij4933me1w7b1.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1zgiuvij4933me1w7b1.PNG" alt="flow" width="800" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intructions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create Amazon Kinesis Application&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Head to the Amazon Kinesis Services dashboard to create a Kinesis Data Stream . &lt;br&gt;
We have three options : &lt;strong&gt;Kinesis Data Stream&lt;/strong&gt; , &lt;strong&gt;Kinesis Data Fireshose&lt;/strong&gt; and &lt;strong&gt;Kinesis Data Analytics&lt;/strong&gt;&lt;br&gt;
Select &lt;strong&gt;Kinesis Data Stream&lt;/strong&gt; &lt;br&gt;
and then click on &lt;strong&gt;Create data stream&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwoz0o2k95370zmw8ncoj.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwoz0o2k95370zmw8ncoj.PNG" alt="k2" width="800" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Enter a name for your &lt;strong&gt;Kinesis Stream&lt;/strong&gt;. In this tutorial, we will call our stream &lt;strong&gt;"DemoStream"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrioklt31oa2p79zw3ep.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrioklt31oa2p79zw3ep.PNG" alt="k3" width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the &lt;strong&gt;capacity mode&lt;/strong&gt; : Select &lt;strong&gt;Provisioned mode&lt;/strong&gt; &lt;br&gt;
and set the number of shards value to 1, which can be increased or decreased later.&lt;/p&gt;

&lt;p&gt;Click on &lt;strong&gt;Create Stream&lt;/strong&gt; to create the stream. You can wait for just some few seconds to get the stream created&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywzu3hiyajrj2ve5dt2r.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywzu3hiyajrj2ve5dt2r.PNG" alt="k4" width="800" height="616"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8st0pmbnmljhrjxi7wyt.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8st0pmbnmljhrjxi7wyt.PNG" alt="k5" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Afterwards our stream should be in the &lt;strong&gt;"Active" state&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create a producer to put records into the Kinesis Data Stream&lt;/strong&gt;&lt;br&gt;
In this tutorial we are going to use the AWS CLI to communicate with the Amazon Kinesis Service&lt;/p&gt;

&lt;p&gt;This tutorial assume you have already &lt;strong&gt;installed&lt;/strong&gt; and &lt;strong&gt;configured&lt;/strong&gt; AWS CLI.&lt;br&gt;
You can follow this Documentation [&lt;a href="https://docs.aws.amazon.com/cli/v1/userguide/install-windows.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/cli/v1/userguide/install-windows.html&lt;/a&gt;] to install and configure the AWS CLI for your OS. We are using windows 10 for this tutorial. &lt;/p&gt;

&lt;p&gt;To confirm that the AWS CLI has been installed and configured on your computer, you can check the installed version by typing "&lt;strong&gt;aws -version&lt;/strong&gt;" which should print the version of the AWS CLI that is installed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08ouqrn29fi2l96cred6.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F08ouqrn29fi2l96cred6.PNG" alt="k7" width="800" height="173"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can view a list of &lt;strong&gt;avaialable commands&lt;/strong&gt; by typing &lt;strong&gt;aws kinesis help&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj88g1ar2o1rbkdxtzglh.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj88g1ar2o1rbkdxtzglh.PNG" alt="k8" width="800" height="607"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let's check the list of created Kinesis Data Streams in our account by typing  &lt;strong&gt;aws kinesis list-streams&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03snbfrwcgp8vumng48m.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03snbfrwcgp8vumng48m.PNG" alt="k9" width="800" height="215"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now lets put some records into the stream. &lt;br&gt;
We can use the &lt;strong&gt;PutRecord&lt;/strong&gt; to put one record into the stream or &lt;strong&gt;PutRecords&lt;/strong&gt; to put many records into the stream&lt;/p&gt;

&lt;p&gt;We are going to simulate data coming from a BANK ATM whose data has to be streammed into Kinesis Data Streams for further processing. &lt;/p&gt;

&lt;p&gt;We can use the command below to send a record to the stream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;C:\Users\FIXya TECH&amp;gt;aws kinesis put-record --stream-name myDemo --partition-key 1 --cli-binary-format raw-in-base64-out  --data "{'trans_id': 1, 'trans_type': 'ATM', 'amt': 200}"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;aws kinesis  &lt;strong&gt;put-record&lt;/strong&gt;\        &lt;br&gt;-                            put-record used to put a single record into the stream&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;--stream-name  &lt;strong&gt;DemoStream&lt;/strong&gt;\      &lt;br&gt;-                            We are passing in the name of our created stream. In this case &lt;strong&gt;DemoStream&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;--partition-key  &lt;strong&gt;1&lt;/strong&gt;\             &lt;br&gt;-                            We are defining a partition key for this put operation&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;--cli-binary-format  &lt;strong&gt;raw-in-base64-out&lt;/strong&gt;\       &lt;br&gt;-              A flag that specifies how the binary input paramaters should be interpreted. In this case &lt;strong&gt;raw-in-                                                                     base64-out&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;--data &lt;strong&gt;"{'trans_id': 1, 'trans_type': 'ATM', 'amt': 200}"&lt;/strong&gt;&lt;br&gt;-  The data blob to be put into the stream. In this case &lt;strong&gt;"{'trans_id': 1, 'trans_type': 'ATM' ,                                                                         'amt':200}"&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this example we are using the &lt;strong&gt;PutRecord API&lt;/strong&gt; to put the following 4 records into the stream one by one. Its also possible to use the &lt;strong&gt;PutRecords API&lt;/strong&gt; to write many records at once into the stream &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;aws kinesis put-record --stream-name DemoStream --partition-key 1 --cli-binary-format raw-in-base64-out --data "{"trans_id": 1, "trans_type": "ATM", "amt":   200}"&lt;/li&gt;
&lt;li&gt;aws kinesis put-record --stream-name DemoStream --partition-key 1 --cli-binary-format raw-in-base64-out --data "{"trans_id": 2, "trans_type": "ATM", "amt":   400}"&lt;/li&gt;
&lt;li&gt;aws kinesis put-record --stream-name DemoStream --partition-key 1 --cli-binary-format raw-in-base64-out --data "{"trans_id": 3, "trans_type": "ATM", "amt":   600}"&lt;/li&gt;
&lt;li&gt;aws kinesis put-record --stream-name DemoStream --partition-key 1 --cli-binary-format raw-in-base64-out --data "{"trans_id": 4, "trans_type": "ATM", "amt":   900}"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After a successful &lt;strong&gt;PutRecord API&lt;/strong&gt; operation, we should get a response object containing the &lt;strong&gt;ShardID&lt;/strong&gt; and the &lt;strong&gt;Sequence Number&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3deybch63gzvfezi4t5g.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3deybch63gzvfezi4t5g.PNG" alt="k12" width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;Shard&lt;/strong&gt; is a uniquely identified group of data records in a Kinesis data stream&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Sequence Number&lt;/strong&gt; is the identifier associated with every record ingested in the stream, and is assigned when a record is put into the stream. Each stream has one or more shards&lt;/p&gt;

&lt;p&gt;Next we want to read records from the stream&lt;/p&gt;

&lt;p&gt;First lets get the &lt;strong&gt;shards iterator&lt;/strong&gt; which specifies the shard position from which to start reading data records sequentially. &lt;/p&gt;

&lt;p&gt;aws kinesis &lt;strong&gt;get-shard-iterator&lt;/strong&gt;\         &lt;br&gt;     to get an Amazon Kinesis shard iterator.It expires 5 minutes after it is retuerned to the requester&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;--stream-name &lt;strong&gt;DemoStream&lt;/strong&gt; \                 We are passing in the name of the created stream. In this case &lt;strong&gt;DemoStream&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;--shard-id &lt;strong&gt;shardId-000000000000&lt;/strong&gt; \          We are passing in the &lt;strong&gt;shardId&lt;/strong&gt; of the shard in our stream&lt;/li&gt;
&lt;li&gt;--shard-iterator-type &lt;strong&gt;TRIM_HORIZON&lt;/strong&gt; \       We are passing in the shard iterater type whicn can be &lt;strong&gt;AT_TIMESTAMP&lt;/strong&gt;, &lt;strong&gt;TRIM_HORIZON&lt;/strong&gt; or &lt;strong&gt;LASTEST&lt;/strong&gt;. In                                                            this project we have used &lt;strong&gt;TRIM_HORIZON&lt;/strong&gt; to cause the shardIterator to point to the last untrimmed record in                                                          the shard(Oldest data record in the shard)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can use the command below to get the shard iterator&lt;br&gt;
&lt;strong&gt;aws kinesis get-shard-iterator --stream-name DemoStream --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnc70ha62o7bfd962lfu.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnc70ha62o7bfd962lfu.PNG" alt="k14" width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can now use the the &lt;strong&gt;GetRecords API&lt;/strong&gt; to get data records from a Kinesis data stream's shard by specifying the &lt;strong&gt;shardIterator&lt;/strong&gt;. &lt;br&gt;&lt;br&gt;
aws kinesis get-records --shard-iterator\ AAAAAAAAAAE/fk1qZTWLE74jIwqLR/N1OqYDsi9d7KHPhTtk7XIF42kEAJdwg0x0oXlZK/5SC7LciGiW5M3IEHdl/WH4cVYvNO1vvTTNra21WQgOUbgODyGfSeDMhd74BGi7z4l/X0Mi9O98Nexx2uSJx5ZHweKaZzEyRm4wkAYHJ4cmzwV1o2W+h/XBXrjdFB1bAKrj4/fYTGDRwvAVuA79qMoWB9vvq6ZhvYUAOLrQXGEK/sjH9g==&lt;/p&gt;

&lt;p&gt;After a succesful &lt;strong&gt;GetRecords&lt;/strong&gt; API call, it returns an object containing the &lt;strong&gt;four data records&lt;/strong&gt; put in the kinesis data stream and the &lt;strong&gt;next shard iterator&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1chsfvhbgsfkg54n7h40.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1chsfvhbgsfkg54n7h40.PNG" alt="k17" width="800" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another alternative to send test data to your &lt;strong&gt;Amazon Kinesis stream&lt;/strong&gt; or &lt;strong&gt;Amazon Kinesis Firehose delivery stream&lt;/strong&gt; is the &lt;strong&gt;Amazon Kinesis Data Generator&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5s9q4zsak72mwp4stmrv.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5s9q4zsak72mwp4stmrv.PNG" alt="generatot" width="800" height="188"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I hope to do another tutorial on this in the future.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>datascience</category>
      <category>cloudskills</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Hosting a static website using Amazon S3</title>
      <dc:creator>Oteng Isaac</dc:creator>
      <pubDate>Thu, 19 Jan 2023 03:38:12 +0000</pubDate>
      <link>https://dev.to/aws-builders/hosting-a-static-website-using-amazon-s3-5fkn</link>
      <guid>https://dev.to/aws-builders/hosting-a-static-website-using-amazon-s3-5fkn</guid>
      <description>&lt;p&gt;&lt;strong&gt;Amazon Simple Storage Service (Amazon S3)&lt;/strong&gt; is an object storage service that offers industry-leading scalability, data availability, security, and performance.Using Amazon S3 you can store and retrieve any amount of data from anywhere and can store any type of data.To store data in Amazon S3, you work with resources known as buckets and objects. A bucket is a container for objects. An object is any type of file and any metadata that describes that file.&lt;br&gt;
In this tutorial, we would use Amazon S3 bucket to host a static website. In Amazon S3, the static website can sustain any conceivable level of traffic, at a very modest cost, without the need to set up, monitor, scale, or manage any servers. All the files of the website are upload to Amazon S3. We can configure any of our s3 buckets as a static website.&lt;br&gt;
When an S3 bucket is configured for website hosting, the bucket is assigned a URL. When request is made to this URL, Amazon S3 returns the HTML file, known as the root object, that has been set for the bucket.&lt;br&gt;
To enable public access to the bucket or objects, permissions must be configured that allows access. To configure these permissions, we would use BUCKET POLICY which is is a resource-based AWS Identity and Access Management (IAM) policy which grants other AWS accounts or IAM users access permissions for the bucket and the objects in it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We would follow the following steps to complete the tutorial.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Let’s go to the AWS management console and search for Amazon S3.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s go to the &lt;strong&gt;AWS management console&lt;/strong&gt; and search for &lt;strong&gt;Amazon S3&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqeasjpbt9nq89uskko1i.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqeasjpbt9nq89uskko1i.PNG" alt="AWS console" width="800" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;strong&gt;bucket section&lt;/strong&gt; click on &lt;strong&gt;Create bucket&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb262iiu55xamzox1p71y.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb262iiu55xamzox1p71y.PNG" alt="create bucket" width="800" height="224"&gt;&lt;/a&gt;&lt;br&gt;
In the &lt;strong&gt;General configuration&lt;/strong&gt; section, enter a &lt;strong&gt;unique name&lt;/strong&gt; for the bucket and choose a region preferably a region closer to your location .In this tutorial, I have typed &lt;strong&gt;'tutorotengcode123'&lt;/strong&gt; as my bucket name. &lt;strong&gt;Note&lt;/strong&gt;: This name must be unique among all created buckets on the AWS platform.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkvruq7xtbvtaikacq6v.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkvruq7xtbvtaikacq6v.PNG" alt="bucket name" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;strong&gt;Object Ownership&lt;/strong&gt; section, choose &lt;strong&gt;ACLs enabled&lt;/strong&gt; and &lt;strong&gt;Object writer&lt;/strong&gt;. By selecting &lt;strong&gt;ACLS enabled&lt;/strong&gt;, we have allowed objects in this bucket to be owned by other AWS accounts and access specified using ACLs. By selecting Object writer, we have allowed an object writer to remain the object owner.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj3l8gofffxlsddcjuib0.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj3l8gofffxlsddcjuib0.PNG" alt="Object ownership" width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;strong&gt;Block Public Access settings&lt;/strong&gt; for this bucket, deselect the &lt;strong&gt;Block all public access option&lt;/strong&gt; and choose to select the &lt;strong&gt;check box&lt;/strong&gt; to acknowledge the current settings of turning off &lt;strong&gt;block all public access&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkouv4psybtleiehz4kg9.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkouv4psybtleiehz4kg9.PNG" alt="block access" width="800" height="678"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note: &lt;strong&gt;Bucket&lt;/strong&gt;, &lt;strong&gt;access points&lt;/strong&gt;, and &lt;strong&gt;objects&lt;/strong&gt; by default do not allow public access. By deselect Block all public access and accepting to acknowledge the current settings we are allowing public access.&lt;br&gt;
We would scroll down and keep the &lt;strong&gt;default settings&lt;/strong&gt; for &lt;strong&gt;encryption&lt;/strong&gt; and click on &lt;strong&gt;Create bucket&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ucgw380t434pp14toiz.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ucgw380t434pp14toiz.PNG" alt="defaults keep" width="800" height="532"&gt;&lt;/a&gt;&lt;br&gt;
We can view details of the created bucket by clicking on the &lt;strong&gt;View details&lt;/strong&gt; in the &lt;strong&gt;Success Alert&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff93azgx63uokpgyoldi9.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff93azgx63uokpgyoldi9.PNG" alt="Success Alert" width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on the name of the created bucket and in the &lt;strong&gt;Objects tab&lt;/strong&gt;, click on the &lt;strong&gt;Upload&lt;/strong&gt; button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29do3euegbhadhk3cdb5.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29do3euegbhadhk3cdb5.PNG" alt="Upload" width="800" height="302"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;strong&gt;Files and folders&lt;/strong&gt; section, click on &lt;strong&gt;Add files&lt;/strong&gt;. &lt;br&gt;
At this stage we can add all the files and folders that make up our static website. Select all the &lt;strong&gt;files and folders&lt;/strong&gt; and click on &lt;strong&gt;Upload&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F97x00tycuy03v82c24yv.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F97x00tycuy03v82c24yv.PNG" alt="addfiles" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a successful upload of files, we should receive in the Success alert &lt;strong&gt;Upload succeeded&lt;/strong&gt;.&lt;br&gt;
Let’s review the upload files. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8g7mi95kmybg1ca97sk.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8g7mi95kmybg1ca97sk.PNG" alt="review" width="800" height="365"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on the &lt;strong&gt;Properties Tab&lt;/strong&gt; and scroll down to the &lt;strong&gt;Static website hosting&lt;/strong&gt; section. &lt;br&gt;
&lt;strong&gt;Static website hosting&lt;/strong&gt; is disabled by &lt;strong&gt;default&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8edgpl2qw6hmd9xoqtfk.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8edgpl2qw6hmd9xoqtfk.PNG" alt="disabled" width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on &lt;strong&gt;Edit&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk0cuk6vpjnruubqlgyii.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk0cuk6vpjnruubqlgyii.png" alt="Edit" width="800" height="162"&gt;&lt;/a&gt;&lt;br&gt;
For &lt;strong&gt;Static website hosting&lt;/strong&gt;, select &lt;strong&gt;enable&lt;/strong&gt; ,for &lt;strong&gt;Hosting type&lt;/strong&gt; select &lt;strong&gt;Host a static website&lt;/strong&gt;, and for &lt;strong&gt;index document&lt;/strong&gt; enter &lt;strong&gt;index.html&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug7njqfye8u6pyc903s0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug7njqfye8u6pyc903s0.png" alt="index" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on the &lt;strong&gt;Permissions tab&lt;/strong&gt;, make sure &lt;strong&gt;Block all public access&lt;/strong&gt; is set to &lt;strong&gt;Off&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cce831uls855npuar4f.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6cce831uls855npuar4f.PNG" alt="off" width="800" height="328"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click on &lt;strong&gt;Edit&lt;/strong&gt; in the &lt;strong&gt;Bucket policy section&lt;/strong&gt;. Under the &lt;strong&gt;Bucket ARN&lt;/strong&gt;, Click to copy the &lt;strong&gt;bucket’s ARN&lt;/strong&gt;. In the &lt;strong&gt;Policy editor&lt;/strong&gt;, copy and paste the bucket policy below by replacing the &lt;strong&gt;“Your_Bucket_ARN”&lt;/strong&gt; with your bucket’s ARN.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "Id": "MyPolicy",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3GetObjectAllow",
      "Action": [
        "s3:GetObject"
      ],
      "Effect": "Allow",
      "Resource": "Your_Bucket_ARN/*",
      "Principal": "*"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9412zhb9r0ytiyxgix22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9412zhb9r0ytiyxgix22.png" alt="grant" width="663" height="928"&gt;&lt;/a&gt;&lt;br&gt;
The policy grants the &lt;code&gt;s3: GetObject&lt;/code&gt; permission to any public anonymous user. Now click on &lt;strong&gt;Save changes&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Let’s go back to the &lt;strong&gt;Properties Tab&lt;/strong&gt; and scroll down to &lt;strong&gt;Static website hosting&lt;/strong&gt; section. In the &lt;strong&gt;Static website hosting section&lt;/strong&gt;, under &lt;strong&gt;Hosting type&lt;/strong&gt;, check to ensure that &lt;strong&gt;bucket hosting&lt;/strong&gt; is set. Under the &lt;strong&gt;Bucket website endpoint&lt;/strong&gt;, click the copy icon to copy the &lt;strong&gt;Bucket website endpoint&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fup3w1b66sptczp1c02p6.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fup3w1b66sptczp1c02p6.PNG" alt="congratulations" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Paste the copied bucket website endpoint in your browser’s &lt;strong&gt;address bar&lt;/strong&gt; and press &lt;strong&gt;enter&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy02agafsdq4jlihgu5zx.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy02agafsdq4jlihgu5zx.PNG" alt="final" width="800" height="545"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Congratulations!!!!!!!!!!!!!!, we have successfully hosted a static website on Amazon S3.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>career</category>
      <category>sideprojects</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
