<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matt Houghton</title>
    <description>The latest articles on DEV Community by Matt Houghton (@mattdevdba).</description>
    <link>https://dev.to/mattdevdba</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F497647%2F5b057ee3-a70e-41a2-8652-dcd4b6d8a36c.jpeg</url>
      <title>DEV Community: Matt Houghton</title>
      <link>https://dev.to/mattdevdba</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mattdevdba"/>
    <language>en</language>
    <item>
      <title>Battle of the LLMs – How the Rise of DeepSeek Changes the Competitive Landscape</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Mon, 11 Aug 2025 16:14:36 +0000</pubDate>
      <link>https://dev.to/aws-builders/battle-of-the-llms-how-the-rise-of-deepseek-changes-the-competitive-landscape-1hgn</link>
      <guid>https://dev.to/aws-builders/battle-of-the-llms-how-the-rise-of-deepseek-changes-the-competitive-landscape-1hgn</guid>
      <description>&lt;p&gt;CDL Head of Architecture, Matt Eisengruber, and Data &amp;amp; AI Architect, Matt Houghton, assess the implications of DeepSeek as it emerges from China as a major player in the LLM space.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Industry Coverage and Benchmarks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;In the rapidly evolving world of artificial intelligence, DeepSeek has emerged as a formidable contender in the landscape of Large Language Models (LLMs). Developed with a focus on high-performance reasoning and multilingual capabilities, DeepSeek gained traction for its open-source transparency and competitive benchmark results. As organisations increasingly rely on LLMs for automation, analytics, and customer engagement, DeepSeek’s rise signals a shift toward more accessible and customisable AI solutions.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Open-source LLMs like DeepSeek are democratising access to cutting-edge AI, enabling innovation beyond the walls of Big Tech.” — Dr. Andrew Ng, AI Pioneer&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Industry Coverage
&lt;/h3&gt;

&lt;p&gt;The LLM space is currently dominated by models from OpenAI (GPT-4), Anthropic (Claude), Google DeepMind (Gemini), and Meta (LLaMA). However, the emergence of open-weight and open-source alternatives like Mistral and DeepSeek is reshaping the competitive landscape.&lt;/p&gt;

&lt;p&gt;DeepSeek, developed by a Chinese AI research group, has positioned itself as a high-performing, multilingual model with strong reasoning capabilities. It supports both instruction-following and code generation tasks, making it versatile for enterprise use.&lt;/p&gt;

&lt;p&gt;Key differentiators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Excellent price-for-performance ratio&lt;/li&gt;
&lt;li&gt;Open-source availability (Apache 2.0 licence)&lt;/li&gt;
&lt;li&gt;Multilingual support, including Chinese and English&lt;/li&gt;
&lt;li&gt;Strong performance on reasoning benchmarks like MATH and GSM8K&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benchmarks
&lt;/h3&gt;

&lt;p&gt;DeepSeek has demonstrated impressive results across several industry-standard benchmarks:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4i8w2py09z7if5c9plg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl4i8w2py09z7if5c9plg.png" alt=" " width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffkctf7aozau1pcisy1x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffkctf7aozau1pcisy1x.png" alt=" " width="800" height="273"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Source: DeepSeek GitHub, llm-stats.com&lt;/p&gt;

&lt;h3&gt;
  
  
  Expert Opinions
&lt;/h3&gt;

&lt;p&gt;At the 2025 AI Frontiers Conference, DeepSeek was highlighted as a “breakthrough in open source reasoning”, with several researchers praising its balance of performance and accessibility.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“DeepSeek’s performance on multilingual and mathematical reasoning tasks is a game-changer for global enterprises.” — Dr. Fei-Fei Li, Stanford University&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Part 2: CDL’s approach to testing LLMs
&lt;/h2&gt;

&lt;p&gt;When testing Generative AI (GenAI) systems, "non-deterministic" refers to the inherent challenge where the same input can produce different outputs on repeated runs, making it difficult to reliably test and verify the system's behaviour due to its unpredictable nature.&lt;/p&gt;

&lt;p&gt;At CDL, we have introduced new testing approaches and tools to address these non-deterministic challenges in GenAI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run multiple tests with the same input and analyse the distribution of outputs to understand the range of possible results.&lt;/li&gt;
&lt;li&gt;Develop specific metrics to measure different aspects of the output like coherence, accuracy, relevance, and bias to assess quality even with variations in responses.&lt;/li&gt;
&lt;li&gt;Use a wide variety of input prompts to test the model's ability to handle different contexts and situations.&lt;/li&gt;
&lt;li&gt;Continuously monitor the performance of the model in real-world scenarios and use feedback to refine the training data and improve its accuracy.&lt;/li&gt;
&lt;li&gt;Move to intent-based testing where we focus on evaluating whether the output aligns with the intended meaning or purpose of the prompt rather than just checking for exact matches.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Testing Methodology
&lt;/h3&gt;

&lt;p&gt;Our process for testing a model is first to define a set of questions. These are shown in the diagram as the prompts. This is a JSON lines file containing the question, the expected answer, also known as the ground truth data and a category.&lt;/p&gt;

&lt;p&gt;The tests, known as evaluations, are run in a couple of modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;First, we run an automated LLM-as-a-judge evaluation. This is where we ask our LLM under test to answer the question and then we pass that along with the ground truth data to a second LLM and ask for it to check the first LLM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The second mode is a human evaluation. We take the same prompts and ground truth data, but this time we ask a team of people to evaluate the model.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Evaluation output is shown in the Bedrock console and all output and results are stored in S3 for further analysis. We have also enabled these tests as part of our CI / CD Pipelines, you can &lt;a href="https://matthoughton.cloud/markdown-template.html?md=/blog/2025/2025-05-12_CI%20CD%20For%20Bedrock%20Evaluations.md" rel="noopener noreferrer"&gt;read Matt Houghton’s blog&lt;/a&gt; on how this was completed.&lt;/p&gt;

&lt;p&gt;We utilise the Amazon Bedrock evaluations feature to assess performance and effectiveness of the model.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock computes our required performance metrics, such as the semantic robustness of a model and the correctness of a knowledge base in retrieving information and generating responses.&lt;/p&gt;

&lt;p&gt;For model evaluations, we use both automatic evaluations and a team of human workers to rate and provide their input for the evaluation. This approach provides us with flexibility, such as utilising both company employees and industry subject-matter experts -  in this case, the insurance industry. We can also include and assess retrieval-augmented generation (RAG) workloads to validate knowledge bases, provide highly relevant information and generate useful, appropriate responses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9bixl2v72l2ri93h6hdf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9bixl2v72l2ri93h6hdf.jpg" alt=" " width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Test Results
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bt6vxicer2xokiol7cy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bt6vxicer2xokiol7cy.jpg" alt=" " width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87jg91ghl8qg3rwksxql.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87jg91ghl8qg3rwksxql.jpg" alt=" " width="800" height="449"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;As you can see from the table above, DeepSeek holds its own and lives up to its hype against other models as a chat bot when fielding insurance specific queries in a RAG based architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: Closing Thoughts on the Results and Possible Implications on the Insurance Industry
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Summary of Findings
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;DeepSeek R1 is a top-tier open-source LLM that could be used in the insurance industry as a chatbot.&lt;/li&gt;
&lt;li&gt;It performs competitively with proprietary models in most benchmarks.&lt;/li&gt;
&lt;li&gt;Its low hallucination rate and high accuracy make it suitable for enterprise applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Implications for the Insurance Industry&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DeepSeek’s capabilities open new possibilities for insurers:&lt;/li&gt;
&lt;li&gt;Risk Assessment: Automating underwriting with accurate, explainable reasoning.&lt;/li&gt;
&lt;li&gt;Fraud Detection: Analysing patterns in claims with multilingual support.&lt;/li&gt;
&lt;li&gt;Customer Service: Deploying chatbots that understand complex queries.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;“LLMs like DeepSeek can transform how insurers interact with customers and assess risk, especially in multilingual markets.” — Insurance AI Journal, May 2025&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Future Outlook
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;As DeepSeek continues to evolve, we anticipate:&lt;/li&gt;
&lt;li&gt;Larger context windows for document-heavy industries&lt;/li&gt;
&lt;li&gt;Integration with retrieval-augmented generation (RAG) for real-time data access&lt;/li&gt;
&lt;li&gt;Domain-specific fine-tuning for insurance, legal, and healthcare sectors&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;DeepSeek is not just another LLM—it’s a signal of the growing power and potential of open-source AI. For the insurance industry, it represents a cost-effective, high-performance alternative to proprietary models. As we continue to explore its applications, stay tuned for future posts where we dive into other models as they emerge. &lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>deepseek</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Frugal SQL data access with Athena and Blue / Green support</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Tue, 12 Mar 2024 21:21:03 +0000</pubDate>
      <link>https://dev.to/aws-builders/frugal-sql-data-access-with-athena-and-blue-green-support-1ool</link>
      <guid>https://dev.to/aws-builders/frugal-sql-data-access-with-athena-and-blue-green-support-1ool</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this post I look at a frugal architecture for SQL based data access. &lt;/p&gt;

&lt;p&gt;The prompt for writing this blog post came from a recent discussion on an application a team were looking to migrate to the cloud. &lt;/p&gt;

&lt;p&gt;The requirements for the migration were the ability to run SQL against the data which was very small in volume (&amp;lt;200Mb).&lt;/p&gt;

&lt;p&gt;During the discussion I turned to Athena which is one of my favourite AWS Services. Athena offers JDBC drivers so I suggested we could swap from the MySQL database which was going to be provided by RDS. &lt;/p&gt;

&lt;p&gt;I was also asked how I would handle a blue / green style deployment with Athena. The specific requirement was that each time the application was deployed the database would be replaced with a new version including all data.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQL Setup
&lt;/h2&gt;

&lt;p&gt;With Athena there is no visible database resource to create like there is with RDS. The steps to allow SQL access to data are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A bucket to store the data in.&lt;/li&gt;
&lt;li&gt;A Glue database / table created that defines the structure of the data held in S3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A quick way to test this out is to use a tool like &lt;a href="https://mockaroo.com" rel="noopener noreferrer"&gt;Mockaroo&lt;/a&gt; to generate some test data and then have a Glue Crawler analyse the data in S3 and create the required data catalog entries.&lt;/p&gt;

&lt;p&gt;Here is the sample schema definition in Mockaroo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlal58w5mqwe738e1yo2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlal58w5mqwe738e1yo2.png" alt=" " width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From here I created two S3 buckets. One would hold data for my 'Blue' deployment and one for the 'Green' deployment. I called the buckets myapp.sql.blue and myapp.sql.green.&lt;/p&gt;

&lt;p&gt;In Glue I created a database called MyApp just to provide a logical separation between this and any other databases I may have in the same account.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgmn9rx2283tvk0y1p1c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgmn9rx2283tvk0y1p1c.png" alt=" " width="800" height="271"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I downloaded two sets of data from Mockaroo and uploaded a file to each of the S3 buckets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2dqropsjc53sdgknrdhb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2dqropsjc53sdgknrdhb.png" alt=" " width="800" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I then created a Glue Crawler for each bucket. Here is an example.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcd6qion0vqpyoonpq8bo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcd6qion0vqpyoonpq8bo.png" alt=" " width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Running both crawlers populates the Glue data catalog with two tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3madp6619x14tv03ydq1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3madp6619x14tv03ydq1.png" alt=" " width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8c9tuq8a0663uowj47kq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8c9tuq8a0663uowj47kq.png" alt=" " width="800" height="665"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point I'm able to run SQL in Athena against each of the tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg5rutzepwteaw5i7785x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg5rutzepwteaw5i7785x.png" alt=" " width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Athena Blue / Green
&lt;/h2&gt;

&lt;p&gt;For the Blue / Green component I utilise a View created in Athena. Just like in RDS views can be created with one more more tables in Athena using an SQL query. We don't need anything too complex here.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create view myapp_sql as select * from myapp_sql_blue;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the data catalog the view appears alongside the tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx03f9mqitjwqmmurt0mb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx03f9mqitjwqmmurt0mb.png" alt=" " width="600" height="958"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives me a consistent name for my application to point to. When I want to switch over the data being used I can simply recreate the view pointing to either the blue or green buckets data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create or replace view myapp_sql as select * from myapp_sql_green;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;

&lt;p&gt;Lets create a lambda function to test this out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
import pyathena


def lambda_handler(event, context):

    connection = pyathena.connect(
      s3_staging_dir="s3://athena.myapp.work/",
      region_name="us-east-1"
    )

    cursor = connection.cursor()

    query = "SELECT * FROM myapp.myapp_sql"

    cursor.execute(query)

    results = cursor.fetchall()

    print(results)

    return {
        'statusCode': 200
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running this gives the output below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4laeuf2rf782gasx9u0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4laeuf2rf782gasx9u0k.png" alt=" " width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A note on IAM. The Lambda function will require permissions for Athena and S3. For testing purposes I attached the AmazonAthenaFullAccess and AmazonS3FullAccess managed roles. In production you should scope the IAM down to least privileges required&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Deployment Switchover
&lt;/h2&gt;

&lt;p&gt;Let's now imagine that I have a pipeline that has loaded up my new data to my second bucket. In the pipeline I can run the following step to switch the view to the latest data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export TABLE_SUFFIX=green
aws athena start-query-execution --query-string "create or replace view myapp_sql as select * from myapp_sql_$TABLE_SUFFIX" --result-configuration "OutputLocation=s3://athena.myapp.work" --query-execution-context "Database=myapp"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Considerations
&lt;/h2&gt;

&lt;p&gt;This setup should work well for simple SQL based access to data where volumes are not too high. You can optimise queries further within Athena by using data formats such as Parquet.&lt;/p&gt;

&lt;p&gt;Costs for the storage assuming S3 standard tier will be ~$0.023 per GB/Month. Querying via Athena costs $5.00 per TB of data scanned. We only pay when we run a query unlike RDS which we have to pay for even when we are not running SQL.&lt;/p&gt;

&lt;p&gt;As long as the access characteristics of your application are a match for the performance of the AWS Services used then S3 based SQL access via Athena is a tough one to beat for those looking to be &lt;a href="http://thefrugalarchitect.com" rel="noopener noreferrer"&gt;The Frugal Architect&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>database</category>
      <category>sql</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Billing for SaaS with EMF and CloudWatch Metric Streams</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Fri, 08 Mar 2024 16:24:38 +0000</pubDate>
      <link>https://dev.to/aws-builders/billing-for-saas-with-emf-and-cloudwatch-metric-streams-4i6p</link>
      <guid>https://dev.to/aws-builders/billing-for-saas-with-emf-and-cloudwatch-metric-streams-4i6p</guid>
      <description>&lt;p&gt;In this post I'm looking at how Software as a Service (SaaS) providers running on AWS can use a few AWS Services to build out a mechanism for collecting billing/metering metrics from their software and process them in order to bill a customer based on usage.&lt;/p&gt;

&lt;p&gt;The main services I will cover are use of AWS CloudWatch embedded metric format (EMF) together with AWS CloudWatch Metric Streams.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is EMF?
&lt;/h2&gt;

&lt;p&gt;The CloudWatch embedded metric format allows you to generate custom metrics asynchronously in the form of logs written to CloudWatch Logs. You can embed custom metrics alongside detailed log event data, and CloudWatch automatically extracts the custom metrics so that you can visualize and alarm on them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is CloudWatch Metric Streams?
&lt;/h2&gt;

&lt;p&gt;You can use metric streams to continually stream CloudWatch metrics to a destination of your choice, with near-real-time delivery and low latency. Supported destinations include AWS destinations such as Amazon Simple Storage Service and several third-party service provider destinations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using EMF in your Application
&lt;/h2&gt;

&lt;p&gt;Imagine a sample Python application returning "hello world" to simulate a successful call. Each call to the application is captured for billing purposes using EMF. Lambda &lt;a href="https://docs.powertools.aws.dev/lambda/python/latest/" rel="noopener noreferrer"&gt;Powertools&lt;/a&gt; is used to reduce the amount of code we need to write.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;metrics.add_metric(name="SuccessfulGet", unit=MetricUnit.Count, value=1)
metrics.add_dimension(name="Customer", value="MattHoughton")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These two lines output the required billing metrics.&lt;/p&gt;

&lt;p&gt;The SuccessfulGet can be customised for your application. This value should indicate a sensible identifier for the chargeable action. For example in the world of insurance you may have actions such as CreatePolicy, CreateQuote, UpdateCar etc.&lt;/p&gt;

&lt;p&gt;On the Lambda function configuration the following environment variables also need to be set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;POWERTOOLS_SERVICE_NAME: SuggestTheNameOfYourSoftware
POWERTOOLS_METRICS_NAMESPACE: SuggestSomethingLikeBilling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the sample Lambda function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json
from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit

metrics = Metrics()

@metrics.log_metrics
def lambda_handler(event, context):

    #do something of value

    metrics.add_metric(name="CreatePolicy", unit=MetricUnit.Count, value=1)
    metrics.add_dimension(name="Customer", value="MattHoughton") #just an example dont hard code this for real source it from the payload or something

    return {
        "statusCode": 200,
        "body": json.dumps({
            "message": "hello world"
        }),
    }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Testing the function in the console you should get this response.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "statusCode": 200,
  "body": "{\"message\": \"hello world\"}"
}

START RequestId: xxxx Version: $LATEST
{
  "_aws": {
    "Timestamp": 1709911806737,
    "CloudWatchMetrics": [
      {
        "Namespace": "DemoBilling",
        "Dimensions": [
          [
            "Customer",
            "service"
          ]
        ],
        "Metrics": [
          {
            "Name": "CreatePolicy",
            "Unit": "Count"
          }
        ]
      }
    ]
  },
  "Customer": "MattHoughton",
  "service": "DemoProductName",
  "CreatePolicy": [
    1
  ]
}
END RequestId: xxxx
REPORT RequestId: xxxx  Duration: 1.42 ms   Billed Duration: 2 ms   Memory Size: 128 MB Max Memory Used: 37 MB  Init Duration: 177.72 ms    
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the Lambda is executed the billing metrics get stored in CloudWatch Logs and are visible in CloudWatch Metrics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg71p23otoflsurx3qpzr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg71p23otoflsurx3qpzr.png" alt=" " width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Costs for EMF are based on CloudWatch log ingestion which in EU-WEST-1 is $0.57 per GB. When I was testing with an example 624 byte payload that is generated by Powertools the costs came out as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each metric above stored costs $0.0000003&lt;/li&gt;
&lt;li&gt;One million metrics stored costs: $0.33&lt;/li&gt;
&lt;li&gt;Ten million metrics stored costs: $3.31&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Collecting and Processing the Billing Metrics
&lt;/h2&gt;

&lt;p&gt;To pull out all of the EMF metrics relating to billing we will setup a Metric Stream to send them to an S3 bucket.&lt;/p&gt;

&lt;p&gt;Under Cloudwatch in the console select Metric Steams and Create a metric stream. We will walk through the Quick setup for S3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhmn4i84sfk3yilo8l5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhmn4i84sfk3yilo8l5h.png" alt=" " width="800" height="716"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Under metrics to be streamed limit this to only the metrics related to billing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzsta6uk5wxm06ignmxlg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzsta6uk5wxm06ignmxlg.png" alt=" " width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking at the metric stream that is created you will see details for the other components created for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiw2wr5h265096p7n10o4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiw2wr5h265096p7n10o4.png" alt=" " width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Data Firehose&lt;/li&gt;
&lt;li&gt;IAM Roles&lt;/li&gt;
&lt;li&gt;S3 Bucket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you run the Lambda function a few more times then view the Data Firehose you will see the metrics being delivered.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxsa4iarx1r59a37uv8r8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxsa4iarx1r59a37uv8r8.png" alt=" " width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now if you look in the S3 bucket you will find object created. By default they are partitioned by Year/Month/Day/Hour.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz995pciteil253fa4rcg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz995pciteil253fa4rcg.png" alt=" " width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is sample content published to S3.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"metric_stream_name":"DemoBillingMetricStream","account_id":"xxxx","region":"us-east-1","namespace":"DemoBilling","metric_name":"CreatePolicy","dimensions":{"Customer":"MattHoughton","service":"DemoProductName"},"timestamp":1709913900000,"value":{"max":1.0,"min":1.0,"sum":9.0,"count":9.0},"unit":"Count"}
{"metric_stream_name":"DemoBillingMetricStream","account_id":"xxxx","region":"us-east-1","namespace":"DemoBilling","metric_name":"CreatePolicy","dimensions":{"Customer":"MattHoughton","service":"DemoProductName"},"timestamp":1709913960000,"value":{"max":1.0,"min":1.0,"sum":30.0,"count":30.0},"unit":"Count"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Further Processing
&lt;/h2&gt;

&lt;p&gt;From this point we have a lot of flexibility in how we can choose to process this data.&lt;/p&gt;

&lt;p&gt;We can trigger a Lambda function that sends these metric payloads to an accounting / invoicing system.&lt;/p&gt;

&lt;p&gt;We can also continue to use AWS Services. As the data is in S3 we can easily add this to a Glue data catalog and query it using Athena. We could even start to build dashboards and reports using QuickSight.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>sass</category>
      <category>data</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Using Snowflake data hosted in GCP with AWS Glue</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Mon, 29 Jan 2024 18:37:51 +0000</pubDate>
      <link>https://dev.to/aws-builders/using-snowflake-data-hosted-in-gcp-with-aws-glue-3eb0</link>
      <guid>https://dev.to/aws-builders/using-snowflake-data-hosted-in-gcp-with-aws-glue-3eb0</guid>
      <description>&lt;p&gt;This post covers a use case of accessing data held in a Snowflake database hosted in GCP within an AWS Glue ETL job.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Snowflake?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.snowflake.com" rel="noopener noreferrer"&gt;Snowflake&lt;/a&gt; is a cloud-based data warehousing platform that provides a fully managed and scalable solution for storing and analyzing large volumes of data. It is not a traditional relational database but rather a data warehouse as a service. Snowflake is designed to work with cloud platforms like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).&lt;/p&gt;

&lt;h2&gt;
  
  
  What is AWS Glue?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/glue" rel="noopener noreferrer"&gt;AWS Glue&lt;/a&gt; is a fully managed extract, transform, and load (ETL) service provided by Amazon Web Services (AWS). It is designed to make it easy for users to prepare and load their data for analysis. AWS Glue simplifies the process of building and managing ETL workflows by providing a serverless environment for running ETL jobs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Snowflake Prerequisites
&lt;/h2&gt;

&lt;p&gt;As I wasn't a user of Snowflake already I &lt;a href="https://signup.snowflake.com/" rel="noopener noreferrer"&gt;signed up for a free trial&lt;/a&gt; in order to work on this use case.  The process was simple and I didn't need to provide any form of payment to try it out. Well done Snowflake!&lt;/p&gt;

&lt;h2&gt;
  
  
  Test Data
&lt;/h2&gt;

&lt;p&gt;I generated some test data to load into Snowflake using &lt;a href="https://mockaroo.com" rel="noopener noreferrer"&gt;Mockaroo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndzn7rhbnhbjig8bjbny.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndzn7rhbnhbjig8bjbny.png" alt=" " width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Snowflake Setup
&lt;/h2&gt;

&lt;p&gt;Within Snowflake I created a &lt;a href="https://docs.snowflake.com/en/sql-reference/ddl-database" rel="noopener noreferrer"&gt;database, schema&lt;/a&gt; and a &lt;a href="https://docs.snowflake.com/en/user-guide/warehouses-overview" rel="noopener noreferrer"&gt;warehouse&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4awpt2spgjmbugqty089.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4awpt2spgjmbugqty089.png" alt=" " width="800" height="231"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyj4f1x86kjcsie37yt4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffyj4f1x86kjcsie37yt4.png" alt=" " width="800" height="284"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2oginair0h516pz2g677.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2oginair0h516pz2g677.png" alt=" " width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With these things in place I loaded the data from my Mockaroo generated JSON file into a new table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu88j1nqytgol5j1exf5n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu88j1nqytgol5j1exf5n.png" alt=" " width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8v6otmg8enoh0ctiqp5t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8v6otmg8enoh0ctiqp5t.png" alt=" " width="800" height="562"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnniwjn0mu8av0jenaw14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnniwjn0mu8av0jenaw14.png" alt=" " width="598" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The data is now able to be queried.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrja9xrvjf5ab2r9wlul.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrja9xrvjf5ab2r9wlul.png" alt=" " width="800" height="406"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next I created the required user and role for Glue to use to connect to Snowflake.&lt;/p&gt;

&lt;p&gt;Create a role called car_sales.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fci41nk648u3azxn347o3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fci41nk648u3azxn347o3.png" alt=" " width="397" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Execute the SQL below to assign privileges to the new role.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;grant select on table matt.matt.car_sales to role car_sales;

grant usage on database matt to role car_sales;

grant usage on schema matt to role car_sales;

grant usage on warehouse matt to role car_sales;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a user for Glue to connect to - ensure you set the default warehouse and assign the car_sales role.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlodnsk4xygvuzai4gvs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnlodnsk4xygvuzai4gvs.png" alt=" " width="551" height="839"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS Glue Setup
&lt;/h2&gt;

&lt;p&gt;Create an IAM role for the Glue job to execute. Note Glue will need to be able to access Secrets Manager.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhk1ezyzwk7cy6kwdncln.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhk1ezyzwk7cy6kwdncln.png" alt=" " width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Secrets Manager create a secret.&lt;/p&gt;

&lt;p&gt;Select Other type of secret. For the keys use sfUser, sfPassword and sfWarehouse.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6l43s27mxd6iopsibg9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6l43s27mxd6iopsibg9w.png" alt=" " width="800" height="838"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdflt3upqf9m45zvuq2ts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdflt3upqf9m45zvuq2ts.png" alt=" " width="800" height="893"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now in Glue create a Data connection to Snowflake&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhbucd8pi4flp383qtmee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhbucd8pi4flp383qtmee.png" alt=" " width="800" height="630"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw70na04o1baq1equcsny.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw70na04o1baq1equcsny.png" alt=" " width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1t2i2u8yy0lsnpawv6i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1t2i2u8yy0lsnpawv6i.png" alt=" " width="800" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can find your Snowflake URL in Snowflake by &lt;br&gt;
selecting Admin &amp;gt; Accounts &amp;gt; . . . Manage URLs&lt;/p&gt;

&lt;p&gt;For the AWS Secret, select the one you created earlier.&lt;/p&gt;

&lt;p&gt;Give your connection a sensible name.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftf929xeexydvgm1b6vf3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftf929xeexydvgm1b6vf3.png" alt=" " width="800" height="348"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Extract Snowflake Data with Glue ETL
&lt;/h2&gt;

&lt;p&gt;Create a new Visual ETL job in Glue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro931eanaop3xm2x08tp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fro931eanaop3xm2x08tp.png" alt=" " width="800" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From Sources select Snowflake&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jc99d0n6jj4pft7g8s2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5jc99d0n6jj4pft7g8s2.png" alt=" " width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On data source properties complete the details as shown.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5s4qp25esaunysllhbk4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5s4qp25esaunysllhbk4.png" alt=" " width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Test the connections working by selecting the Snowflake connection in the Visual. This will open the Data preview window. Select the role created earlier. This will start a data preview and display a sample of the data from Snowflake. This will take a few minutes to run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdumq3bvxdx5sa5s40ru.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdumq3bvxdx5sa5s40ru.png" alt=" " width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Within activity in Snowflake you will see the query has been executed by the GLUE user.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtsxs3cca3ie0akwjdax.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtsxs3cca3ie0akwjdax.png" alt=" " width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can continue to complete your ETL job within Glue. For example if I only want cars made by Porsche I could add an SQL transformation step.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6z03ld2xiijnzsdxgz4d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6z03ld2xiijnzsdxgz4d.png" alt=" " width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Selecting the Script tab provides the Glue ETL code for the job.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue import DynamicFrame


def sparkSqlQuery(glueContext, query, mapping, transformation_ctx) -&amp;gt; DynamicFrame:
    for alias, frame in mapping.items():
        frame.toDF().createOrReplaceTempView(alias)
    result = spark.sql(query)
    return DynamicFrame.fromDF(result, glueContext, transformation_ctx)


args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node Snowflake
Snowflake_node1706530098901 = glueContext.create_dynamic_frame.from_options(
    connection_type="snowflake",
    connection_options={
        "autopushdown": "on",
        "dbtable": "car_sales",
        "connectionName": "snowflake_glue_connection",
        "sfDatabase": "matt",
        "sfSchema": "matt",
    },
    transformation_ctx="Snowflake_node1706530098901",
)

# Script generated for node SQL Query
SqlQuery0 = """
select * from myDataSource
where car_make = 'Porsche'
"""
SQLQuery_node1706531170324 = sparkSqlQuery(
    glueContext,
    query=SqlQuery0,
    mapping={"myDataSource": Snowflake_node1706530098901},
    transformation_ctx="SQLQuery_node1706531170324",
)

job.commit()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The cloud provider used for hosting Snowflake makes little difference to the easy integration of Snowflake as a data source in Glue ETL jobs.&lt;/p&gt;

&lt;p&gt;The low code / no code interface for building ETL jobs with Glue makes it simple to gather data from many sources once initial IAM, secrets and connections are in place.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>glue</category>
      <category>snowflake</category>
    </item>
    <item>
      <title>Scaling ML Education With AWS DeepRacer</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Wed, 30 Aug 2023 13:41:06 +0000</pubDate>
      <link>https://dev.to/aws-builders/scaling-ml-education-with-aws-deepracer-3kbn</link>
      <guid>https://dev.to/aws-builders/scaling-ml-education-with-aws-deepracer-3kbn</guid>
      <description>&lt;p&gt;In this blog post I will outline how we used DeepRacer to scale Machine Learning (ML) education across our organisation.&lt;/p&gt;

&lt;p&gt;The blog will cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;How to acquire the cars, track and build the associated event space.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The various roles on our Pit Crew and how each contributes to the success of the event.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;How AWS can support you in running your event along with links to training resources.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The timeline for events leading up to our championship race and how to keep your team engaged.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The costs for the event and the results we saw for the investment.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/deepracer/" rel="noopener noreferrer"&gt;AWS DeepRacer&lt;/a&gt; is a service offered by Amazon Web Services (AWS) that combines machine learning, cloud computing, and robotics to provide a platform for learning and experimenting with reinforcement learning. &lt;/p&gt;

&lt;p&gt;Reinforcement learning is a type of machine learning where agents learn to make decisions by interacting with an environment and receiving feedback in the form of rewards.&lt;/p&gt;

&lt;p&gt;In the context of DeepRacer, the environment is a physical or virtual racetrack, and the agent is an autonomous racing car. Participants can train and develop the car's racing skills using reinforcement learning techniques, allowing the car to learn how to navigate the track and optimize its racing performance over time.&lt;/p&gt;

&lt;p&gt;Overall, AWS DeepRacer is an educational and hands-on way for people to dive into reinforcement learning and gain practical experience in training AI models to perform specific tasks, such as autonomous racing.&lt;/p&gt;

&lt;p&gt;Between April 2023 and July 2023, I put together a team to organise and run a DeepRacer event at &lt;a href="https://www.cdl.co.uk" rel="noopener noreferrer"&gt;CDL&lt;/a&gt; with the aim to scale Machine Learning education across our organisation.&lt;/p&gt;

&lt;h1&gt;
  
  
  Equipment Required
&lt;/h1&gt;

&lt;h2&gt;
  
  
  The Track
&lt;/h2&gt;

&lt;p&gt;We used the 2018 DeepRacer track. This is the smallest track available, and it provided us with some flexibility on location. The track is now referred to as the A-Z Speedway.&lt;/p&gt;

&lt;p&gt;The spec for the track is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track size: 26’ x 17’ or 7.9248m x 5.1816m&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The track should have the following colours:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Field ("the green") = PMS 3395C&lt;/li&gt;
&lt;li&gt;Road surface and AWS logo = PMS 432C&lt;/li&gt;
&lt;li&gt;Dotted center line ("yellow") = PMS 137C&lt;/li&gt;
&lt;li&gt;Track boundaries ("white side lines") = CMYK 0-0-2-0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We were advised that carpet-based tracks have fewer issues with light reflection than their vinyl alternatives.&lt;/p&gt;

&lt;p&gt;The spec in Illustrator Format for the track is available to download from the &lt;a href="https://docs.aws.amazon.com/deepracer/latest/developerguide/samples/deepracer-A-to-Z-speedway-basic.ai.zip" rel="noopener noreferrer"&gt;AWS Documentation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We ordered the track in July 2022 and were given a figure of £1314.02 for us to get this printed to carpet using &lt;a href="https://www.bannerworld.co.uk/product/custom-printed-carpet/" rel="noopener noreferrer"&gt;bannerworld.co.uk&lt;/a&gt;. It came to us in three sections. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Barrier
&lt;/h2&gt;

&lt;p&gt;After looking at this &lt;a href="https://www.amazon.com/dp/B0BVB14J6T/ref=s9_acsd_al_bw_c2_x_1_i?pf_rd_m=ATVPDKIKX0DER&amp;amp;pf_rd_s=merchandised-search-2&amp;amp;pf_rd_r=N4YSSF0DK2N7F0NE83ZD&amp;amp;pf_rd_t=101&amp;amp;pf_rd_p=d3e8afec-d1ed-4329-a3d1-3f70b155cc7f&amp;amp;pf_rd_i=32957528011" rel="noopener noreferrer"&gt;A to Z Speedway Printed Wall Barrier for AWS DeepRacer Race Track&lt;/a&gt; I felt the cost and issues around delivery to the UK were going to be problematic so I decided to try a build my own. &lt;/p&gt;

&lt;p&gt;Spoiler alert, although the barrier worked it is probably the main thing I would change for future events.&lt;/p&gt;

&lt;p&gt;The materials used were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Green Plastic ‘Corex’ Board. 4mm thick - A1 in Size. I ordered 30 sheets from Vesey Gallery via their &lt;a href="https://www.ebay.co.uk/itm/152138450975?hash=item236c28881f:g:VNoAAOSwjVVVsAHX" rel="noopener noreferrer"&gt;EBay shop&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To join the corex board together I used plastic Shower/Bath screen joiner/seals. They came from Shower Seal UK Ltd. I ordered them from their eBay store. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 x 2M lengths of &lt;a href="https://www.ebay.co.uk/itm/283819171966?var=586298035256" rel="noopener noreferrer"&gt;Straight Seals&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;2 x 2M lengths of &lt;a href="https://www.ebay.co.uk/itm/283819171966?var=586298035248" rel="noopener noreferrer"&gt;Corner Seals&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I cut these to the length required to fit the A1 corex plastic sheets with a small handsaw.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgusnn1sfbg3871mp6t8u.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgusnn1sfbg3871mp6t8u.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To provide additional support to the barrier I added some desk draw furniture at regular intervals. The barrier has a couple of issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It's too flexible in the gaps without a desk draw. This meant that the cars could crash into it and cause the corex board to detach from the jointing strips.&lt;/li&gt;
&lt;li&gt;Where there was no gap, the car would occasionally get damaged when it hit the desk draw at high speed. We had to fix the car shells during our event with black gaffa tape. We also had to secure the camera with elastic bands to prevent them from coming loose.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8whxaetvp9nere6slu2.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8whxaetvp9nere6slu2.JPG" alt=" " width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The company logo and DeepRacer event branding was added by ordering large stickers from &lt;a href="https://www.stickermule.com/" rel="noopener noreferrer"&gt;Sticker Mule&lt;/a&gt; 5 copies of our logo printed using the Wall graphics option and sized at 508 mm x 173 mm cost a total of $63.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cars
&lt;/h2&gt;

&lt;p&gt;We ordered 2 x &lt;a href="https://www.amazon.com/dp/B081GZSJVL/ref=s9_acsd_al_bw_c2_x_1_i?pf_rd_m=ATVPDKIKX0DER&amp;amp;pf_rd_s=merchandised-search-1&amp;amp;pf_rd_r=H8EJQE5FF9K1QGXMXYJB&amp;amp;pf_rd_t=101&amp;amp;pf_rd_p=2ae26ae6-11bb-49e1-a151-d10ee4f68e61&amp;amp;pf_rd_i=32957528011" rel="noopener noreferrer"&gt;DeepRacer EVO’s.&lt;/a&gt; I already owned one so we had three cars in total to work with. &lt;/p&gt;

&lt;p&gt;Each car comes in two boxes. The original DeepRacer car and then the EVO upgrade kit.&lt;/p&gt;

&lt;p&gt;We used &lt;a href="https://www.shipito.com/en/" rel="noopener noreferrer"&gt;Shipito&lt;/a&gt; to get the cars shipped to the UK.&lt;/p&gt;

&lt;p&gt;For racing events we do not use the EVO Kit. This matches what AWS do at their events.&lt;/p&gt;

&lt;p&gt;When racing the cars, we learnt a couple of tips about battery placement. It's much easier to secure the motor battery using the Velcro strip to the top of the compute battery. This allows much faster swapping out of batteries during races.&lt;/p&gt;

&lt;p&gt;The cars need to be calibrated following the instructions within the &lt;a href="https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-set-up-vehicle-test-drive.html" rel="noopener noreferrer"&gt;DeepRacer Car console&lt;/a&gt;. In practice we found that calibration mode would sometimes not work. When this happens, we learned that the following actions are helpful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When working with the cars during calibration place them on something so the wheels don't touch the surface. The masking tape included in the car box works well for this.&lt;/li&gt;
&lt;li&gt;Cycle the power on the car.&lt;/li&gt;
&lt;li&gt;Put the car into manual mode and spin the wheels forward and then backwards multiple times.&lt;/li&gt;
&lt;li&gt;Check the compute battery, if it has two bars or less replace it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Car Parts
&lt;/h2&gt;

&lt;p&gt;You will need an iPad or similar to control the car speed trackside. We had two iPad minis to ensure we always had one charged up.&lt;/p&gt;

&lt;p&gt;We ordered spare batteries for the cars:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.amazon.co.uk/dp/B08Y8YN283?psc=1&amp;amp;ref=ppx_yo2ov_dt_b_product_details" rel="noopener noreferrer"&gt;URGENEX 2S Lipo Battery 7.4v Lipo with JST Plug&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These batteries come with USB charging cables. We ordered &lt;a href="https://www.amazon.co.uk/gp/product/B07T7GLJYJ/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&amp;amp;psc=1" rel="noopener noreferrer"&gt;this charger&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.amazon.co.uk/dp/B08SFHT1VV?psc=1&amp;amp;ref=ppx_yo2ov_dt_b_product_details" rel="noopener noreferrer"&gt;Compute Battery&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For storage of batteries we ordered a number of these &lt;a href="https://www.amazon.co.uk/gp/product/B097259MRS/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&amp;amp;psc=1" rel="noopener noreferrer"&gt;protective bags&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As the cars ship with US plugs we ordered 4 of these &lt;a href="https://www.amazon.co.uk/dp/B0085IZQ4E?psc=1&amp;amp;ref=ppx_yo2ov_dt_b_product_details" rel="noopener noreferrer"&gt;Combo Travel Adaptors By SKROSS&lt;/a&gt; to allow us to keep up with our battery charging requirements.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://www.amazon.co.uk/dp/B07QB6D2Z5?psc=1&amp;amp;ref=ppx_yo2ov_dt_b_product_details" rel="noopener noreferrer"&gt;battery tester&lt;/a&gt; is also essential to check quickly that the battery is in a good state.&lt;/p&gt;

&lt;p&gt;A few of tips on battery health:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure the compute battery is over two bars full. If it isn't swap it out for a fresh battery.&lt;/li&gt;
&lt;li&gt;Rotate the motor batteries regularly - we wouldn't race a car unless it showed green on the DeepRacer Car console.&lt;/li&gt;
&lt;li&gt;Have a system within the Pit Crew for rotating batteries. We used a large bowl to put used batteries in and someone would pick these up and get them on charge for later use.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Timer
&lt;/h2&gt;

&lt;p&gt;So that lap times can be accurately captured we added a pressure sensor-based timer to the track. By the final race we ended up with two versions of this as a prefabricated module became available.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timer Option 1
&lt;/h3&gt;

&lt;p&gt;For the first timer I mostly followed the instructions on &lt;a href="https://github.com/davidfsmith/deepracer-timer" rel="noopener noreferrer"&gt;David Smith's GitHub Repo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the parts I ordered the following items&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.amazon.co.uk/gp/product/B07Q1BYDS7/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&amp;amp;psc=1" rel="noopener noreferrer"&gt;Sound Microphone Sensor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.amazon.co.uk/gp/product/B07PM5PTPQ/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&amp;amp;psc=1" rel="noopener noreferrer"&gt;Thin Film Pressure Sensor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.amazon.co.uk/dp/B087X2ZHHR?psc=1&amp;amp;ref=ppx_yo2ov_dt_b_product_details" rel="noopener noreferrer"&gt;Electrical Terminal Blocks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.amazon.co.uk/dp/B074P726ZR?ref=ppx_yo2ov_dt_b_product_details&amp;amp;th=1" rel="noopener noreferrer"&gt;Jumper Wire Set&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thepihut.com/products/prototyping-wire-spool-set?variant=36945199825" rel="noopener noreferrer"&gt;Wire Spool Set&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thepihut.com/products/heat-shrink-pack?variant=27740410385" rel="noopener noreferrer"&gt;Heat Shrink Pack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxz7qsdumnl9rsqgd84lo.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxz7qsdumnl9rsqgd84lo.JPG" alt=" " width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After removing the microphone sensor, I soldered a terminal block so the pressure sensor cables could screwed into place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4n65hglbe7bsg2gc2w7.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4n65hglbe7bsg2gc2w7.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I soldered two meters of cable to each pressure sensor.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrlzuq9swni6phpful7q.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmrlzuq9swni6phpful7q.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final connection to the RaspberryPi is shown below. I just used off the shelf jumper cables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiiaihlo1tthbeaf9aii.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiiaihlo1tthbeaf9aii.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To make the pressure sensor look like a finish line of a race track I bought some thin card, some double sided tape and some &lt;a href="https://www.amazon.co.uk/dp/B0BSDYMY2L?psc=1&amp;amp;ref=ppx_yo2ov_dt_b_product_details" rel="noopener noreferrer"&gt;chequered flag&lt;/a&gt; tape.&lt;/p&gt;

&lt;p&gt;The pressure sensors were attached to the card with double sided tape and the chequered flag tape added to the card to create the finish line effect. The wires were run under the carpet track and stuck to the track using &lt;a href="https://www.amazon.co.uk/dp/B002TOL45K?psc=1&amp;amp;ref=ppx_yo2ov_dt_b_product_details" rel="noopener noreferrer"&gt;green gaffer tape&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fej35pfc3mdcusqmf8yga.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fej35pfc3mdcusqmf8yga.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is a video of the timer being tested.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/Uy3SP-BIAfg"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Timer Option 2
&lt;/h3&gt;

&lt;p&gt;Shortly before our final race the &lt;a href="https://digitalracingkings.com/products/unofficial-deepracer-timer-for-raspberry-pi-3-5" rel="noopener noreferrer"&gt;Digital Racing Kings Unofficial DeepRacer Timer&lt;/a&gt; became available. We ordered one of these to replace the adapted microphone sensors. We found this much better and required much less calibration.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfld5iuvwd19vaz03e9v.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpfld5iuvwd19vaz03e9v.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Schedule of Events
&lt;/h1&gt;

&lt;p&gt;We built up to the final race on the physical track. This took the form of a number of virtual and in person events.&lt;/p&gt;

&lt;h2&gt;
  
  
  Engagement Sessions
&lt;/h2&gt;

&lt;p&gt;Our AWS account team arranged for &lt;a href="https://twitter.com/davidfsmith/" rel="noopener noreferrer"&gt;David Smith&lt;/a&gt; who is a Solutions Architect within AWS ML Thought Leadership to deliver our initial training. David is the person you will often see at the AWS Summits leading the Pit Crew.&lt;/p&gt;

&lt;p&gt;David delivered a number of sessions with the CDL team.&lt;/p&gt;

&lt;p&gt;The first session covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Intro to DeepRacer, Machine Learning and the reward functions. (26 Mins)&lt;/li&gt;
&lt;li&gt;First Model and Training (36 Mins)&lt;/li&gt;
&lt;li&gt;Q&amp;amp;A (21 Mins)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After the Q&amp;amp;A we had a break and returned a few hours later for further Q&amp;amp;A once everyone had got the chance to build a model.&lt;/p&gt;

&lt;p&gt;This session attracted 105 attendees. After the first session David shared a number of follow-on resources to help our team continue their DeepRacer learning.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.youtube.com/watch?v=43zqI0n4D7A" rel="noopener noreferrer"&gt;DeepRacer Video&lt;/a&gt; to learn more about Deep Racer.
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/deepracer/getting-started/?nc=sn&amp;amp;loc=6" rel="noopener noreferrer"&gt;Getting started With DeepRacer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-get-started-training-model.html" rel="noopener noreferrer"&gt;Train Your First Model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/deepracer/latest/developerguide/deepracer-reward-function-input.html" rel="noopener noreferrer"&gt;Input parameters available&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/deepracer/racing-tips/" rel="noopener noreferrer"&gt;Racing tips&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://deepracing.io/#home" rel="noopener noreferrer"&gt;Deepracer community&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/oscarYCL/deepracer-waypoints-workshop/blob/main/Waypoint%20Map/reinvent_base(new_2018).png" rel="noopener noreferrer"&gt;Waypoints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws-deepracer-community.github.io/deepracer-for-cloud/" rel="noopener noreferrer"&gt;DeepRacer for cloud&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://catalog.workshops.aws/deepracer-200l/en-US" rel="noopener noreferrer"&gt;DeepRacer workshop&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;PRO TIP: Suggest setting the hyperparameter of Discount Factor to 0.95&lt;/li&gt;
&lt;li&gt;&lt;a href="https://bit.ly/drl200console" rel="noopener noreferrer"&gt;Workshop console dive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://bit.ly/drl200ws" rel="noopener noreferrer"&gt;Workshop pre-recorded session&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;About a month after the initial engagement session we had a couple of check-in sessions with AWS. The first was a 'pro' session for racers who wanted to do a deep dive on particular topics. The second session was for people who were still getting started with DeepRacer and wanted to ask more basic questions.&lt;/p&gt;

&lt;p&gt;About a month before the final race, we had our track in place and a timer working so we started to offer weekly check-in sessions trackside. These sessions allowed anyone to bring their model along and a member of our Pit Crew would help the team race. We also offered a remote service where people could send a message with their model, and we would record the car going round the track.&lt;/p&gt;

&lt;p&gt;This time on track proved one of the most popular sessions and we were often oversubscribed. Having the chance to tinker with the track, cars, play about, crash the cars etc also helped train our Pit Crew for the final race.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Engagement Tips
&lt;/h2&gt;

&lt;p&gt;In between the engagement sessions we tried a number of things to keep people interested and build the momentum towards the final race.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Blog Posts: One of the Pit Crew wrote about their experience of training models and offered hints and tips.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Race Chat: We used MS Teams to host a chat with the racers. This helped build up a healthy competition between the teams, offer support, answer questions and debug issues. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Take our models to the in-person London Meetup and test them out.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Take our models to the AWS London Summit to try them out.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;We also used the chat to post links to the &lt;a href="https://www.twitch.tv/thetrackboss" rel="noopener noreferrer"&gt;live feeds&lt;/a&gt; of the various AWS Deep Racer events that were happening at the summits across the world.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Virtual League. Within the DeepRacer section of the AWS console we created a CDL league. This allowed our racers to submit their models to the virtual league over the three months build up to the final race.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8efpwj9i5snr4ffcibqt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8efpwj9i5snr4ffcibqt.png" alt=" " width="800" height="621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The Final
&lt;/h1&gt;

&lt;h2&gt;
  
  
  The Pit Crew
&lt;/h2&gt;

&lt;p&gt;The Pit Crew are the hardest working people during the DeepRacer event. Do not underestimate how vital they are to keep the event running smoothly. Our event had 26 teams / 72 people taking part.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track Boss: Monitoring the car as it races on the track and putting it back on the track if it crashes. This is hard work. Rotate the person after each race.&lt;/li&gt;
&lt;li&gt;Mechanics: Charging batteries and swapping them out within the cars and doing the calibration. Fixing the cars if any parts fail.&lt;/li&gt;
&lt;li&gt;Lap Timer: Working with the track boss to capture complete laps and disqualify any invalid laps.&lt;/li&gt;
&lt;li&gt;Team Liaison: Finding the next team to race, explaining how to use the DeepRacer Car console via the iPad and getting them ready to race.&lt;/li&gt;
&lt;li&gt;Commentary: Helping the event flow for those watching in person or via live stream by making announcements, doing interviews or general comments on the action.&lt;/li&gt;
&lt;li&gt;Merchandise: Dealing with the free SWAG&lt;/li&gt;
&lt;li&gt;Audio Visual: Connecting AV equipment, sound checking and checking on the live stream.&lt;/li&gt;
&lt;li&gt;Official Photographer: Capturing the joy / pain of the winners / losers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Racing Structure
&lt;/h2&gt;

&lt;p&gt;Our Pit Crew also established the running order for the event. Each race is three minutes. Within the three minutes each team can complete as many laps as they are able to. The fastest of their laps is what establishes their position on the leader board.&lt;/p&gt;

&lt;p&gt;As we knew the number of teams taking part, we did some estimates of how long it would take to put the car on the track, get each team in position and complete three minutes of racing. We validated with AWS that 5-7 minutes per team was about right. This did work out correctly for us but the Pit Crew didn't get any time to relax. We're told this is normal for the AWS run event too and there is an unwritten rule that everyone on the Pit Crew should be active at all times.&lt;/p&gt;

&lt;p&gt;We held two racing sessions with an hour break for lunch. Starting at 10AM we got all teams to complete two three-minute races by around 3:30PM.&lt;/p&gt;

&lt;p&gt;There was some debate amongst teams on if they should try two different models. Whilst we didn't come up with any real conclusions on this point, we did notice the following real-world effects.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Environmental factors can impact model performance. We did some prior testing with lighting for example and went with what we found to be most reliable.&lt;/li&gt;
&lt;li&gt;Healthy competition amongst teams led to a feeling that certain cars were more reliable than others.&lt;/li&gt;
&lt;li&gt;We tried to support teams by advising them to start their models at a slower speed and experiment with the speed adjustment via the iPad over the three minutes.&lt;/li&gt;
&lt;li&gt;Once a speed benchmark is established further adjustments can be made around corners and straight sections of the track.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Event Space
&lt;/h2&gt;

&lt;p&gt;In addition to the track, we added screens so people could keep an eye on our leader board and watch the racing. We had two cameras on the track and connected these back to a &lt;a href="https://www.blackmagicdesign.com/products/atemmini" rel="noopener noreferrer"&gt;Blackmagic ATEM Mini Extreme&lt;/a&gt;. This allowed us to record all camera and graphics feeds for editing after the event and also live stream via MS Teams to the whole company.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F594gy2rjtf522ywi3d2d.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F594gy2rjtf522ywi3d2d.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwfkz5qm6mxe8yo557hn.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwfkz5qm6mxe8yo557hn.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/EQwkaKl5Ino"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;In preparation for our event I attended a couple of DeepRacer events at AWS Summits and also through the &lt;a href="https://www.meetup.com/aws-deepracer-community-uk-ireland/" rel="noopener noreferrer"&gt;UK DeepRacer Meetup&lt;/a&gt;. The latter was hosted by JP Morgan who have been doing events for a long time. Having in person conversations with the community members, JP Morgan and AWS was really helpful to me.&lt;/p&gt;

&lt;h2&gt;
  
  
  DeepRacer Event Manager (DREM)
&lt;/h2&gt;

&lt;p&gt;DREM is an AWS built application that is the missing piece for any DeepRacer event. If you have ever raced at an AWS Summit this is the system in use at those events.&lt;/p&gt;

&lt;p&gt;At the time of writing, it is working its way towards a public beta but we were fortunate enough to get our hands on it via our AWS account team. DREM made the event run much smoother and has some awesome features just not available via the public AWS or Open-Source tools that we found.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allows racers to upload their models via login to their DREM account.&lt;/li&gt;
&lt;li&gt;Car management, which includes loading models to the cars. Effectively you have a fleet of cars registered in DREM and can send the model to any car. Doing this manually via separate logins to each car console now feels painful after using DREM.&lt;/li&gt;
&lt;li&gt;Integrates with the lap timer device recording all key lap data.&lt;/li&gt;
&lt;li&gt;Uses lap timer data to create a live scoreboard for the event.&lt;/li&gt;
&lt;li&gt;Provides on-screen graphics showing the current racer and their lap time stats such as current and best.&lt;/li&gt;
&lt;li&gt;Provides a console for the Track Boss / Pit Crew to invalidate laps e.g. when the car leaves the track and crosses the pressure sensor incorrectly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I can't wait for the beta version to be available and recommend that anyone doing a DeepRacer event speaks to AWS about availability of DREM.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prizes
&lt;/h2&gt;

&lt;p&gt;We offered a prize to the team for the fastest lap and another prize for a 'highly commended' category. The latter was intended to be a way for us to incentivise people to take part in the final race even if they had not done that well in the virtual league.&lt;/p&gt;

&lt;p&gt;In practice we found the virtual race to be very different to the physical race. There are far more variables in real life and the virtual race is more of a dust-free lab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23h5klo1q3bxm645j6hh.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F23h5klo1q3bxm645j6hh.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Swag
&lt;/h2&gt;

&lt;p&gt;As an added incentive to take part in the racing we arranged a selection of DeepRacer swag. In the morning of the event, we only allowed our racers to pick up swag. In the afternoon we opened up the swag to everyone.&lt;/p&gt;

&lt;p&gt;To capture feedback on the event we made swag pickup conditional on completion of a survey. See results later on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrzgtjuxpww76fiecu7g.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrzgtjuxpww76fiecu7g.JPG" alt=" " width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fue9ceebsdzgrb4mn7mz5.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fue9ceebsdzgrb4mn7mz5.JPG" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The T-Shirts were kindly arranged and provided by our AWS Account team. They also designed the logo for the event.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklsqb05k5vqrgbakv4hw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fklsqb05k5vqrgbakv4hw.png" alt=" " width="800" height="851"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The rest of the Swag came through contacts in the &lt;a href="https://join.slack.com/t/aws-ml-community/shared_invite/zt-226ch5s9i-tZ_5Ggqbimn3YuOGJ9~bcw" rel="noopener noreferrer"&gt;DeepRacer Community&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;This slack workspace is where the DeepRacer community get together and talk about all things DeepRacer. The community was an invaluable resource for ideas and support during the run up to our event. A special thanks to &lt;a href="https://twitter.com/breadcentric" rel="noopener noreferrer"&gt;Tomasz Ptak&lt;/a&gt; who put me in contact with &lt;a href="https://www.promoveritas.com" rel="noopener noreferrer"&gt;Promo Veritas&lt;/a&gt; as they sent us DeepRacer Hoodies, Caps and Socks.&lt;/p&gt;

&lt;h1&gt;
  
  
  Costs
&lt;/h1&gt;

&lt;p&gt;Overall, we had capital costs of £4,973.22 &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The two DeepRacer Cars cost £1,507.48&lt;/li&gt;
&lt;li&gt;Various cables and dedicated 4G router cost  £474.59&lt;/li&gt;
&lt;li&gt;DeepRacer AWS charges not covered by our APN credits £623.66&lt;/li&gt;
&lt;li&gt;The barrier  £799.47&lt;/li&gt;
&lt;li&gt;Track and lap timer  £1,415.79&lt;/li&gt;
&lt;li&gt;Prizes £152.23 &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We estimated that our AWS bill for DeepRacer would come to $3719. This was covered by Innovation Sandbox Credits. The AWS innovation Sandbox Credits are designed to help you effectively integrate AWS services into your solution or launch a product to general availability on AWS. The AWS Innovation Sandbox credits help offset AWS usage costs incurred during the development. This benefit is available to AWS partners that build or offer services and solutions.&lt;/p&gt;

&lt;h1&gt;
  
  
  Results
&lt;/h1&gt;

&lt;p&gt;We had 71 people respond to our survey. 100% of the number signed up to race. After the event we also held a retrospective to help us form suggestions for the future of DeepRacer at CDL.&lt;/p&gt;

&lt;p&gt;94% said they would take part in another CDL track event with 90% wanting to take part again in the virtual aspect of DeepRacer.&lt;/p&gt;

&lt;p&gt;All respondents thought that DeepRacer encouraged or improved collaboration within our teams. 96% of people thought the ~£5k expenditure was good value for money.&lt;/p&gt;

&lt;p&gt;87.4% of people reported learning more about Machine Learning by taking part.&lt;/p&gt;

&lt;p&gt;We also captured free text feedback:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Great event, would like to see this on an annual basis."&lt;/p&gt;

&lt;p&gt;"Found training on the physical track invaluable.  There are quite a few differences in behaviour between the virtual and physical track."&lt;/p&gt;

&lt;p&gt;"To properly explore the different factors that go into a successful model takes time.  Would happily have continued exploring different models and different approaches to the models with more hours of training."&lt;/p&gt;

&lt;p&gt;"Small team, so was easy to engage with each other and bounce ideas off each other."&lt;/p&gt;

&lt;p&gt;"All in all, a very enjoyable way to get into AI and machine learning."&lt;/p&gt;

&lt;p&gt;"Great use of space. Fantastic collaboration opportunity. Great team building. Fun." &lt;/p&gt;

&lt;p&gt;"Cracking thing to do. Encourages a very social twist on a tech thing, it improves diversification with gamified learning which a lot of people benefit from." &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  What's Next?
&lt;/h1&gt;

&lt;p&gt;We are currently looking at running future DeepRacer events. We are particularly interested in running an event with one or more of our customers. We are also looking at contributing to the community initially through a DeepRacer event for a local college or university.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>ai</category>
    </item>
    <item>
      <title>SSL For RDS With Glue Python Job and AWS SDK For Pandas</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Sun, 06 Nov 2022 10:40:36 +0000</pubDate>
      <link>https://dev.to/aws-builders/ssl-for-rds-with-glue-python-job-and-aws-sdk-for-pandas-2cf6</link>
      <guid>https://dev.to/aws-builders/ssl-for-rds-with-glue-python-job-and-aws-sdk-for-pandas-2cf6</guid>
      <description>&lt;p&gt;This blog post is the result of a recent interaction with AWS Support. As always they were very helpful in resolving the issue.&lt;/p&gt;

&lt;h2&gt;
  
  
  AWS SDK For Pandas
&lt;/h2&gt;

&lt;p&gt;Recently AWS renamed the AWS data wrangler python library to &lt;a href="https://aws-sdk-pandas.readthedocs.io/en/stable/#" rel="noopener noreferrer"&gt;AWS SDK for Pandas&lt;/a&gt;.  This is an AWS Professional Service open source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data related services.&lt;/p&gt;

&lt;p&gt;Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses and Databases.&lt;/p&gt;

&lt;p&gt;I was looking to use the integration with AWS Glue to use a glue connection within some Python ETL code. The connection in my case was to an Amazon RDS PostgreSQL database.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;awswrangler&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;wr&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;con_postgresql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;postgresql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My-RDS-PostgreSQL-Connection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;con_postgresql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The theory was that the connection could be defined in Glue once and used by multiple AWS Glue ETL &lt;/p&gt;

&lt;h2&gt;
  
  
  Amazon RDS Ready - Encryption Requirements
&lt;/h2&gt;

&lt;p&gt;The purpose of the &lt;a href="https://aws.amazon.com/rds/partners/?blog-posts-cards.sort-by=item.additionalFields.modifiedDate&amp;amp;blog-posts-cards.sort-order=desc&amp;amp;partner-solutions-cards.sort-by=item.additionalFields.partnerNameLower&amp;amp;partner-solutions-cards.sort-order=asc&amp;amp;awsf.partner-solutions-filter-partner-type=*all&amp;amp;awsf.partner-solutions-filter-product=*all&amp;amp;awsf.partner-solutions-filter-location=*all" rel="noopener noreferrer"&gt;Amazon Relational Database Service (RDS) Ready Program&lt;/a&gt; is to recognise AWS Partner products that support the use of Amazon RDS database as a backend for business applications deployed within a customer’s AWS account or provided as SaaS deployed in APN Partner’s AWS Account. &lt;/p&gt;

&lt;p&gt;This program requires that products follow AWS security, availability, reliability, performance and other architecture best practices while integrating with Amazon RDS.&lt;/p&gt;

&lt;p&gt;At CDL our software has been accredited as Amazon RDS Ready and we apply these standards when developing new solutions. Specifically on Data encryption the Amazon RDS Ready states:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DBCONN-004 - Data Encryption:&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;For business applications where data encryption is a requirement for security compliance, the product must support encryption of data at rest and in transit for Amazon RDS.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;At CDL we ensure that data to RDS is encrypted in transit by setting the rds.force_ssl parameter to 1. See &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Concepts.General.SSL.html#PostgreSQL.Concepts.General.SSL.Requiring" rel="noopener noreferrer"&gt;Using SSL with a PostgreSQL DB instance - Amazon Relational Database Service&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Attempting an SSL Connection From Glue To RDS
&lt;/h2&gt;

&lt;p&gt;A connection in Glue is created to a RDS database that has rds.force_ssl set.&lt;/p&gt;

&lt;p&gt;This is done via the legacy glue connection screen in the console as this allows us to test the connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff38cjxt2o51ud7az6ndk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff38cjxt2o51ud7az6ndk.png" alt="Glue Connection" width="800" height="379"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see running the test works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj83y03bx6ibxcxqfygss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj83y03bx6ibxcxqfygss.png" alt="Glue Connection Test OK" width="800" height="275"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Next we try an use that connection in a AWS Glue Python Job utilising the AWS SDK For Pandas.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;awswrangler&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;wr&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;con_postgresql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;postgresql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My-RDS-PostgreSQL-Connection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;con_postgresql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running the job will return errors about SSL. I got a couple of different errors when trying to debug different versions of the code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fft6w1vh9e2bk0cmqwbjg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fft6w1vh9e2bk0cmqwbjg.png" alt="SSL Error" width="800" height="164"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finvwx0qeuh1b0o8t7q8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Finvwx0qeuh1b0o8t7q8x.png" alt="SSL Error" width="800" height="224"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After a bit of back and forth with AWS Support trying to debug the issue the service team identified the following.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Currently, awswrangler loads and uses default SSL configuration for creating boto3 session clients.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It was clear from the errors we receive that this default did not include the Amazon RDS Root CA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To overwrite a default configuration, it’s possible to use the connect() function in awswrangler that allows to pass an SSL context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We need to download the RDS root certificate and point to it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;awswrangler&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;wr&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ssl&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;urllib.request&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;download_rds_root_ca&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Downloading RDS CA root cert…&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;urllib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;urlretrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Downloaded RDS CA root cert.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_rds_ssl_context&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;cafile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/tmp/rds-ca-2019-root.pem&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="nf"&gt;download_rds_root_ca&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cafile&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ssl_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SSLContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PROTOCOL_TLS&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;ssl_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;verify_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ssl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CERT_REQUIRED&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;ssl_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_verify_locations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cafile&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cafile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ssl_context&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Connecting to RDS database…&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rds_ssl_context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_rds_ssl_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;con_postgresql&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;wr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;postgresql&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;My-RDS-PostgreSQL-Connection&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ssl_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;rds_ssl_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Successfully connected to RDS database.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Run With SSL
&lt;/h2&gt;

&lt;p&gt;Running the job again with the correct SSL certificate in place we get a successful execution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cepiajb4td9dp0ozx8v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4cepiajb4td9dp0ozx8v.png" alt="Job Run Ok" width="800" height="151"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqe734omy8x9v4hlv0b8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjqe734omy8x9v4hlv0b8.png" alt="Jon Run Logs" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>glue</category>
      <category>python</category>
      <category>etl</category>
    </item>
    <item>
      <title>Using Athena Views As A Source In Glue</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Wed, 16 Feb 2022 17:03:02 +0000</pubDate>
      <link>https://dev.to/aws-builders/using-athena-views-as-a-source-in-glue-k09</link>
      <guid>https://dev.to/aws-builders/using-athena-views-as-a-source-in-glue-k09</guid>
      <description>&lt;p&gt;Whilst working with AWS Glue recently I noticed that I was unable to use a view created in Athena as a source for an ETL job in the same way that I could use a table that had been cataloged.&lt;/p&gt;

&lt;p&gt;The error I received was this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;An&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="n"&gt;occurred&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;calling&lt;/span&gt; &lt;span class="n"&gt;o73&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getCatalogSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;classification&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mydatabase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v_my_view&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rather than try and recreate the view using a new PySpark job I used the Athena JDBC drivers as a custom JAR in a glue job to be able to query the view I wanted to use. &lt;/p&gt;

&lt;p&gt;This blog are my notes on how this works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Drivers
&lt;/h2&gt;

&lt;p&gt;Create or reuse an existing S3 bucket to store the Athena JDBC drivers JAR file. The JAR files are available to &lt;a href="https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html" rel="noopener noreferrer"&gt;download from AWS&lt;/a&gt;. I used the latest version which at the time of writing was JDBC Driver with AWS SDK AthenaJDBC42_2.0.27.1000.jar (compatible with JDBC 4.2 and requires JDK 8.0 or later).&lt;/p&gt;

&lt;h2&gt;
  
  
  IAM
&lt;/h2&gt;

&lt;p&gt;The Glue job will need not only Glue Service privileges but also IAM privileges to access the S3 Buckets and also the AWS Athena Service.&lt;/p&gt;

&lt;p&gt;For Athena this would provide Glue will full permissions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Sid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"VisualEditor1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"logs:CreateLogStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"logs:AssociateKmsKey"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"athena:*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"logs:CreateLogGroup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"logs:PutLogEvents"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:athena:*:youraccount:workgroup/*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:athena:*:youracccont:datacatalog/*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:logs:*:*:/aws-glue/*"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create Glue ETL Job
&lt;/h2&gt;

&lt;p&gt;My use case for the Glue job was to query the view I had and save the results into Parquet format to speed up future queries against the same data.&lt;/p&gt;

&lt;p&gt;The following code allows you to query an Athena view as a source for a data frame.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.transforms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;getResolvedOptions&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GlueContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.job&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.dynamicframe&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DynamicFrame&lt;/span&gt;

&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getResolvedOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;glueContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GlueContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark_session&lt;/span&gt;
&lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;athena_view_dataframe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jdbc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;driver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;com.simba.athena.jdbc.Driver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AwsCredentialsProviderClass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;com.simba.athena.amazonaws.auth.InstanceProfileCredentialsProvider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jdbc:awsathena://athena.eu-west-1.amazonaws.com:443&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbtable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AwsDataCatalog.yourathenadatabase.yourathenaview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3OutputLocation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://yours3bucket/temp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;athena_view_dataframe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;printSchema&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key things in this code snippet to be aware of are.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;driver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;com.simba.athena.jdbc.Driver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are telling Glue which class within the JDBC driver to use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AwsCredentialsProviderClass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;com.simba.athena.amazonaws.auth.InstanceProfileCredentialsProvider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This uses the IAM role assigned to the Glue job to authenticate to Athena. You can use other authentication method like AWS_ACCESS_KEY or federated authentication but using IAM I think makes most sense for an ETL job that will most likely run on a schedule or event.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jdbc:awsathena://athena.eu-west-1.amazonaws.com:443&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I am using Athena in Ireland (EU-WEST-1) if you are using a different region update this accordingly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbtable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AwsDataCatalog.yourathenadatabase.yourathenaview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fully qualified name of view in your Athena catalog. It's in the format of 'AwsDataCatalog.Database.View'. For example this query run in Athena.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;"AwsDataCatalog"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"vehicles"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;"v_electric_cars"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You would set the dbtable option to this&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbtable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AwsDataCatalog.vehicles.v_electric_cars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The last option tells Glue which S3 location to use as temporary storage to store the data returned from Athena.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3OutputLocation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://yours3bucket/temp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point you can test it works. When running the job you need to tell Glue about the location for the Athena JDBC drivers JAR file that was uploaded to S3.&lt;/p&gt;

&lt;p&gt;If you are working in the AWS Glue Console the parameter to set can be found under Job Details --&amp;gt; Advanced --&amp;gt; Dependent JARs path.&lt;/p&gt;

&lt;p&gt;The parameter needs to be set to the full path and filename of the JAR file. For example s3://yours3bucket/jdbc-drivers/AthenaJDBC42_2.0.27.1000.jar&lt;/p&gt;

&lt;p&gt;By setting this in the console it ensures that the correct argument is passed into the Glue job.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nt"&gt;--extra-jars&lt;/span&gt; s3://yours3bucket/jdbc-drivers/AthenaJDBC42_2.0.27.1000.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final code including the conversion to Parquet format looked like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.transforms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;getResolvedOptions&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GlueContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.job&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Job&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglue.dynamicframe&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DynamicFrame&lt;/span&gt;

&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getResolvedOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;sc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;glueContext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GlueContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;spark_session&lt;/span&gt;
&lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Job&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JOB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;athena_view_dataframe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jdbc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;driver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;com.simba.athena.jdbc.Driver&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AwsCredentialsProviderClass&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;com.simba.athena.amazonaws.auth.InstanceProfileCredentialsProvider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jdbc:awsathena://athena.eu-west-1.amazonaws.com:443&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbtable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AwsDataCatalog.vehicles.v_electric_cars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;S3OutputLocation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://yours3bucket/temp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;athena_view_dataframe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;printSchema&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;athena_view_datasource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DynamicFrame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromDF&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;athena_view_dataframe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;athena_view_source&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pq_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write_dynamic_frame&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;athena_view_datasource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;connection_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glueparquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;connection_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://yourotherS3Bucket/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;partitionKeys&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;format_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compression&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;snappy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;transformation_ctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ParquetConversion&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>aws</category>
      <category>sql</category>
      <category>etl</category>
      <category>glue</category>
    </item>
    <item>
      <title>What's New in ML? re:Invent ML Keynote</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Tue, 08 Dec 2020 19:14:53 +0000</pubDate>
      <link>https://dev.to/aws-builders/what-s-new-in-ml-re-invent-ml-keynote-5fol</link>
      <guid>https://dev.to/aws-builders/what-s-new-in-ml-re-invent-ml-keynote-5fol</guid>
      <description>&lt;p&gt;It's week two of re:Invent and that includes the first ever dedicated keynote for Machine Learning. Here are the features that I found interesting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fj83lw6di4lflms7d7a7m.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fj83lw6di4lflms7d7a7m.jpeg" alt="Alt Text" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Aurora ML
&lt;/h1&gt;

&lt;p&gt;This feature aims to bring ML to SQL without you needing to learn ML. Running your query Amazon Aurora provides the integration to AWS ML services.&lt;/p&gt;

&lt;p&gt;Aurora exposes ML models as SQL functions, allowing you to use standard SQL to build applications that call ML models, pass data to them, and return predictions as query results. The models can include ones you trained in SageMaker, Comprehend or models offered by AWS partners.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/rds/aurora/machine-learning/" rel="noopener noreferrer"&gt;https://aws.amazon.com/rds/aurora/machine-learning/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fhareunyx95gun2t3i0va.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fhareunyx95gun2t3i0va.png" alt="Alt Text" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Lookout For Metrics
&lt;/h1&gt;

&lt;p&gt;Amazon Lookout for Metrics uses machine learning to automatically detect and diagnose anomalies in business and operational time series data.&lt;/p&gt;

&lt;p&gt;You can connect up data stores like S3, RDS, Redshift and SaaS applications and monitor metrics that are important to your business. &lt;/p&gt;

&lt;p&gt;Amazon Lookout simplifies the process by automatically inspecting and preparing the data and building a custom ML model. It's powered by experience built up of doing this within Amazon.&lt;/p&gt;

&lt;p&gt;Sign up for the preview here &lt;a href="https://aws.amazon.com/lookout-for-metrics/" rel="noopener noreferrer"&gt;https://aws.amazon.com/lookout-for-metrics/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0reopyijlrqz31ew8vsc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0reopyijlrqz31ew8vsc.png" alt="Alt Text" width="715" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Neptune ML
&lt;/h1&gt;

&lt;p&gt;Amazon Neptune is a fully managed graph database service that is designed to work with highly connected datasets. The new ML feature looks to bring predictions on graphs. &lt;/p&gt;

&lt;p&gt;For large graphs with billions of relationships, it’s hard to discover insights using queries based only on human intuition. For this reason, you can use ML on graphs to automatically reveal new insights and make predictions.&lt;/p&gt;

&lt;p&gt;Using graph neural networks (GNNs), a machine learning (ML) technique purpose-built for graphs you can improve the accuracy of most predictions for graphs by over 50%. &lt;/p&gt;

&lt;p&gt;Neptune ML uses the Deep Graph Library (DGL), an open-source library to which AWS contributes that makes it easy to develop and apply GNN models on graph data.&lt;/p&gt;

&lt;p&gt;Read the AWS Database blog on the announcement here &lt;a href="https://aws.amazon.com/blogs/database/announcing-amazon-neptune-ml-easy-fast-and-accurate-predictions-on-graphs/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/database/announcing-amazon-neptune-ml-easy-fast-and-accurate-predictions-on-graphs/&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;There is also a getting started guide here &lt;a href="https://aws.amazon.com/blogs/database/how-to-get-started-with-neptune-ml/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/database/how-to-get-started-with-neptune-ml/&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fyhftukdkrgz3tdu2spuo.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fyhftukdkrgz3tdu2spuo.jpeg" alt="Alt Text" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Redshift ML
&lt;/h1&gt;

&lt;p&gt;Like Aurora ML this feature aims to bring ML to Redshift using SQL.&lt;/p&gt;

&lt;p&gt;The CREATE MODEL SQL command is used in Redshift to specify your training data. Redshift ML will then compile and import the trained model inside the Redshift data warehouse and prepare a SQL function for use in SQL queries.&lt;/p&gt;

&lt;p&gt;See the product page for more details and to get started &lt;a href="https://aws.amazon.com/redshift/features/redshiftML/" rel="noopener noreferrer"&gt;https://aws.amazon.com/redshift/features/redshiftML/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fv3k24itbsqr5g1lylxc6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fv3k24itbsqr5g1lylxc6.png" alt="Alt Text" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  HealthLake
&lt;/h1&gt;

&lt;p&gt;HealthLake aims to take medical data and provide tools and machine learning to make it available for analytics in a way that is 'HIPAA-eligible' and that supports Fast Healthcare Interoperability Resources (FHIR) industry standard format.&lt;/p&gt;

&lt;p&gt;Using NLP and Comprehend Medical processing the data is made available for search and query using QuickSight, SageMaker and third party applications.&lt;/p&gt;

&lt;p&gt;Sign up for the preview here &lt;a href="https://aws.amazon.com/healthlake/" rel="noopener noreferrer"&gt;https://aws.amazon.com/healthlake/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Watch the two minute into video.&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/HM-7YkMt9Y4"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>aws</category>
      <category>analytics</category>
      <category>datascience</category>
    </item>
    <item>
      <title>re:Invent Week 2: Data Sessions</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Mon, 07 Dec 2020 20:56:05 +0000</pubDate>
      <link>https://dev.to/aws-builders/re-invent-week-2-data-sessions-2ge0</link>
      <guid>https://dev.to/aws-builders/re-invent-week-2-data-sessions-2ge0</guid>
      <description>&lt;p&gt;Week 2 is starting soon so here are my picks of the data related sessions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Machine Learning Keynote&lt;/li&gt;
&lt;li&gt;Using Amazon QLDB as a system-of-trust database for core business apps&lt;/li&gt;
&lt;li&gt;Get started with Amazon SageMaker in minutes&lt;/li&gt;
&lt;li&gt;What’s new in Amazon RDS for SQL Server&lt;/li&gt;
&lt;li&gt;Fast distributed training and near-linear scaling with PyTorch on AWS&lt;/li&gt;
&lt;li&gt;Building a successful inventory planning solution with Amazon Forecast&lt;/li&gt;
&lt;li&gt;Get deep insights about your ML models during training&lt;/li&gt;
&lt;li&gt;Amazon Aurora Serverless v2: Instant scaling for demanding workloads&lt;/li&gt;
&lt;li&gt;Paving the way toward automated driving with BMW Group&lt;/li&gt;
&lt;li&gt;Migrating databases to Amazon DocumentDB (with MongoDB compatibility)&lt;/li&gt;
&lt;li&gt;Train large models with billions of parameters in TensorFlow 2.0 &lt;/li&gt;
&lt;li&gt;How New Relic is migrating its Apache Kafka cluster to Amazon MSK&lt;/li&gt;
&lt;li&gt;Deliver viewing experiences for super fans with Amazon Personalize&lt;/li&gt;
&lt;li&gt;Running Apache Cassandra workloads with Amazon Keyspaces&lt;/li&gt;
&lt;li&gt;Harness the power of data with AWS analytics&lt;/li&gt;
&lt;li&gt;What’s new with Amazon Redshift&lt;/li&gt;
&lt;li&gt;Power modern serverless applications with GraphQL and AWS AppSync&lt;/li&gt;
&lt;li&gt;How Amazon Redshift powers large-scale analytics for Amazon.com&lt;/li&gt;
&lt;li&gt;New use cases for Amazon Redshift&lt;/li&gt;
&lt;li&gt;Beyond AWS DMS: Programs and partners to ace your migration&lt;/li&gt;
&lt;li&gt;Amazon.com’s use of AI/ML to enhance the customer experience&lt;/li&gt;
&lt;li&gt;Migrating a legacy data warehouse to Amazon Redshift&lt;/li&gt;
&lt;li&gt;What’s new with Amazon EMR&lt;/li&gt;
&lt;li&gt;Choose the right machine learning algorithm in Amazon SageMaker&lt;/li&gt;
&lt;li&gt;Understanding AWS Lambda streaming events&lt;/li&gt;
&lt;li&gt;Infrastructure Keynote (ok not strictly data services but it's always interesting to see the scale of what is powering them)&lt;/li&gt;
&lt;li&gt;Serverless data preparation with AWS Glue&lt;/li&gt;
&lt;li&gt;Deep dive on Amazon Aurora with MySQL compatibility&lt;/li&gt;
&lt;li&gt;Under the hood: How Amazon uses AWS for analytics at petabyte scale&lt;/li&gt;
&lt;li&gt;Building real-time applications using Apache Flink&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>database</category>
      <category>datascience</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Whats New In Data: re:invent Andy Jassy Keynote</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Tue, 01 Dec 2020 21:41:05 +0000</pubDate>
      <link>https://dev.to/aws-builders/whats-new-in-data-re-invent-andy-jassy-keynote-7fa</link>
      <guid>https://dev.to/aws-builders/whats-new-in-data-re-invent-andy-jassy-keynote-7fa</guid>
      <description>&lt;p&gt;It's a different experience this year. The chat with my teammates is a mixture of discussion about new features and pictures of good times in Vegas from previous re:Invent conferences.&lt;/p&gt;

&lt;p&gt;Andy Jassy has finished the first keynote of 2020 and I was not disappointed. Lots of great new features that we have use cases for.&lt;/p&gt;

&lt;p&gt;Here are my favourite data related features announced during the Andy Jassy re:Invent keynote.&lt;/p&gt;

&lt;h1&gt;
  
  
  Glue Elastic Views
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fx0zplildtbtsrpw0u0yt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fx0zplildtbtsrpw0u0yt.png" alt="Alt Text" width="800" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most data teams and customers I work with have data in multiple places. You might have a CRM system, an accounts system, document management etc. Bringing all this data together and keeping it up to date in a 'single customer view' for analytics workloads is something data engineers spend a lot of time thinking about.&lt;/p&gt;

&lt;p&gt;I've used Materialised Views heavily in the past to convert transactional data models into views more suitable for reporting queries.&lt;/p&gt;

&lt;p&gt;Glue Elastic views seems to be a great feature where you have data in multiple types of databases and want to apply Change Data Capture (CDC) and Materialised view type functionality.&lt;/p&gt;

&lt;p&gt;I cant wait to get hands on with the preview. You can sign up today at &lt;a href="https://aws.amazon.com/glue/features/elastic-views/" rel="noopener noreferrer"&gt;https://aws.amazon.com/glue/features/elastic-views/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Quicksight Q
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1euua12m0dd76drmhbw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1euua12m0dd76drmhbw9.png" alt="Alt Text" width="735" height="380"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I was already a fan of QuickSight due to the pay per session pricing. It works really well when you consider the minimum user licensing for some other data visualisation tools.&lt;/p&gt;

&lt;p&gt;I also like the features for embedding QuickSight dashboards into your applications.&lt;/p&gt;

&lt;p&gt;With the newly announced feature of using natural language to ask questions of your data it makes it even easier for end users of your applications to benefit from analytics in a much more consistent and integrated way.&lt;/p&gt;

&lt;p&gt;The Q feature is in preview and you can sign up at &lt;a href="https://aws.amazon.com/quicksight/q/?nc=sn&amp;amp;loc=4" rel="noopener noreferrer"&gt;https://aws.amazon.com/quicksight/q/?nc=sn&amp;amp;loc=4&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check out the blog on QuickSight Q here &lt;a href="https://aws.amazon.com/blogs/aws/amazon-quicksight-q-to-answer-ad-hoc-business-questions/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/aws/amazon-quicksight-q-to-answer-ad-hoc-business-questions/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  New gp3 EBS Volumes
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fap1xnvcloycckgu34so7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fap1xnvcloycckgu34so7.png" alt="Alt Text" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can now scale your storage volume performance independent of storage capacity. Oh and it's up to 20% cheaper than gp2. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2020/12/introducing-new-amazon-ebs-general-purpose-volumes-gp3/" rel="noopener noreferrer"&gt;https://aws.amazon.com/about-aws/whats-new/2020/12/introducing-new-amazon-ebs-general-purpose-volumes-gp3/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Aurora Serverless v2
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1m23z1wrnlz1v83cjfb7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F1m23z1wrnlz1v83cjfb7.png" alt="Alt Text" width="800" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;v2 now claims to be able to scale instantly in a fraction of a second. The scaling is adjusted in fine-grained increments to provide just the right amount of database resources that the application needs. &lt;/p&gt;

&lt;p&gt;The preview will be MySQL currently and will have Aurora features like Global Database, Multi-AZ deployment and read replicas.&lt;/p&gt;

&lt;p&gt;Sign up for the preview at &lt;a href="https://aws.amazon.com/rds/aurora/serverless/" rel="noopener noreferrer"&gt;https://aws.amazon.com/rds/aurora/serverless/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Babelfish for PostgreSQL
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjilqcuz8crsjh9hmurjy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjilqcuz8crsjh9hmurjy.png" alt="Alt Text" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I've seen quite a number of database workload migrations to the cloud. Often these will also include moving from a commercial database engine to an open source engine like PostgreSQL. There are tools like AWS DMS and Qlik Replicate that do a good job of handing the data migration and conversion of data types. What is often is more time consuming is migration of database code such as PL/SQL to the open source equivalent.&lt;/p&gt;

&lt;p&gt;Babelfish looks to address the database code migration problem for MS SQL to PostgreSQL migrations.&lt;/p&gt;

&lt;p&gt;Babelfish adds an endpoint to PostgreSQL that understands the SQL Server wire protocol Tabular Data Stream (TDS), as well as commonly used T-SQL commands used by SQL Server.&lt;/p&gt;

&lt;p&gt;With Babelfish enabled, you don’t have to swap out database drivers or take on the significant effort of rewriting and verifying all of your applications’ database requests.&lt;/p&gt;

&lt;p&gt;Check out the AWS Open Source blog on Babelfish here &lt;a href="https://aws.amazon.com/blogs/opensource/want-more-postgresql-you-just-might-like-babelfish/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/opensource/want-more-postgresql-you-just-might-like-babelfish/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AWS are going to open source Babelfish in Q1 2021 until then you can sign up for the Amazon Aurora preview. You can also check out the Babelfish community here &lt;a href="https://babelfish-for-postgresql.github.io/babelfish-for-postgresql/" rel="noopener noreferrer"&gt;https://babelfish-for-postgresql.github.io/babelfish-for-postgresql/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  SageMaker Data Wrangler
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5au1o6o48oz0p3rfw32e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F5au1o6o48oz0p3rfw32e.png" alt="Alt Text" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In some industries up to 92% of analytics project time is spent doing data wrangling (sourcing, ETL, cleaning etc) in order to get ready for the actual Machine Learning and Analytics workloads.&lt;/p&gt;

&lt;p&gt;Amazon SageMaker Data Wrangler claims to reduce the time it takes to aggregate and prepare data for machine learning and simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/sagemaker/data-wrangler/" rel="noopener noreferrer"&gt;https://aws.amazon.com/sagemaker/data-wrangler/&lt;/a&gt; &lt;/p&gt;

&lt;h1&gt;
  
  
  SageMaker Feature Store
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fr2i6v7y7hcggf001x28p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fr2i6v7y7hcggf001x28p.png" alt="Alt Text" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Like data wrangling feature engineering can be a time consuming process. Once completed it makes sense to be able to share the results with other people who might be developing machine learning workloads based on the same datasets.&lt;/p&gt;

&lt;p&gt;Just as a data catalog enables an organisation to discover data assets the new Feature Store in Sagemaker provides a repository where you can store and access features so it’s much easier to name, organise, and reuse them across teams.&lt;/p&gt;

&lt;p&gt;Check out the details here &lt;a href="https://aws.amazon.com/sagemaker/feature-store/" rel="noopener noreferrer"&gt;https://aws.amazon.com/sagemaker/feature-store/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  SageMaker Pipelines
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmh2fz3wsdqdknjubsi1t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmh2fz3wsdqdknjubsi1t.png" alt="Alt Text" width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bringing CI/CD to machine learning workloads SageMaker Pipelines has been launched to help you automate different steps of the ML workflow, including data loading, data transformation, training and tuning, and deployment.&lt;/p&gt;

&lt;p&gt;Check out the details here &lt;a href="https://aws.amazon.com/sagemaker/pipelines/" rel="noopener noreferrer"&gt;https://aws.amazon.com/sagemaker/pipelines/&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Whats Next?
&lt;/h1&gt;

&lt;p&gt;It's been a great start to re:Invent I can't wait to see what else they have in store for us.&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>analytics</category>
      <category>database</category>
      <category>aws</category>
    </item>
    <item>
      <title>re:Invent: Data Sessions Week 1</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Sun, 29 Nov 2020 23:14:05 +0000</pubDate>
      <link>https://dev.to/aws-builders/re-invent-data-sessions-week-1-4dka</link>
      <guid>https://dev.to/aws-builders/re-invent-data-sessions-week-1-4dka</guid>
      <description>&lt;p&gt;Here is my list of sessions for week one of re:Invent focussed around data and analytics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How to use fully managed Jupyter notebooks in Amazon SageMaker&lt;/li&gt;
&lt;li&gt;What’s new with Amazon S3&lt;/li&gt;
&lt;li&gt;How BMW Group uses AWS serverless analytics for a data-driven ecosystem&lt;/li&gt;
&lt;li&gt;Innovate faster with applications on AWS storage&lt;/li&gt;
&lt;li&gt;Embed analytics in your applications with Amazon QuickSight&lt;/li&gt;
&lt;li&gt;Discovering insights from customer surveys at McDonald’s&lt;/li&gt;
&lt;li&gt;What’s new in Amazon ElastiCache&lt;/li&gt;
&lt;li&gt;How FINRA operates PB-scale analytics on data lakes with Amazon Athena&lt;/li&gt;
&lt;li&gt;How Zynga modernized mobile analytics with Amazon Redshift RA3&lt;/li&gt;
&lt;li&gt;Implementing MLOps practices with Amazon SageMaker&lt;/li&gt;
&lt;li&gt;Break down data silos: Build a serverless data lake on Amazon S3&lt;/li&gt;
&lt;li&gt;Amazon DocumentDB (with MongoDB compatibility) Deep Dive&lt;/li&gt;
&lt;li&gt;Gameloft: A zero downtime data lake migration deep dive&lt;/li&gt;
&lt;li&gt;Nationwide’s journey to a governed data lake on AW&lt;/li&gt;
&lt;li&gt;BI at hyperscale: Quickly build and scale dashboards with Amazon QuickSight&lt;/li&gt;
&lt;li&gt;Building for the future with AWS databases&lt;/li&gt;
&lt;li&gt;How the NFL builds computer vision training datasets at scale&lt;/li&gt;
&lt;li&gt;What’s new in Amazon RDS&lt;/li&gt;
&lt;li&gt;Serverless analytics at Equinox Media: Handling growth during disruption&lt;/li&gt;
&lt;li&gt;Data modeling with Amazon DynamoDB – Part 1&lt;/li&gt;
&lt;li&gt;Dive deep into AWS Schema Conversion Tool and AWS DMS&lt;/li&gt;
&lt;li&gt;How Vyaire uses AWS analytics to scale ventilator production&lt;/li&gt;
&lt;li&gt;From POC to production: Strategies for achieving machine learning at scale&lt;/li&gt;
&lt;li&gt;The right tool for the job: Enabling analytics at scale at Intuit&lt;/li&gt;
&lt;li&gt;Secure and compliant machine learning for regulated industries&lt;/li&gt;
&lt;li&gt;How Disney+ uses fast data ubiquity to improve the customer experience&lt;/li&gt;
&lt;li&gt;Deep dive on Amazon Aurora with PostgreSQL compatibility&lt;/li&gt;
&lt;li&gt;Train and tune ML models to the highest accuracy using Amazon SageMaker&lt;/li&gt;
&lt;li&gt;How Nielsen built a multi-petabyte data platform using Amazon EMR&lt;/li&gt;
&lt;li&gt;How Goldman Sachs uses an Amazon MSK backbone for its Transaction Banking Platform&lt;/li&gt;
&lt;li&gt;Data modeling with Amazon DynamoDB – Part 2&lt;/li&gt;
&lt;li&gt;How Disney+ scales globally on Amazon DynamoDB&lt;/li&gt;
&lt;li&gt;Productionizing R workloads using Amazon SageMaker, featuring Siemens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a full list of sessions and to register visit &lt;a href="https://reinvent.awsevents.com" rel="noopener noreferrer"&gt;https://reinvent.awsevents.com&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>datascience</category>
      <category>analytics</category>
      <category>database</category>
    </item>
    <item>
      <title>Unify data silos with AWS AppSync</title>
      <dc:creator>Matt Houghton</dc:creator>
      <pubDate>Tue, 24 Nov 2020 13:14:52 +0000</pubDate>
      <link>https://dev.to/aws-builders/unify-data-silos-with-aws-appsync-2gkl</link>
      <guid>https://dev.to/aws-builders/unify-data-silos-with-aws-appsync-2gkl</guid>
      <description>&lt;h1&gt;
  
  
  Silos
&lt;/h1&gt;

&lt;p&gt;Most organisations that process data will have experienced the concept of data in silos. This is where an application is built for a particular purpose and tied to a data store. While this may solve a particular business problem, as time passes developers and engineers may start to spend time extracting data from these silos for other purposes such as analytics and machine learning.&lt;/p&gt;

&lt;p&gt;If you are lucky your teams might have provided API's to access the data, but what if that API is missing two key fields that you need or returns too much data?&lt;/p&gt;

&lt;p&gt;For older software that is using a relational database for its data store its more likely that the software is using SQL with JDBC/ODBC and you might not have an API available.&lt;/p&gt;

&lt;p&gt;Pulling disparate datasets together to present them for new projects can be time consuming. Engineers also have to deal with application modernisation projects such breaking up monoliths as part of a cloud migration. Keeping the lights on whilst providing a path to making your architecture cloud friendly is a delicate balancing act.&lt;/p&gt;

&lt;p&gt;This post looks into GraphQL, specifically the AWS implementation via AppSync and how it can be used to help:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provide a flexible API for developers&lt;/li&gt;
&lt;li&gt;Join data from silos together&lt;/li&gt;
&lt;li&gt;Provide a migration path for application modernisation by moving some data into DynamoDB while keeping some in a RDBMS.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's GraphQL?
&lt;/h2&gt;

&lt;p&gt;Organizations choose to build APIs with &lt;a href="https://graphql.org" rel="noopener noreferrer"&gt;GraphQL&lt;/a&gt; because it gives developers the ability to query multiple databases, microservices, and APIs with a single GraphQL endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's AppSync?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/appsync/" rel="noopener noreferrer"&gt;AWS AppSync&lt;/a&gt; is a fully managed service that makes it easy to develop GraphQL APIs. Out of the box it allows connections to data sources like AWS DynamoDB, Lambda, and more.&lt;/p&gt;

&lt;h1&gt;
  
  
  Data Sources
&lt;/h1&gt;

&lt;p&gt;In this example we will provide a unified API that is able to query data from the following data stores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DynamoDB - Representing a fairly new cloud native application.&lt;/li&gt;
&lt;li&gt;RDS - Representing a traditional 3 tier app that has been migrated to the cloud.&lt;/li&gt;
&lt;li&gt;Lambda - Representing a serverless application.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Throughout we'll use dummy/test vehicle data that we want to bring together.&lt;/p&gt;

&lt;h2&gt;
  
  
  DynamoDB
&lt;/h2&gt;

&lt;p&gt;Create a table named vehicle. The key is vehicle_id (string).&lt;/p&gt;

&lt;p&gt;Add some test data by adding a couple of items.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpwv5x8imozex2xishyro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpwv5x8imozex2xishyro.png" alt="Alt Text" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Lambda
&lt;/h2&gt;

&lt;p&gt;We now create a quick Lambda that will mock returning some data for a vehicle_id.&lt;/p&gt;

&lt;p&gt;The lambda code is shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import json

print('Loading function')

def lambda_handler(event, context):
    print (json.dumps(event))
    print (context)

    vehicle_id={}
    vehicle_id=event['source']['vehicle_id']
    print(vehicle_id)

    vehicles = {
        "123456" : { "vehicle_id" : "123456", "fuel" : "electric", "category": "SUV" },
        "987654321" : { "vehicle_id" : "987654321", "fuel": "hybrid", "category": "Saloon"}
    }

    print(vehicles[vehicle_id])
    return (vehicles[vehicle_id])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  RDS (Aurora PostgreSQL)
&lt;/h2&gt;

&lt;p&gt;Out of the box AppSync supports Aurora Serverles RDS instances. Create an RDS Aurora PostgreSQL instance named vehicle-accident.&lt;/p&gt;

&lt;p&gt;It's important to enable the Data API feature which is a connectionless Web Service API for running SQL queries against the database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fi8sw5kjawa3h7y5h59rt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fi8sw5kjawa3h7y5h59rt.png" alt="Alt Text" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once the instance has been created, connect to it using the RDS query editor and run the following SQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create table accident (
vehicle_id varchar,
accident_date date,
damage varchar,
cost integer);

insert into accident values (123456, '2020-11-23 18:00:00', 'windscreen smashed', 100);
insert into accident values (987654321, '2020-11-24 18:00:00', 'dent in front passenger door', 600);
commit;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order for AppSync to connect to RDS later we need to store database credentials in AWS Secrets Manager.&lt;/p&gt;

&lt;p&gt;Create a file names creds.json with the database credentials in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "username": "xxxxxxxxxxxxxx",
    "password": "xxxxxxxxxxxxxx"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the credentials using the AWS CLI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws secretsmanager create-secret --name HttpRDSSecret --secret-string file://creds.json --region eu-west-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make a note of the ARN returned as this is needed later.&lt;/p&gt;

&lt;h1&gt;
  
  
  Create The GraphQL API
&lt;/h1&gt;

&lt;p&gt;From the AppSync console select build from scratch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F14vwjmgbxjeasafiah65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F14vwjmgbxjeasafiah65.png" alt="Alt Text" width="800" height="191"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Give your API a name.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fhs8em8f9o6dsju2verd2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fhs8em8f9o6dsju2verd2.png" alt="Alt Text" width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Schema
&lt;/h2&gt;

&lt;p&gt;Click edit schema.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9hvz8xz2wof3o9ll45m2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F9hvz8xz2wof3o9ll45m2.png" alt="Alt Text" width="800" height="182"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Add the following schema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;type Query {
    #Get a single vehicle.
    singleVehicle(vehicle_id: String): Vehicle
}

type Vehicle {
    vehicle_id: String
    model: String
    year: String
    colour: String
    make: String
    fuel: String
    category: String
    accident_date: String
    accident_damage: String
    accident_cost: String
}

schema {
    query: Query
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Data Sources
&lt;/h2&gt;

&lt;p&gt;Next we define the three data sources. DynamoDB, RDS and Lambda. Click Data Sources and add them one by one.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxqavleww0bq9srm37f9t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxqavleww0bq9srm37f9t.png" alt="Alt Text" width="800" height="898"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwf4bdmaxblu2fbrwqh34.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwf4bdmaxblu2fbrwqh34.png" alt="Alt Text" width="800" height="596"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fu6dnkxroxnxu0gk7dk2p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fu6dnkxroxnxu0gk7dk2p.png" alt="Alt Text" width="800" height="870"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Resolvers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  DynamoDB
&lt;/h3&gt;

&lt;p&gt;Back on the Schema screen select Attach for the resolver of "singleVehicle(...): Vehicle"&lt;/p&gt;

&lt;p&gt;Select vehicle_ddb as the data source and add the following for the request mapping temple.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "version": "2017-02-28",
    "operation": "GetItem",
    "key": {
        "vehicle_id": $util.dynamodb.toDynamoDBJson($ctx.args.vehicle_id),
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the following for the response template.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Pass back the result from DynamoDB. **
$util.toJson($ctx.result)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxyutwcnae6o7edkrap59.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fxyutwcnae6o7edkrap59.png" alt="Alt Text" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At this point the data for some of the defined schema will be able to be queried. You can check this on the query screen of AppSync.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fn0ahhbixrrk6csbvyc72.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fn0ahhbixrrk6csbvyc72.png" alt="Alt Text" width="800" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Lambda
&lt;/h3&gt;

&lt;p&gt;On the schema definition screen scroll down to the fuel field and click attach.&lt;/p&gt;

&lt;p&gt;Select the lambda function created earlier and enable the response mapping template with and add the following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$util.toJson($context.result.get("fuel"))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzlec3xhmgbkarehtbz44.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzlec3xhmgbkarehtbz44.png" alt="Alt Text" width="800" height="658"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repeat these steps for the category field. The response mapping template should be defined as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$util.toJson($context.result.get("category"))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbep94gmu98px2v0p7mby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fbep94gmu98px2v0p7mby.png" alt="Alt Text" width="800" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  RDS
&lt;/h3&gt;

&lt;p&gt;On the schema definition screen scroll down to the accident_date field and click attach.&lt;/p&gt;

&lt;p&gt;Select the RDS database created earlier. Configure the request mapping template as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "version": "2018-05-29",
    "statements": [
            $util.toJson("select accident_date from accident WHERE vehicle_id = '$ctx.source.vehicle_id'")
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Specify the response mapping template as below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#if($ctx.error)
    $util.error($ctx.error.message, $ctx.error.type)
#end
#set($output = $utils.rds.toJsonObject($ctx.result)[0])
## Make sure to handle instances where fields are null
## or don't exist according to your business logic
#foreach( $item in $output )
    #set($accident_date = $item.get('accident_date'))
#end
$util.toJson($accident_date)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjzux8bs04x23qxcambl4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjzux8bs04x23qxcambl4.png" alt="Alt Text" width="800" height="734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repeat these steps for the accident_damage and accident_cost fields. The request and response mapping templates are shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "version": "2018-05-29",
    "statements": [
            $util.toJson("select damage from accident WHERE vehicle_id = '$ctx.source.vehicle_id'")
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#if($ctx.error)
    $util.error($ctx.error.message, $ctx.error.type)
#end
#set($output = $utils.rds.toJsonObject($ctx.result)[0])
## Make sure to handle instances where fields are null
## or don't exist according to your business logic
#foreach( $item in $output )
    #set($damage = $item.get('damage'))
#end
$util.toJson($damage)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "version": "2018-05-29",
    "statements": [
            $util.toJson("select cost from accident WHERE vehicle_id = '$ctx.source.vehicle_id'")
    ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#if($ctx.error)
    $util.error($ctx.error.message, $ctx.error.type)
#end
#set($output = $utils.rds.toJsonObject($ctx.result)[0])
## Make sure to handle instances where fields are null
## or don't exist according to your business logic
#foreach( $item in $output )
    #set($cost = $item.get('cost'))
#end
$util.toJson($cost)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Query
&lt;/h1&gt;

&lt;p&gt;The three data sources are now in place to resolve all the fields for our API. Go back to the query screen and check that the fields all get populated when you run a query.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ftbhg5nx2abhlx7osd94y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ftbhg5nx2abhlx7osd94y.png" alt="Alt Text" width="800" height="471"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw4y9hgmq6i804zoxr9zu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw4y9hgmq6i804zoxr9zu.png" alt="Alt Text" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Tips
&lt;/h1&gt;

&lt;p&gt;Turn on CloudWatch Logs so you can see details of any errors. You can do this under settings.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvtqyd3hrjqxqqcse9nv9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fvtqyd3hrjqxqqcse9nv9.png" alt="Alt Text" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The following webpages were useful to me getting started with this demo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-programming-guide.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/appsync/latest/devguide/resolver-mapping-template-reference-programming-guide.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://adrianhall.github.io/cloud/2019/01/03/early-return-from-graphql-resolvers/" rel="noopener noreferrer"&gt;https://adrianhall.github.io/cloud/2019/01/03/early-return-from-graphql-resolvers/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://stackoverflow.com/questions/58031076/aws-appsync-rds-util-rds-tojsonobject-nested-objects" rel="noopener noreferrer"&gt;https://stackoverflow.com/questions/58031076/aws-appsync-rds-util-rds-tojsonobject-nested-objects&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/xai1983kbu/apollo-server/blob/pulumi_appsync_2/bff_pulumi/graphql/resolvers/Query.message.js" rel="noopener noreferrer"&gt;https://github.com/xai1983kbu/apollo-server/blob/pulumi_appsync_2/bff_pulumi/graphql/resolvers/Query.message.js&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>graphql</category>
      <category>database</category>
      <category>analytics</category>
    </item>
  </channel>
</rss>
