<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: nadyaf</title>
    <description>The latest articles on DEV Community by nadyaf (@nadyaf).</description>
    <link>https://dev.to/nadyaf</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F485170%2F0417e665-86be-4c11-824d-ca3bf8120439.png</url>
      <title>DEV Community: nadyaf</title>
      <link>https://dev.to/nadyaf</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nadyaf"/>
    <language>en</language>
    <item>
      <title>#CloudGuruChallenge – Event-Driven Python on AWS</title>
      <dc:creator>nadyaf</dc:creator>
      <pubDate>Thu, 08 Oct 2020 11:56:20 +0000</pubDate>
      <link>https://dev.to/nadyaf/cloudguruchallenge-event-driven-python-on-aws-7</link>
      <guid>https://dev.to/nadyaf/cloudguruchallenge-event-driven-python-on-aws-7</guid>
      <description>&lt;p&gt;I accidentally came across this CloudGuru challenge while I was browsing their website. I love challenges, and after reading the description I thought I could give it a go for a few reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it would give me a good practical experience with a range of AWS technologies&lt;/li&gt;
&lt;li&gt;it seemed interesting&lt;/li&gt;
&lt;li&gt;it's not too much time consuming&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About the challenge
&lt;/h2&gt;

&lt;p&gt;The challenge was to implement ETL process using AWS cloud which pulls US COVID-19 data daily from 2 different sources. Then, process the data and create a dashboard for it.&lt;/p&gt;

&lt;p&gt;Link with full description is &lt;a href="https://acloudguru.com/blog/engineering/cloudguruchallenge-python-aws-etl"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My approach
&lt;/h2&gt;

&lt;p&gt;I have some experience with AWS and recently started to learn Python, so I had an idea where to start with.&lt;/p&gt;

&lt;p&gt;Here's the architecture diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ey7Qf-Td--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/uex14ykava0vqzj2xtaz.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ey7Qf-Td--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/uex14ykava0vqzj2xtaz.jpg" alt="Project architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ETL
&lt;/h2&gt;

&lt;p&gt;I used Pandas to read the data from csv files into DataFrame. The data is merged, validated and loaded into DynamoDB. Here, I thought about having 3 Lambdas, one for each part of the process, but since the data is small, decided not to do it. &lt;br&gt;
DynamoDB was chosen to keep the project serverless.&lt;br&gt;
After data is in the database, its copy is stored in csv file in S3, and notification with ETL result is sent to SNS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dashboard
&lt;/h2&gt;

&lt;p&gt;I never used QuickSight before, and playing with it was the most fun part of the project. &lt;br&gt;
Here are my visualisations (there's no option to share then publicly in QuickSight unfortunately, so only the screenshots):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xNheOvVE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/xmlom1hrw021tkvadr4l.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xNheOvVE--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/xmlom1hrw021tkvadr4l.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Z3Wv_ud4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ahlcb59uqrq9mlba8v9b.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Z3Wv_ud4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/ahlcb59uqrq9mlba8v9b.JPG" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My learnings
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Lambda.&lt;/strong&gt; When Lambda has external dependencies which are not part of AWS Linux distribution on which it will run, you need to zip them up with your code. I used Windows computer for development, and when I ran install libraries from requirements.txt file, it installed Windows version of them. So, when I deployed it to the cloud, it didn't work. It took me several hours to figure out what was the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CloudFormation.&lt;/strong&gt; I never worked with it before, and happy I had this opportunity to learn it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visualisation.&lt;/strong&gt; When I was planning my solution, I knew that Glue crawler works with S3 and DynamoDB. I also knew that Athena works with Glue catalogs. So I assumed that Athena would be able to query table generated by Glue crawler if I store data in DynamoDB. When I got to this point in implementation, and everything else was working, it turned out that Athena can't query Glue table if it was generated from DynamoDB. To solve this, I decided to save a copy of DynamoDB data in a csv file in S3 and then Glue crawler would create catalog from this file. Considering the small size of data, I think it's appropriate in this case.  Probably not the best solution, and a point to improve in the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Although I've done the all tasks required in the challenge, I think there are things that can be improved/further worked on, e.g. creating CI/CD pipeline.&lt;/p&gt;

&lt;p&gt;I'd like to say big thank you to &lt;a href="https://acloudguru.com/blog/author/forrest-brazeal"&gt;Forrest Brazeal&lt;/a&gt; for setting up this challenge. I've learned a lot from it and looking forward to do more challenges!&lt;/p&gt;

&lt;p&gt;GitHub repo: &lt;a href="https://github.com/nadyaf/cloudguru-092020-challenge"&gt;https://github.com/nadyaf/cloudguru-092020-challenge&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>python</category>
      <category>etl</category>
      <category>cloudguru</category>
    </item>
  </channel>
</rss>
