<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Christian Bonzelet</title>
    <description>The latest articles on DEV Community by Christian Bonzelet (@cremich).</description>
    <link>https://dev.to/cremich</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F175725%2Fc5cd6a51-4b84-4d60-b27b-af7ede3f5a8a.jpeg</url>
      <title>DEV Community: Christian Bonzelet</title>
      <link>https://dev.to/cremich</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cremich"/>
    <language>en</language>
    <item>
      <title>Scaling content delivery while saving costs? Making the most out of Amazon Cloudfront</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Thu, 12 Oct 2023 12:06:30 +0000</pubDate>
      <link>https://dev.to/aws-builders/scaling-content-delivery-while-saving-costs-making-the-most-out-of-amazon-cloudfront-4055</link>
      <guid>https://dev.to/aws-builders/scaling-content-delivery-while-saving-costs-making-the-most-out-of-amazon-cloudfront-4055</guid>
      <description>&lt;p&gt;In the domain of media and entertainment, every byte of data and every millisecond of latency counts. As professionals in this space, we're not just delivering content; we're crafting experiences. And while we strive for excellence in quality, we're also constantly on the lookout for ways to optimize costs. After all, a well-architected cloud solution isn't just about performance and scalability—it's also about financial efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/cloudfront/" rel="noopener noreferrer"&gt;Amazon CloudFront&lt;/a&gt;, AWS's global content delivery network is an essential service for media &amp;amp; entertainment, offering a global reach with its extensive network of edge locations. But with great power comes... well, costs. The good news? There are several strategies and features within CloudFront designed specifically to help you save money. This guide will walk you through the nuances of CloudFront's pricing, from the basics of the Free Tier to the intricacies of custom pricing. So, grab a coffee, settle in, and let's dive into cost savings with Amazon CloudFront.&lt;/p&gt;

&lt;h2&gt;
  
  
  💰 How AWS Charges for CloudFront?
&lt;/h2&gt;

&lt;p&gt;When it comes to cloud services, understanding the pricing model is half the battle. And with Amazon CloudFront, it's no different. At its core, CloudFront's pricing is a reflection of the service's versatility and global reach. But, as with any service, the more you know about its pricing intricacies, the better equipped you are to make cost-effective decisions.&lt;/p&gt;

&lt;p&gt;At a high level, AWS charges for CloudFront based on several factors. The most obvious in terms of data transfer are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Transfer Out Rates&lt;/strong&gt;: This is the cost associated with the amount of data that CloudFront delivers to your viewers. It's important to note that these rates vary depending on the geographic region of your viewers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;HTTP/HTTPS Request Rates&lt;/strong&gt;: Every time a viewer makes a request (be it HTTP or HTTPS), there's a charge. Again, these rates differ based on the viewer's region&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want to dive deeper into the nuances of Cloudfront pricing, please check out the &lt;a href="https://aws.amazon.com/cloudfront/pricing/" rel="noopener noreferrer"&gt;official pricing page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One of the unique aspects of CloudFront's pricing is its variability. The cost isn't just about how much data you're transferring or how many requests you're handling. It's also about where your viewers are located. AWS has divided the world into different regions, and each region has its own set of rates for both data transfer and requests.&lt;/p&gt;

&lt;p&gt;For instance, data transfer out rates for viewers in North America or Europe might differ from rates for viewers in Asia or South America. This regional variability is something to keep in mind, especially if your media content has a global audience.&lt;/p&gt;

&lt;h2&gt;
  
  
  🆓 Diving into the Free Tier
&lt;/h2&gt;

&lt;p&gt;Ah, the Free Tier. It's like the appetizer before the main course, giving you a taste of what's to come without the commitment. For those new to AWS or those wanting to experiment with CloudFront without immediately incurring costs, the Free Tier is a godsend.&lt;/p&gt;

&lt;p&gt;Amazon CloudFront's Free Tier is not just a marketing gimmick; it's an offering that can provide significant value, especially when you're in the initial stages of setting up or testing your media delivery.&lt;/p&gt;

&lt;p&gt;Here's what you get with the Free Tier:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;1TB &lt;strong&gt;Data Transfer Out Each Month&lt;/strong&gt;: This is a generous amount, especially for small to medium-sized projects or for those in the testing phase. It allows you to deliver content to your viewers without incurring any costs for the first 1TB each month.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;10 &lt;strong&gt;Million HTTP/HTTPS Requests Each Month&lt;/strong&gt;: Again, this is a substantial number. For many websites or applications in their early stages, this can cover a significant portion, if not all, of their monthly traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2,000,000 CloudFront Function invocations per month&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Free SSL certificates&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's essential to note that the Free Tier benefits last for 12 months from the time you create your AWS account. After this period, standard CloudFront charges apply.&lt;/p&gt;

&lt;p&gt;The Free Tier is not just about saving money (though that's a big part of it). It's also about learning, experimenting, and iterating. It provides a risk-free environment to test your media delivery, understand CloudFront's features, and optimize your setup before scaling up.&lt;/p&gt;

&lt;p&gt;Moreover, for startups or individual content creators in the media &amp;amp; entertainment space, every penny counts. The Free Tier can be a financial relief, allowing you to allocate resources to other critical areas while still delivering a top-notch viewer experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  🌐 Price Classes: Tailoring Your Costs
&lt;/h2&gt;

&lt;p&gt;In the landscape of media delivery, not all regions are created equal. Some areas might be more expensive to deliver content to, while others might be more cost-effective. CloudFront's Price Classes are designed to give you control over where your content is delivered from, allowing you to strike a balance between cost and performance.&lt;/p&gt;

&lt;p&gt;At its core, &lt;a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html" rel="noopener noreferrer"&gt;Price Classes&lt;/a&gt; allow you to specify which of CloudFront's global edge locations you want your content to be served from. By default, CloudFront aims to minimize latency by delivering content from its entire global network. However, this might mean you're paying more to deliver content to certain regions where AWS's costs are higher.&lt;/p&gt;

&lt;p&gt;Here's a breakdown:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Price Class All (Default)&lt;/strong&gt;: This uses all of CloudFront's global edge locations, ensuring the lowest latency but potentially higher costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Price Class 200&lt;/strong&gt;: This excludes regions of South America, Austria, and New Zealand, offering a balance between cost and performance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Price Class 100&lt;/strong&gt;: This further narrows down the edge locations to only North America, Europe, and Israel, focusing on a cost-effective delivery but potentially higher latency for some users.   &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0ebvt9jp3f6ytozdycn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0ebvt9jp3f6ytozdycn.png" alt="Amazon Cloudfront Price classes" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Choosing the right Price Class is a strategic decision. Here are some considerations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Audience Geography&lt;/strong&gt;: Where is the bulk of your audience located? If most of your viewers are in regions covered by Price Class 100 or 200, then opting for one of these might make sense.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quality of Experience&lt;/strong&gt;: Are you willing to compromise slightly on latency for certain users to save on costs? If yes, then a more restrictive Price Class might be the way to go.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Budget Constraints&lt;/strong&gt;: If you're working with a tight budget, especially in the early stages of a project, opting for a more cost-effective Price Class can be a smart move.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Imagine you're delivering high-definition video content primarily to viewers in North America and Europe. By choosing &lt;strong&gt;Price Class 100&lt;/strong&gt;, you can ensure optimal delivery to these regions while saving on costs by excluding more expensive regions. However, if you have a growing viewer base in Asia, you might need to evaluate if the cost savings outweigh the potential increase in latency for these users.&lt;/p&gt;

&lt;h2&gt;
  
  
  🛡️ Saving Bundles: More Than Just Cost-Saving
&lt;/h2&gt;

&lt;p&gt;In the quest to optimize costs, CloudFront's &lt;a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/savings-bundle.html" rel="noopener noreferrer"&gt;Saving Bundles&lt;/a&gt; emerge as a compelling option. But it's not just about the savings; it's about enhancing security while keeping costs in check.&lt;/p&gt;

&lt;p&gt;The CloudFront security savings bundle is a blend of cost-saving and enhanced security. When you opt for this bundle, you're not just committing to a consistent monthly amount; you're also getting credits for AWS WAF, a web application firewall that fortifies your CloudFront distribution against common web threats.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby4d3w33zs3gc9qjr9zf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby4d3w33zs3gc9qjr9zf.png" alt="Amazon Cloudfront Saving Bundles" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a breakdown:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Commitment&lt;/strong&gt;: By purchasing a savings bundle, you agree to a fixed monthly amount for CloudFront for one year. This commitment ensures you have a predictable budget, and in return, you get credits that offset your CloudFront charges.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Up to 30% Discount&lt;/strong&gt;: The value of these credits can result in up to a 30% discount on CloudFront's standard pricing. It's like getting premium service at a discounted rate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS WAF Credits&lt;/strong&gt;: In addition to CloudFront credits, you receive credits for AWS WAF. This can offset up to 10% of the monthly CloudFront commitment, providing an added layer of security without additional costs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's say your typical CloudFront charges amount to $600 per month. By committing to $420 each month for a year (a 30% reduction), CloudFront provides you with $600 worth of credits monthly. In essence, you're paying $420 for services worth $600. Plus, you get an additional $42 in AWS WAF credits. Over a year, this can lead to substantial savings.&lt;/p&gt;

&lt;p&gt;Key points you have to consider using saving bundles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Credits Apply Account-Wide&lt;/strong&gt;: These credits aren't restricted to a specific distribution. They apply across all CloudFront usage in your AWS account.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Credits Cover All CloudFront Usage&lt;/strong&gt;: Whether it's data transfer charges, request charges, or Lambda@Edge charges, the credits offset all types of CloudFront usage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unused Credits&lt;/strong&gt;: Remember, credits are use-it-or-lose-it. If you don't utilize all the credits in a billing period, they don't roll over to the next.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Exceeding Credit Amount&lt;/strong&gt;: If your usage surpasses the available credits, you'll be billed the difference at standard rates.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  💼 Custom Pricing for the Heavy Hitters
&lt;/h2&gt;

&lt;p&gt;In media &amp;amp; entertainment, scale is often the name of the game. As your content reaches a broader audience and your traffic surges, standard pricing models might not always be the most economical. That's where Custom Pricing steps in, offering tailored solutions for those with substantial data transfer needs.&lt;/p&gt;

&lt;p&gt;Custom Pricing isn't for everyone. It's designed for users who are ready to commit to a minimum of 10 TB of data transfer per month for at least 12 months. If you fit this bill, here's what's in store:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tailored Discounts&lt;/strong&gt;: The discounts vary based on the volume of your commitment. The more you commit, the better the rates you can secure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Organization-Wide Application&lt;/strong&gt;: If you're managing multiple AWS accounts within an organization, the custom pricing applies across the board. This ensures consistent savings, irrespective of which account is handling the traffic.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Why you should consider custom pricing?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Predictability&lt;/strong&gt;: With custom pricing, you're entering a commitment. This means you have a clear forecast of your costs, allowing for better budgeting and financial planning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Economies of Scale&lt;/strong&gt;: As your traffic grows, the per-unit cost of delivery can decrease with custom pricing, ensuring that your success doesn't lead to disproportionate costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Negotiation Power&lt;/strong&gt;: Custom pricing discussions with AWS give you a platform to negotiate terms based on your specific needs and projected growth.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you believe Custom Pricing is right for you, the first step is to reach out to AWS. The process involves discussions to understand your requirements, after which AWS provides a tailored pricing proposal. Remember, the emphasis here is on partnership. AWS understands the challenges of delivering high-quality media content at scale and is often willing to work closely with users to find the best pricing solution.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 Conclusion
&lt;/h2&gt;

&lt;p&gt;Navigating AWS pricing can sometimes feel like charting a course through uncharted waters. But with the right knowledge and tools at your disposal, you can ensure smooth sailing. Amazon CloudFront, with its cost-saving options, offers media &amp;amp; entertainment professionals to deliver top-notch content without breaking the bank.&lt;/p&gt;

&lt;p&gt;From the Free Tier's generous offerings for newcomers to the tailored solutions of Custom Pricing for the big players, there's a cost-saving strategy for everyone. With features like Price Classes and Saving Bundles, you have the flexibility to fine-tune your expenses based on your specific needs and audience demographics.&lt;/p&gt;

&lt;p&gt;But remember, while cost-saving is essential, it's just one piece of the puzzle. The ultimate goal is to deliver exceptional experiences to your audience, and CloudFront provides the tools to achieve that without compromising on quality.&lt;/p&gt;

&lt;p&gt;As you continue your journey in the cloud, always keep an eye out for new features and pricing options. AWS is continually evolving, and there might be new opportunities around the corner to optimize both performance and costs. And for those intricate details or when in doubt, the &lt;a href="https://aws.amazon.com/documentation/cloudfront/" rel="noopener noreferrer"&gt;official AWS documentation&lt;/a&gt; is an invaluable resource.&lt;/p&gt;

&lt;p&gt;Here's to building in the cloud, crafting exceptional viewer experiences, and making every penny count!&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloudfront</category>
      <category>cost</category>
    </item>
    <item>
      <title>AWS re:Invent uncovered - TOP TIPS FOR FIRST-TIME ATTENDEES</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Fri, 22 Sep 2023 08:51:50 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-reinvent-uncovered-top-tips-for-first-time-attendees-5bic</link>
      <guid>https://dev.to/aws-builders/aws-reinvent-uncovered-top-tips-for-first-time-attendees-5bic</guid>
      <description>&lt;p&gt;Ah, AWS re:Invent! It's not just another tech conference; it's &lt;strong&gt;THE&lt;/strong&gt; tech conference. A sprawling, bustling hub of innovation, learning, and networking, re:Invent is where the AWS world converges every year. If this is your maiden voyage into the vast sea of re:Invent, you might be feeling a mix of excitement and, let's admit it, a tad bit of overwhelm. But fear not! Just as every seasoned traveler once took their first step, every re:Invent pro was once in your shoes (hopefully, comfortable ones, but we'll get to that).&lt;/p&gt;

&lt;p&gt;The sheer scale and depth of re:Invent can be daunting. With countless sessions, expos, and networking events spread across multiple venues, it's easy to feel like a kid in a candy store, eyes wide, not knowing where to start. But here's the good news: with a bit of preparation and some insider tips, you can navigate this colossal event like a pro, soaking in the knowledge, making meaningful connections, and yes, snagging some cool swag along the way.&lt;/p&gt;

&lt;p&gt;So let's uncover the secrets of AWS re:Invent together and set you on a path to make the most of this incredible experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  🗓️ Planning is everything
&lt;/h2&gt;

&lt;p&gt;The early bird doesn't just get the worm; it gets the best sessions, the optimal seats, and a smoother overall experience. Think of re:Invent as a vast amusement park. Without a map and a plan, you might end up wandering aimlessly, missing out on the best rides.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start Early with the Session Catalog:&lt;/strong&gt; Weeks before the event kicks off, AWS releases a detailed session catalog. This is your treasure map. Dive into it, explore the myriad sessions, and mark your favorites. Whether you're into deep technical dives, visionary keynotes, or hands-on workshops, there's something for everyone. But remember, the most sought-after sessions fill up fast. So, once the session reservation window opens, be swift to secure your spot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Walk-In Strategy:&lt;/strong&gt; Missed out on reserving a seat for that session you were eyeing? Don't fret. Every session has walk-in lines. But here's the catch: they can get long, especially for popular sessions. If you're determined not to miss out, ensure you allocate enough time to queue up. Arriving early can make the difference between being part of the action inside or hearing about it later.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;👟 The Importance of Comfort&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Imagine this: It's day three of re:Invent. You've attended back-to-back sessions, explored the expo, and networked like a champ. But there's a nagging pain in your feet, and every step feels like a marathon. Don't let this be you!&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Choose Your Footwear Wisely:&lt;/strong&gt; AWS re:Invent is colossal. And when I say colossal, I mean it. The venues are expansive, and the events are spread out. You'll be walking—a lot. While those stylish shoes might look fantastic, they might not be your best friend by the end of the day. Opt for comfort over style. Trust me, your feet will thank you.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  🏨 Hop-on/Hop-off is good for sightseeing not for re:Invent
&lt;/h2&gt;

&lt;p&gt;Las Vegas is a city of grandeur, and the venues for re:Invent are no exception. They're vast, they're opulent, and they're... well, quite far from each other. Navigating between them can be a bit of a trek, especially if you're hopping from one hotel to another for different sessions.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Avoid the Hotel Hop:&lt;/strong&gt; While it might be tempting to jump between hotels to catch various sessions, it's a time-consuming endeavor. The distances between hotels can be deceptive, and even with the shuttle services, you might find yourself spending more time in transit than you'd like. Check the session catalog: many popular sessions are repeated at different venues throughout the week. If you can, try to cluster your sessions by location each day. It'll save you time and energy.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Shuttle Services Are Your Friend:&lt;/strong&gt; If you do need to switch venues, make the most of the shuttle services provided. They're efficient, regular, and a great way to move between locations without the hassle of navigating Vegas traffic.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;⏰ Keep Your Schedule Flexible&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;While it's essential to plan, it's equally crucial to leave some room for spontaneity. AWS re:Invent isn't just about the sessions; it's about the community, the unexpected conversations, and those serendipitous moments that can spark new ideas or friendships. Seasoned attendees call this “the magic of the hallway track”.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Embrace Community and Networking Events:&lt;/strong&gt; Beyond the official sessions, re:Invent is teeming with community-led events, meetups, and networking opportunities. These are goldmines for making connections, sharing experiences, and even having a bit of fun. Keep an eye on the re:Invent page and session catalog for these events. Whether it's a casual meetup at a local bar or a more formal networking dinner, these events can be some of the most rewarding parts of your re:Invent experience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Take a Breath:&lt;/strong&gt; It's easy to get caught up in the whirlwind of back-to-back sessions and events. But remember, it's okay to take a step back. Schedule some downtime. Whether it's a leisurely coffee break, a stroll around the venue, or just some quiet time to process what you've learned, these moments can be incredibly refreshing and give you the energy to dive back in with renewed vigor.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  🎤 Decipher Session Types
&lt;/h2&gt;

&lt;p&gt;AWS re:Invent is a smorgasbord of learning opportunities, and not all sessions are created equal. Understanding the different session types can help you tailor your experience to your learning style and objectives.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Beyond the Breakout:&lt;/strong&gt; While breakout sessions are informative, they're often recorded and available for viewing post-event on platforms like YouTube. If you're looking for a more interactive experience, consider other formats.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Chalk Talks &amp;amp; Builder Sessions:&lt;/strong&gt; These are smaller, more intimate settings where AWS experts dive deep into specific topics. The beauty of these sessions? They're interactive. You can ask questions, engage in discussions, and get feedback on your specific challenges.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Code Talks - The New Kid on the Block:&lt;/strong&gt; Making its debut this year, Code Talks promises to be an exciting addition. Tailored for developers, these sessions are all about diving deep into code, exploring best practices, and getting hands-on with AWS services.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Remember, the goal is to maximize your learning. Choose sessions that align with your interests, offer interactive opportunities, and provide value beyond the event itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  📊 Pick The Right Session Levels
&lt;/h2&gt;

&lt;p&gt;AWS re:Invent caters to a diverse audience, from cloud novices to seasoned experts. The sessions are categorized into different levels to help attendees choose the right fit for their expertise and interests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Know Your Levels:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Level 100:&lt;/strong&gt; These are introductory sessions, providing a broad overview of a topic. Ideal for those new to AWS or a specific service.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Level 200:&lt;/strong&gt; Intermediate sessions that delve a bit deeper, offering a more detailed look at specific AWS services or solutions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Level 300:&lt;/strong&gt; Advanced sessions, perfect for those with a good grasp of AWS. They dive deep into specific topics, often involving complex architectures and solutions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Level 400:&lt;/strong&gt; The deep dive. These are for the pros, covering intricate details, best practices, and advanced architectures.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Tailor Your Experience:&lt;/strong&gt; If you're like me, with multiple production workloads already humming on AWS, you might find levels 100 and 200 less enlightening. They're great for building foundational knowledge, but if you're looking for advanced insights, aim for levels 300 and 400. However, if a topic is entirely new to you, don't shy away from starting at a lower level to build a solid understanding.&lt;/p&gt;

&lt;p&gt;The key is to strike a balance. Mix and match session levels based on your familiarity with the topics and where you want to deepen your knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚖️ The Art of Balancing Time
&lt;/h2&gt;

&lt;p&gt;AWS re:Invent is not just a conference; it's an experience. And like any grand experience, it's essential to find a rhythm that allows you to soak in the knowledge, connect with peers, and also take moments for yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pause and Reflect:&lt;/strong&gt; With so much information coming your way, it's easy to feel overwhelmed. Schedule pockets of time to pause, reflect, and process what you've learned. Whether it's jotting down notes, discussing with peers, or simply taking a quiet moment to think, these breaks can enhance your understanding and retention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Listen to Your Body:&lt;/strong&gt; It's a marathon, not a sprint. While the excitement can keep you going, it's essential to listen to your body. If you're feeling drained, it's okay to skip a session or take a longer break. Remember, the goal is to leave re:Invent enriched, not exhausted.&lt;/p&gt;

&lt;h2&gt;
  
  
  🥤 Nutrition and Hydration
&lt;/h2&gt;

&lt;p&gt;Amidst the whirlwind of sessions, keynotes, and networking, it's easy to overlook the basics: eating well and staying hydrated. But remember, re:Invent is a marathon, and you'll need to fuel your body and mind to keep going.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stay Energized:&lt;/strong&gt; With the adrenaline rush of the event, you might be tempted to skip meals or grab quick, less-nutritious options. But to stay sharp and attentive, it's crucial to nourish yourself with balanced meals. Thankfully, re:Invent offers a variety of food options. Whether you're grabbing a lunch box from the catering or exploring local eateries, prioritize meals that give you sustained energy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hydration is Key:&lt;/strong&gt; Las Vegas can be dry, and with all the walking and talking, you'll need to stay hydrated. Don't rely solely on caffeine (tempting, I know). Make use of the numerous water dispensers scattered throughout the venue. Pro tip: During your re:Invent check-in, you'll receive a refillable bottle. Keep it handy and refill it regularly. It's an eco-friendly way to ensure you're always hydrated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Take Breaks:&lt;/strong&gt; Amidst sessions, find spots to relax and enjoy a snack or a drink. One of my favorite places is outside the Caesars Forum. It's a great spot to catch some fresh air, bask in the sun, and recharge before diving back into the action.&lt;/p&gt;

&lt;h2&gt;
  
  
  🎪 Explore the Expo
&lt;/h2&gt;

&lt;p&gt;The expo at re:Invent is a microcosm of the cloud computing universe. With a plethora of vendors showcasing cutting-edge products, solutions, and innovations, it's a must-visit for every attendee.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dive into the AWS Ecosystem:&lt;/strong&gt; The expo is not just about flashy booths and swag (though there's plenty of that!). It's a chance to dive deep into the AWS ecosystem, explore new tools, and discover solutions that can elevate your cloud game.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Engage with AWS Experts:&lt;/strong&gt; The AWS Village is a gem within the expo. Here, you'll find dedicated booths for almost every AWS service or group of services. It's a golden opportunity to engage with AWS experts, ask questions, clarify doubts, and even provide feedback or feature requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connect with the Community:&lt;/strong&gt; Don't miss out on the AWS Heroes Lounge and the AWS Community Lounge. These spaces are buzzing with energy, offering a chance to interact with AWS Heroes, join DevChat sessions, and immerse yourself in the vibrant AWS community.&lt;/p&gt;

&lt;h2&gt;
  
  
  🎁 The Allure of re:Invent Swag
&lt;/h2&gt;

&lt;p&gt;Ah, swag! It's one of those delightful perks of attending tech conferences, and re:Invent takes it to a whole new level. From quirky t-shirts to innovative gadgets, there's a treasure trove of goodies waiting for you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Be sustainable:&lt;/strong&gt; While it's tempting to grab everything in sight, be selective. Think about what you'll genuinely use or cherish. Some attendees even bring an extra bag just for swag, but remember, quality over quantity. Be sustainable!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eco-Friendly Choices:&lt;/strong&gt; Many vendors are now offering sustainable swag options. Whether it's a reusable water bottle, eco-friendly tote bags, or bamboo tech accessories, make choices that are kind to our planet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Share the Love:&lt;/strong&gt; If you end up with swag that's not quite your style or duplicates, consider sharing with colleagues, friends, or even donating. It's a great way to spread the re:Invent spirit beyond the event.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 Final words
&lt;/h2&gt;

&lt;p&gt;As we draw this guide to a close, I hope these tips offer you a clearer path to navigate the vast landscape of AWS re:Invent. Remember, while the sessions, keynotes, and workshops are invaluable, it's the connections you make, the conversations you have, and the experiences you gather that truly define your re:Invent journey.&lt;/p&gt;

&lt;p&gt;If you haven't already, now's the time to take the plunge. Register for AWS re:Invent, mark your calendar, and gear up for an unforgettable cloud adventure. Whether you're there to deepen your technical knowledge, network with like-minded professionals, or simply soak in the vibrant atmosphere, re:Invent promises a transformative experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to dive into the world's biggest tech conference?&lt;/strong&gt; &lt;a href="https://reinvent.awsevents.com/register/" rel="noopener noreferrer"&gt;&lt;strong&gt;Register for AWS re:Invent now&lt;/strong&gt;&lt;/a&gt; and set yourself up for a week of learning, networking, and inspiration.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>conference</category>
      <category>beginners</category>
      <category>reinvent</category>
    </item>
    <item>
      <title>Engaging football fans with mobile push notifications</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Wed, 04 Jan 2023 09:30:00 +0000</pubDate>
      <link>https://dev.to/aws-builders/engaging-football-fans-with-mobile-push-notifications-ceo</link>
      <guid>https://dev.to/aws-builders/engaging-football-fans-with-mobile-push-notifications-ceo</guid>
      <description>&lt;p&gt;One of the key capabilities of (mobile) sports applications is, to inform fans about what is happening during the game. This includes for example notifications about important events like kick-offs, goals or cards. &lt;strong&gt;The majority of mobile sports applications use push notifications as the primary channel to keep their fans up to date&lt;/strong&gt;. Depending on the application platforms, notifications are sent directly to the devices with either the Apple Push notification service (APNS) or the Google Cloud Messaging (GCM) service.&lt;/p&gt;

&lt;p&gt;In AWS you have multiple options to send native push notifications to your mobile applications - Amazon SNS or Amazon Pinpoint. But which one should you use? In this article, I will describe &lt;strong&gt;why I prefer to use Amazon Pinpoint&lt;/strong&gt; to solve common &lt;strong&gt;marketing and engagement challenges&lt;/strong&gt; and how this service compares to Amazon SNS.&lt;/p&gt;

&lt;h2&gt;
  
  
  👀 Fan engagement has multiple perspectives
&lt;/h2&gt;

&lt;p&gt;When we talk about fan engagement and all the engagement-related challenges of marketers these days, we quickly come to the point that solving these challenges goes beyond just sending a notification or message.&lt;/p&gt;

&lt;p&gt;Companies that are able to segment their audience put themselves in the position to create and communicate specific targeted marketing messages that align with the interests and emotions of specific customer groups. According to the &lt;a href="https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/the-value-of-getting-personalization-right-or-wrong-is-multiplying" rel="noopener noreferrer"&gt;"Next in Personalization 2021"&lt;/a&gt; report, &lt;strong&gt;72% of consumers expect brands to demonstrate they know them on a personal level.&lt;/strong&gt; &lt;a href="https://www.forbes.com/sites/forbesagencycouncil/2021/03/09/why-trust-defines-success-in-customer-engagement/?sh=6c09c02b1ec2" rel="noopener noreferrer"&gt;Trust defines customer engagement.&lt;/a&gt; Personalization is not just a recommendation engine. Personalization is relevant for a variety of domains. &lt;strong&gt;It is a commitment to streamlining your activities according to many customer demands.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And the future of fan engagement is omnichannel. Targeting the right fans, at the right time using the right - often - multiple channels. This requires as always the right people, the right processes and the right technology.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Investments in omnichannel are improving, but still have a long way to go. [...]&lt;/p&gt;

&lt;p&gt;Only 35% of companies feel, they are sucessfully achieving omnichannel personalization, up from 24% in 2021.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://segment.com/state-of-personalization-report/" rel="noopener noreferrer"&gt;https://segment.com/state-of-personalization-report/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Leveraging notification channels for sports applications can be seen from two perspectives. Those perspectives each come with different technical, functional and non-functional requirements. They also put different kinds of KPIs in the focus of decision-making and success evaluation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ho99klk9lhzq3syq3w6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ho99klk9lhzq3syq3w6.png" alt="Flow of notifications - Match schedule as the main driver of engagement" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚽️ The first perspective: the matchday
&lt;/h3&gt;

&lt;p&gt;Sports in general and football matches in particular can write awesome stories. And it is not surprising that the things that happen on the pitch are usually the main driver of engagement. There is not much to add if your favorite club strikes a leading goal in your local derby match. Or if your favorite player gets substituted in and takes your club to finish a match with a draw or even a win after being behind the whole match. &lt;strong&gt;In this case, you might not necessarily look at the opening rates of our match-event-focused push notifications.&lt;/strong&gt; Those notifications can be seen as an additional layer of engagement. Encouraging fans to open those notifications depends on so many factors. Opening notifications gets unlikely if fans watch a match live. It is getting more likely if they don't have access to watch a match live for any reason. In both cases, you gain a lot of additional opportunities depending on your fan behavior.&lt;/p&gt;

&lt;p&gt;1. 📺 𝗙𝗼𝗿 𝗳𝗮𝗻𝘀 𝘁𝗵𝗮𝘁 𝘄𝗮𝘁𝗰𝗵 𝗮 𝗺𝗮𝘁𝗰𝗵 𝗹𝗶𝘃𝗲, those notifications can provide and engage with additional or "unseen" information. If a fan watches a match live, the information that a goal happened is already transported via the big screen. Storytelling can be extended to further increase engagement by further mixing our real-time sports data with such marketing channels.&lt;/p&gt;

&lt;p&gt;2. 📲 𝗙𝗼𝗿 𝗳𝗮𝗻𝘀 𝘁𝗵𝗮𝘁 𝗮𝗿𝗲 𝗡𝗢𝗧 𝗮𝗯𝗹𝗲 𝘁𝗼 𝘄𝗮𝘁𝗰𝗵 𝗮 𝗺𝗮𝘁𝗰𝗵 𝗹𝗶𝘃𝗲, latency can be an important USP of your product. We want our fans to cheer first. To be the king or queen in a group of people. Imagine you get the notification first, that your favorite team won the championship. This will be the ultimate hugging guarantee from the Bundesliga. Try it out!&lt;/p&gt;

&lt;p&gt;This means: &lt;strong&gt;match-related notifications are highly contextual&lt;/strong&gt;. Combined with the fact that sending out notifications about sports events can produce a lot of notifications in the timeframe of a match, you have to think about strategies to prevent fan churn. Think about segmentation and what kind of fans you want to target and provide value. Otherwise, keep in mind that your fans might simply ignore your notifications (and the effort you spent in sending them out) or &lt;a href="https://www.businessofapps.com/marketplace/push-notifications/research/push-notifications-statistics/" rel="noopener noreferrer"&gt;leave your platform&lt;/a&gt; and deinstall your application for several reasons.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Segmentation increases the likelihood that customers will engage with the brand, and reduces the potential for communications fatigue — that is, the disengagement of customers who feel like they’re receiving too many messages that don’t apply to them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://aws.amazon.com/blogs/messaging-and-targeting/use-machine-learning-to-target-your-customers-based-on-their-interest-in-a-product-or-product-attribute/" rel="noopener noreferrer"&gt;Target your customers with ML based on their interest in a product or product attribute&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  😴 The second perspective: between the matchdays
&lt;/h3&gt;

&lt;p&gt;Or how I call it: the fan wakeup-call. Extending the matchday experience between matches can be a very important task to keep your level of engagement and retention stable. &lt;strong&gt;It is a typical pattern to have high engagement on matchdays, followed by a drop of your engagement-related KPIs once a matchday is over.&lt;/strong&gt; Several marketing strategies and campaigns help you to extend the so-called matchday experience. Either post-match by sending out notifications about highlight clips, interviews or match reports. Or pre-match by engaging our fans with potential line-ups, injured players or relevant background information about the upcoming matches. In this case, more classic metrics like open rates or session metrics are very valuable KPIs to measure engagement and success.&lt;/p&gt;

&lt;h2&gt;
  
  
  🏟️ From pitch to push notification
&lt;/h2&gt;

&lt;p&gt;What happens behind the scenes when a goal was shot and you want to use this as a trigger to send a push notification? Let us zoom out a bit and let us take a closer look at a real-life example from the professional football league in Germany: the &lt;a href="https://www.bundesliga.com" rel="noopener noreferrer"&gt;Bundesliga&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8se43zjuhre214ch0d8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8se43zjuhre214ch0d8.png" alt="From pitch to push notification - very high level architecture" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The DFL subsidiary company - &lt;a href="https://www.linkedin.com/company/sportec-solutions-ag/" rel="noopener noreferrer"&gt;Sportec Solutions AG&lt;/a&gt; - is the official data provider for all data around Bundesliga and Bundesliga 2 football matches. From here we get all the events like goals, substitutions, cards, and fouls in real-time. This enables the Bundesliga, to build those great digital products like the official Bundesliga App.&lt;/p&gt;

&lt;p&gt;When a player shoots a goal, this information is pushed to a match data processing service. The main responsibility of this service is to receive and process all events that occur during a match and decide how to act on those events.&lt;/p&gt;

&lt;p&gt;One example of how to act on events like goals, cards or kickoffs is to send out a push notification using Amazon Pinpoint. This engages especially those fans who are not actively using the Bundesliga Apps. Those events are not simply broadcasted to all fans. The relevant target segments are selected based on the type of the event, the associated match and the interest of fans receiving specific events. This will result in more &lt;strong&gt;specific targeting and sending out notifications to fans that have an explicitly defined interest in receiving this notification&lt;/strong&gt;. Everything is fully automated.&lt;/p&gt;

&lt;p&gt;Defining the right level of segmentation is an important success factor in your engagement and marketing story. This can result in building several layers of segments that allow you to target our fans using very specific characteristics like&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;⚽️ a match,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🗓 a matchday,&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🐐 a club or&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🏟 individual events.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Amazon Pinpoint gives you the capabilities to create and the flexibility to adapt your segmentations at any given time.&lt;/p&gt;

&lt;p&gt;If you want to know more about the official Bundesliga match data, I can highly recommend the following video which explains the whole process in detail.&lt;br&gt;
&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/DSAYcek__ic"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  🔔 Why you should use Amazon Pinpoint?
&lt;/h2&gt;

&lt;p&gt;Generally speaking, you can achieve to send push notifications in two recommended ways - either using Amazon SNS or Amazon Pinpoint. Although both services have similar capabilities - like sending push notifications, E-Mails or SMS - they have different intentions. &lt;strong&gt;Yes, we can say they are united in the cloud but divided by purpose.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqj282o9k9q6v74ebmfk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqj282o9k9q6v74ebmfk.png" alt="United in cloud, divided by purpose - Comparing Amazon SNS and Amazon Pinpoint" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📢 You will find Amazon SNS - the Simple Notification Service - in the category of “Application Integration”. &lt;strong&gt;From my perspective, the main purpose is more related to pure technical use cases.&lt;/strong&gt; Often used when you need to implement messaging scenarios, event-driven-architectures or want to decouple components and systems. In a nutshell, Amazon SNS is the implementation of a &lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/PublishSubscribeChannel.html" rel="noopener noreferrer"&gt;Publish-Subscribe&lt;/a&gt; pattern and gives you not a rich feature set for solving marketing-related challenges. In any case, please be aware of the &lt;a href="https://docs.aws.amazon.com/general/latest/gr/sns.html#limits_sns" rel="noopener noreferrer"&gt;limits and quotas&lt;/a&gt;. Amazon SNS has several hard limits you have to keep in mind that influence your integration and architecture. Especially when it comes to Topic subscriptions and unsubscriptions.&lt;/p&gt;

&lt;p&gt;🎯 Amazon Pinpoint on the other side is named a “Multichannel Marketing Communication Service” and is located in the “Business Applications” category. It is &lt;strong&gt;NOT&lt;/strong&gt; just about sending messages over a given channel. &lt;strong&gt;It is about building business use cases for marketing and engagement over multiple channels.&lt;/strong&gt; Hence Amazon Pinpoint provides a lot more features and capabilities than Amazon SNS. It is hard to describe the central implemented pattern. Viewed from the outside, the centerpiece of Amazon Pinpoint is the implementation of a &lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/RecipientList.html" rel="noopener noreferrer"&gt;Recipient List&lt;/a&gt; combined with a &lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/ProcessManager.html" rel="noopener noreferrer"&gt;Process Manager&lt;/a&gt;. Looking more detailed, it is a composition of integration patterns to engage with your audience and solve a broad range of marketing business problems.&lt;/p&gt;

&lt;p&gt;You can achieve the same, but you have to take a look from different perspectives to find the right service for the right job.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wrap up
&lt;/h3&gt;

&lt;p&gt;Both services - Amazon SNS and Amazon Pinpoint - are capable of sending push notifications. But they have different intentions, tradeoffs and strengths. &lt;strong&gt;You can start with Amazon SNS if you just want to send out a (transactional) message.&lt;/strong&gt; As soon as you move into more advanced marketing and engagement scenarios and have the requirement of segmentation and personalized messages, you will hit limitations using Amazon SNS. How do you measure the impact of your notifications? How do you know that your messages targeted the right fans? Keep in mind that using Amazon SNS will force you to build a lot of custom stuff around the pure notification part.&lt;/p&gt;

&lt;p&gt;With Amazon Pinpoint you will make the shift from a pure message-focused approach to a real marketing and engagement-focused approach. &lt;strong&gt;Amazon Pinpoint enables you to build omnichannel customer experiences that go beyond just pure messaging.&lt;/strong&gt; The services give you features to define your audience, create dynamic segments, and target your audience while giving you options to measure the impact of your marketing efforts. This enables you to put your fans at the center of the engagement and not just the raw message. It also enables marketers options to create segments based on recent trends.&lt;/p&gt;

&lt;p&gt;And what about measuring your KPIs? Amazon Pinpoint comes with a whole analytics integration and provides features out of the box to analyze your campaign performance and important engagement-related KPIs. Have you ever tried this with Amazon SNS? &lt;strong&gt;I can tell you: it won't scale and you won't close&lt;/strong&gt; &lt;a href="https://segment.com/pdfs/State-of-Personalization-Report-Twilio-Segment-2022.pdf" rel="noopener noreferrer"&gt;&lt;strong&gt;the omnichannel gap&lt;/strong&gt;&lt;/a&gt;&lt;strong&gt;!&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;You are interested to know more about how we at Bundesliga are able to &lt;strong&gt;send out a goal notification to a hundred thousand fans in real time during a match&lt;/strong&gt;? Reach out to me and I am happy to present you &lt;strong&gt;bits, bytes and insights&lt;/strong&gt; about our journey leveraging Amazon Pinpoint for fan engagement.&lt;/p&gt;




&lt;p&gt;About the author: &lt;/p&gt;

&lt;p&gt;👋 Hi my name is Christian. I am working as an AWS Solution Architect at &lt;a href="https://www.dfl.de/en/about/subsidiaries/about-dfl-digital-sports-gmbh/" rel="noopener noreferrer"&gt;DFL Digital Sports GmbH&lt;/a&gt;. Based in cologne with my beloved wife and two kids. I am interested in all things around ☁️ (cloud), 👨‍💻 (tech) and 🧠 (AI/ML).&lt;/p&gt;

&lt;p&gt;With 10+ years of experience in several roles, I have a lot to talk about and love to share my experiences. I worked as a software developer in several companies in the media and entertainment business, as well as a solution engineer in a consulting company. &lt;/p&gt;

&lt;p&gt;I love those challenges to provide high scalable systems for millions of users. And I love to collaborate with lots of people to design systems in front of a whiteboard.&lt;/p&gt;

&lt;p&gt;You can find me on &lt;a href="https://www.linkedin.com/in/christian-bonzelet/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://cremich.cloud" rel="noopener noreferrer"&gt;read my blog&lt;/a&gt;&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>database</category>
      <category>discuss</category>
    </item>
    <item>
      <title>AWS Step Function vs. AWS Lambda - Part 2</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Thu, 25 Nov 2021 13:49:46 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-step-function-vs-aws-lambda-part-2-1dgk</link>
      <guid>https://dev.to/aws-builders/aws-step-function-vs-aws-lambda-part-2-1dgk</guid>
      <description>&lt;p&gt;Do you also feel the tension of the cover image? It is time for a battle again. 🥊 After I published &lt;a href="https://dev.to/aws-builders/aws-step-function-vs-aws-lambda-benchmark-54hj"&gt;the first part&lt;/a&gt; of my comparison, I was overwhelmed about the amount of feedback I received. May it be comments on my post, discussions on twitter or LinkedIn.&lt;/p&gt;

&lt;p&gt;The fact that the initial post triggered a lot of inspiring discussions is very valuable. While reading through your feedback it was kind of obvious that there is a need for a second part. &lt;/p&gt;

&lt;p&gt;I received a lot of feedback about optimizations for AWS Lambda and that people are curious how this affects the performance in comparison to our state machine. We will also take a closer look on the perspective of costs to get a more complete view how the services differ.&lt;/p&gt;

&lt;p&gt;Here we are. &lt;/p&gt;

&lt;p&gt;Like in our first part, again all experiments are triggered using &lt;a href="https://httpd.apache.org/docs/2.4/programs/ab.html" rel="noopener noreferrer"&gt;Apache Bench&lt;/a&gt; with the following parameters.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ab -n 15000 -c 1 https://hash.execute-api.eu-central-1.amazonaws.com/.../&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-n&lt;/code&gt; configures the total amount of requests that are triggered - in our case 15.000&lt;br&gt;
&lt;code&gt;-c&lt;/code&gt; is the number of concurrent requests - in our setup 1&lt;/p&gt;

&lt;p&gt;⚠️ &lt;strong&gt;IMPORTANT:&lt;/strong&gt; it is important to consider, that the results from apache-bench are not 100% accurate. The measured throughput depends on the hardware and network capabilities of my local workstation. For upcoming benchmarks, I consider to use something like CloudShell. &lt;br&gt;
But apache-bench gives some very early feedback and potential indications. Hence we use these results in combination with the Lambda duration and Step-Function execution duration.&lt;/p&gt;
&lt;h2&gt;
  
  
  🔋 Optimizing our Lambda function
&lt;/h2&gt;

&lt;p&gt;So what is the goal of our upcoming experiments? We want to &lt;br&gt;
apply some optimizations on our Lambda function with a clear focus to decrease latencies. Based on the feedback I got, there were two main approaches for optimization:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reusing downstream http connections by activating keep-alive settings.&lt;/li&gt;
&lt;li&gt;Improving overall execution performance by increasing the allocated memory.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Reusing Connections with Keep-Alive in Node.js
&lt;/h3&gt;

&lt;p&gt;For short-lived operations, such as in our case writing and reading to and from S3, the latency overhead of setting up a TCP connection might be greater than the operation itself. To activate http keep-alive you simply have to set an environment variable in your Lambda function configuration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Environment:
  Variables:
    AWS_NODEJS_CONNECTION_REUSE_ENABLED: 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In case you already use v3 of the AWS JS SDK, this setting is &lt;a href="https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/node-reusing-connections.html" rel="noopener noreferrer"&gt;enabled by default&lt;/a&gt;. For v2 you have to &lt;a href="https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/node-reusing-connections.html" rel="noopener noreferrer"&gt;explicitly activate it&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let us deploy the change and start our first test. Let us first start with analyzing the Apache Bench reports. The complete reporting is available on &lt;a href="https://github.com/cremich/aws-sf-lambda-benchmark/tree/main/benchmark" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Here some highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Lambda function was able to process all requests 43 seconds faster compared to the state machine.&lt;/li&gt;
&lt;li&gt;Both the state machine and the Lambda function were able to process round about 7 requests per second&lt;/li&gt;
&lt;li&gt;The mean time per request for the Lambda function was 131ms and 134ms for state machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Looking at these results, this little tweak of activating TCP keep-alive helped a lot to speed up the Lambda function. In terms of end-2-end performance and latency, both solutions are now very close to each other.&lt;/p&gt;

&lt;p&gt;Let us take a closer look into CloudWatch and X-Ray to confirm the observations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faysa1t8hlox1s73vb6gd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faysa1t8hlox1s73vb6gd.png" alt="latencies with keep-alive" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The average execution time of the state-machine is 46.4ms and Lambda performs with 49ms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt9uhcwylc7ujpjnr8ow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt9uhcwylc7ujpjnr8ow.png" alt="x-ray service map with keep alive" width="800" height="275"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here things are still looking interesting. The Lambda function duration on average still has some up and downs during the execution of the test while the duration of the state-machine is stable. Both solutions show some cold-start behavior while it seems that the state-machine needs less time to become "warm".&lt;/p&gt;

&lt;p&gt;But in total the impact on the Lambda function performance is very impressive compared to the results in the first part.&lt;/p&gt;

&lt;h3&gt;
  
  
  Give the Lambda function some RAM
&lt;/h3&gt;

&lt;p&gt;But the question is: how much memory does my Lambda function need? The range is quite large from 128 MB to 10.240 MB. &lt;br&gt;
There is an awesome open source tool called "&lt;a href="https://github.com/alexcasalboni/aws-lambda-power-tuning" rel="noopener noreferrer"&gt;Lambda Power Tuner&lt;/a&gt;" that helps you to determine your memory settings based on different strategies like speed, cost or balanced.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you use "cost" the state machine will suggest the cheapest option (disregarding its performance), while if you use "speed" the state machine will suggest the fastest option (disregarding its cost). When using "balanced" the state machine will choose a compromise between "cost" and "speed"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Source: &lt;a href="https://serverlessrepo.aws.amazon.com/applications/arn:aws:serverlessrepo:us-east-1:451282441545:applications~aws-lambda-power-tuning" rel="noopener noreferrer"&gt;Lambda Power Tuner @ AWS Serverless Application Repository&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In my case the "Lambda Power Tuner" suggested 256 MB as "Best cost" and 2048 MB as "Best Time". &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0v4xxyzdlvi7x79gfvc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0v4xxyzdlvi7x79gfvc.png" alt="lambda-power-tuner-output" width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Awesome, now we have a good start for the final tests.&lt;/p&gt;
&lt;h4&gt;
  
  
  Best time setting
&lt;/h4&gt;

&lt;p&gt;As we aim to reduce latency, let us first start with the proposed "Best Time" setting of 2048 MB memory and let us have a look at the apache-bench metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Lambda function was able to process all requests 81 seconds faster compared to the state machine.&lt;/li&gt;
&lt;li&gt;Both the state machine and the Lambda function were able to process round about 8 requests per second&lt;/li&gt;
&lt;li&gt;The mean time per request for the Lambda function was 121ms and 127ms for state machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared to our first test, there are some improvement but they seem to be marginally on average. Let us try to get some more insights using CloudWatch and X-Ray.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1mf2qilnwy8s087eeldh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1mf2qilnwy8s087eeldh.png" alt="cloudwatch-latencies-2048" width="800" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the most parts, the duration of the Lambda function is just below the execution time of the state-machine. &lt;br&gt;
The average execution time of the state-machine is 45.1ms and Lambda shines with 41.8ms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fma1m8l4tv1x4m9emw9q4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fma1m8l4tv1x4m9emw9q4.png" alt="xray-service-map-2048" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What would happen, if we set our memory configuration to the setting considered as "Best cost"? Let us review the results in the next chapter.&lt;/p&gt;
&lt;h4&gt;
  
  
  Best cost setting
&lt;/h4&gt;

&lt;p&gt;In short again our apache-bench metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Lambda function was able to process all requests 155 seconds faster compared to the state machine.&lt;/li&gt;
&lt;li&gt;The state machine was able to process 7.5 requests per second while the Lambda function processes 8 requests per second&lt;/li&gt;
&lt;li&gt;The mean time per request for the Lambda function was 122ms and 132ms for state machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CloudWatch and X-Ray results also confirm very close results. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvlbnp90x0645ljlc2jjq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvlbnp90x0645ljlc2jjq.png" alt="cloudwatch-256" width="800" height="319"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The average execution time of the state-machine is 54.8ms and Lambda is just in the lead with 50.5ms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjifyz7mgjxujcivhhse.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxjifyz7mgjxujcivhhse.png" alt="xray-256" width="800" height="475"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  💰 Cost comparison
&lt;/h2&gt;

&lt;p&gt;Based on the scale of my test, the AWS Cost Explorer was not really helpful as the load I generated was too low. The AWS calculator is a helpful tool to better compare the costs of both services. &lt;/p&gt;

&lt;p&gt;The estimate is &lt;a href="https://calculator.aws/#/estimate?id=16d3b9fd0f064aac0f7d743fb47ad2b2044ad91e" rel="noopener noreferrer"&gt;publicly available&lt;/a&gt; if you want to have a detailed look. &lt;/p&gt;

&lt;p&gt;I calculated with 5 million invocations per month per service. Based on our test results, I was able to determine very precise values for the parameter that influence pricing like Lambda invocation duration/state-machine execution or consumed memory. The monthly costs are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 USD for AWS Lambda with 2048MB memory (Best time)&lt;/li&gt;
&lt;li&gt;1.83 USD for AWS Lambda with 265MB memory (Best cost)&lt;/li&gt;
&lt;li&gt;5.52 USD for the AWS Step Function express workflow&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  💡 Conclusion
&lt;/h2&gt;

&lt;p&gt;In this part we covered some important aspects like options to improve the performance of a Lambda function. I think it is again very important to mention, that this benchmark should not be interpreted as "use Step Functions whenever you can". &lt;/p&gt;

&lt;p&gt;My goal was more to raise discussions about the importance to not build you decision based on hypothesis or rumors. Make your decision based on data to make the best of all kind of  decisions you can make.&lt;/p&gt;

&lt;p&gt;I would again like to point out a quote from &lt;a href="https://twitter.com/edjgeek" rel="noopener noreferrer"&gt;Eric Johnson&lt;/a&gt; at the &lt;a href="https://www.youtube.com/watch?v=zdmCYPvOHoo" rel="noopener noreferrer"&gt;serverless office hours&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use Lambda to transform not to transport&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Or in my words: the best code is the code that is never written.&lt;/p&gt;

&lt;p&gt;☝️ And here come the thing and this is very important to keep in mind:&lt;/p&gt;

&lt;p&gt;BOTH SERVICES ARE AWESOME.&lt;/p&gt;

&lt;p&gt;If you have the need to write a Lambda function, you will be able to solve a lot of problems. But depending on what you want to achieve, Step Functions give you a lot of power to get the same results without writing ANY line of code, while making up your mind about things like TCP keep-alive or how to figure out what the best memory setting is. In all tests, AWS Lambda showed the well-known cold-start behavior that is something you should keep in mind. AWS Step Function also needs some warm-up time but it is not really comparable to AWS Lambda cold-starts. There was an interesting discussion around this on twitter:&lt;/p&gt;

&lt;p&gt;&lt;iframe class="tweet-embed" id="tweet-1453733187666857985-364" src="https://platform.twitter.com/embed/Tweet.html?id=1453733187666857985"&gt;
&lt;/iframe&gt;

  // Detect dark theme
  var iframe = document.getElementById('tweet-1453733187666857985-364');
  if (document.body.className.includes('dark-theme')) {
    iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1453733187666857985&amp;amp;theme=dark"
  }



&lt;/p&gt;

&lt;p&gt;It only remains to say: happy coding AND happy orchestrating! 🥳 I really hope that my analysis and the approach to decision-making helps you in deciding towards or against one of these services for your individual use cases.&lt;/p&gt;




&lt;p&gt;About the author: &lt;/p&gt;

&lt;p&gt;👋 Hi my name is Christian. I am working as an AWS Solution Architect at &lt;a href="https://www.dfl.de/en/about/subsidiaries/about-dfl-digital-sports-gmbh/" rel="noopener noreferrer"&gt;DFL Digital Sports GmbH&lt;/a&gt;. Based in cologne with my beloved wife and two kids. I am interested in all things around ☁️ (cloud), 👨‍💻 (tech) and 🧠 (AI/ML).&lt;/p&gt;

&lt;p&gt;With 10+ years of experience in several roles, I have a lot to talk about and love to share my experiences. I worked as a software developer in several companies in the media and entertainment business, as well as a solution engineer in a consulting company. &lt;/p&gt;

&lt;p&gt;I love those challenges to provide high scalable systems for millions of users. And I love to collaborate with lots of people to design systems in front of a whiteboard.&lt;/p&gt;

&lt;p&gt;You can find me on &lt;a href="https://www.linkedin.com/in/christian-bonzelet/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://www.twitter.com/cremich" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Cover Image by &lt;a href="https://tenor.com/users/miguelangelvenegasgordillo" rel="noopener noreferrer"&gt;miguelangelvenegasgordillo&lt;/a&gt; on &lt;a href="https://tenor.com/view/civilwar-captainamerica-gif-14387051" rel="noopener noreferrer"&gt;Tenor&lt;/a&gt;&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>aws</category>
      <category>lambda</category>
      <category>stepfunction</category>
    </item>
    <item>
      <title>AWS Step function vs. AWS Lambda benchmark</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Wed, 27 Oct 2021 10:58:34 +0000</pubDate>
      <link>https://dev.to/aws-builders/aws-step-function-vs-aws-lambda-benchmark-54hj</link>
      <guid>https://dev.to/aws-builders/aws-step-function-vs-aws-lambda-benchmark-54hj</guid>
      <description>&lt;p&gt;Looking into the AWS ecosystem of serverless services, AWS Step Functions is one of my personal most favorite services. I recently had a chat with some colleagues about a potential use case of Step functions in favor of AWS Lambda. While we discussed the general concept of AWS Step Functions, one of my beloved colleagues argued towards the usage of AWS Lambda like&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let us use AWS Lambda because a workflow described as a state machine sounds like it is much slower.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I could neither substantiate this statement nor could I contradict it. So I started to examine the original assumption "Step Functions is slower than Lambda" with facts. Time for a benchmark!&lt;/p&gt;

&lt;p&gt;For me the results were crystal clear 😆 &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxw85ufu7wo9z42awxr2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxw85ufu7wo9z42awxr2.jpg" alt="One does not simply" width="651" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just kidding! Let us first get a common understanding what AWS Step Functions and AWS Lambda is. If you are familiar with these services, you can jump right into the section about the test setup and results. &lt;/p&gt;

&lt;p&gt;By the way: the source code is also available for you &lt;a href="https://github.com/cremich/aws-sf-lambda-benchmark" rel="noopener noreferrer"&gt;on Github&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  🤹 What is AWS Step Functions?
&lt;/h2&gt;

&lt;p&gt;AWS Step Functions was published in 2016 as a serverless orchestration service. I think the following definition of AWS Step Functions explains very well, what kind of problems AWS Step Functions solves:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Step Functions is a serverless orchestration service that lets you combine […] AWS services to build business-critical applications. Through Step Functions’ graphical console, you see your application’s workflow as a series of event-driven steps. &lt;/p&gt;

&lt;p&gt;Step Functions is based on state machines and tasks. A state machine is a workflow. A task is a state in a workflow that represents a single unit of work that another AWS service performs. Each step in a workflow is a state. &lt;br&gt;
Source: &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html" rel="noopener noreferrer"&gt;What is AWS Step Functions? - AWS Step Functions&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;State machines can be invoked both asynchronously and synchronously. Step Functions itself offers several ways to invoke you state machine, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;via an explicit &lt;code&gt;StartExecution&lt;/code&gt; call using your favourite AWS SDK,&lt;/li&gt;
&lt;li&gt;on each http request hitting your AWS API Gateway,&lt;/li&gt;
&lt;li&gt;as a destination in your Amazon EventBridge event bus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical use cases for AWS Step Functions cover data processing, machine learning, microservices orchestration or governance and security automation. Since the launch of the AWS SDK service integrations, you can use out of the box working integrations with every service that is supported by the AWS SDK. This offers you a huge number of new opportunities to integrate with AWS services without writing a single line of code. &lt;/p&gt;

&lt;p&gt;While creating a new state machine you can decide between two execution types named “Standard” or “Express”. Each type has several characteristics and strengths. While standard workflows are a good fit for long-running workflows, Express workflows are a good fit for high-traffic workloads, data streaming or mobile application backends. &lt;/p&gt;

&lt;h2&gt;
  
  
  ⚡️ What is AWS Lambda?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging. With Lambda, you can run code for virtually any type of application or backend service.&lt;br&gt;
Source: &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/welcome.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/lambda/latest/dg/welcome.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Don’t get me wrong, I am also a big fan of AWS Lambda. But since AWS announced the game changing SDK service integrations for Step Functions, I start to think more about what are typical use cases for AWS Lambda. To use AWS Lambda more for the things that it is amazing at in the future. &lt;/p&gt;

&lt;p&gt;Or to quote &lt;a href="https://twitter.com/edjgeek" rel="noopener noreferrer"&gt;Eric Johnson&lt;/a&gt; at the &lt;a href="https://www.youtube.com/watch?v=zdmCYPvOHoo" rel="noopener noreferrer"&gt;serverless office hours&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use Lambda to transform not to transport&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  ⏰ Benchmarking latencies
&lt;/h2&gt;

&lt;p&gt;The goal of this benchmark is not to say that service A is better/worse than service B. Each service has its strengths and weaknesses. &lt;br&gt;
What we want to achieve is, getting a better understanding what kind of latencies we can measure for AWS Step Functions and how this compares to a similar integration based on AWS Lambda. &lt;/p&gt;
&lt;h3&gt;
  
  
  General setup
&lt;/h3&gt;

&lt;p&gt;We want to measure the time it takes to read from and write data to Amazon S3 both from a state machine and an AWS Lambda function. &lt;/p&gt;

&lt;p&gt;We test the behavior in two different versions. Version 1 simply writes to S3. Version 2 extends this by executing a &lt;code&gt;GetObject&lt;/code&gt; operation afterwards. The code of the Lambda function is written in javascript.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;AWSXRay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws-xray-sdk-core&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;AWS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;AWSXRay&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;captureAWS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws-sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;S3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;AWS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bucketName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DestinationBucketName&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambdaHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;EVENT: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lambda/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;isBase64Encoded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The state machine workflow is similarly straight forward and chains the same Amazon S3 calls as the AWS Lambda function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsk9xpfqw08wqcpaq6jhe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsk9xpfqw08wqcpaq6jhe.png" alt="State machine graph" width="550" height="650"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both the AWS Lambda function and the state machine can be invoked via an API Gateway. All experiments are triggered using &lt;a href="https://httpd.apache.org/docs/2.4/programs/ab.html" rel="noopener noreferrer"&gt;Apache Bench&lt;/a&gt; with the following parameters.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ab -n 15000 -c 1 https://hash.execute-api.eu-central-1.amazonaws.com/Prod/invoke-lambda/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-n&lt;/code&gt; configures the total amount of requests that are triggered - in our case 15.000&lt;br&gt;
&lt;code&gt;-c&lt;/code&gt; is the number of concurrent requests - in our setup 1&lt;/p&gt;

&lt;p&gt;I decided to use this setting because I want to generate a moderate stream of load for both integrations.&lt;/p&gt;

&lt;p&gt;X-Ray is activated on all integration layers so that we are able to get a complete trace from the API-Gateway down to S3.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experiment 1 - Writing to S3
&lt;/h3&gt;

&lt;p&gt;The first experiment focuses only on the execution of a &lt;code&gt;PutObject&lt;/code&gt; without reading the files afterwards. The automatic Amazon CloudWatch dashboards for AWS Lambda, AWS API Gateway and AWS Step Functions are a good starting point to  provide us valuable insights.&lt;/p&gt;

&lt;p&gt;Let us first start with analyzing the Apache Bench reports. The complete reporting is available on &lt;a href="https://github.com/cremich/aws-sf-lambda-benchmark/tree/main/benchmark" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Here some highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The state machine was able to process all requests 539 seconds faster compared to the lambda function. &lt;/li&gt;
&lt;li&gt;The state machine was able to process 2.07 more requests per second&lt;/li&gt;
&lt;li&gt;The mean time per request for the state machine is 35.92 ms lower than the lambda based integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  API Gateway latencies
&lt;/h4&gt;

&lt;p&gt;A closer look into the Amazon CloudWatch dashboard underlines what Apache Bench tells us. While observing the complete length of the benchmark we see that the average latency of Step Functions is constantly below AWS Lambda.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qznji6l8jzqx52ucd79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qznji6l8jzqx52ucd79.png" alt="Average latencies on API Gateway" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both integration types indicate a drop in latencies indicating some kind of cold start behavior. While the drop of Step Functions on average is more significant compared to AWS Lambda. &lt;/p&gt;

&lt;p&gt;When we take a closer look into the 99th percentile, we see some more spikes but in general a similar result over time. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7yk8jegp7nvyovssc1t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7yk8jegp7nvyovssc1t8.png" alt="99 percentile latencies on API Gateway" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Statemachine and AWS Lambda function execution
&lt;/h4&gt;

&lt;p&gt;Let us now jump into the next integration layer and take a look at the duration of the AWS Lambda function and the state machine itself. Not very surprisingly that the the state machine is very much faster - in the end round about 60% compared to the duration of the Lambda function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xayxeq95s5j9dc5xong.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xayxeq95s5j9dc5xong.png" alt="Statemachine and lambda execution" width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AWS Lambda function runs with the default memory settings of 128MB and a default timeout of 3 seconds. Depending on the concrete use case, fine-tuning your memory settings might have a significant impact on the lambda metrics. &lt;/p&gt;

&lt;h4&gt;
  
  
  Downstream service latencies
&lt;/h4&gt;

&lt;p&gt;I was very much surprised to see, that the connection between Step-Functions and S3 seems to be much more efficient. Looking at our X-Ray service map and traces the average latency between Lambda and S3 is 63ms compared to the integration with Step Functions of 28ms. It may be a coincidence that the relatively difference is also almost 60%. Or it might reveal, that Step Functions does some &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/connect-supported-services.html" rel="noopener noreferrer"&gt;optimization handling the AWS client SDK&lt;/a&gt; under the hood.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8njv85gh8ed4f1f9yjc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8njv85gh8ed4f1f9yjc.png" alt="X-Ray service map experiment 1" width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Experiment 2 - Write and read from S3
&lt;/h3&gt;

&lt;p&gt;I was interested to know if the amount of work a statemachine has to cover, impacts latencies and execution times compared to my AWS Lambda function. Hence we extended our experiment to also read data from S3 after writing it. &lt;/p&gt;

&lt;p&gt;Again, let us first check the report from Apache Bench:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The state machine was able to process all requests 1287 seconds faster compared to the lambda function. &lt;/li&gt;
&lt;li&gt;The state machine was able to process 3.01 more requests per second&lt;/li&gt;
&lt;li&gt;The mean time per request for the state machine is 85,83 ms lower than the lambda based integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  API Gateway latencies and execution duration
&lt;/h4&gt;

&lt;p&gt;Long story short, the results are comparable to the ones from the first experiment. But it is interesting to see, that the gap between the state machine and the Lambda function is getting bigger. Some factors will influence this, like the chosen implementation and runtime of the AWS Lambda function. &lt;/p&gt;

&lt;p&gt;💡 Please checkout the awesome article of my AWS Community Builder fellow Alexandr Filichkin about a &lt;a href="https://filia-aleks.medium.com/aws-lambda-battle-2021-performance-comparison-for-all-languages-c1b441005fd1" rel="noopener noreferrer"&gt;performance comparison of the different lambda runtimes&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The AWS Lambda function is not able to get closer to the latency behavior of the state machine implementation. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk73t86370dqbc533vhtr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk73t86370dqbc533vhtr.png" alt="API Gateway latencies experiment 2" width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AWS Lambda function needs almost double the amount of time to write and read data from/to S3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpmufyzfb17eu68lco3p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpmufyzfb17eu68lco3p.png" alt="execution duration experiment 2" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also interesting to see, that the latency between my AWS Lambda function and Amazon S3 seems to slightly increase compared to the first experiment on average. AWS Step Function keeps on optimizing the connection to Amazon S3 🤩. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwt7tnow50s2yczxmvduv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwt7tnow50s2yczxmvduv.png" alt="xray service map experiment 2" width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Based on the things I learned, what would I answer now if someone states &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let us use AWS Lambda because a workflow described as a state machine sounds like it is very much slower.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My general answer would be: measure first. My specific answer on the comparision of AWS Step Functions and a AWS Lambda function is, that this is not true in all cases. Our little experiment revealed some interesting insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Step Function scales and is much faster in our setup compared to my AWS Lambda function.&lt;/li&gt;
&lt;li&gt;In this experiment, the state machine shows a more efficient communication with S3 compared to my custom code implementation. &lt;/li&gt;
&lt;li&gt;When we compare the Step Function implementation with AWS Lambda it is obvious that we do not have to write custom code to achieve the same results. &lt;/li&gt;
&lt;li&gt;The new capabilities of the Step Function Workflow Studio and SDK service integration lower the barrier to achieve the same result in this use case while reducing time-to-market.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But be cautious in generalizing the test results. There is a lot you can do to &lt;a href="https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-introduction/" rel="noopener noreferrer"&gt;optimize your AWS Lambda functions&lt;/a&gt; to optimise for performance efficiency. Your results might also differ in other use cases. These results should not disband you from creating additional benchmarks including your specific use cases to measure what is important to you. &lt;/p&gt;

&lt;p&gt;Please also keep in mind if you really have to optimize for performance and take into account if it is also possible to implement your use case asynchronously. &lt;/p&gt;




&lt;p&gt;About the author: &lt;/p&gt;

&lt;p&gt;👋 Hi my name is Christian. I am working as an AWS Solution Architect at &lt;a href="https://www.dfl.de/en/about/subsidiaries/about-dfl-digital-sports-gmbh/" rel="noopener noreferrer"&gt;DFL Digital Sports GmbH&lt;/a&gt;. Based in cologne with my beloved wife and two kids. I am interested in all things around ☁️ (cloud), 👨‍💻 (tech) and 🧠 (AI/ML).&lt;/p&gt;

&lt;p&gt;With 10+ years of experience in several roles, I have a lot to talk about and love to share my experiences. I worked as a software developer in several companies in the media and entertainment business, as well as a solution engineer in a consulting company. &lt;/p&gt;

&lt;p&gt;I love those challenges to provide high scalable systems for millions of users. And I love to collaborate with lots of people to design systems in front of a whiteboard.&lt;/p&gt;

&lt;p&gt;You can find me on &lt;a href="https://www.linkedin.com/in/christian-bonzelet/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://www.twitter.com/cremich" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Cover Image by &lt;a href="https://unsplash.com/@wacalke?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Mateusz Wacławek&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/marvel?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>aws</category>
      <category>lambda</category>
      <category>stepfunction</category>
    </item>
    <item>
      <title>AWS Step function vs. AWS Lambda benchmark</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Wed, 27 Oct 2021 10:49:46 +0000</pubDate>
      <link>https://dev.to/cremich/aws-step-function-vs-aws-lambda-benchmark-4f41</link>
      <guid>https://dev.to/cremich/aws-step-function-vs-aws-lambda-benchmark-4f41</guid>
      <description>&lt;p&gt;Looking into the AWS ecosystem of serverless services, AWS Step Functions is one of my personal most favorite services. I recently had a chat with some colleagues about a potential use case of Step functions in favor of AWS Lambda. While we discussed the general concept of AWS Step Functions, one of my beloved colleagues argued towards the usage of AWS Lambda like&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let us use AWS Lambda because a workflow described as a state machine sounds like it is much slower.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I could neither substantiate this statement nor could I contradict it. So I started to examine the original assumption "Step Functions is slower than Lambda" with facts. Time for a benchmark!&lt;/p&gt;

&lt;p&gt;For me the results were crystal clear 😆 &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxw85ufu7wo9z42awxr2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxw85ufu7wo9z42awxr2.jpg" alt="One does not simply" width="651" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just kidding! Let us first get a common understanding what AWS Step Functions and AWS Lambda is. If you are familiar with these services, you can jump right into the section about the test setup and results. &lt;/p&gt;

&lt;p&gt;By the way: the source code is also available for you &lt;a href="https://github.com/cremich/aws-sf-lambda-benchmark" rel="noopener noreferrer"&gt;on Github&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  🤹 What is AWS Step Functions?
&lt;/h2&gt;

&lt;p&gt;AWS Step Functions was published in 2016 as a serverless orchestration service. I think the following definition of AWS Step Functions explains very well, what kind of problems AWS Step Functions solves:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Step Functions is a serverless orchestration service that lets you combine […] AWS services to build business-critical applications. Through Step Functions’ graphical console, you see your application’s workflow as a series of event-driven steps. &lt;/p&gt;

&lt;p&gt;Step Functions is based on state machines and tasks. A state machine is a workflow. A task is a state in a workflow that represents a single unit of work that another AWS service performs. Each step in a workflow is a state. &lt;br&gt;
Source: &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html" rel="noopener noreferrer"&gt;What is AWS Step Functions? - AWS Step Functions&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;State machines can be invoked both asynchronously and synchronously. Step Functions itself offers several ways to invoke you state machine, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;via an explicit &lt;code&gt;StartExecution&lt;/code&gt; call using your favourite AWS SDK,&lt;/li&gt;
&lt;li&gt;on each http request hitting your AWS API Gateway,&lt;/li&gt;
&lt;li&gt;as a destination in your Amazon EventBridge event bus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical use cases for AWS Step Functions cover data processing, machine learning, microservices orchestration or governance and security automation. Since the launch of the AWS SDK service integrations, you can use out of the box working integrations with every service that is supported by the AWS SDK. This offers you a huge number of new opportunities to integrate with AWS services without writing a single line of code. &lt;/p&gt;

&lt;p&gt;While creating a new state machine you can decide between two execution types named “Standard” or “Express”. Each type has several characteristics and strengths. While standard workflows are a good fit for long-running workflows, Express workflows are a good fit for high-traffic workloads, data streaming or mobile application backends. &lt;/p&gt;

&lt;h2&gt;
  
  
  ⚡️ What is AWS Lambda?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging. With Lambda, you can run code for virtually any type of application or backend service.&lt;br&gt;
Source: &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/welcome.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/lambda/latest/dg/welcome.html&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Don’t get me wrong, I am also a big fan of AWS Lambda. But since AWS announced the game changing SDK service integrations for Step Functions, I start to think more about what are typical use cases for AWS Lambda. To use AWS Lambda more for the things that it is amazing at in the future. &lt;/p&gt;

&lt;p&gt;Or to quote &lt;a href="https://twitter.com/edjgeek" rel="noopener noreferrer"&gt;Eric Johnson&lt;/a&gt; at the &lt;a href="https://www.youtube.com/watch?v=zdmCYPvOHoo" rel="noopener noreferrer"&gt;serverless office hours&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Use Lambda to transform not to transport&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  ⏰ Benchmarking latencies
&lt;/h2&gt;

&lt;p&gt;The goal of this benchmark is not to say that service A is better/worse than service B. Each service has its strengths and weaknesses. &lt;br&gt;
What we want to achieve is, getting a better understanding what kind of latencies we can measure for AWS Step Functions and how this compares to a similar integration based on AWS Lambda. &lt;/p&gt;
&lt;h3&gt;
  
  
  General setup
&lt;/h3&gt;

&lt;p&gt;We want to measure the time it takes to read from and write data to Amazon S3 both from a state machine and an AWS Lambda function. &lt;/p&gt;

&lt;p&gt;We test the behavior in two different versions. Version 1 simply writes to S3. Version 2 extends this by executing a &lt;code&gt;GetObject&lt;/code&gt; operation afterwards. The code of the Lambda function is written in javascript.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;AWSXRay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws-xray-sdk-core&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;AWS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;AWSXRay&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;captureAWS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;aws-sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;S3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;AWS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bucketName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DestinationBucketName&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lambdaHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;EVENT: &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;lambda/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;isBase64Encoded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The state machine workflow is similarly straight forward and chains the same Amazon S3 calls as the AWS Lambda function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsk9xpfqw08wqcpaq6jhe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsk9xpfqw08wqcpaq6jhe.png" alt="State machine graph" width="550" height="650"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both the AWS Lambda function and the state machine can be invoked via an API Gateway. All experiments are triggered using &lt;a href="https://httpd.apache.org/docs/2.4/programs/ab.html" rel="noopener noreferrer"&gt;Apache Bench&lt;/a&gt; with the following parameters.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ab -n 15000 -c 1 https://hash.execute-api.eu-central-1.amazonaws.com/Prod/invoke-lambda/&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-n&lt;/code&gt; configures the total amount of requests that are triggered - in our case 15.000&lt;br&gt;
&lt;code&gt;-c&lt;/code&gt; is the number of concurrent requests - in our setup 1&lt;/p&gt;

&lt;p&gt;I decided to use this setting because I want to generate a moderate stream of load for both integrations.&lt;/p&gt;

&lt;p&gt;X-Ray is activated on all integration layers so that we are able to get a complete trace from the API-Gateway down to S3.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experiment 1 - Writing to S3
&lt;/h3&gt;

&lt;p&gt;The first experiment focuses only on the execution of a &lt;code&gt;PutObject&lt;/code&gt; without reading the files afterwards. The automatic Amazon CloudWatch dashboards for AWS Lambda, AWS API Gateway and AWS Step Functions are a good starting point to  provide us valuable insights.&lt;/p&gt;

&lt;p&gt;Let us first start with analyzing the Apache Bench reports. The complete reporting is available on &lt;a href="https://github.com/cremich/aws-sf-lambda-benchmark/tree/main/benchmark" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Here some highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The state machine was able to process all requests 539 seconds faster compared to the lambda function. &lt;/li&gt;
&lt;li&gt;The state machine was able to process 2.07 more requests per second&lt;/li&gt;
&lt;li&gt;The mean time per request for the state machine is 35.92 ms lower than the lambda based integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  API Gateway latencies
&lt;/h4&gt;

&lt;p&gt;A closer look into the Amazon CloudWatch dashboard underlines what Apache Bench tells us. While observing the complete length of the benchmark we see that the average latency of Step Functions is constantly below AWS Lambda.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qznji6l8jzqx52ucd79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8qznji6l8jzqx52ucd79.png" alt="Average latencies on API Gateway" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both integration types indicate a drop in latencies indicating some kind of cold start behavior. While the drop of Step Functions on average is more significant compared to AWS Lambda. &lt;/p&gt;

&lt;p&gt;When we take a closer look into the 99th percentile, we see some more spikes but in general a similar result over time. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7yk8jegp7nvyovssc1t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7yk8jegp7nvyovssc1t8.png" alt="99 percentile latencies on API Gateway" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Statemachine and AWS Lambda function execution
&lt;/h4&gt;

&lt;p&gt;Let us now jump into the next integration layer and take a look at the duration of the AWS Lambda function and the state machine itself. Not very surprisingly that the the state machine is very much faster - in the end round about 60% compared to the duration of the Lambda function.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xayxeq95s5j9dc5xong.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9xayxeq95s5j9dc5xong.png" alt="Statemachine and lambda execution" width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AWS Lambda function runs with the default memory settings of 128MB and a default timeout of 3 seconds. Depending on the concrete use case, fine-tuning your memory settings might have a significant impact on the lambda metrics. &lt;/p&gt;

&lt;h4&gt;
  
  
  Downstream service latencies
&lt;/h4&gt;

&lt;p&gt;I was very much surprised to see, that the connection between Step-Functions and S3 seems to be much more efficient. Looking at our X-Ray service map and traces the average latency between Lambda and S3 is 63ms compared to the integration with Step Functions of 28ms. It may be a coincidence that the relatively difference is also almost 60%. Or it might reveal, that Step Functions does some &lt;a href="https://docs.aws.amazon.com/step-functions/latest/dg/connect-supported-services.html" rel="noopener noreferrer"&gt;optimization handling the AWS client SDK&lt;/a&gt; under the hood.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8njv85gh8ed4f1f9yjc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw8njv85gh8ed4f1f9yjc.png" alt="X-Ray service map experiment 1" width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Experiment 2 - Write and read from S3
&lt;/h3&gt;

&lt;p&gt;I was interested to know if the amount of work a statemachine has to cover, impacts latencies and execution times compared to my AWS Lambda function. Hence we extended our experiment to also read data from S3 after writing it. &lt;/p&gt;

&lt;p&gt;Again, let us first check the report from Apache Bench:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The state machine was able to process all requests 1287 seconds faster compared to the lambda function. &lt;/li&gt;
&lt;li&gt;The state machine was able to process 3.01 more requests per second&lt;/li&gt;
&lt;li&gt;The mean time per request for the state machine is 85,83 ms lower than the lambda based integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  API Gateway latencies and execution duration
&lt;/h4&gt;

&lt;p&gt;Long story short, the results are comparable to the ones from the first experiment. But it is interesting to see, that the gap between the state machine and the Lambda function is getting bigger. Some factors will influence this, like the chosen implementation and runtime of the AWS Lambda function. &lt;/p&gt;

&lt;p&gt;💡 Please checkout the awesome article of my AWS Community Builder fellow Alexandr Filichkin about a &lt;a href="https://filia-aleks.medium.com/aws-lambda-battle-2021-performance-comparison-for-all-languages-c1b441005fd1" rel="noopener noreferrer"&gt;performance comparison of the different lambda runtimes&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;The AWS Lambda function is not able to get closer to the latency behavior of the state machine implementation. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk73t86370dqbc533vhtr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk73t86370dqbc533vhtr.png" alt="API Gateway latencies experiment 2" width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AWS Lambda function needs almost double the amount of time to write and read data from/to S3.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpmufyzfb17eu68lco3p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpmufyzfb17eu68lco3p.png" alt="execution duration experiment 2" width="800" height="262"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also interesting to see, that the latency between my AWS Lambda function and Amazon S3 seems to slightly increase compared to the first experiment on average. AWS Step Function keeps on optimizing the connection to Amazon S3 🤩. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwt7tnow50s2yczxmvduv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwt7tnow50s2yczxmvduv.png" alt="xray service map experiment 2" width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusions
&lt;/h2&gt;

&lt;p&gt;Based on the things I learned, what would I answer now if someone states &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let us use AWS Lambda because a workflow described as a state machine sounds like it is very much slower.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My general answer would be: measure first. My specific answer on the comparision of AWS Step Functions and a AWS Lambda function is, that this is not true in all cases. Our little experiment revealed some interesting insights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Step Function scales and is much faster in our setup compared to my AWS Lambda function.&lt;/li&gt;
&lt;li&gt;In this experiment, the state machine shows a more efficient communication with S3 compared to my custom code implementation. &lt;/li&gt;
&lt;li&gt;When we compare the Step Function implementation with AWS Lambda it is obvious that we do not have to write custom code to achieve the same results. &lt;/li&gt;
&lt;li&gt;The new capabilities of the Step Function Workflow Studio and SDK service integration lower the barrier to achieve the same result in this use case while reducing time-to-market.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But be cautious in generalizing the test results. There is a lot you can do to &lt;a href="https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-introduction/" rel="noopener noreferrer"&gt;optimize your AWS Lambda functions&lt;/a&gt; to optimise for performance efficiency. Your results might also differ in other use cases. These results should not disband you from creating additional benchmarks including your specific use cases to measure what is important to you. &lt;/p&gt;

&lt;p&gt;Please also keep in mind if you really have to optimize for performance and take into account if it is also possible to implement your use case asynchronously. &lt;/p&gt;




&lt;p&gt;About the author: &lt;/p&gt;

&lt;p&gt;👋 Hi my name is Christian. I am working as an AWS Solution Architect at &lt;a href="https://www.dfl.de/en/about/subsidiaries/about-dfl-digital-sports-gmbh/" rel="noopener noreferrer"&gt;DFL Digital Sports GmbH&lt;/a&gt;. Based in cologne with my beloved wife and two kids. I am interested in all things around ☁️ (cloud), 👨‍💻 (tech) and 🧠 (AI/ML).&lt;/p&gt;

&lt;p&gt;With 10+ years of experience in several roles, I have a lot to talk about and love to share my experiences. I worked as a software developer in several companies in the media and entertainment business, as well as a solution engineer in a consulting company. &lt;/p&gt;

&lt;p&gt;I love those challenges to provide high scalable systems for millions of users. And I love to collaborate with lots of people to design systems in front of a whiteboard.&lt;/p&gt;

&lt;p&gt;You can find me on &lt;a href="https://www.linkedin.com/in/christian-bonzelet/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://www.twitter.com/cremich" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;Cover Image by &lt;a href="https://unsplash.com/@wacalke?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Mateusz Wacławek&lt;/a&gt; on &lt;a href="https://unsplash.com/s/photos/marvel?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>aws</category>
      <category>lambda</category>
      <category>stepfunction</category>
    </item>
    <item>
      <title>Why you should explore your data before feeding Amazon Personalize</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Wed, 20 Oct 2021 08:30:09 +0000</pubDate>
      <link>https://dev.to/cremich/why-you-should-explore-your-data-before-feeding-amazon-personalize-1f5n</link>
      <guid>https://dev.to/cremich/why-you-should-explore-your-data-before-feeding-amazon-personalize-1f5n</guid>
      <description>&lt;p&gt;Alexa...set a timer for 15 minutes. ⏳&lt;/p&gt;




&lt;p&gt;In my &lt;a href="https://dev.to/cremich/automate-provisioning-of-sagemaker-notebooks-using-the-aws-cdk-3p4l"&gt;previous blog post&lt;/a&gt;, I showed you how to automate the provisioning of Sagemaker notebook instances. Let us now use this notebook instance for data exploration and data analysis as part of the &lt;a href="https://github.com/cremich/personalize-kickstart/" rel="noopener noreferrer"&gt;Amazon Personalize Kickstart&lt;/a&gt; project.&lt;/p&gt;

&lt;p&gt;The goal of this project is to provide you a kickstart for your personalization journey when building a recommendation engine based on Amazon Personalize. It will serve you as a reference implementation you can both learn the concepts and integration aspects of Amazon Personalize. &lt;/p&gt;

&lt;h2&gt;
  
  
  🕵️ Data exploration is an essential part of your machine learning development process
&lt;/h2&gt;

&lt;p&gt;Before you just import your historical data, it is recommended to gather knowledge. Both on your data and on your business domain. Every recommendation engine project is kind of unique if we look at the data we have to process and the way the business works. In a very first step during a proof-of-concept phase, it is all about finding answers on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What data can we use?&lt;/li&gt;
&lt;li&gt;What data do we need?&lt;/li&gt;
&lt;li&gt;Is our data quality sufficient?&lt;/li&gt;
&lt;li&gt;How do we access the required data?&lt;/li&gt;
&lt;li&gt;How do we identify users, interaction or items we want to recommend?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Collaborative sessions with subject matter experts help us in building an optimal solution along the given circumstances. &lt;strong&gt;Making decisions is easy. Making the right decision is the challenge.&lt;/strong&gt; In my opinion, data exploration is one of the most important parts in your machine learning development process. &lt;/p&gt;

&lt;p&gt;To formulate it a bit more drastically: without data analysis and exploration, you can only do the right thing by accident. &lt;/p&gt;

&lt;h2&gt;
  
  
  🏁 What do we want to achieve?
&lt;/h2&gt;

&lt;p&gt;We want to build a recommendation engine covering all features of Amazon Personalize. The dataset we will use is the publicly available MovieLens dataset.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;GroupLens Research has collected and made available rating data sets from the MovieLens web site (&lt;a href="https://movielens.org" rel="noopener noreferrer"&gt;https://movielens.org&lt;/a&gt;). The data sets were collected over various periods of time, depending on the size of the set. Before using these data sets, please review their README files for the usage licenses and other details.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Source: &lt;a href="https://grouplens.org/datasets/movielens/" rel="noopener noreferrer"&gt;https://grouplens.org/datasets/movielens/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The MovieLens dataset contains 25 million movie ratings and a rich set of movie metadata. We will use this data to provide an initial version of our recommendation engine based on historical data. &lt;/p&gt;

&lt;p&gt;My goal is to not reinvent the wheel at all. But bring relevant analyses in one place that help us to judge if our data fits to be used for a recommendation engine based on Amazon Personalize.&lt;/p&gt;

&lt;p&gt;Those analyses are both inspired from my personal experiences as well as a lot of cool stuff of the open source community like the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://notebook.community/harishkrao/DSE200x/Mini%20Project/Analysis%20on%20the%20Movie%20Lens%20dataset" rel="noopener noreferrer"&gt;Analysis on the Movie Lens dataset using pandas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://personalization-immersionday.workshop.aws/en/" rel="noopener noreferrer"&gt;Amazon Personalize immersion day&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.kaggle.com/jneupane12/analysis-of-movielens-dataset-beginner-sanalysis" rel="noopener noreferrer"&gt;Analysis of MovieLens dataset (Beginner'sAnalysis)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=8kElv1sticI" rel="noopener noreferrer"&gt;Data Analysis using the MovieLens dataset with pandas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsdatascience.com/comprehensive-data-explorations-with-matplotlib-a388be12a355" rel="noopener noreferrer"&gt;Comprehensive Data Visualization with Matplotlib&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📊 From data to answers
&lt;/h2&gt;

&lt;p&gt;Before you start with your analysis, it is recommended to define some key questions you would like to answer. You can then use the insights and knowledge you gained to discuss them with subject matter experts. &lt;/p&gt;

&lt;p&gt;Well in our kickstart project, unfortunately there are no subject matter experts available right now. But let us start with what we have: 🤖 data and a 📖 &lt;a href="https://files.grouplens.org/datasets/movielens/ml-25m-README.html" rel="noopener noreferrer"&gt;README&lt;/a&gt;! &lt;/p&gt;

&lt;p&gt;By analyzing the Movielens datasets we want to answer some very specific questions about our movie business:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What are the top 10 most rated movies?&lt;/li&gt;
&lt;li&gt;Are ratings in general more positive or negative?&lt;/li&gt;
&lt;li&gt;Is there a correlation between genres?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So let us get started and dive into our datasets. 🤿&lt;/p&gt;

&lt;h2&gt;
  
  
  🗺 Data exploration samples
&lt;/h2&gt;

&lt;p&gt;For a complete overview of all analysis results, please check the complete &lt;a href="https://github.com/cremich/personalize-kickstart/blob/main/notebooks/data-exploration.ipynb" rel="noopener noreferrer"&gt;Jupyter notebook on github&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Before we start, let us do some basic setup like importing libraries, downloading the sample data and loading them into dataframes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;data_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;movielens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;dataset_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data_dir&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ml-latest-small/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;mkdir&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;data_dir&lt;/span&gt;

&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;data_dir&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;wget&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;grouplens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;org&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;movielens&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ml&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;small&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;data_dir&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;unzip&lt;/span&gt; &lt;span class="n"&gt;ml&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;latest&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;small&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;

&lt;span class="n"&gt;raw_ratings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_dir&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/ratings.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;raw_movies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_dir&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/movies.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;movie_rating&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;merge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_ratings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_movies&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;left&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;right_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;movieId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;left_on&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;movieId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What are the top 10 most rated movies?
&lt;/h3&gt;

&lt;p&gt;We want to know better, what movies are top rated in our system. We use the merged dataframe of movies and ratings, group it by title and sort by the number of rows per movie to get the top 10 movies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;top_ten_movies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;movie_rating&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;top_ten_movies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;barh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahxlaeo1onh2sx3uk90r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fahxlaeo1onh2sx3uk90r.png" alt="Top rated movies" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we build our recommender system based on ratings, we have to check if we have some bias in our data. It could happen that top rated movies are recommended more often compared to less rated videos in the end. This is something to be discussed with subject matter experts to have clear expectations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are ratings in general more positive or negative?
&lt;/h3&gt;

&lt;p&gt;We want to now more about the distribution of ratings. Our hypothesis is, that recommending low rated videos might not be a good user experience. On the other side we might not be too aggressive as it can lead to biased recommendation by ignoring those low rated videos. Maybe there are users that are still interested in low rated videos because they fit their favorite genre. Who knows?&lt;/p&gt;

&lt;p&gt;Let us in a first step visualize the distribution of all ratings. In a next step we will categorize ratings that are lower than 3.0 as a negative rating. All other ratings will be categorized as a positive rating.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;raw_ratings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sort_index&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;barh&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtpaq2fej7m5n74rzph0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtpaq2fej7m5n74rzph0.png" alt="Screenshot 2021-10-08 at 10.55.59" width="776" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We now map each rating that is bigger than 3.0 to a positive sentiment and all other ratings to a negative sentiment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rating_sentiment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_ratings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;rating_sentiment&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rating_sentiment&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;positive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;3.0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rating_sentiment&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;barh&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6w0enh62rl9sinuxf8dp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6w0enh62rl9sinuxf8dp.png" alt="Rating sentiment" width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We now get an idea that the majority of ratings are "positive".&lt;/p&gt;

&lt;h3&gt;
  
  
  How many videos are released per year?
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;movies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_movies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;release_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\((\d{4})\)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;expand&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;movies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;release_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;release_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;int64&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(.*?)\s*\(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expand&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;movie_year&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;movies&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;release_year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;movie_year&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;movie_year&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;release_year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;legend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xlabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Release year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ylabel&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Number of movies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzdadeclfzoou4qda3wh9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzdadeclfzoou4qda3wh9.png" alt="Released videos per year" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Movies range from release dates from 1902 till 2018. Since round about the year of 1980 the amount of released movies seems to be increasing more strongly. There is an interesting drop of releases round about in the year of 2012. In 2018 nearly the same amount of movies were release like in the end of the 70s.&lt;/p&gt;

&lt;p&gt;If there were subject matter experts in place, those analysis might result in some very interesting question to better understand the driver of both the increase in the 80s but also the drop after 2010.&lt;/p&gt;

&lt;h2&gt;
  
  
  💡 Conclusions
&lt;/h2&gt;

&lt;p&gt;Data exploration and the learnings you gained out of data puts yourself in an excellent position. Ideally you formulated the business problem you want to solve upfront. Define some relevant KPIs you want to improve. Based on your learnings you can now dive deeper into what is possible in your situation. Challenge your KPI definition or define additional hypotheses that will guide you on your journey. &lt;/p&gt;




&lt;p&gt;Alexa says, time is over...see you next time. &lt;/p&gt;

&lt;p&gt;I am not a data scientist and would never consider myself to have a deep knowledge in this context. But I have to admit that I am getting a bit obsessed about those things around data science, data exploration, analysis and data driven decisions. I observed a lot and tried to be the sponge that soaks up everything in this area. &lt;/p&gt;

&lt;p&gt;Hence I am really interested in your feedback, experience and thoughts in the comments. 👋 &lt;/p&gt;




&lt;p&gt;Cover Image by Andrew Neel - &lt;a href="https://unsplash.com/photos/z55CR_d0ayg" rel="noopener noreferrer"&gt;https://unsplash.com/photos/z55CR_d0ayg&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>personalize</category>
    </item>
    <item>
      <title>Automate provisioning of Sagemaker Notebooks using the AWS CDK</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Sat, 02 Oct 2021 21:30:26 +0000</pubDate>
      <link>https://dev.to/cremich/automate-provisioning-of-sagemaker-notebooks-using-the-aws-cdk-3p4l</link>
      <guid>https://dev.to/cremich/automate-provisioning-of-sagemaker-notebooks-using-the-aws-cdk-3p4l</guid>
      <description>&lt;p&gt;Alexa...set a timer for 15 minutes. ⏳&lt;/p&gt;




&lt;p&gt;In this post I want to highlight two things:&lt;br&gt;
1.) The announcement of the Amazon Personalize kickstart project&lt;br&gt;
2.) Show you how to automate the provisioning of Sagemaker Notebooks for your data exploration tasks&lt;/p&gt;
&lt;h2&gt;
  
  
  🚀 What is the Amazon Personalize kickstart project?
&lt;/h2&gt;

&lt;p&gt;The goal of this project is to provide you a kickstart for your personalization journey when building a recommendation engine based on Amazon Personalize. It will serve you as a reference implementation so you can both learn the concepts and integration aspects of Amazon Personalize. &lt;/p&gt;

&lt;p&gt;You can also use it to build you own recommendation engine upon best practices and production-ready components based on the &lt;a href="https://aws.amazon.com/de/cdk/" rel="noopener noreferrer"&gt;AWS Cloud Development Kit&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;It will contain all of my collected best practices over the last year while building and designing recommendation engines. From implementing A/B testing strategies towards orchestrating and automating the training workflow of Amazon Personalize. But also providing a good developer experience using sandbox stacks or automate the provisioning of Sagemaker notebook instances.&lt;/p&gt;

&lt;p&gt;The kickstart project is open source and available via Github:&lt;br&gt;
&lt;a href="https://github.com/cremich/personalize-kickstart/" rel="noopener noreferrer"&gt;https://github.com/cremich/personalize-kickstart/&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  💻 Wait, you are talking about an AI service. Why do I need to get in touch with Sagemaker?
&lt;/h2&gt;

&lt;p&gt;Independent on which layer of the AWS ML stack you operate: &lt;br&gt;
Before you just import your historical data, it is recommended to gather knowledge. Both on your data and on your business domain. Every recommendation engine project is kind of unique if we look at the data we have to process and the way how the business works. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ke1s6w8qe22k3dx8exu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ke1s6w8qe22k3dx8exu.png" alt="Screenshot 2021-10-01 at 14.45.39" width="800" height="119"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Your &lt;a href="https://dev.to/cremich/learnings-from-creating-recommendation-engines-with-amazon-personalize-o24"&gt;process should start&lt;/a&gt; with defining the business problem you want to solve. Followed by defining KPIs you want to improve and framing your ML problem definition. Then start with data exploration and analysis. &lt;/p&gt;

&lt;p&gt;A managed jupyter notebook by Amazon Sagemaker is an excellent start to &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ingest and analyze you data&lt;/li&gt;
&lt;li&gt;prepare, clean and transform your data,&lt;/li&gt;
&lt;li&gt;start to train and tune your recommendation model candidates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Amazon Personalize kickstart project supports you to automate the provisioning of individual Sagemaker Notebooks. Also ensures that a notebook is deleted once you delete your stack to save costs. &lt;/p&gt;
&lt;h2&gt;
  
  
  🚧 The Sagemaker notebook construct
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The problem it solves
&lt;/h3&gt;

&lt;p&gt;My observation from recent projects, if you do not automate this kind of stuff:&lt;/p&gt;

&lt;p&gt;You kicked off your machine learning project and want to use AI services to solve this. Your team members want to do some data exploration and data analysis in the early days. Therefore everyone who owns this task, provisions a Sagemaker Notebook instance, some default IAM execution roles and a bunch of S3 buckets to store the data that needs to be explored. &lt;/p&gt;

&lt;p&gt;Your project ends and usually those manual provisioned resources will be forgotten but still costs you money. They might also introduce some vulnerabilities due to outdated libraries until you stop your notebook sessions or restart the instances.&lt;/p&gt;
&lt;h3&gt;
  
  
  The solution it offers
&lt;/h3&gt;

&lt;p&gt;The Sagemaker notebook construct provides an automatic provisioned Amazon Sagemaker notebook instance for your data analysis and data exploration tasks. Provisioning a Sagemaker Notebook is optional and not required in all stages and cases. In central provisioned dev, staging or production accounts a Sagemaker Notebook is not inevitably necessary.&lt;/p&gt;

&lt;p&gt;But it will help you in your developer sandbox accounts or stacks. We encapsulate the resources that are needed to operate and run a Sagemaker notebook in a reusable construct.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The construct consists of the actual Sagemaker Notebook instance, an S3 bucket to put your raw data in as well as a &lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html" rel="noopener noreferrer"&gt;IAM execution role&lt;/a&gt;. The role grants the Sagemaker service access to get and update data in the S3 bucket. It further includes the required managed policy to provide Sagemaker full access. &lt;/p&gt;

&lt;p&gt;Construct parameters allow you to set the name of the notebook instance, the required EBS volume size as well as the required EC2 instance type. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ Please keep in mind that some properties will result in a resource replacement like changing the instance name. According to the cloudformation resource documentation, a change of the EBS volume size or the instance type won't replace your notebook instance. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each developer that uses this construct will now benefit from an automated and consistent process while ensuring that all resources for data exploration are deleted once they are not needed anymore. &lt;/p&gt;

&lt;p&gt;Sure there is always room for additional features if you for example have the requirements of a more fine grained networking setup. Consider this as a conceptual starting point and extend this concept to your needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automate everything
&lt;/h2&gt;

&lt;p&gt;The benefits of automating this process is, that you prevent to end in orphan resources. In my experience, it is likely to be forgotten over time that there are some notebooks still running or raw data still sleeping in S3 buckets.&lt;/p&gt;

&lt;p&gt;So save your credit card and automate everything 😊&lt;/p&gt;

&lt;p&gt;It also enables some other interesting options like uploading some initial data for data exploration or training along provisioning your construct. Or connecting a git repository to your notebook instance to provide some shared notebooks to all your data scientists. &lt;/p&gt;




&lt;p&gt;Alexa says, time is over...see you next time. What additional ideas or options do you have in mind? What are your experiences how to automate this process? Happy to get your feedback, experience and thoughts in the comments. 👋 &lt;/p&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>personalize</category>
      <category>cdk</category>
    </item>
    <item>
      <title>Automated rotating of AWS access keys in Bitbucket pipelines</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Fri, 20 Aug 2021 12:52:21 +0000</pubDate>
      <link>https://dev.to/cremich/automated-rotating-of-aws-access-keys-in-bitbucket-pipelines-1jfi</link>
      <guid>https://dev.to/cremich/automated-rotating-of-aws-access-keys-in-bitbucket-pipelines-1jfi</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;⚠️ I am sorry if gists are sometimes rendered in the wrong order. It still seem to be an open issue on the forem platform: &lt;a href="https://github.com/forem/forem/issues/14428" rel="noopener noreferrer"&gt;https://github.com/forem/forem/issues/14428&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AWS access keys enables you to provide a programmatic access to your AWS cloud infrastructure. You can use those access keys to verify your identity and permissions in your AWS CLI, all AWS SDKs but also in your CI/CD pipeline to deploy your application which certain toolsets like SAM, CDK or the raw AWS CLI. &lt;/p&gt;

&lt;p&gt;When using access keys, automated rotation is a very important aspect in terms of security and also embedded in the &lt;a href="https://docs.aws.amazon.com/wellarchitected/latest/security-pillar/identity-management.html" rel="noopener noreferrer"&gt;security pillar&lt;/a&gt; of the &lt;a href="https://aws.amazon.com/en/architecture/well-architected/" rel="noopener noreferrer"&gt;AWS Well-Architected Framework&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this post, I will describe a solution to automatically rotate access keys that are used in your Bitbucket CI/CD pipeline using the Bitbucket API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best practices for managing AWS access keys
&lt;/h2&gt;

&lt;p&gt;Besides the regular rotation of your access keys, AWS describes some additional &lt;a href="https://docs.aws.amazon.com/general/latest/gr/aws-access-keys-best-practices.html" rel="noopener noreferrer"&gt;best practices&lt;/a&gt; when managing your AWS access keys, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;protect or don't create your root user access key&lt;/li&gt;
&lt;li&gt;don't embed access keys into your &lt;a href="https://medium.com/swlh/aws-access-keys-leak-in-github-repository-and-some-improvements-in-amazon-reaction-cc2e20e89003" rel="noopener noreferrer"&gt;source code&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;rotate access keys periodically&lt;/li&gt;
&lt;li&gt;remove unused access keys&lt;/li&gt;
&lt;li&gt;configure MFA for your most sensitive operations&lt;/li&gt;
&lt;li&gt;use IAM roles instead of long-term access keys &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But how to I rotate access keys that I configured in my CI/CD SaaS solution like Github, Bitbucket or Gitlab? &lt;/p&gt;

&lt;p&gt;The good thing: all major providers provide APIs to enhance automation. Let's have a look how this can work for Bitbucket.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rotate, deactivate, delete, repeat
&lt;/h2&gt;

&lt;p&gt;What do we want to achieve? We want an automated process to&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rotate our access keys every 90 days&lt;/li&gt;
&lt;li&gt;Deactivate unused access keys every 100 days&lt;/li&gt;
&lt;li&gt;Delete unused access keys every 110 days&lt;/li&gt;
&lt;li&gt;Update deployment variables in our Bitbucket pipeline configuration after every rotation. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Okay. Challenge accepted. Hold my beer and here we go:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8m03n0ifqqvf8nrtw9ci.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8m03n0ifqqvf8nrtw9ci.png" alt="Screenshot 2021-08-20 at 12.38.08" width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In order to trigger our process we configure scheduled Cloudwatch events to invoke some lambda functions ever 90/100/110 days. Each lambda function has a dedicated responsibility to create a new, deactivate an or delete already deactivated access keys. &lt;/p&gt;

&lt;p&gt;The rotation lambda is straight forward. It creates a new access key and writes the credentials in a secret provisioned in the AWS Secret Manager. &lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;The secret will be the source of truth for the actice access key that is also used in our Bitbucket Pipeline configuration.  In the next chapter, we take a deeper look how we now sync the secret with Bitbucket. &lt;/p&gt;

&lt;h2&gt;
  
  
  Sync credentials with Bitbucket
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8795dgx9ywoulwgeaeys.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8795dgx9ywoulwgeaeys.png" alt="Screenshot 2021-08-20 at 12.38.58" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bitbucket provides a &lt;a href="https://developer.atlassian.com/bitbucket/api/2/reference/resource/" rel="noopener noreferrer"&gt;REST API&lt;/a&gt; to access data or trigger operations on repositories, workspaces or pipeline configurations. In order to update environment variables in our deployment configurations in Bitbucket, we need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the name of the workspace,&lt;/li&gt;
&lt;li&gt;the name of the repository,&lt;/li&gt;
&lt;li&gt;the uuid of the deployment environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Having all those information in place, we can use the Bitbucket API to update the value of our deployment variables. &lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;We can then use a Cloudwatch Event to capture all &lt;code&gt;PutSecretValue&lt;/code&gt; events via CloudTrail and invoke the sync lambda function that is responsible to update your Bitbucket deployment variables. If you for example use the &lt;a href="https://aws.amazon.com/en/cdk/" rel="noopener noreferrer"&gt;AWS CDK&lt;/a&gt;, configure the CloudWatch event like:&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;If you are more a fan of raw Cloudformation or SAM, the same will also be possible to setup in your favorite infrastructure-as-code tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about IAM roles?
&lt;/h2&gt;

&lt;p&gt;If you are more looking for a solution that does not rely on access keys but rather uses IAM roles for authorizing, I can highly recommend the solution to deploy on &lt;a href="https://support.atlassian.com/bitbucket-cloud/docs/deploy-on-aws-using-bitbucket-pipelines-openid-connect/" rel="noopener noreferrer"&gt;AWS using Bitbucket pipelines OIDC&lt;/a&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In order to use OpenID Connect on AWS, you will need to configure Pipelines as a Web Identity Provider, create an IAM role, and configure the build to assume the created role prior to running your build.&lt;br&gt;
Web Identity Providers allow the system to receive an authentication token, and then use or exchange that token for temporary security credentials in AWS. These temporary security credentials map to an IAM role with permissions to use the resources in your AWS account. &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_oidc.html" rel="noopener noreferrer"&gt;Learn more about Web Identity Providers from AWS&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Source: &lt;a href="https://support.atlassian.com/bitbucket-cloud/docs/deploy-on-aws-using-bitbucket-pipelines-openid-connect/" rel="noopener noreferrer"&gt;https://support.atlassian.com/bitbucket-cloud/docs/deploy-on-aws-using-bitbucket-pipelines-openid-connect/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Alexa says, time is over...see you next time. Happy to get your feedback, experience and thoughts in the comments. 👋 &lt;/p&gt;




&lt;p&gt;Cover Image: &lt;a href="https://unsplash.com/photos/DoWZMPZ-M9s" rel="noopener noreferrer"&gt;https://unsplash.com/photos/DoWZMPZ-M9s&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bitbucket</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>Learnings from creating recommendation engines with Amazon Personalize</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Thu, 08 Jul 2021 12:23:03 +0000</pubDate>
      <link>https://dev.to/cremich/learnings-from-creating-recommendation-engines-with-amazon-personalize-o24</link>
      <guid>https://dev.to/cremich/learnings-from-creating-recommendation-engines-with-amazon-personalize-o24</guid>
      <description>&lt;p&gt;Alexa...set a timer for 15 minutes. ⏳&lt;/p&gt;

&lt;p&gt;In the last year of working in my role as an AWS Solution Architect, I was involved in several projects to improve products to provide a more personalized experience. Recommendation Engines are one good way to provide a personalized product usage. &lt;/p&gt;

&lt;p&gt;I want to give you some insights about my personal learnings on building recommendation engines using Amazon Personalize.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Amazon Personalize?
&lt;/h2&gt;

&lt;p&gt;As part of the AI Services, Amazon Personalize is a managed service to provide personalization and recommendations based on the same technology used at Amazon.com. The marketing claim further adds: "...with no ML experience required". Well in general that is true, but my personal experience is: having ML skills on board will boost you project. Lets dive deeper into this in the next chapters. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1xxbpdosjd7njms05l5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1xxbpdosjd7njms05l5.png" alt="AWS ML Stack" width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using Amazon Personalize you will get convenient APIs you can use to solve a specific personalization and recommendation business problem. The trade-off of using more convenience is less flexibility in the end. This does not have to be bad at all. But it is good to keep this in mind when choosing on which level of the AWS ML stack you want to focus your work. &lt;/p&gt;

&lt;p&gt;Amazon Personalize is a fully managed service. It generates highly relevant recommendations using deep learning techniques. It build custom and private ML models using your own data. Private ML models means, data and models are not shared to improve the Amazon Personalize service itself. Which makes GDPR compliance a bit easier 🥳&lt;/p&gt;

&lt;h2&gt;
  
  
  Three simple steps to your recommendation engine
&lt;/h2&gt;

&lt;p&gt;The cool thing about Amazon Personalize, it includes a fully managed ML pipeline to identify features, select the right hyperparameters, train your model, optimize your model, host your model and provide a real-time feature store.&lt;/p&gt;

&lt;p&gt;Your main task as a consumer is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define you target schema of user interactions, item metadata and user metadata&lt;/li&gt;
&lt;li&gt;Preprocess or transform historical data to match your desired target schema&lt;/li&gt;
&lt;li&gt;Import your data into Amazon Personalize using a batch import&lt;/li&gt;
&lt;li&gt;Deploy your auto-trained recommendation model resulting in an API endpoint to &lt;/li&gt;
&lt;li&gt;Infer recommendations for your users &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sounds easy. But working on three recommendation engine projects in the past, I can tell you: there are a lot of more things to consider besides those 5 steps. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F839blxz4kord42m40lnd.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F839blxz4kord42m40lnd.jpg" alt="Personalize ML workflow" width="800" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amazon Personalize solves three foundational personalization cases like user pesonalization, similar items or personalized ranking. &lt;/p&gt;

&lt;h2&gt;
  
  
  From PoC to MVP
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Data analysis and feature engineering
&lt;/h3&gt;

&lt;p&gt;Before you just import your historical data, it is recommended to gather knowledge. Both on your data and on your business domain. Every recommendation engine project is kind of unique if we look at the data we have to process and the way how the business works. &lt;/p&gt;

&lt;p&gt;In a very first step during a proof-of-concept phase, it is all about finding answers on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What data do we need?&lt;/li&gt;
&lt;li&gt;How do we get those data?&lt;/li&gt;
&lt;li&gt;How do we identify a user?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Amazon Personalize is able to collect data for three main data sources: user metadata, item metadata and user interactions. Personalize at least needs interaction metadata to operate and work. Item and user data are not necessary required but having them will improve the recommendation output. &lt;/p&gt;

&lt;p&gt;So even if you do not need data science skills to use Amazon Personalize, it is really helpful to have those skills in your package for data analysis and preparation. &lt;/p&gt;

&lt;p&gt;Strong collaboration within your team enables you to get  quick answers the above questions. Existing web analytics or user tracking data is often a very interesting source for interaction data. At least for using these data to create an initial and production-ready recommendation model. &lt;/p&gt;

&lt;p&gt;Data analysis might also reveal some additional challenges. If data quality is poor, it is likely that providing good recommendations from the start might not be the easy task. Making wrong assumptions might lead to biased datasets which will influence recommendations. In a nutshell: data analysis and preceding feature engineering is key to understand the context you are in. &lt;/p&gt;

&lt;h3&gt;
  
  
  Define your KPIs
&lt;/h3&gt;

&lt;p&gt;Without knowing the exact business value you want to provide and how to measure success, it can be very hard to estimate whether the solution you provide is successful or not. Personalized recommendations means: you get different content compared to me. Looking just at my result it is very hard to debug and judge if the recommendations me or one of my collegues get, are sufficient or not. Simply trusting the process that Amazon Personalize will make a good job might also blur your results. &lt;/p&gt;

&lt;p&gt;You have to define your KPIs upfront. What makes your recommendation engine successful? Is it an increase of session duration? Is it an increase of read articles per user per session? Is it an increase of selled products? Is retention the right KPI?&lt;/p&gt;

&lt;p&gt;Defining the right KPI is not an easy task. And doing it too late makes this process even harder. So my recommendation is: do it as a very very first step and align with your business stakeholders, what KPIs make sense. &lt;/p&gt;

&lt;h3&gt;
  
  
  A/B testing
&lt;/h3&gt;

&lt;p&gt;A/B testing is mandatory when you consider to build a recommendation engine. It should be prepared right from the start of a project. Otherwise it could make things unnecessary complex if we blindly pick an out-of-the box solution without looking at what is really needed for a given workload.&lt;/p&gt;

&lt;p&gt;A/B testing is a common technique for comparing the efficiency of different recommendation strategies. These capabilities should support us in getting answers based on our KPIs. By applying A/B testing you are able to compare different kind of user journeys:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;user journeys including recommendations vs no recommendations to measure the impact of recommendations on your KPIs.&lt;/li&gt;
&lt;li&gt;measuring the performance of two recommendation models to decide which one is the better one.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My personal conclusion after the past months of work in this field: &lt;a href="https://dev.to/cremich/all-you-need-is-love-and-data-317l"&gt;data is key&lt;/a&gt;. If you want to use data to make better decisions quickly, you have to work on both accessibility, semantic and quality of your data. &lt;/p&gt;

&lt;p&gt;Analytics has to scale - and here I not mean the technical side of scaling. It has to scale organizational along business requirements and frequency of change. Focusing on AI/ML use cases introduces a lot of changes as you have to explore and experiment with data to measure your success. The more and better data we have, the better is your foundation for experiments. Introducing experiments and A/B testing might also come with a change in organizational culture or peoples mindset.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Kick start your proof of concept
&lt;/h2&gt;

&lt;p&gt;During my learning path, I collected several resources that helped me a lot to understand more about Personalize and also use existing implementations to setup a working recommendation engine in less than an hour. This is actually very helpful in a proof-of-concept phase to get fast results. &lt;/p&gt;

&lt;p&gt;Below you find a curated list of helpful resources on Github and the AWS blog.&lt;/p&gt;

&lt;h3&gt;
  
  
  Github
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/aws-samples/amazon-personalize-data-conversion-pipeline" rel="noopener noreferrer"&gt;Amazon Personalize data conversion pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws-samples/amazon-personalize-automated-retraining" rel="noopener noreferrer"&gt;Amazon Personalize automated retraining&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws-samples/amazon-personalize-ingestion-pipeline" rel="noopener noreferrer"&gt;Amazon Personalize ingestion pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws-samples/amazon-personalize-monitor" rel="noopener noreferrer"&gt;Amazon Personalize Monitor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws-samples/amazon-personalize-samples" rel="noopener noreferrer"&gt;Amazon Personalize samples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws-samples/amazon-personalize-samples/blob/master/PersonalizeCheatSheet2.0.md" rel="noopener noreferrer"&gt;Amazon Personalize cheat sheet&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AWS Blog
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/de/blogs/machine-learning/selecting-the-right-metadata-to-build-high-performing-recommendation-models-with-amazon-personalize/" rel="noopener noreferrer"&gt;Selecting the right metadata to build high-performing recommendation models with Amazon Personalize&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/de/blogs/machine-learning/using-a-b-testing-to-measure-the-efficacy-of-recommendations-generated-by-amazon-personalize/" rel="noopener noreferrer"&gt;Using A/B testing to measure the efficacy of recommendations generated by Amazon Personalize&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/de/blogs/machine-learning/automating-amazon-personalize-solution-using-the-aws-step-functions-data-science-sdk/" rel="noopener noreferrer"&gt;Automating an Amazon Personalize solution using the AWS Step Functions Data Science SDK&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/de/blogs/architecture/automating-recommendation-engine-training-with-amazon-personalize-and-aws-glue/" rel="noopener noreferrer"&gt;Automating Recommendation Engine Training with Amazon Personalize and AWS Glue&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Workshops
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://personalization-immersionday.workshop.aws/en/" rel="noopener noreferrer"&gt;Amazon Personalize Immersion Day&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.sagemakerworkshop.com/personalize/" rel="noopener noreferrer"&gt;Amazon Sagemaker/Personalize Workshop&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My top three learnings
&lt;/h2&gt;

&lt;p&gt;Amazon Personalize is an awesome service. I am very impressed about the results. But Personalize can only provide good recommendations, if you keep some things in mind. Here are my top three learnings. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scope out the business problem you want to solve&lt;/li&gt;
&lt;li&gt;Define your KPIs to measure success and implement A/B testing&lt;/li&gt;
&lt;li&gt;Analyze your data and crunch as much knowledge as you can about the business domain and business problem. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Alexa says, time is over...see you next time. Happy to get your feedback, experience and thoughts in the comments. 👋 &lt;/p&gt;




&lt;p&gt;Image Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://personalization-immersionday.workshop.aws/en/getting-started.html" rel="noopener noreferrer"&gt;Cover Image&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/de/blogs/machine-learning/using-a-b-testing-to-measure-the-efficacy-of-recommendations-generated-by-amazon-personalize/" rel="noopener noreferrer"&gt;Personalize ML workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.toAWS%20Immersion%20day%20slides"&gt;AWS ML Stack&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>personalize</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>All you need is love ❤️ and data 🤖</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Thu, 10 Jun 2021 13:08:12 +0000</pubDate>
      <link>https://dev.to/cremich/all-you-need-is-love-and-data-317l</link>
      <guid>https://dev.to/cremich/all-you-need-is-love-and-data-317l</guid>
      <description>&lt;p&gt;Alexa...set a timer for 15 minutes. ⏳&lt;/p&gt;

&lt;p&gt;A few days ago I had a chat with a former colleague that inspired me to write my first official lighning post article about this. He asked me via WhatsApp:&lt;/p&gt;

&lt;p&gt;Hey Christian, how are you? You must be very familiar with Lambda as an AWS solution architect? I have a question. &lt;/p&gt;

&lt;p&gt;The question was targeted towards a service quota of lambda for the maximum number of parallel executions. Giving the answer on the exact question was very easy. I said:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Yeah there is a default quota of 1000 concurrent executions. But you can increase it if you need to via a support ticket.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Well that could be the end of the story but I was interested to know more about why they hit those limits and we got in a very interesting conversation. &lt;/p&gt;

&lt;h2&gt;
  
  
  What about data 🤖 ?
&lt;/h2&gt;

&lt;p&gt;The conversation reveals a lot of what I spend my professional day with.&lt;/p&gt;

&lt;p&gt;My job as an AWS Solution Architect at DFL Digital Sports is to help teams find an optimal solution to a (business) problem. Or as Maria Ane Dias (Solution Architect at AWS) describes it very aptly on &lt;a href="https://www.instagram.com/p/CPbMGkojzC0/?utm_source=ig_web_copy_link" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I help customers understand how they can innovate by working backwards from their challenges, drawing and reviewing architectures, and being a strong support and advisory to them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Bam 💥&lt;/p&gt;

&lt;p&gt;And one of the most exciting challenges in my work: for every problem there are many possible solutions.&lt;/p&gt;

&lt;p&gt;To make an informed decision about possible solutions, I need a lot of background information. I have to put myself in the shoes of others and ask a lot of questions to understand things and contexts.&lt;/p&gt;

&lt;p&gt;The reason why my buddy was actually evaluating AWS Lambda was because they have troubles to handle traffic peaks after they send out push notifications to users. And my buddy seems to be kind of disappointed that a serverless approach using AWS Lambda did not solve the problem out of the box. &lt;/p&gt;

&lt;p&gt;So our discussion afterwards was a lot about topics apart from AWS Lambda. I asked a lot of questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you have enough data to predict peak loads?&lt;/li&gt;
&lt;li&gt;Can the response somehow be cached?&lt;/li&gt;
&lt;li&gt;What percentage of users are logged in on average?&lt;/li&gt;
&lt;li&gt;Is it possible to segment users and send the notification in bulks?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, in order to be able to show solution options &lt;br&gt;
a) I have to have a good technical understanding&lt;br&gt;
b) but more importantly, understand exactly how the problem is structured.&lt;/p&gt;

&lt;p&gt;In the end, the discussion was less about AWS Lambda. It was more about the context of the application. About background information on the problem. &lt;/p&gt;

&lt;p&gt;I asked myself 🤔:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maybe AWS Lambda is a less than optimal choice of technology here?&lt;/li&gt;
&lt;li&gt;Are there other approaches in terms of integrating the systems in charge?&lt;/li&gt;
&lt;li&gt;How deep is the data base to make informed decisions here?&lt;/li&gt;
&lt;li&gt;Is the problem a general one or does it only occur sporadically?&lt;/li&gt;
&lt;li&gt;What influences that the problem occurs?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data and knowledge helps me in my daily work to support my colleagues in the best possible way. To make data-driven decisions and reduce the amount of hypothesis. Technology must never be an end in itself. &lt;/p&gt;

&lt;p&gt;In my eyes, the choice of technology must always be based on supporting a specific problem. If we've achieved that, we've done everything right. 💡&lt;/p&gt;

&lt;p&gt;Coming back to the problem of my former colleague, I recommended to look, if the new released capability of &lt;a href="https://aws.amazon.com/de/blogs/compute/scaling-your-applications-faster-with-ec2-auto-scaling-warm-pools/" rel="noopener noreferrer"&gt;EC2 warm-pools&lt;/a&gt; could potentially help better before refactoring towards AWS Lambda. Or if they can somehow reduce the pressure by caching responses at edge locations. Or if something like &lt;a href="https://aws.amazon.com/de/blogs/aws/new-predictive-scaling-for-ec2-powered-by-machine-learning/" rel="noopener noreferrer"&gt;predictive scaling&lt;/a&gt; of EC2 could also work for them. Maybe it is "shooting at sparrows with cannons", but in this short time it was not possible for me to dig deeper in the status quo of his current architecture and future plans. &lt;/p&gt;

&lt;h2&gt;
  
  
  And what about love ❤️ ?
&lt;/h2&gt;

&lt;p&gt;I love what I do. I believe that is another important factor to inspire people. I am a passionate software developer, software architect and fan of technology. I don't care if the software is written in JavaScript, TypeScript, Java or PHP. Likewise whether it runs serverless, container-based or on premise. &lt;/p&gt;

&lt;p&gt;I am constantly trying to broaden my horizon. This is the only way I can neutrally look at a wide range of options and then come to an informed decision. &lt;/p&gt;

&lt;p&gt;Passion for what you do is everything! &lt;/p&gt;

&lt;p&gt;Alexa says, time is over...see you next time. Happy to get your feedback and thoughts in the comments. 👋 &lt;/p&gt;

</description>
      <category>aws</category>
      <category>lambda</category>
      <category>serverless</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Introducing lightning posts</title>
      <dc:creator>Christian Bonzelet</dc:creator>
      <pubDate>Tue, 08 Jun 2021 11:44:27 +0000</pubDate>
      <link>https://dev.to/cremich/introducing-lightning-posts-2m5e</link>
      <guid>https://dev.to/cremich/introducing-lightning-posts-2m5e</guid>
      <description>&lt;p&gt;Alexa...set a timer for 15 minutes. ⏳&lt;/p&gt;

&lt;p&gt;Admittedly, I've made many attempts to blog in the past. Ultimately, in almost all cases, it has failed due to lack of time. But somehow, some past encounters with different people have managed to make me try again. &lt;/p&gt;

&lt;p&gt;But this time I want to approach the subject differently. What I certainly can't achieve: gaining more time. But I dare to try different techniques to divide my time better. &lt;/p&gt;

&lt;p&gt;The first encounter that made me want to blog was during a job interview. I saw that the applicant was blogging and asked him: &lt;strong&gt;"What is your motivation to blog?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;He told me, meaningfully: &lt;strong&gt;"I use so much from open source communities every day. It's time to give something back".&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Wow! That immediately triggered me. 🤩&lt;/p&gt;

&lt;p&gt;But how can I "give back" meaningful content on the one hand and not sacrifice several hours of my time for it on the other hand. &lt;/p&gt;

&lt;p&gt;I dare an attempt which I call "lightning posts". &lt;/p&gt;

&lt;p&gt;Based on the technique of "lightning talks" which I already know from various conferences. I set a timer for 15 minutes and start writing. &lt;/p&gt;

&lt;h2&gt;
  
  
  What are lightning talks? ⚡️
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Lightning_talk" rel="noopener noreferrer"&gt;Wikipedia&lt;/a&gt; says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A lightning talk is a very short presentation lasting only a few minutes, given at a conference or similar forum. Several lightning talks will usually be delivered by different speakers in a single session, sometimes called a data blitz.&lt;br&gt;
Some formats of lightning talk, including PechaKucha and Ignite, involve a specific number of slides that are automatically advanced at fixed intervals. Lightning talks are often referred to as ignite talks. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In my own words, it is about reducing a specific issue, topic or problem to its essence and explaining it to an audience in a short time. &lt;/p&gt;

&lt;p&gt;Of course, I have thought about a topic in advance. And there are already many ideas circulating in my head about how to structure the topic accordingly. &lt;/p&gt;

&lt;p&gt;So the biggest challenge is to focus and just get started. At the same time, I prepare myself for the fact that not everything will run perfectly from the start. I'm also adjusting to the fact that the amount of content per post probably won't be as detailed. But is that necessarily a bad thing? When I read blog posts, I tend to like ones that get to the point quickly. Because even when I'm reading, I feel that the time I can invest is simply limited.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inspired by the crazy 🎱
&lt;/h2&gt;

&lt;p&gt;Another source of inspiration was a design sprint I recently participated in. There we used the brainstorming method "&lt;a href="https://conceptboard.com/de/blog/crazy-8s-brainstorming-template/" rel="noopener noreferrer"&gt;crazy 8&lt;/a&gt;". Each team member has eight minutes to visualize possible solutions to a problem on eight fields of a sheet of paper. I thought that was very cool! The method ensured maximum focus on a concrete problem in the minutes. &lt;/p&gt;

&lt;p&gt;The obvious thing about both variants (lightning talks and crazy 8): they use the technique of "timeboxing". I try to get the maximum out of a defined time. This does not necessarily have to be perfect. But the way is the goal. You can always improve later. &lt;/p&gt;

&lt;p&gt;I'm curious myself how the experiment is going. And I look forward to your feedback in the comments. If you come across things that I should formulate more precisely, please let me know.&lt;/p&gt;

&lt;p&gt;👋&lt;/p&gt;

</description>
      <category>agile</category>
      <category>blogging</category>
    </item>
  </channel>
</rss>
