<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chat Rathnayake</title>
    <description>The latest articles on DEV Community by Chat Rathnayake (@chat_r).</description>
    <link>https://dev.to/chat_r</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2802585%2F355868e9-5620-4b3c-b95d-7142cc1ebafd.jpg</url>
      <title>DEV Community: Chat Rathnayake</title>
      <link>https://dev.to/chat_r</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chat_r"/>
    <language>en</language>
    <item>
      <title>Designing a Scalable Shuffle Service for Big Data on AWS</title>
      <dc:creator>Chat Rathnayake</dc:creator>
      <pubDate>Wed, 05 Feb 2025 21:46:30 +0000</pubDate>
      <link>https://dev.to/chat_r/designing-a-scalable-shuffle-service-for-big-data-on-aws-58mo</link>
      <guid>https://dev.to/chat_r/designing-a-scalable-shuffle-service-for-big-data-on-aws-58mo</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Shuffle operations are a critical component of distributed data processing frameworks like Apache Spark. The "Magnet: Push-based Shuffle Service" paper introduces an optimized shuffle service for large-scale Spark workloads, reducing shuffle overhead and improving efficiency. In this post, we design a fully AWS-native approach to solving similar shuffle performance bottlenecks using Amazon EMR, S3, and other AWS services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding the Shuffle Problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In distributed computing, shuffle is the process of transferring intermediate data between map and reduce tasks. Traditional Spark shuffle operations suffer from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Disk and Network Bottlenecks:&lt;/strong&gt; Reading and writing shuffle data incurs high I/O costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability Issues:&lt;/strong&gt; Large-scale clusters experience significant delays in shuffle-heavy jobs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Skew:&lt;/strong&gt; Uneven distribution of keys across partitions leads to straggler tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "Magnet" approach improves shuffle efficiency by merging small shuffle data files into large blocks and proactively co-locating data closer to compute resources. We aim to achieve similar optimizations using AWS-native solutions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Choosing the Right AWS Architecture&lt;/strong&gt;&lt;br&gt;
AWS offers various services that can be leveraged to build a scalable shuffle service:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Amazon EMR:&lt;/strong&gt; A managed big data processing framework for running Apache Spark, Presto, and Hadoop.&lt;br&gt;
&lt;strong&gt;- Amazon S3:&lt;/strong&gt; Object storage that can serve as external shuffle storage.&lt;br&gt;
&lt;strong&gt;- Amazon FSx for Lustre:&lt;/strong&gt; High-performance distributed file system that can reduce shuffle read/write overhead.&lt;br&gt;
&lt;strong&gt;- AWS Glue:&lt;/strong&gt; Can be used for ETL jobs with optimized data processing pipelines.&lt;br&gt;
&lt;strong&gt;- Amazon EC2 Placement Groups:&lt;/strong&gt; Reduce network latency by ensuring physical proximity of compute nodes.&lt;/p&gt;

&lt;p&gt;For this design, we will use &lt;strong&gt;Amazon EMR with S3-backed shuffle storage and FSx for Lustre for high-speed temporary data handling.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Configuring Amazon EMR for Optimized Shuffle&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;a) Selecting the Right EC2 Instances&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For shuffle-heavy workloads, we recommend:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Compute-Optimized (C5, C6g) Instances:&lt;/strong&gt; High CPU power for intensive computations.&lt;br&gt;
&lt;strong&gt;- Memory-Optimized (R5, R6g) Instances:&lt;/strong&gt; Useful for reducing disk spill during shuffle operations.&lt;br&gt;
&lt;strong&gt;- Storage-Optimized (I3, I4i) Instances:&lt;/strong&gt; Provides NVMe SSD storage for fast temporary shuffle data access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;b) Using FSx for Lustre as Shuffle Storage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Amazon EMR can be configured to use &lt;strong&gt;FSx for Lustre&lt;/strong&gt; instead of local disks for temporary shuffle data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws emr create-cluster \
    --applications Name=Spark \
    --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \
    --instance-type c5.4xlarge \
    --instance-count 5 \
    --use-default-roles \
    --configurations '[{"Classification":"spark-defaults","Properties":{"spark.local.dir":"/fsx"}}]'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This configuration redirects shuffle data to &lt;strong&gt;FSx for Lustre&lt;/strong&gt;, reducing disk bottlenecks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;c) Enabling Dynamic Resource Allocation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Dynamic resource allocation optimizes resource usage by scaling executors based on load:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spark.dynamicAllocation.enabled true
spark.shuffle.service.enabled true
spark.dynamicAllocation.minExecutors 2
spark.dynamicAllocation.maxExecutors 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Optimizing Data Locality and Network Efficiency&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;a) Using Cluster Placement Groups&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By placing EMR instances in &lt;strong&gt;cluster placement groups&lt;/strong&gt;, we minimize inter-node network latency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 create-placement-group --group-name spark-shuffle --strategy cluster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Attach this placement group when launching EC2 instances for the EMR cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;b) Using S3 for External Shuffle Storage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Amazon EMRFS allows shuffle data to be stored in &lt;strong&gt;Amazon S3&lt;/strong&gt; instead of local disks, reducing failure risks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spark.hadoop.fs.s3a.committer.magic.enabled true
spark.shuffle.io.preferDirectBufs true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This enables &lt;strong&gt;optimized shuffle writes to S3&lt;/strong&gt;, avoiding small file overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Monitoring and Auto-Scaling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;a) Enabling CloudWatch for Shuffle Monitoring&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Set up &lt;strong&gt;CloudWatch metrics&lt;/strong&gt; to monitor shuffle-related performance issues:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws cloudwatch put-metric-alarm \
    --alarm-name ShuffleDiskUsageHigh \
    --metric-name DiskReadBytes \
    --namespace AWS/EMR \
    --statistic Average --threshold 80 \
    --comparison-operator GreaterThanThreshold \
    --evaluation-periods 2 \
    --period 300 \
    --actions-enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This alerts when shuffle disk usage exceeds 80%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;b) Auto-Scaling EMR Cluster&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Configure auto-scaling to dynamically adjust the cluster size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws emr modify-instance-groups --cluster-id j-XXXXXX --instance-groups InstanceGroupId=ig-XXXXXX,InstanceCount=50
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;br&gt;
By leveraging AWS services like Amazon EMR, FSx for Lustre, and S3, we can design a highly scalable shuffle service that minimizes disk bottlenecks and improves network efficiency. This approach provides:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Cost savings&lt;/strong&gt; by dynamically scaling compute resources.&lt;br&gt;
&lt;strong&gt;- Improved performance&lt;/strong&gt; through data locality and optimized shuffle operations.&lt;br&gt;
&lt;strong&gt;- Resilience&lt;/strong&gt; by leveraging S3 for persistent storage instead of local disk failures.&lt;/p&gt;

&lt;p&gt;This AWS-native solution effectively addresses large-scale shuffle challenges faced in big data workloads. 🚀&lt;/p&gt;

&lt;p&gt;Would you implement this in your next Spark project? Let’s discuss in the comments! 👇&lt;/p&gt;

</description>
      <category>aws</category>
      <category>spark</category>
      <category>bigdata</category>
      <category>cloudnative</category>
    </item>
    <item>
      <title>Amazon Aurora DB Architecture at a glance.</title>
      <dc:creator>Chat Rathnayake</dc:creator>
      <pubDate>Wed, 05 Feb 2025 00:05:20 +0000</pubDate>
      <link>https://dev.to/chat_r/amazon-aurora-db-architecture-at-a-glance-mkk</link>
      <guid>https://dev.to/chat_r/amazon-aurora-db-architecture-at-a-glance-mkk</guid>
      <description>&lt;p&gt;Amazon Aurora is Amazon Web Services' (AWS) relational database service tailored for Online Transaction Processing (OLTP) workloads. Its architecture addresses the shift in performance bottlenecks from compute and storage to network constraints in high-throughput data processing. By offloading redo log processing to a multi-tenant, scale-out storage service, Aurora reduces network traffic and enhances performance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj7628shvmlf9fakm6ijf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj7628shvmlf9fakm6ijf.jpg" alt="Aurora Architecture Bird Eye View" width="474" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decoupling Compute and Storage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In traditional database systems, compute and storage are tightly integrated, leading to potential bottlenecks. Aurora decouples these components, allowing for independent scaling and improved resilience. This separation enables efficient handling of tasks such as replacing faulty hosts, adding replicas, and scaling database instances.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optimized Redo Log Processing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Aurora's architecture focuses on efficient redo log processing. By pushing redo processing to a distributed storage service, it minimizes network I/O operations. This approach not only reduces traffic but also allows for rapid crash recovery and seamless failovers without data loss.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07br1yuwkdcp7xu5y3km.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07br1yuwkdcp7xu5y3km.png" alt="Move logging and storage off the database engine" width="444" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quorum-Based Storage for Durability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To ensure data durability, Aurora employs a quorum-based storage system. Data is replicated across multiple nodes, and consensus is achieved through an efficient asynchronous protocol. This design ensures resilience against failures and maintains data integrity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eliminating Multi-Phase Synchronization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional databases often rely on multi-phase synchronization protocols like two-phase commit, which can introduce latency and complexity. Aurora eliminates the need for such protocols by leveraging its distributed storage architecture, resulting in faster transaction commits and reduced system complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lessons from Production Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Operating Aurora as a production service has provided valuable insights into modern cloud application requirements. Key takeaways include the importance of efficient metadata management for databases with numerous tables, the need to support high numbers of concurrent connections, and the necessity of facilitating frequent schema migrations with minimal downtime.&lt;br&gt;
amazon.science&lt;/p&gt;

&lt;p&gt;In summary, Amazon Aurora's design addresses the challenges of high-throughput, cloud-native relational databases by decoupling compute and storage, optimizing redo log processing, ensuring data durability through quorum-based storage, and eliminating the need for complex synchronization protocols.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Further Reading&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vogels, W., et al. (2022). Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases. &lt;a href="https://assets.amazon.science/dc/2b/4ef2b89649f9a393d37d3e042f4e/amazon-aurora-design-considerations-for-high-throughput-cloud-native-relational-databases.pdf" rel="noopener noreferrer"&gt;Amazon Science&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>aurora</category>
      <category>database</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
