DEV Community

Deep Dive on Amazon Elastic MapReduce Service Platform with Amazon EC2 Instance

“ I have checked the documents of AWS to get into deep dive on amazon elastic mapreduce service platform with amazon ec2 instance. In terms of cost, need to pay for emr service, amount of storage and data transferred in and out of service for s3 bucket, amazon ec2 instance.”

Amazon Elastic MapReduce is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service and Amazon Dynamodb.

In this post, you will experience how to deep dive on amazon elastic mapreduce service platform with amazon ec2 instance. Here I have created an amazon emr service cluster with iam roles, key pair and s3 bucket.

Architecture Overview

Image description
The architecture diagram shows the overall deployment architecture with data flow, amazon emr service, s3 bucket, iam service role, ec2 instances.

Solution overview

The blog post consists of the following phases:

  1. Create of Amazon EMR Service Cluster with Required Configurations

  2. Output of Emr Cluster as Submit of Spark Application as a Step Option

Phase 1: Create of Amazon EMR Service Cluster with Required Configurations

  1. Create a key pair, iam roles and s3 bucket with required data. Open the console of Amazon emr service, create a cluster with amazon emr running on amazon ec2 option. Specify the cluster name as Emr Cluster and choose the required parameters as choice of application, instance type, ebs volume, networking, cluster logs s3 location, key pair, emr service role, ec2 instance profile for emr role.

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Phase 2: Output of Emr Cluster as Submit of Spark Application as a Step Option

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Image description

Clean-up

Delete of Amazon EMR Cluster, IAM Roles, S3 Bucket, Key Pair.

Pricing

I review the pricing and estimated cost of this example.

Cost of Amazon Elastic MapReduce service = $0.048 per hour for EMR m5.xlarge = $(0.048x1.086) = $0.05

Cost of Amazon Elastic Compute Cloud = $0.192 per On Demand Linux m5.xlarge Instance Hour = $(0.192x1.334) = $0.26

Cost of Amazon Simple Storage Service = $0.0

Total Cost = $0.31

Summary

In this post, I showed “how to deep dive on amazon elastic mapreduce service platform with amazon ec2 instance”.

For more details on Amazon EMR Service, Checkout Get started Amazon EMR Service, open the Amazon EMR Service console. To learn more, read the Amazon EMR Service documentation.

Thanks for reading!

Connect with me: Linkedin
Image description

Top comments (0)