DEV Community

1

Amazon EMR Summary

What is Amazon EMR

EMR stands for Elastic MapReduce and EMR helps you create Hadoop clusters for doing big data on AWS.

This allows you to analyze and process vast amount of data.
Anytime you would see anything related to big data clusters with Hadoop clusters, then think Amazon EMR.

These clusters have to be provisioned and they can be made of hundreds of EC2 instances.

Why would you use EMR?

EMR comes bundled with a lot of tools that big data specialist use.
For example, Apache Spark, or HBase, or Presto, or Apache Flink.
They're very difficult to set up, so Amazon EMR will take care of all the provisioning and the configuration of these services for you.

You can also auto-scale your entire cluster and it's integrated with spot instances for you to benefit from price reductions.

The use cases of Amazon EMR

You can use Amazon EMR for Data Processing, doing Machine Learning, Web Indexing and Big Data, but all of them using big data related technologies such as Hadoop, Spark, HBase Preso Flink, and so on.

Amazon EMR Components

Amazon EMR is made of clusters of EC2 instances and you have different kind of nodes.

Master Node
The Master Node manages the cluster, it will coordinate and manage the health of all your other nodes, and it must be long running.

Core Node
they're here to run tasks and also store data, and they must be long running as well.

Task Node
which is there just to run tasks. Usually you can take spot instances for it and using task node is optional.

GitHub
LinkedIn
Facebook
Medium

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (1)

Collapse
 
nowsathk profile image
Nowsath

I have compiled a list of common errors and their corresponding solutions encountered during the setup of EMR Clusters with DynamoDB.

Check here

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay