DEV Community

Cover image for Differences between primary, core and task nodes in Amazon EMR cluster
Nowsath for AWS Community Builders

Posted on

9

Differences between primary, core and task nodes in Amazon EMR cluster

The key differences between primary, core, and task nodes in an Amazon EMR cluster are:

Primary Node (also known as Master Node):

  • The primary node is responsible for coordinating the cluster and managing the execution of jobs.
  • It runs the main Hadoop services, such as the JobTracker, NameNode, and ResourceManager.
  • There is only one primary node in an EMR cluster.
  • The primary node cannot be terminated during the lifetime of the cluster, as it is essential for the cluster's operation.

Core Nodes:

  • Core nodes host the Hadoop Distributed File System (HDFS) and run the DataNode and TaskTracker services.
  • They are responsible for storing and processing data in the cluster.
  • Core nodes cannot be removed from the cluster without risking data loss, as they contain the persistent data in HDFS.
  • You should reserve core nodes for the capacity that is required until your cluster completes.

Task Nodes:

  • Task nodes are used for running tasks and do not host HDFS. They can be added or removed from the cluster as needed, without the risk of data loss.
  • Task nodes are ideal for handling temporary or burst workloads, as you can launch task instance fleets on Spot Instances to increase capacity while minimizing costs.
  • The cluster will never scale below the minimum constraints set in the managed scaling policy.

Here's a table summarizing the key differences:

EMR nodes summary


More details regarding,

  1. Selecting and deploying an Amazon EMR cluster: click here
  2. Estimating Amazon EMR cluster capacity: click here

Reinvent your career. Join DEV.

It takes one minute and is worth it for your career.

Get started

Top comments (0)

Create a simple OTP system with AWS Serverless cover image

Create a simple OTP system with AWS Serverless

Implement a One Time Password (OTP) system with AWS Serverless services including Lambda, API Gateway, DynamoDB, Simple Email Service (SES), and Amplify Web Hosting using VueJS for the frontend.

Read full post

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay