DEV Community

Cover image for Differences between primary, core and task nodes in Amazon EMR cluster
Nowsath for AWS Community Builders

Posted on

Differences between primary, core and task nodes in Amazon EMR cluster

The key differences between primary, core, and task nodes in an Amazon EMR cluster are:

Primary Node (also known as Master Node):

  • The primary node is responsible for coordinating the cluster and managing the execution of jobs.
  • It runs the main Hadoop services, such as the JobTracker, NameNode, and ResourceManager.
  • There is only one primary node in an EMR cluster.
  • The primary node cannot be terminated during the lifetime of the cluster, as it is essential for the cluster's operation.

Core Nodes:

  • Core nodes host the Hadoop Distributed File System (HDFS) and run the DataNode and TaskTracker services.
  • They are responsible for storing and processing data in the cluster.
  • Core nodes cannot be removed from the cluster without risking data loss, as they contain the persistent data in HDFS.
  • You should reserve core nodes for the capacity that is required until your cluster completes.

Task Nodes:

  • Task nodes are used for running tasks and do not host HDFS. They can be added or removed from the cluster as needed, without the risk of data loss.
  • Task nodes are ideal for handling temporary or burst workloads, as you can launch task instance fleets on Spot Instances to increase capacity while minimizing costs.
  • The cluster will never scale below the minimum constraints set in the managed scaling policy.

Here's a table summarizing the key differences:

EMR nodes summary


More details regarding,

  1. Selecting and deploying an Amazon EMR cluster: click here
  2. Estimating Amazon EMR cluster capacity: click here

Top comments (0)