DEV Community

Johny
Johny

Posted on

Dynamic Allocation Issues On Spark 2.4.8 (Possible Issue with External Shuffle Service?)

Hey Team,

I am having some issue with dynamic Allocation for spark 2.4.8. I have setup a cluster using your clemlab distribution (https://www.clemlab.com/) . Spark jobs are now running fine. The issue is when I try to use dynamicAllocation options. I am thinking the problems could be due to External Shuffle Service but I feel like it should be setup properly from what I have.

From the resource manager logs we can see that the container goes from ACQUIRED to RELEASED resources which is weird. It does not go to RUNNING state.

I am out of ideas at this point how to make the dynamic Allocation work. So I am turning to you in hope that you may have some insight in the matter.

There are no issues if I do not use dynamic Allocation and spark jobs work just fine but I really want to make dynamic allocation work.

Thank you for the assistance and apologies for the long message but just wanted to supply all details possible.

Here are setting I have in ambari related to it:

Yarn:

Image description

Checking the directories here I can find necessary jar on all nodemanager hosts in the right directory:
/usr/odp/1.2.2.0-138/spark2/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar
/usr/odp/current/spark2-cient/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar ( I believe there is symbolic link to the above jar)

Spark2:

Image description

 In the spark log I can see this message continuously spamming:

24/10/13 16:38:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:46 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:01 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Speedy emails, satisfied customers

Postmark Image

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (1)

Collapse
 
mr_boom_boom profile image
Johny

I found the issue in one of the container logs:
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
It looks like when using dynamic allocation it was looking for spark_shuffle while I had spark2_shuffle.
Also need to adjust these settings in ambari:

yarn.nodemanager.aux-services: spark_shuffle (remove spark2_shuffle if it is present)
yarn.nodemanager.aux-services.spark_shuffle.classpath: {{stack_root}}/current/spark2-client/yarn/* (point to correct jar location

yarn.nodemanager.aux-services.spark_shuffle.class: org.apache.spark.network.yarn.YarnShuffleService
And that fixed the issue, now I can use dynamic allocation.

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay