DEV Community

Johny
Johny

Posted on

Dynamic Allocation Issues On Spark 2.4.8 (Possible Issue with External Shuffle Service?)

Hey Team,

I am having some issue with dynamic Allocation for spark 2.4.8. I have setup a cluster using your clemlab distribution (https://www.clemlab.com/) . Spark jobs are now running fine. The issue is when I try to use dynamicAllocation options. I am thinking the problems could be due to External Shuffle Service but I feel like it should be setup properly from what I have.

From the resource manager logs we can see that the container goes from ACQUIRED to RELEASED resources which is weird. It does not go to RUNNING state.

I am out of ideas at this point how to make the dynamic Allocation work. So I am turning to you in hope that you may have some insight in the matter.

There are no issues if I do not use dynamic Allocation and spark jobs work just fine but I really want to make dynamic allocation work.

Thank you for the assistance and apologies for the long message but just wanted to supply all details possible.

Here are setting I have in ambari related to it:

Yarn:

Image description

Checking the directories here I can find necessary jar on all nodemanager hosts in the right directory:
/usr/odp/1.2.2.0-138/spark2/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar
/usr/odp/current/spark2-cient/yarn/spark-2.4.8.1.2.2.0-138-yarn-shuffle.jar ( I believe there is symbolic link to the above jar)

Spark2:

Image description

 In the spark log I can see this message continuously spamming:

24/10/13 16:38:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:38:46 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:01 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:16 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
24/10/13 16:39:31 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (1)

Collapse
 
mr_boom_boom profile image
Johny

I found the issue in one of the container logs:
Caused by: org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:spark_shuffle does not exist
It looks like when using dynamic allocation it was looking for spark_shuffle while I had spark2_shuffle.
Also need to adjust these settings in ambari:

yarn.nodemanager.aux-services: spark_shuffle (remove spark2_shuffle if it is present)
yarn.nodemanager.aux-services.spark_shuffle.classpath: {{stack_root}}/current/spark2-client/yarn/* (point to correct jar location

yarn.nodemanager.aux-services.spark_shuffle.class: org.apache.spark.network.yarn.YarnShuffleService
And that fixed the issue, now I can use dynamic allocation.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more