DEV Community

Discussion on: Creating a Spark Standalone Cluster with Docker and docker-compose(2021 update)

Collapse
 
sunmoon4ever profile image
sunmoon4ever

Spark job (Scala/s3) worked fine for few runs in stand-alone cluster with spark-submit but after few run it started giving the below error. There were no changes to code, it is making connection to spark-master but immediately application is getting killed with the reason “All masters are unresponsive! Giving up”.

Error

22/03/20 05:33:39 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
22/03/20 05:33:39 INFO TransportClientFactory: Successfully created connection to spark-master/xx.x.x.xxx:7077 after 42 ms (0 ms spent in bootstraps)
22/03/20 05:33:59 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
22/03/20 05:34:19 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
22/03/20 05:34:39 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
22/03/20 05:34:39 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
22/03/20 05:34:39 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33139.
22/03/20 05:34:39 INFO NettyBlockTransferService: Server created on a1326e4ae4bb:33139
22/03/20 05:34:39 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
22/03/20 05:34:39 INFO SparkUI: Stopped Spark web UI at xxxxxxxxxxxxx:4040
22/03/20 05:34:39 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, a1326e4ae4bb, 33139, None)
22/03/20 05:34:39 INFO StandaloneSchedulerBackend: Shutting down all executors
22/03/20 05:34:39 INFO BlockManagerMasterEndpoint: Registering block manager a1326e4ae4bb:33139 with 1168.8 MiB RAM, BlockManagerId(driver, a1326e4ae4bb, 33139, None)
22/03/20 05:34:39 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
22/03/20 05:34:39 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, a1326e4ae4bb, 33139, None)
22/03/20 05:34:39 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, a1326e4ae4bb, 33139, None)
22/03/20 05:34:39 WARN StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
22/03/20 05:34:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/03/20 05:34:39 INFO MemoryStore: MemoryStore cleared
22/03/20 05:34:39 INFO BlockManager: BlockManager stopped
22/03/20 05:34:39 INFO BlockManagerMaster: BlockManagerMaster stopped
22/03/20 05:34:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/03/20 05:34:40 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)

Collapse
 
purnima1612 profile image
purnima1612

I am unable to connect spark worker with master getting below mentioned error :
22/07/14 01:07:56 INFO Worker: Started daemon with process name: 557@spark-worker
22/07/14 01:07:56 INFO SignalUtils: Registering signal handler for TERM
22/07/14 01:07:56 INFO SignalUtils: Registering signal handler for HUP
22/07/14 01:07:56 INFO SignalUtils: Registering signal handler for INT
22/07/14 01:07:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/07/14 01:07:56 INFO SecurityManager: Changing view acls to: glue_user
22/07/14 01:07:56 INFO SecurityManager: Changing modify acls to: glue_user
22/07/14 01:07:56 INFO SecurityManager: Changing view acls groups to:
22/07/14 01:07:56 INFO SecurityManager: Changing modify acls groups to:
22/07/14 01:07:56 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(glue_user); groups with view permissions: Set(); users with modify permissions: Set(glue_user); groups with modify permissions: Set()
22/07/14 01:07:56 INFO Utils: Successfully started service 'sparkWorker' on port 40843.
22/07/14 01:07:56 INFO Worker: Worker decommissioning not enabled, SIGPWR will result in exiting.
22/07/14 01:07:56 INFO Worker: Starting Spark worker 172.24.0.3:40843 with 8 cores, 23.8 GiB RAM
22/07/14 01:07:56 INFO Worker: Running Spark version 3.1.1-amzn-0
22/07/14 01:07:56 INFO Worker: Spark home: /home/glue_user/spark
22/07/14 01:07:56 INFO ResourceUtils: ==============================================================
22/07/14 01:07:56 INFO ResourceUtils: No custom resources configured for spark.worker.
22/07/14 01:07:56 INFO ResourceUtils: ==============================================================
22/07/14 01:07:57 INFO log: Logging initialized @1340ms to org.sparkproject.jetty.util.log.Slf4jLog
22/07/14 01:07:57 INFO Server: jetty-9.4.37.v20210219; built: 2021-02-19T15:16:47.689Z; git: 27afab2bd37780d179836e313e0fe11bc4fa0ce9; jvm 1.8.0_322-b06
22/07/14 01:07:57 INFO Server: Started @1447ms
22/07/14 01:07:57 INFO AbstractConnector: Started ServerConnector@582f7291{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
22/07/14 01:07:57 INFO Utils: Successfully started service 'WorkerUI' on port 8080.
22/07/14 01:07:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@73ee79ce{/logPage,null,AVAILABLE,@Spark}
22/07/14 01:07:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6d7ea67a{/logPage/json,null,AVAILABLE,@Spark}
22/07/14 01:07:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@54bf8430{/,null,AVAILABLE,@Spark}
22/07/14 01:07:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@20de6297{/json,null,AVAILABLE,@Spark}
22/07/14 01:07:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@118e7b91{/static,null,AVAILABLE,@Spark}
22/07/14 01:07:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7e83cf73{/log,null,AVAILABLE,@Spark}
22/07/14 01:07:57 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://spark-worker:8080
22/07/14 01:07:57 INFO Worker: Connecting to master spark-master:7077...
22/07/14 01:07:57 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@394bb5d9{/metrics/json,null,AVAILABLE,@Spark}
22/07/14 01:07:57 ERROR TransportClientFactory: Exception while bootstrapping client after 169 ms
java.lang.RuntimeException: java.lang.IllegalArgumentException: Authentication failed.
at org.apache.spark.network.crypto.AuthRpcHandler.doAuthChallenge(AuthRpcHandler.java:125)