DEV Community

Michael Staszel
Michael Staszel

Posted on • Originally published at mikestaszel.com on

S3A on Spark 3.3 in 2023

Updating my post from almost 3 years ago! The world has moved on to Spark 3.3, and so have the necessary JARs you will need to access S3 from Spark.

Run these commands to download JARs for Spark 3.3.2:

wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.426/aws-java-sdk-bundle-1.12.426.jar -P $SPARK_HOME/jars/
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.2/hadoop-aws-3.3.2.jar -P $SPARK_HOME/jars/

Enter fullscreen mode Exit fullscreen mode

That’s all there is to it. The s3a:// prefix should work now for reading and writing data using Spark 3.3.2.

Top comments (0)