Apache Spark: Read Data from S3 Bucket

Reading Time: < 1 minute

Amazon S3

Accessing S3 Bucket through Spark

  1. Edit spark-default.conf file
    You need to add below 3 lines consists of your S3 access key, secret key & file system
spark.hadoop.fs.s3a.access.key "s3keys"
spark.hadoop.fs.s3a.secret.key "yourkey"
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem

./spark-shell --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3

spark.read.parquet("S3 Bucket URL")

spark.read.parquet("s3a://your_path_to_bucket/")


Also published on Medium.

Written by 

Divyansh Jain is a Software Consultant with experience of 1 years. He has a deep understanding of Big Data Technologies, Hadoop, Spark, Tableau & also in Web Development. He is an amazing team player with self-learning skills and a self-motivated professional. He also worked as Freelance Web Developer. He loves to play & explore with Real-time problems, Big Data. In his leisure time, he prefers doing LAN Gaming & watch movies.

1 thought on “Apache Spark: Read Data from S3 Bucket1 min read

Comments are closed.

Discover more from Knoldus Blogs

Subscribe now to keep reading and get access to the full archive.

Continue reading