Apache Spark: Read Data from S3 Bucket

January 7, 2020March 12, 2020 Divyansh JainAmazon, Analytics, Apache Spark, Big Data and Fast Data, Cloud, Database, ML, AI and Data Engineering, Spark, SQL, Studio-Scala, Tech BlogsAmazon S3, AWS, Big Data, Big Data Analytics, Big Data Storage, data analysis, fast data analytics1 Comment

Table of contents

Reading Time: < 1 minute

Amazon S3

Accessing S3 Bucket through Spark

Edit spark-default.conf file
You need to add below 3 lines consists of your S3 access key, secret key & file system

spark.hadoop.fs.s3a.access.key "s3keys"
spark.hadoop.fs.s3a.secret.key "yourkey"
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem

./spark-shell --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3

spark.read.parquet("S3 Bucket URL")

spark.read.parquet("s3a://your_path_to_bucket/")

Also published on Medium.

Written by Divyansh Jain

Divyansh Jain is a Software Consultant with experience of 1 years. He has a deep understanding of Big Data Technologies, Hadoop, Spark, Tableau & also in Web Development. He is an amazing team player with self-learning skills and a self-motivated professional. He also worked as Freelance Web Developer. He loves to play & explore with Real-time problems, Big Data. In his leisure time, he prefers doing LAN Gaming & watch movies.