How To Connect AWS Redshift with S3

Table of contents

Reading Time: 5 minutes

Hello Readers ! Today in this blog we’ll see How To Connect AWS Redshift with S3 . firstly we”ll go through the introduction about what is AWS Redshift , about S3 and about how we can do that . Stick with me till end to learn something interesting about Redshift and I hope you all will find helpful for you as well .

Let’s get start

INTRODUCTION

AWS Redshift

Redshift is a totally manage, petabyte-scale information warehouse provider with inside the cloud. You can begin with only a few hundred gigabytes of information and scale to a petabyte or more.

To create a information warehouse is to release a hard and fast of nodes. It is refer to as an Amazon Redshift cluster. After you provision your cluster, you could add your information set after which carry out information evaluation queries.

Though the scale of the information set, Amazon Redshift gives rapid question overall performance. The use of the equal SQL-primarily based totally gear and commercial enterprise intelligence programs which you use today.

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an item garage carrier that gives industry-main scalability, facts availability, security, and performance. It also presents control functions so you can optimise, organise, and configure get admission to in your facts to fulfil your unique business, organisational, and compliance requirements.

Connect AWS Redshift with S3

Though we will see how AWS Redshift will be connecting with S3 to handle data that is in S3. So follow the below steps to do so :

Step 1: Create Redshift cluster

Login into your AWS Console ,choose service as AWS Redshift, choose the option to create a cluster.Though creating a cluster like this :

Now here you see , We will be able to choose node_type, number_of_nodes, and database configurations (Admin username, admin password) as:

Click on the Create Cluster Option to create a Redshift Cluster, when it shows available.

Step 2: Create IAM Role

Once the Redshift cluster is available, IAM role has to be created which provides the connection between Redshift and S3.

Choose the AWS services IAM in AWS console.

The steps to create IAM roles are as follows:

Choose the option AWS service to access all the AWS services from Redshift. Then choose the use case as Redshift.

Then choose the permission policy, If you want both read and write access to S3, we can choose AmazonS3FullAccess policy.

As a final step, please create the role name .

Step 3: Associating IAM role with Redshift

Here, we need to associate IAM role with Redshift. In the create Redshift cluster, under actions, there is an option as manage IAM role. There we can associate the created role by giving the role name.

Step 4: Connecting with query editor

In Redshift dashboard, we have query editor which is used to query the table. Similarly to access the query editor, connect to the database created when we launched Redshift cluster.

While connecting to database give the database name , along with username as :

Once you will click on the Connect button you will see the window like this as the database is connected :

Step 5: Creating table in Query editor

create the table by specifying the columns which are available in our input file.

Run the query in Query editor to create the table name orders like this :

Create S3 Bucket and upload txt file in that :

Once you have select create bucket option you will see , your bucket is created like this :

Upload .txt file into that S3 Bucket as:

Step 6 : Copy S3 data to Redshift

We are at the final step. Now we have to copy data from S3 to Redshift use the following command:

copy Orders from 's3://deeku-redshift/redshift file.txt' iam_role ' ' region

 'us-east-1' IGNOREHEADER 1;

In copy statement,

iam_role should be pass as a parameter, it can be take for IAM dashboard.
region should be mandatory pass in copy statement.Hope it gives you more insights on AWS Redshift, IAM and S3.

We can specify the other two parameters as options,

If the file has any delimiter, we can specify the delimiter as parameter.
If you want to ignore the header present in the file while loading, then use the parameter IGNOREHEADER.