Amazon Redshift - Provisioned clusters

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is designed to handle large datasets and perform complex queries on structured data. Redshift is based on PostgreSQL, and it is optimized for high-performance analysis and reporting.

Prerequisites

  1. An active AWS account with access to Redshift, S3 and IAM roles/policies
  2. An active Redshift Provisioned cluster deployment
  3. An Amazon S3 bucket used for data loading

Setup guide

Part 1 - Set up S3 bucket

  • Create an S3 bucket where Weld will upload the data in GZIP CSV format before loading it into Redshift

Part 2 - Redshift Setup

For the following steps, you will need to be able to run queries against Redshift with admin permissions. You can do this either by opening the built in Query Editor in the Redshift console, or via a 3rd party database management tool.

  • Step 1 - Create database

CREATE DATABASE "weld";

  • Step 2 - Create user for database:

CREATE USER "weld_user" PASSWORD DISABLE;

  • Step 3 - Grant permissions to user:

GRANT ALL PRIVILEGES ON DATABASE "weld" TO "weld_user";

Note: You can use any database name and username, but you will need to ensure that the IAM role is set up correctly in the next step.

Part 3 - IAM Setup

  • Step 1 - Create an IAM policy with the following permissions (edit in JSON mode):
{
	"Version": "2012-10-17",
	"Statement": [
  {
  	"Sid": "s3Access",
  	"Effect": "Allow",
  	"Action": [
  		"s3:PutObject",
  		"s3:GetObject",
  		"s3:GetEncryptionConfiguration",
  		"s3:ListBucketMultipartUploads",
  		"s3:AbortMultipartUpload",
  		"s3:ListBucket",
  		"s3:DeleteObject",
  		"s3:GetBucketLocation",
  		"s3:ListMultipartUploadParts"
  	],
  	"Resource": [
  		"arn:aws:s3:::{{YOUR_S3_BUCKET_NAME}}/*",
  		"arn:aws:s3:::{{YOUR_S3_BUCKET_NAME}}"
  	]
  },
  {
  	"Sid": "redshiftAccess",
  	"Effect": "Allow",
  	"Action": [
  		"redshift:GetClusterCredentials"
  	],
  	"Resource": [
  		"{{REDSHIFT_USER_ARN}}",
  		"{{REDSHIFT_DB_ARN}}"
  	]
  },
 ]
}

Note: {{REDSHIFT_USER_ARN}} and {{REDSHIFT_DB_ARN}} are the ARNs of the Redshift user and database respectively. These cannot be found in the Redshift console, but can be easily constructed:

arn:aws:redshift:{{AWS_REGION}}:{{AWS_ACCOUNT_ID}}:dbuser:{{REDSHIFT_CLUSTER_NAME}}/{{REDSHIFT_USERNAME}}

arn:aws:redshift:{{AWS_REGION}}:{{AWS_ACCOUNT_ID}}:dbname:{{REDSHIFT_CLUSTER_NAME}}/{{REDSHIFT_DATABASE_NAME}}

  • Step 2 - Create an IAM role and add the following trust relationships to ensure that Weld and Redshift can assume the role:

    {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Effect": "Allow",
              "Principal": {
                  "AWS": "arn:aws:iam::887019001183:root"
              },
              "Action": "sts:AssumeRole",
              "Condition": {
                  "StringEquals": {
                      "sts:ExternalId": "{{YOUR_CHOSEN_EXTERNAL_ID}}"
                  }
              }
          },
          {
              "Effect": "Allow",
              "Principal": {
                  "Service": "redshift.amazonaws.com"
              },
              "Action": "sts:AssumeRole"
          }
      ]
    }
    
  • Step 3 - Attach IAM policy to IAM role

Part 4 - IAM role association with Redshift

  • Open the Redshift cluster, and click on the Actions button.
  • Click on Manage IAM Roles
  • Select the previously created role from the dropdown and press Associate IAM Role. After this, save the changes.

Part 4 - Weld Setup

  • Fill out the details in the Weld UI for the Redshift data warehouse connection
    • For the host and port, use the Endpoint from your Redshift cluster details. Note that the Endpoint will be in the format {{HOST}}:{{PORT}}/{{DATABASE}}
    • For the database, use the Database that you created in Part 2 - Step 1. If you didn't change the name, it will be called weld
    • For the user, use the username that you created in Part 2 - Step 2. If you didn't change the name, it will be called weld_user
    • For External ID you can use any string, but ensure that it is the same as the one in the IAM Role's trust relationship.
    • For the S3 Bucket, input your the name of the S3 bucket that you created in Part 1.
  • If using SSH, enable it and configure the SSH credentials
  • Click Connect

Was this page helpful?