Amazon Redshift

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is designed to handle large datasets and perform complex queries on structured data. Redshift is based on PostgreSQL, and it is optimized for high-performance analysis and reporting.

Prerequisites

  1. An active AWS account with access to Redshift, S3 and IAM roles/policies
  2. An active Redshift Serverless deployment
  3. An Amazon S3 bucket used for data loading

Setup guide

Part 1 - Set up S3 bucket

  • Create an S3 bucket where Weld will upload the data in GZIP NDJSON format before loading it into Redshift

    Make sure that the bucket is not publicly accessible.

    Weld will not expire the items in the bucket by default. If you would like to set up lifecycle rules to automatically delete old data, you can do so via the S3 console. This will only remove raw data from the bucket, it will not delete the data from Redshift.

    You can also use this S3 bucket as a data lake as it will always have the raw data available. An example Redshift query that can be used to copy data from the S3 bucket to Redshift is:

    COPY "{{SCHEMA_NAME}}"."{{TABLE_NAME}}" FROM '{{S3_PATH}}' iam_role '{{IAM_ROLE_ARN}}' GZIP JSON as 'auto ignorecase' dateformat 'auto'

    Where {{IAM_ROLE_ARN}} is the ARN of the IAM role that you will set up in Part 2.

Part 2 - IAM Setup

  • Step 1 - Create an IAM policy with the following permissions (edit in JSON mode)

    These S3 permissions ensure that Weld is able to upload data to S3 and Redshift is able to read from S3. The Redshift Serverless permissions ensure that Weld can get temporary credentials for Redshift Serverless. Weld only supports temporary credentials for Redshift as this is the most secure way to access Redshift.

    {
      "Version": "2012-10-17",
      "Statement": [
      	  {
      	  	"Sid": "s3Access",
      	  	"Effect": "Allow",
      	  	"Action": [
      	  		"s3:PutObject",
      	  		"s3:GetObject",
      	  		"s3:GetEncryptionConfiguration",
      	  		"s3:ListBucketMultipartUploads",
      	  		"s3:AbortMultipartUpload",
      	  		"s3:ListBucket",
      	  		"s3:DeleteObject",
      	  		"s3:GetBucketLocation",
      	  		"s3:ListMultipartUploadParts"
      	  	],
      	  	"Resource": [
      	  		"arn:aws:s3:::{{YOUR_S3_BUCKET_NAME}}/*",
      	  		"arn:aws:s3:::{{YOUR_S3_BUCKET_NAME}}"
      	  	]
      	  },
      	  {
      	  	"Sid": "redshiftServerlessAccess",
      	  	"Effect": "Allow",
      	  	"Action": [
      	  		"redshift-serverless:GetCredentials"
      	  	],
      	  	"Resource": [
      	  		"{{YOUR_REDSHIFT_SERVERLESS_WORKGROUP_ARN}}"
      	  	]
      	  }
        ]
    }
    
  • Step 2 - Create an IAM role and add the following trust relationships to ensure that the Weld AWS account (887019001183) and Redshift can assume the role:

    {
      "Version": "2012-10-17",
      "Statement": [
          {
              "Effect": "Allow",
              "Principal": {
                  "AWS": "arn:aws:iam::887019001183:root"
              },
              "Action": "sts:AssumeRole",
              "Condition": {
                  "StringEquals": {
                      "sts:ExternalId": "{{EXTERNAL_ID}}"
                  }
              }
          },
          {
              "Effect": "Allow",
              "Principal": {
                  "Service": "redshift.amazonaws.com"
              },
              "Action": "sts:AssumeRole"
          }
      ]
    }
    

    For {{EXTERNAL_ID}} you can use any string. You will need to use the same string in the Weld UI when configuring the Redshift connection.

  • Step 3 - Attach IAM policy to IAM role

    On the IAM role page, click on Add permissions then select Attach policies. Search for the policy that you created in Part 2 - Step 1.

Part 3 - Redshift Setup

  • Step 1 - Associate the IAM role with Redshift
    • Open the Redshift namespace, and navigate to the Security and Encryption tab.
    • Click on Manage IAM Roles
    • Click on Associate IAM Roles and select the role created in the previous steps

For the following steps, you will need to be able to run queries against Redshift with admin permissions. You can do this either by opening the built in Query Editor in the Redshift console, or via a 3rd party database management tool.

  • Step 2 - Create database

    CREATE DATABASE "weld";

  • Step 3 - Create user for database:

    CREATE USER "IAMR:{{IAM_ROLE_NAME}}" PASSWORD DISABLE;

  • Step 4 - Grant permissions to user:

    GRANT ALL PRIVILEGES ON DATABASE "weld" TO "IAMR:{{IAM_ROLE_NAME}}";

Checklist

Ensure that you have completed the following steps before proceeding to the next part:

  • Create S3 bucket
  • Create IAM policy with S3 and Redshift permissions
  • Create IAM role with trust relationships for Weld and Redshift
  • Attach IAM policy to IAM role
  • Associate IAM role with Redshift
  • Create database and IAMR user
  • Grant all privileges for database to IAMR user

Part 4 - Weld Setup

  • Fill out the details in the Weld UI for the Redshift data warehouse connection
    • For the host and port, use the Endpoint from your Redshift Workgroup details. Note that the Endpoint will be in the format {{HOST}}:{{PORT}}/{{DATABASE}}
    • For the database, use the Database that you created in Part 3 - Step 1. If you didn't change the name, it will be called weld
    • For External ID you can use any string, but ensure that it is the same as the one in the IAM Role's trust relationship.
    • For the S3 Bucket, input your the name of the S3 bucket that you created in Part 1.
  • If using SSH, enable it and configure the SSH credentials
  • Click Connect

Was this page helpful?