Amazon Redshift
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is designed to handle large datasets and perform complex queries on structured data. Redshift is based on PostgreSQL, and it is optimized for high-performance analysis and reporting.
Weld supports Redshift Serverless and Redshift Provisioned clusters.
For provisioned clusters use the Redshift Cluster integration and follow this guide.
Prerequisites
- An active AWS account with access to Redshift, S3 and IAM roles/policies
- An active Redshift Serverless deployment
- An Amazon S3 bucket used for data loading
Setup guide
Part 1 - Set up S3 bucket
-
Create an S3 bucket where Weld will upload the data in GZIP NDJSON format before loading it into Redshift
Make sure that the bucket is not publicly accessible.
Weld will not expire the items in the bucket by default. If you would like to set up lifecycle rules to automatically delete old data, you can do so via the S3 console. This will only remove raw data from the bucket, it will not delete the data from Redshift.
You can also use this S3 bucket as a data lake as it will always have the raw data available. An example Redshift query that can be used to copy data from the S3 bucket to Redshift is:
COPY "{{SCHEMA_NAME}}"."{{TABLE_NAME}}" FROM '{{S3_PATH}}' iam_role '{{IAM_ROLE_ARN}}' GZIP JSON as 'auto ignorecase' dateformat 'auto'Where
{{IAM_ROLE_ARN}}is the ARN of the IAM role that you will set up in Part 2.
Part 2 - IAM Setup
-
Step 1 - Create an IAM policy with the following permissions (edit in JSON mode)
These S3 permissions ensure that Weld is able to upload data to S3 and Redshift is able to read from S3. The Redshift Serverless permissions ensure that Weld can get temporary credentials for Redshift Serverless. Weld only supports temporary credentials for Redshift as this is the most secure way to access Redshift.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3Access", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:GetEncryptionConfiguration", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUpload", "s3:ListBucket", "s3:DeleteObject", "s3:GetBucketLocation", "s3:ListMultipartUploadParts" ], "Resource": [ "arn:aws:s3:::{{YOUR_S3_BUCKET_NAME}}/*", "arn:aws:s3:::{{YOUR_S3_BUCKET_NAME}}" ] }, { "Sid": "redshiftServerlessAccess", "Effect": "Allow", "Action": [ "redshift-serverless:GetCredentials" ], "Resource": [ "{{YOUR_REDSHIFT_SERVERLESS_WORKGROUP_ARN}}" ] } ] } -
Step 2 - Create an IAM role and add the following trust relationships to ensure that the Weld AWS account (887019001183) and Redshift can assume the role:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::887019001183:root" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "{{EXTERNAL_ID}}" } } }, { "Effect": "Allow", "Principal": { "Service": "redshift.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }For
{{EXTERNAL_ID}}you can use any string. You will need to use the same string in the Weld UI when configuring the Redshift connection. -
Step 3 - Attach IAM policy to IAM role
On the IAM role page, click on
Add permissionsthen selectAttach policies. Search for the policy that you created in Part 2 - Step 1.
Part 3 - Redshift Setup
- Step 1 - Associate the IAM role with Redshift
- Open the Redshift namespace, and navigate to the
Security and Encryptiontab. - Click on
Manage IAM Roles - Click on
Associate IAM Rolesand select the role created in the previous steps
- Open the Redshift namespace, and navigate to the
For the following steps, you will need to be able to run queries against Redshift with admin permissions. You can do this either by opening the built in Query Editor in the Redshift console, or via a 3rd party database management tool.
-
Step 2 - Create database
CREATE DATABASE "weld"; -
Step 3 - Create user for database:
CREATE USER "IAMR:{{IAM_ROLE_NAME}}" PASSWORD DISABLE; -
Step 4 - Grant permissions to user:
GRANT ALL PRIVILEGES ON DATABASE "weld" TO "IAMR:{{IAM_ROLE_NAME}}";
Checklist
Ensure that you have completed the following steps before proceeding to the next part:
- Create S3 bucket
- Create IAM policy with S3 and Redshift permissions
- Create IAM role with trust relationships for Weld and Redshift
- Attach IAM policy to IAM role
- Associate IAM role with Redshift
- Create database and IAMR user
- Grant all privileges for database to IAMR user
Part 4 - Weld Setup
- Fill out the details in the Weld UI for the Redshift data warehouse connection
- For the host and port, use the
Endpointfrom your Redshift Workgroup details. Note that theEndpointwill be in the format{{HOST}}:{{PORT}}/{{DATABASE}} - For the database, use the Database that you created in Part 3 - Step 1. If you didn't change the name, it will be called
weld - For
External IDyou can use any string, but ensure that it is the same as the one in the IAM Role's trust relationship. - For the
S3 Bucket, input your the name of the S3 bucket that you created in Part 1.
- For the host and port, use the
- If using SSH, enable it and configure the SSH credentials
- Click Connect