Amazon Redshift - Provisioned clusters
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It is designed to handle large datasets and perform complex queries on structured data. Redshift is based on PostgreSQL, and it is optimized for high-performance analysis and reporting.
Weld supports Redshift Serverless and Redshift Provisioned clusters.
For Redshift Serverless use the Redshift integration and follow this guide.
Prerequisites
- An active AWS account with access to Redshift, S3 and IAM roles/policies
- An active Redshift Provisioned cluster deployment
- An Amazon S3 bucket used for data loading
Setup guide
Part 1 - Set up S3 bucket
- Create an S3 bucket where Weld will upload the data in GZIP CSV format before loading it into Redshift
Part 2 - Redshift Setup
For the following steps, you will need to be able to run queries against Redshift with admin permissions. You can do this either by opening the built in Query Editor in the Redshift console, or via a 3rd party database management tool.
- Step 1 - Create database
CREATE DATABASE "weld";
- Step 2 - Create user for database:
CREATE USER "weld_user" PASSWORD DISABLE;
- Step 3 - Grant permissions to user:
GRANT ALL PRIVILEGES ON DATABASE "weld" TO "weld_user";
Note: You can use any database name and username, but you will need to ensure that the IAM role is set up correctly in the next step.
Part 3 - IAM Setup
- Step 1 - Create an IAM policy with the following permissions (edit in JSON mode):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "s3Access",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetEncryptionConfiguration",
"s3:ListBucketMultipartUploads",
"s3:AbortMultipartUpload",
"s3:ListBucket",
"s3:DeleteObject",
"s3:GetBucketLocation",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::{{YOUR_S3_BUCKET_NAME}}/*",
"arn:aws:s3:::{{YOUR_S3_BUCKET_NAME}}"
]
},
{
"Sid": "redshiftAccess",
"Effect": "Allow",
"Action": [
"redshift:GetClusterCredentials"
],
"Resource": [
"{{REDSHIFT_USER_ARN}}",
"{{REDSHIFT_DB_ARN}}"
]
},
]
}
Note: {{REDSHIFT_USER_ARN}} and {{REDSHIFT_DB_ARN}} are the ARNs of the Redshift user and database respectively.
These cannot be found in the Redshift console, but can be easily constructed:
arn:aws:redshift:{{AWS_REGION}}:{{AWS_ACCOUNT_ID}}:dbuser:{{REDSHIFT_CLUSTER_NAME}}/{{REDSHIFT_USERNAME}}
arn:aws:redshift:{{AWS_REGION}}:{{AWS_ACCOUNT_ID}}:dbname:{{REDSHIFT_CLUSTER_NAME}}/{{REDSHIFT_DATABASE_NAME}}
-
Step 2 - Create an IAM role and add the following trust relationships to ensure that Weld and Redshift can assume the role:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::887019001183:root" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "{{YOUR_CHOSEN_EXTERNAL_ID}}" } } }, { "Effect": "Allow", "Principal": { "Service": "redshift.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } -
Step 3 - Attach IAM policy to IAM role
Part 4 - IAM role association with Redshift
- Open the Redshift cluster, and click on the
Actionsbutton. - Click on
Manage IAM Roles - Select the previously created role from the dropdown and press
Associate IAM Role. After this, save the changes.
Part 4 - Weld Setup
- Fill out the details in the Weld UI for the Redshift data warehouse connection
- For the host and port, use the
Endpointfrom your Redshift cluster details. Note that theEndpointwill be in the format{{HOST}}:{{PORT}}/{{DATABASE}} - For the database, use the Database that you created in Part 2 - Step 1. If you didn't change the name, it will be called
weld - For the user, use the username that you created in Part 2 - Step 2. If you didn't change the name, it will be called
weld_user - For
External IDyou can use any string, but ensure that it is the same as the one in the IAM Role's trust relationship. - For the
S3 Bucket, input your the name of the S3 bucket that you created in Part 1.
- For the host and port, use the
- If using SSH, enable it and configure the SSH credentials
- Click Connect