BigQuery Partitioned Tables

Google supports partitioned tables in BigQuery. Partitioning your tables makes it easier to manage data and improves query performance. Furthermore, partitioning can significantly decrease the costs of writing to and querying from your BigQuery warehouse by reducing the amount of data scanned.

For more information about partitioned tables, see BigQuery's partitioned table documentation.

Supported Partitioning Types

Weld supports the following types of table partitioning in BigQuery:

  1. Tables partitioned by date/timestamp/datetime: Tables that are partitioned based on a specific TIMESTAMP, DATE, or DATETIME column in your source data.
  2. Tables partitioned by integer range: Tables that are partitioned based on an INTEGER column.

Granularity and Primary Keys

Weld supports time partitioning based on daily (DAY) granularity.

  • Primary Keys: If your partition key is part of your source's primary key, Weld automatically optimizes how we merge data into BigQuery.
  • Non-Primary Keys: If you choose a partition key that is not part of the primary key, then make sure to choose a partition key that doesn't update as it might result in duplicates. For example, partitioning by a last_updated timestamp is not recommended - as we might skip the last version when trying to merge the new version in.

Converting an Existing Table to a Partitioned Table

If you have an existing non-partitioned table created by Weld and wish to convert it to a partitioned table, you must manually migrate the table in BigQuery.

Follow the steps below to ensure no data is lost during the conversion.

Step 1: Stop the Weld Sync

Before making structural changes to your BigQuery destination, you must stop Weld from writing data.

  1. Log in to your Weld dashboard.
  2. Navigate to the ELT section and select the relevant Sync.
  3. Toggle the status to STOPPED.
  4. Wait for any currently running sync jobs to complete.

Step 2: Run Migration SQL in BigQuery

To convert an existing table, you will essentially create a new partitioned table, copy the data from the old table, and then swap the tables.

Open the BigQuery Console, select your project/dataset, and execute the SQL statement matching your desired partition type below.

Option A: Partition by TIMESTAMP

Use this if your column is a TIMESTAMP type.

-- 1. Rename the original table to a backup name
ALTER TABLE `[schema-name].[table-name]`
RENAME TO `[table-name]_backup`;

-- 2. Create the new partitioned table with data from the backup
CREATE TABLE `[schema-name].[table-name]`
PARTITION BY DATE([timestamp-column])
AS SELECT * FROM `[schema-name].[table-name]_backup`;

-- 3. (Optional) Drop the backup table once you verify the data
DROP TABLE `[schema-name].[table-name]_backup`;

Option B: Partition by DATE

Use this if your column is a DATE type.

-- 1. Rename the original table to a backup name
ALTER TABLE `[schema-name].[table-name]`
RENAME TO `[table-name]_backup`;

-- 2. Create the new partitioned table with data from the backup
CREATE TABLE `[schema-name].[table-name]`
PARTITION BY [date-column]
AS SELECT * FROM `[schema-name].[table-name]_backup`;

-- 3. (Optional) Drop the backup table once you verify the data
DROP TABLE `[schema-name].[table-name]_backup`;

Option C: Partition by DATETIME

Use this if your column is a DATETIME type.

-- 1. Rename the original table to a backup name
ALTER TABLE `[schema-name].[table-name]`
RENAME TO `[table-name]_backup`;

-- 2. Create the new partitioned table with data from the backup
CREATE TABLE `[schema-name].[table-name]`
PARTITION BY [datetime-column]
AS SELECT * FROM `[schema-name].[table-name]_backup`;

-- 3. (Optional) Drop the backup table once you verify the data
DROP TABLE `[schema-name].[table-name]_backup`;

Option D: Partition by Integer Range

Use this to partition by an ID or integer bucket. You must define the start, end, and interval.

-- 1. Rename the original table to a backup name
ALTER TABLE `[schema-name].[table-name]`
RENAME TO `[table-name]_backup`;

-- 2. Create the new partitioned table with data from the backup
CREATE TABLE `[schema-name].[table-name]`
PARTITION BY RANGE_BUCKET([integer-column], GENERATE_ARRAY([start], [end], [interval]))
AS SELECT * FROM `[schema-name].[table-name]_backup`;

-- 3. (Optional) Drop the backup table once you verify the data
DROP TABLE `[schema-name].[table-name]_backup`;

Step 3: Resume Weld Sync

Once the table has been recreated with the correct partitioning:

  1. Return to your Weld dashboard.
  2. Set the Sync status toggle back to STARTED.
  3. Weld will detect the existing table and continue syncing data into the new partitions.

Was this page helpful?