BigQuery Partitioned Tables
Google supports partitioned tables in BigQuery. Partitioning your tables makes it easier to manage data and improves query performance. Furthermore, partitioning can significantly decrease the costs of writing to and querying from your BigQuery warehouse by reducing the amount of data scanned.
For more information about partitioned tables, see BigQuery's partitioned table documentation.
Supported Partitioning Types
Weld supports the following types of table partitioning in BigQuery:
- Tables partitioned by date/timestamp/datetime: Tables that are partitioned based on a specific
TIMESTAMP,DATE, orDATETIMEcolumn in your source data. - Tables partitioned by integer range: Tables that are partitioned based on an
INTEGERcolumn.
Weld does not currently support ingestion-time partitioning. You must select a specific column from your source table to act as the partition key. If you choose ingestion-time partitioning, Weld will not be able to use partitioning to optimize data merges, which may lead to increased costs and slower performance.
Granularity and Primary Keys
Weld supports time partitioning based on daily (DAY) granularity.
- Primary Keys: If your partition key is part of your source's primary key, Weld automatically optimizes how we merge data into BigQuery.
- Non-Primary Keys: If you choose a partition key that is not part of the primary key, then make sure to choose a partition key that doesn't update as it might result in duplicates. For example, partitioning by a last_updated timestamp is not recommended - as we might skip the last version when trying to merge the new version in.
Converting an Existing Table to a Partitioned Table
If you have an existing non-partitioned table created by Weld and wish to convert it to a partitioned table, you must manually migrate the table in BigQuery.
Follow the steps below to ensure no data is lost during the conversion.
Step 1: Stop the Weld Sync
Before making structural changes to your BigQuery destination, you must stop Weld from writing data.
- Log in to your Weld dashboard.
- Navigate to the ELT section and select the relevant Sync.
- Toggle the status to STOPPED.
- Wait for any currently running sync jobs to complete.
Step 2: Run Migration SQL in BigQuery
To convert an existing table, you will essentially create a new partitioned table, copy the data from the old table, and then swap the tables.
Open the BigQuery Console, select your project/dataset, and execute the SQL statement matching your desired partition type below.
Replace [schema-name] with your dataset name and [table-name] with your
actual table name.
Option A: Partition by TIMESTAMP
Use this if your column is a TIMESTAMP type.
-- 1. Rename the original table to a backup name
ALTER TABLE `[schema-name].[table-name]`
RENAME TO `[table-name]_backup`;
-- 2. Create the new partitioned table with data from the backup
CREATE TABLE `[schema-name].[table-name]`
PARTITION BY DATE([timestamp-column])
AS SELECT * FROM `[schema-name].[table-name]_backup`;
-- 3. (Optional) Drop the backup table once you verify the data
DROP TABLE `[schema-name].[table-name]_backup`;
Option B: Partition by DATE
Use this if your column is a DATE type.
-- 1. Rename the original table to a backup name
ALTER TABLE `[schema-name].[table-name]`
RENAME TO `[table-name]_backup`;
-- 2. Create the new partitioned table with data from the backup
CREATE TABLE `[schema-name].[table-name]`
PARTITION BY [date-column]
AS SELECT * FROM `[schema-name].[table-name]_backup`;
-- 3. (Optional) Drop the backup table once you verify the data
DROP TABLE `[schema-name].[table-name]_backup`;
Option C: Partition by DATETIME
Use this if your column is a DATETIME type.
-- 1. Rename the original table to a backup name
ALTER TABLE `[schema-name].[table-name]`
RENAME TO `[table-name]_backup`;
-- 2. Create the new partitioned table with data from the backup
CREATE TABLE `[schema-name].[table-name]`
PARTITION BY [datetime-column]
AS SELECT * FROM `[schema-name].[table-name]_backup`;
-- 3. (Optional) Drop the backup table once you verify the data
DROP TABLE `[schema-name].[table-name]_backup`;
Option D: Partition by Integer Range
Use this to partition by an ID or integer bucket. You must define the start, end, and interval.
-- 1. Rename the original table to a backup name
ALTER TABLE `[schema-name].[table-name]`
RENAME TO `[table-name]_backup`;
-- 2. Create the new partitioned table with data from the backup
CREATE TABLE `[schema-name].[table-name]`
PARTITION BY RANGE_BUCKET([integer-column], GENERATE_ARRAY([start], [end], [interval]))
AS SELECT * FROM `[schema-name].[table-name]_backup`;
-- 3. (Optional) Drop the backup table once you verify the data
DROP TABLE `[schema-name].[table-name]_backup`;
Step 3: Resume Weld Sync
Once the table has been recreated with the correct partitioning:
- Return to your Weld dashboard.
- Set the Sync status toggle back to STARTED.
- Weld will detect the existing table and continue syncing data into the new partitions.