MySQL CDC (Change Data Capture)
SOON MySQL CDC is currently under development and will be available soon. Reach out to us to get notified when it's live!
Change Data Capture (CDC) continuously streams row-level inserts, updates, and deletes from your MySQL database to Weld. Instead of using the traditional approach of scanning whole tables or using a cursor timestamp to receive the updates on a schedule, CDC reads changes from MySQL's binary log (binlog) using a replication connection. This yields lower latency, reduced load on the primary, and reliable propagation of deletes.
With Weld's MySQL CDC connector, changes are captured from the binlog using row-based events. Weld consumes the change stream and applies it to your destination in near real time.
Prerequisites
Before enabling CDC in Weld, ensure the following are true in your MySQL environment.
Network access and authentication for Weld to connect
CREATE USER 'weld_cdc_user' IDENTIFIED BY '<set password here>';
-- Grant replication and metadata access for reading binlog
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'weld_cdc_user';
-- Grant read access for initial snapshots and backfills
GRANT SELECT ON *.* TO 'weld_cdc_user';
You have two options. You can either create a lightweight table or you need to enable GTIDs. The first option is to create a table to store high watermarks during incremental snapshots. Create the table and grant the CDC user read/write/delete access so it can manage these markers safely. In case you can't create this table, you can skip this step and enable GTIDs.
The watermark table can be created in any database on the same MySQL server.
CREATE TABLE weld_watermark (
id VARCHAR(255) PRIMARY KEY,
type VARCHAR(255) NOT NULL,
data TEXT
);
GRANT SELECT, INSERT, UPDATE, DELETE ON weld_watermark TO 'weld_cdc_user'@'%';
FLUSH PRIVILEGES;
1) Server parameters support row-based binlog
Self-hosted MySQL
For a self-hosted MySQL instance, ensure the following server variables are set (my.cnf/my.ini) and restart if needed:
-- Must be ON to produce binlogs
SHOW VARIABLES LIKE 'log_bin';
-- Must be ROW for CDC to capture changes reliably
SHOW VARIABLES LIKE 'binlog_format';
-- Recommended: FULL for complete row images on updates/deletes
SHOW VARIABLES LIKE 'binlog_row_image';
-- Set a unique server_id for the primary (required for replication)
SHOW VARIABLES LIKE 'server_id';
-- Set the binlog retention window in seconds
SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';
If your MySQL server was configured with a legacy or OS-specific timezone name that doesnβt exist in the IANA Timezone Database, the CDC connector may be unable to interpret timestamps correctly. In that case, set the serverβs timezone to the equivalent IANA-compliant identifier.
Target values:
-
log_bin=ON -
binlog_format=ROW -
binlog_row_image=FULL -
server_idis set (non-zero and unique in the replication topology) -
binlog_expire_logs_seconds=604800(is set to 7 days in seconds)
Enable GTIDs only if you can't create the weld_watermark table. Please advice your IT team on how to set GTIDs especially if your database has Read Replicas.
gtid_mode = ON
Amazon RDS / Aurora MySQL
Create a dedicated DB cluster parameter group for the Weld CDC setup: give it a unique name/description, pick the aurora-mysql8.0 family, and ensure the type is set to DB Cluster Parameter Group. After it is created, edit the group so the following settings are applied:
binlog_format = ROWbinlog_row_metadata = FULLread_only = 0
Enable GTIDs only if you can't create the weld_watermark table. Please advice your IT team on how to set GTIDs especially if your database has Read Replicas.
enforce_gtid_consistency = ONgtid-mode = ON
Configure the equivalent parameters in the DB parameter group. Also set binlog retention appropriately. You will need to restart the instance after changing the parameters.
Use the following command to set the binlog retention window in hours (7 days is the maximum):
-- To specify the number of hours to retain binary logs on a DB instance. 7 days is the max option.
CALL mysql.rds_set_configuration('binlog retention hours', 168);
2) All CDC tables have a primary key or a unique index
CDC requires a stable row identifier so updates/deletes can be applied correctly downstream. Ensure each CDC table has a primary key or a unique index.
π§ Enable CDC in Weld
Step 1 β Connect MySQL in Weld
- Create or open your MySQL connection in Weld.
Step 2 β Select tables
Pick the tables to replicate and enable CDC for them.
Step 3 β Configure destination
- Choose sync frequency/latency targets (CDC runs frequently; the setting controls apply frequency downstream).
- Provide a destination dataset/schema and naming pattern.
Weld will begin consuming the MySQL binlog and applying changes to your destination.
Housekeeping and binlog retention
If you stop or permanently delete an existing CDC sync, binlogs will expire based on your server's binlog retention settings. Ensure retention is configured high enough to cover any unexpected downtime to avoid data loss.