MySQL CDC (Change Data Capture)

Change Data Capture (CDC) continuously streams row-level inserts, updates, and deletes from your MySQL database to Weld. Instead of using the traditional approach of scanning whole tables or using a cursor timestamp to receive the updates on a schedule, CDC reads changes from MySQL's binary log (binlog) using a replication connection. This yields lower latency, reduced load on the primary, and reliable propagation of deletes.

With Weld's MySQL CDC connector, changes are captured from the binlog using row-based events. Weld consumes the change stream and applies it to your destination in near real time.


Prerequisites

Before enabling CDC in Weld, ensure the following are true in your MySQL environment.

Network access and authentication for Weld to connect

CREATE USER 'weld_cdc_user' IDENTIFIED BY '<set password here>';
-- Grant replication and metadata access for reading binlog
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'weld_cdc_user';
-- Grant read access for initial snapshots and backfills
GRANT SELECT ON *.* TO 'weld_cdc_user';

You have two options. You can either create a lightweight table or you need to enable GTIDs. The first option is to create a table to store high watermarks during incremental snapshots. Create the table and grant the CDC user read/write/delete access so it can manage these markers safely. In case you can't create this table, you can skip this step and enable GTIDs.

CREATE TABLE weld_watermark (
    id VARCHAR(255) PRIMARY KEY,
    type VARCHAR(255) NOT NULL,
    data TEXT
);

GRANT SELECT, INSERT, UPDATE, DELETE ON weld_watermark TO 'weld_cdc_user'@'%';

FLUSH PRIVILEGES;

1) Server parameters support row-based binlog

Self-hosted MySQL

For a self-hosted MySQL instance, ensure the following server variables are set (my.cnf/my.ini) and restart if needed:

-- Must be ON to produce binlogs
SHOW VARIABLES LIKE 'log_bin';
-- Must be ROW for CDC to capture changes reliably
SHOW VARIABLES LIKE 'binlog_format';
-- Recommended: FULL for complete row images on updates/deletes
SHOW VARIABLES LIKE 'binlog_row_image';
-- Set a unique server_id for the primary (required for replication)
SHOW VARIABLES LIKE 'server_id';
-- Set the binlog retention window in seconds
SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';

If your MySQL server was configured with a legacy or OS-specific timezone name that doesn’t exist in the IANA Timezone Database, the CDC connector may be unable to interpret timestamps correctly. In that case, set the server’s timezone to the equivalent IANA-compliant identifier.

Target values:

  • log_bin=ON

  • binlog_format=ROW

  • binlog_row_image=FULL

  • server_id is set (non-zero and unique in the replication topology)

  • binlog_expire_logs_seconds=604800 (is set to 7 days in seconds)

Enable GTIDs only if you can't create the weld_watermark table. Please advice your IT team on how to set GTIDs especially if your database has Read Replicas.

  • gtid_mode = ON

Amazon RDS / Aurora MySQL

Create a dedicated DB cluster parameter group for the Weld CDC setup: give it a unique name/description, pick the aurora-mysql8.0 family, and ensure the type is set to DB Cluster Parameter Group. After it is created, edit the group so the following settings are applied:

  • binlog_format = ROW
  • binlog_row_metadata = FULL
  • read_only = 0

Enable GTIDs only if you can't create the weld_watermark table. Please advice your IT team on how to set GTIDs especially if your database has Read Replicas.

  • enforce_gtid_consistency = ON
  • gtid-mode = ON

Configure the equivalent parameters in the DB parameter group. Also set binlog retention appropriately. You will need to restart the instance after changing the parameters.

Use the following command to set the binlog retention window in hours (7 days is the maximum):

-- To specify the number of hours to retain binary logs on a DB instance. 7 days is the max option.
CALL mysql.rds_set_configuration('binlog retention hours', 168);

2) All CDC tables have a primary key or a unique index

CDC requires a stable row identifier so updates/deletes can be applied correctly downstream. Ensure each CDC table has a primary key or a unique index.


πŸ”§ Enable CDC in Weld

Step 1 β€” Connect MySQL in Weld

  1. Create or open your MySQL connection in Weld.

Step 2 β€” Select tables

Pick the tables to replicate and enable CDC for them.

Step 3 β€” Configure destination

  1. Choose sync frequency/latency targets (CDC runs frequently; the setting controls apply frequency downstream).
  2. Provide a destination dataset/schema and naming pattern.

Weld will begin consuming the MySQL binlog and applying changes to your destination.


Housekeeping and binlog retention

If you stop or permanently delete an existing CDC sync, binlogs will expire based on your server's binlog retention settings. Ensure retention is configured high enough to cover any unexpected downtime to avoid data loss.

Was this page helpful?