PostgreSQL CDC (Change Data Capture)
Change Data Capture (CDC) continuously streams row-level inserts, updates, and deletes from your PostgreSQL database to Weld. Instead of using the traditional approach of scanning whole tables or using XMIN to receive the updates on a schedule, CDC reads changes from PostgreSQL's write-ahead log (WAL) using logical replication. This yields lower latency, reduced load on the primary, and reliable propagation of deletes.
With Weld's PostgreSQL CDC connector, changes are captured via a logical replication slot and a publication that define which tables to stream. Weld consumes the change stream and applies it to your destination in near real time.
Prerequisites
Before enabling CDC in Weld, ensure the following are true in your PostgreSQL environment.
We first need network access (so Weld can reach your PostgreSQL server) and authentication (a database user with appropriate permissions) for Weld to connect and read from the replication slot.
CREATE USER weld_cdc_user WITH PASSWORD '<set password here>';
GRANT USAGE ON SCHEMA "public" TO weld_cdc_user;
GRANT SELECT ON ALL TABLES IN SCHEMA "public" TO weld_cdc_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA "public" GRANT SELECT ON TABLES TO weld_cdc_user;
If you are running PostgreSQL 13 or below, you should add a lightweight heartbeat table to prevent WAL (Write-Ahead Log) space accumulation.
Without a heartbeat table, PostgreSQL's logical replication slots can accumulate WAL files when there's no regular database activity, as the replication slot needs to track changes even during idle periods. This WAL accumulation can significantly increase your database size and consume disk space. Importantly, WAL accumulation affects all databases on the same PostgreSQL server, not just the database you're replicating, since WAL is shared across the entire server instance.
The heartbeat table generates periodic writes that keep the replication slot advancing, preventing WAL files from accumulating unnecessarily. This is especially important for low-traffic databases where natural write activity may be infrequent.
Create the table and grant the CDC user full access (only for PostgreSQL 13 or below). You can customize the schema and table name if you prefer something other than public.weld_heartbeat:
CREATE TABLE IF NOT EXISTS public.weld_heartbeat (
id TEXT PRIMARY KEY,
watermark TEXT
);
GRANT ALL PRIVILEGES ON TABLE public.weld_heartbeat TO weld_cdc_user;
1) Server parameters support logical replication
For a self hosted Postgres you need to run
ALTER SYSTEM SET wal_level = logical;
To avoid unbounded WAL growth caused by replication slots, set a cap via max_slot_wal_keep_size. For example, to allow up to 50GB:
ALTER SYSTEM SET max_slot_wal_keep_size = '50GB';
For Amazon RDS or Aurora Postgres
You need to set rds.logical_replication=1 to the parameter group associated with the database. After this change a database reboot is required.
It is recommented to set max_slot_wal_keep_size=51200 (equivalent to 50GB) in the parameter group to cap WAL retained by replication slots.
Finally, you need to run
GRANT rds_replication TO weld_cdc_user;
so that the assigned user has replication privileges.
You can validate wal_level quickly:
SHOW wal_level;
that should return logical.
2) A logical and non-temporary replication slot exists
Weld requires a logical replication slot that is not temporary. You can validate with:
SELECT (slot_type = 'logical') AS is_logical,
(NOT temporary) AS drop_on_stop_false
FROM pg_replication_slots;
If no appropriate slot exists, create one (with pgoutput):
-- Create a logical replication slot named weld_cdc
SELECT *
FROM pg_create_logical_replication_slot('weld_cdc', 'pgoutput');
3) A publication is configured for the tables to capture
Create either a publication for all user tables or a scoped publication for specific tables used for CDC.
CREATE PUBLICATION weld_publication FOR TABLE
public.table1,
public.table2;
You can inspect publications with:
SELECT p.pubname, p.puballtables, pt.schemaname, pt.tablename
FROM pg_publication p
LEFT JOIN pg_publication_tables pt ON pt.pubname = p.pubname
ORDER BY p.pubname, pt.schemaname, pt.tablename;
4) All CDC tables have a primary key or a unique index
CDC requires a stable row identifier so updates/deletes can be applied correctly downstream. Ensure each table in the publication has a primary key or a unique index.
5) Replica identity is not set to NONE
Replica identity should not be NONE because tables must expose enough identifying columns for UPDATE/DELETE messages.
Validate it with the query below; the value should not be n (or NONE).
In most cases, d (DEFAULT, use primary key) is the right setting.
SELECT CONCAT(n.nspname, '.', c.relname) AS table_name,
c.relreplident
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
AND c.relkind = 'r'
ORDER BY 1;
π§ Enable CDC in Weld
Step 1 β Connect PostgreSQL in Weld
- Create or open your PostgreSQL connection in Weld.
Step 2 β Choose CDC as the sync mode
In the PostgreSQL connector settings in Weld, provide:
- Replication slot name (e.g.,
weld_cdc) - Publication name (e.g.,
weld_publication)
Step 3 β Select tables
Pick the tables to replicate and enable the CDC option to them
Step 4 β Configure destination
- Choose sync frequency/latency targets (CDC runs frequently; the setting controls apply frequency downstream).
- Provide a destination dataset/schema and naming pattern.
Weld will begin consuming the WAL stream via the logical slot and applying changes to your destination.
Housekeeping and WAL retention
If you stop or permanently delete an existing Elt sync, make sure the associated logical replication slot is also removed. Leaving an unused slot behind can force PostgreSQL to retain WAL indefinitely, consuming disk space.
- List existing replication slots:
SELECT *
FROM pg_replication_slots;
- Drop an unused replication slot (replace with your slot name, e.g.
weld_cdc):
SELECT pg_drop_replication_slot('weld_cdc');
To protect production databases from unbounded WAL retention, set a cap via max_slot_wal_keep_size, which limits how much WAL a replication slot can force the server to keep. The default is -1 (unlimited). Set an appropriate bound for your environment, for example:
In RDS enviroments this is configured through the parameter group associated with the database.