Data sources (ELT)

Data sources are used up to extract data on a scheduled sync from your connectors to your data warehouse.

Setting up a data source

  1. You must connect the tool from which you wish to sync data in your workspace settings. Once the connector is authenticated, you will be able to see the tables available for sync.

  2. Next, you can choose which tables you would like to include in or exclude from the sync. You also have the option to expand the tables to view the available columns.

    Columns can either be hashed to hide sensitive data or entirely removed from the sync. Tables and columns can be added or removed even after the initial sync is set up.

  3. After selecting all the tables you want to include, you will be prompted to create a name for the dataset.

  4. Finally, you can select the frequency at which you want the sync to run.

Usage

See a monthly overview of your row usage both total rows and Active rows on a connector and table level.

Syncing data

Weld uses ELT (Extract, load and transform) to move and transform data from its source to your data warehouse.

Initial syncs

Initial syncs can take a long time to complete because they process all available historical data. Data cannot be queried until the entire table has finished syncing.

Several factors influence the duration of an initial sync, including:

  • Volume of data: The primary factor affecting sync speed is the sheer amount of data. Syncing tables with millions of rows will take significantly longer than processing a few hundred.
  • Source system limits: If the source system enforces rate limits, it can slow down the data extraction process.

Incremental syncs

Incremental Syncs update only new or modified data.

After the initial sync, Weld connectors sync most tables using incremental updates. We use a variety of mechanisms to capture the changes in the source data, depending on how the source provides change data. During incremental syncs, Weld maintains an internal set of pointers, which let us track the exact point where our last successful sync left off.
Incremental syncs are efficient because they update only the changed data, instead of re-importing whole tables.

Incremental syncs do not update rows when they have been deleted. In order to remove deleted rows from a table then a manual full sync sync is required.

Protected tables

Table protection is an optional feature available for Incremental Syncs, providing additional control over schema changes and data integrity. By enabling table protection, you can be notified of incompatible schema changes and manually trigger a full sync to capture data with the new schema. This feature is particularly useful in scenarios where data retention on the source is not guaranteed, thus mitigating the risk of data loss during a full sync operation.

Full syncs

In some instances where the connector does not support incremental syncs then full syncs of the table will be run.
Full syncs can be manually triggered when there's a significant structural change to the data or the data models, or if there's a need to reprocess all the data due to data quality issues. Full syncs can be resource-intensive and time-consuming, especially when dealing with large amounts of data.

Sync modes and Active rows

The mode in which you run your sync will affect the count of your Active Rows. The limit of Active Rows is based on the plan you are on, but only a small percentage of users with high data syncing needs will reach this limit.

If you're unsure of your data syncing needs, getting close to the limits, or need to increase your limit, you can always reach out to our support team for assistance and guidance, either via the chat in your workspace or on our website

Learn more about the usage limits of Active Rows in our Fair Usage Policy.

The first 14 days of setting up any connector are not included in the count of active rows.

Was this page helpful?