Slowly Changing Dimensions (SCD): A Complete Guide to Tracking Historical Data

What Are Slowly Changing Dimensions?

Here's a scenario most data teams have run into: a customer updates their billing address, a product gets repriced, or an account gets reassigned to a new sales rep. The change happens, the database updates, and the old value just... disappears.

Now imagine someone asks, "What was this customer's address when we shipped that order last quarter?" If you overwrote it, you can't answer that question. The data is gone.

This is exactly the problem that slowly changing dimensions (SCD) solve. SCD is a set of data warehousing techniques for handling attributes that change over time -- but not frequently. Think customer addresses, product prices, employee titles, or account ownership. These values do change, but slowly compared to transactional data like orders or clicks.

The core question SCD answers is straightforward: when a dimension value changes, what should happen to the old value? The answer depends on your use case, and that's where the different SCD types come in.

Overview of slowly changing dimension types: SCD Type 1 overwrites, Type 2 adds versioned rows, Type 3 stores previous values, Type 4 uses a separate history table

Why This Matters for Your Analytics

Without an SCD strategy, your analytics can only describe the present. You can see what is, but never what was. That's a significant limitation when you need to answer questions like:

  • Revenue attribution: Which sales region owned this account when the deal actually closed?
  • Price analysis: How has this SKU's price changed over the past 12 months -- and did revenue follow?
  • Customer journey: When did this customer move from "prospect" to "active," and what triggered it?
  • Compliance & audit: What was the exact state of this record on a specific date?
  • Trend detection: Are inventory levels at this location trending down over time?

These aren't edge cases. They come up constantly in finance, operations, and any team running reports that compare time periods. SCD is a foundational concept in dimensional modeling (the Kimball methodology) and applies to every major data warehouse -- BigQuery, Snowflake, Redshift, and Databricks included.


SCD Types Explained

There are several SCD types, each making a different trade-off between simplicity, storage cost, and how much history you actually keep. Let's walk through them.

SCD Type 1 -- Overwrite (No History)

This is the default behavior in most databases: when a value changes, you overwrite it in place. Simple. Fast. And the old value is permanently gone.

Before the change:

customer_idnamecity
101AliceNY

After Alice moves to SF:

customer_idnamecity
101AliceSF

NY is gone forever. There's no record that Alice ever lived there.

This is fine for correcting typos or updating non-critical fields. But if the attribute matters for reporting -- say, you're segmenting revenue by customer location -- overwriting it will silently distort your historical numbers.

Use SCD Type 1 when:

  • You're fixing data-entry errors
  • The attribute genuinely doesn't matter historically
  • Storage and simplicity are top priorities

SCD Type 2 -- Row Versioning (Full History)

This is the most widely used SCD approach, and for good reason. Each time a value changes, a new row is added with timestamps marking when that version was valid. The old row stays untouched.

customer_idcityvalid_fromvalid_tois_current
101NY2022-01-012024-03-15false
101SF2024-03-159999-12-31true

Now you can answer both questions:

  • "Where did customer 101 live on 2023-06-01?" -- NY (the row where valid_from <= 2023-06-01 < valid_to)
  • "Where do they live now?" -- SF (where is_current = true)

The trade-off is that your table grows over time. For dimensions with millions of rows and frequent changes, this row bloat can start to impact query performance. But for most use cases, it's the gold standard.

Use SCD Type 2 when:

  • You need full audit trails or compliance reporting
  • You're doing time-series analysis on dimension attributes
  • Anyone on your team might need to "go back in time"

SCD Type 3 -- Previous Value Column (Limited History)

Instead of adding rows, SCD Type 3 adds columns. You keep the current value and the previous value side by side.

customer_idcurrent_cityprevious_citycity_changed_at
101SFNY2024-03-15

It's quick to query and keeps the schema simple. But there's an obvious limit: you can only see one level back. If Alice moves from NY to SF to LA, the NY record is lost completely.

Use SCD Type 3 when:

  • You only care about the most recent change ("before and after")
  • Schema simplicity matters more than deep history
  • You need fast "did this value change?" comparisons

SCD Type 4 -- History Table (Separate Tracking)

SCD Type 4 takes a different approach entirely. Instead of modifying the main table, you keep it clean with only current values and maintain a separate history table that logs every change as an append-only record.

Main table (always shows the current state):

customer_idnamecity
101AliceSF

History table (append-only -- every change is recorded):

customer_idnamecitysynced_at
101AliceNY2022-01-01
101AliceSF2024-03-15

This gives you the best of both worlds: your main table stays fast and compact for day-to-day queries, while the history table holds the full timeline for audit, analysis, or debugging.

This is the approach Weld uses for history tables -- we'll cover the details below.

Use SCD Type 4 when:

  • You need fast queries on current data and full historical access
  • Main table performance is important (no row bloat)
  • Audit and compliance requirements exist
  • You want "current state" analytics separated from "historical state" analysis

SCD Types Compared

Here's a quick reference for choosing the right type:

SCD TypeKeeps History?MethodBest ForDrawback
Type 1NoOverwrite in placeError corrections, non-critical fieldsHistory is permanently lost
Type 2FullNew row per changeAudit trails, time-travel queriesTable grows over time
Type 3Last change onlyExtra column(s)Before/after comparisonsOnly one level of history
Type 4Full (separate table)Append-only history tablePerformance + auditabilityTwo tables to manage

In practice, most teams end up using a mix. Type 1 for non-critical fields, Type 2 or Type 4 for the dimensions that actually matter. The key is being intentional about it rather than letting every change silently overwrite history.


What Are History Tables?

A history table is the practical implementation of SCD Type 4. It's a separate, append-only table in your data warehouse that captures every version of a record over time.

The key properties:

  • Same schema as the main table, plus a timestamp column (synced_at, updated_at, etc.)
  • Append-only -- rows are never updated or deleted, only added
  • Protected -- the table is managed automatically; you can't accidentally modify historical records
  • Complete timeline -- every sync captures a snapshot of changed records

This means you can always answer: "What did this record look like on date X?" -- which is surprisingly hard to do without a deliberate strategy.

How History Tables Compare to Change Data Capture (CDC)

If you've worked with change data capture (CDC), history tables might sound familiar. They solve a similar problem -- tracking what changed -- but they work quite differently in practice:

History Tables (SCD Type 4)CDC
TriggerPeriodic sync snapshotsReal-time event stream
GranularityFull record snapshot per syncIndividual field-level changes
LatencyMinutes (sync interval)Seconds (event-driven)
ComplexityLow -- just enable the tableHigh -- requires log parsing, schema registry
Best forAnalytics, reporting, auditReal-time pipelines, event sourcing

For most analytics and reporting use cases, history tables hit the sweet spot: they're dead simple to set up, give you full record-level snapshots, and don't require any infrastructure beyond your existing data warehouse.


Using History Tables in Weld

Weld makes setting up history tables straightforward across supported connectors. There's no code to write and no schema to configure -- you just select the tables you want.

Here's how to enable them:

  1. Open your connector settings:
    Go to Settings -> Connectors and select your data source.

  2. Search for history tables:
    In the table selection panel, type history. All available history streams will appear.

  3. Select and save:
    Check the history tables you want (e.g., product_history, order_history, customer_history) and save.

Enabling history tables in Weld - searching for history in the table selection panel and selecting append-only tables

That's it. Weld will start appending historical snapshots on the next sync. Each history table runs in append mode and is marked as protected, so existing rows are never overwritten or deleted. Every sync captures the current state of changed records, building a complete timeline automatically.


Example: Shopify History Tables

As an example, Weld supports 19 history tables for Shopify across all major entities:

CategoryHistory TableWhat It Tracks
Checkoutabandoned_checkout_historyAbandoned checkout changes over time
Collectioncollection_historyProduct collection updates
Companycompany_historyB2B company profile changes
Customercustomer_historyCustomer profile updates (address, email, tags)
Customercustomer_metafield_historyCustom customer metafield changes
Discountdiscount_code_historyDiscount code modifications
Draft Orderdraft_order_historyDraft order changes before completion
Fulfillmentfulfillment_order_historyFulfillment status transitions
Inventoryinventory_item_historyInventory item attribute changes
Inventoryinventory_level_historyStock level changes across locations
Line Itemline_item_group_historyLine item group modifications
Orderorder_historyOrder lifecycle changes (status, amounts, tags)
Orderorder_metafield_historyCustom order metafield changes
Orderorder_return_historyReturn status and detail changes
Price Ruleprice_rule_historyPricing rule updates
Productproduct_historyProduct detail changes (title, description, status)
Productproduct_image_historyProduct image additions and updates
Productproduct_metafield_historyCustom product metafield changes
Productproduct_variant_historyVariant-level changes (price, SKU, weight, inventory)

Tip: You can join any history table back to its main table to compare current vs. past values. For example: "Show me all products where the price changed in the last 30 days" -- a query that's trivial with history tables and impossible without them.


Practical Example: Tracking Product Price Changes

To make this concrete, here's how you'd use a Shopify product_variant_history table to find every product whose price has changed:

1with
2    price_changes as (
3        select
4            product_id
5          , title
6          , price
7          , synced_at
8          , lag(price) over (
9                partition by
10                    product_id
11                order by
12                    synced_at
13            ) as previous_price
14        from
15            shopify.product_variant_history
16    )
17select
18    *
19from
20    price_changes
21where
22    price != previous_price
23order by
24    synced_at desc

The LAG() window function grabs the previous price for the same product, so each row shows both the new price and what it was before. This kind of time-aware query is exactly what history tables unlock -- and it's something that's simply not possible when your data pipeline overwrites records in place.

You could extend this to calculate price change percentages, alert on drops above a threshold, or feed the output into a dashboard tracking pricing trends across your catalog.


Wrapping Up

Slowly changing dimensions aren't a new concept -- they've been a core part of data warehousing since Ralph Kimball formalized them in the 1990s. But they're more relevant than ever as companies rely on analytics for decisions that span time periods, not just snapshots.

The right SCD strategy depends on what you actually need:

  • SCD Type 1 when history genuinely doesn't matter
  • SCD Type 2 when you need full row-level versioning in a single table
  • SCD Type 3 when you only care about the most recent change
  • SCD Type 4 (history tables) when you want fast current-state queries and a complete audit trail, without compromising on either

Weld's history tables implement SCD Type 4 across supported connectors -- giving you append-only historical tracking with zero configuration beyond selecting the tables you want. No schema changes, no custom code, no maintenance.

If you've been overwriting your dimension data and hoping nobody asks about the past, now's a good time to stop doing that.


Sources & Further Reading