Building dbt Models Based on Snapshots

Yesterday I was building a dimension staging table in dbt Cloud. I anticipated that the contents of the table will change overtime, so my original plan was to implement an incremental strategy. But, I also wanted a way to keep track of the table's history. To achieve both objectives, I built my staging table off a snapshot of the table in order to create a Slowly Changing Type 2 (SCD 2) table.

Slowly Changing Type 2 Tables

An SCD 2 table is a method in data warehousing that preserves the complete history of changes to dimension data by creating a new row for each change, rather than overwriting the existing one.

Let's say we have a table of names and street addresses.

First Name Last Name Street Address
John Doe 123 Sunset Blvd
Jane Doe 145 Princeton Cir

If any of the addresses get updated, we may want to preserve the history depending on our business needs. By creating an SCD 2 table, we would be able to maintain history of changes in address, while being able to distinguish the current address for each person. SCD 2 tables typically achieve this by adding Valid_From and Valid_To columns.

First Name Last Name Street Address Valid_From Valid_To
John Doe 123 Sunset Blvd 27-11-25 null
Jane Doe 145 Princeton Cir 27-01-25 null
John Doe 873 Hickory St 27-01-25 27-11-24

In the above example, we can see that John Doe changed address on 27-11-24, hence the entry for the '873 Hickory St' address has a Valid_To date. His new address, however, has a null Valid_To date, indicating that the '123 Sunset Blvd' address is his current address.

Building Snapshots (SCD 2 Tables) in dbt

If we were to try to build SCD 2 tables in SQL or Python, we might need to build convoluted logic to both maintain history while inserting new data. dbt, however, makes it extremely easy to build SCD 2 tables - or Snapshots. All it requires is a YAML file.

I built two snapshots for Mailchimp data. The Snapshot above was built for my members dimension table. Details for members will probably not change frequently, but I still wanted to preserve history when it does change while being able to add new members. Whenever details for a member changes, this Snapshot table will add a new row and update the Valid_To and Valid_From columns so that we'd be able to differentiate between historical and current records.

Above is what my schema for the members Snapshot looks like. dbt updates the four fields in red.

Building a Staging Table Based on a Snapshot

Now, we can build a staging table based on this Snapshot.

with source as (
    select *
    from {{ ref('snapshot_mailchimp_members') }}
    where dbt_valid_to is null
)

Above, I built my members staging table by referencing the snapshot. To filter out old records, I used a where filter to only grab rows where dbt_valid_to is null, which indicates that the row is current.

We can build on this staging table to merge with fact tables based on date ranges, so that each entry in a fact table is merged with the record that was current at the time of events.

In short, dbt makes it extremely convenient to build SCD 2 tables and base downstream models on them. This allows you to maintain history and incrementally update downstream models.

Author:
Charles Yi
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2025 The Information Lab