📊 Mastering SCD Type 2: Handling Historical Changes in SQL
Slowly Changing Dimensions (SCD) are a crucial part of data warehousing and analytics. Among the different types of SCD, Type 2 is particularly interesting as it allows us to track historical changes in dimensions such as customer data, product information, and more.
In a recent project, I had the opportunity to implement an SCD Type 2 solution using SQL, and it was a game-changer for historical data analysis.
lets say we have table customer
#|customerid|address |current|effective_date|end_date |
-+———-+——– —+——-+————–+———-+
1| 1 |Denver |false | null |2018-02-01|
2| 1 |New York |true | 2018-02-01| |
3| 2 |Budapest |true | 2018-02-01| |
4| 3 |London |true | 2018-02-01| |
customer_incremental table as
#|customerid|address |effective_date|
-+———-+————-+————–+
1| 1|San Francisco| 2020-01-20|
2| 3|London | 2020-01-20|
3| 4|Berlin | 2020-01-20|
to achieve an SCD type 2 :
step1 : first identify the records which as dual effect , it means insert a new record as well update the existing record in master table.
select ci.customerId,ci.address,ci.effective_date from customer_incremental ci join customer c on ci.customerId = c.customerId where current = true and ci.address <> c.address;
step2: union all the records from customer_incremental also the output from above query as
select null as merge_key,ci.customerId,ci.address,ci.effective_date from customer_incremental ci join customer c on ci.customerId = c.customerId where current = true and ci.address <> c.address
union all
select customerId as merge_key,ci.* from customer_incremental ci;
we are using merge_key column to get to know which record has dual effect lets say in our case customerId 1 has dual effect , so we must have two records from above query for customerId 1 , one with null to do insert in master table ans one with customerId as merge_key to do upsert.
step3: finally do an merge with an master table.
merge into customer as c using
(
select null as merge_key,ci.customerId,ci.address,ci.effective_date from customer_incremental ci join customer c on ci.customerId = c.customerId
where current = true and ci.address <> c.address
union all
select customerId as merge_key,ci.* from customer_incremental ci)a
on c.customerId = a.merge_key
when matched and current = true and c.address <> a.address
then update set end_date = a.effective_date , current = false
when not matched then insert (customerid, address, current, effective_date, end_date) values(customerid, address, true, effective_date, null);
Adding an pictorial view of all table queries to have better understanding.

Leave a comment