The Simplest Way to Understand SCD in Data Warehouse
What is SCD?
SCD stands for Slowly Changing Dimensions.
It is very important in Data Warehouse.
As we know, ETL (Extract, Transform, Load) is between data sources and data warehouse.
When ETL runs, it will pick up all records and update them in the Dimension tables.
Why we need SCD?
Because we have some problems in updating data in Data Warehouse when data in data sources are changing.
In the dimension tables, if we want to keep some old records, how we can do this?
Using SCD.
Types of SCD:
Note:The types of SCD are defined on Column level, not on the table level.
There are two popular types in SCD.
Type 1: Overwrite
An old record is updated by the new record. It means covering the old records.
Type 2: Store history by creating another row
Type 2 is to add new records rather than covering the old records.
As long as we have a type 2 in tables, we must have two extra values: ‘StartDate’ and ‘EndDate’. The ‘EndDate’ is when the changes happen, as the end date of historical data.
We can have a third one, ‘IsCurrent’, to identify or mark the current record.
We don’t need to update the records, and we just need to update ‘StartDate’ and ‘EndDate’.
If you are interested in or have any problems with fact tables and dimension tables, feel free to contact me .
Originally published at http://jacquiwu.com on September 1, 2019.