The Simplest Way to Understand SCD in Data Warehouse

Image for post
Image for post

SCD stands for Slowly Changing Dimensions.

It is very important in Data Warehouse.

As we know, ETL (Extract, Transform, Load) is between data sources and data warehouse.

When ETL runs, it will pick up all records and update them in the Dimension tables.

Image for post
Image for post

Because we have some problems in updating data in Data Warehouse when data in data sources are changing.

In the dimension tables, if we want to keep some old records, how we can do this?

Using SCD.

Note:The types of SCD are defined on Column level, not on the table level.

There are two popular types in SCD.

Type 1: Overwrite

An old record is updated by the new record. It means covering the old records.

Type 2: Store history by creating another row

Type 2 is to add new records rather than covering the old records.

As long as we have a type 2 in tables, we must have two extra values: ‘StartDate’ and ‘EndDate’. The ‘EndDate’ is when the changes happen, as the end date of historical data.

We can have a third one, ‘IsCurrent’, to identify or mark the current record.

We don’t need to update the records, and we just need to update ‘StartDate’ and ‘EndDate’.

If you are interested in or have any problems with fact tables and dimension tables, feel free to contact me .

Originally published at http://jacquiwu.com on September 1, 2019.

Written by

A current Data Analyst in a subsidiary under Webjet, with experience in applying data science techniques to business.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store