First of all, a concept of Data Warehouse is supposed to be clear. Data Warehouse is not a copy of source database with a name prefixed with ‘DW’.
DW is we can store data from multiple data sources to be used for historical or analysis report. Some data sources are .csv format, some are google docs, etc. So we need to aggregate them into one data source.
This is called Data Warehouse.
How to design it? The procedure is Dimensional Modelling.
What is the difference between relational and dimensional?
It is same as the difference like normalised VS denormalised.
- Minimal data redundancy
- Optimised for fast read and fast write
- Current data only
- Realtime data
- Redundancy data storage for performance
- Fast read only
- Non-realtime data
- Current and historical data
The next part is about fact table and dimension table in dimensional design.
- Data that can be measured
- Contains surrogate key, linking the associated measures or facts
Some types of dimensional models:
- Accumalating snapshot table
- Aggregate fact
- Fact table
- Factless Fact table
- Snapshot table
If you are interested in or have any problems with Dimensional Modelling, feel free to contact me.
Or you can connect with me through my LinkedIn.
Originally published at http://jacquiwu.com on January 13, 2020.