Image for post
Image for post

First of all, a concept of Data Warehouse is supposed to be clear. Data Warehouse is not a copy of source database with a name prefixed with ‘DW’.

DW is we can store data from multiple data sources to be used for historical or analysis report. Some data sources are .csv format, some are google docs, etc. So we need to aggregate them into one data source.

This is called Data Warehouse.

How to design it? The procedure is Dimensional Modelling.

What is the difference between relational and dimensional?

It is same as the difference like normalised VS denormalised.

Normalised:

  • Minimal data redundancy
  • Optimised for fast read and fast write
  • Current data only
  • Realtime data

Denormalised:

  • Redundancy data storage for performance
  • Fast read only
  • Non-realtime data
  • Current and historical data

The next part is about fact table and dimension table in dimensional design.

Fact table:

  • Data that can be measured
  • Contains surrogate key, linking the associated measures or facts

Dimension table:

Some types of dimensional models:

  • Accumalating snapshot table
  • Aggregate fact
  • Fact table
  • Factless Fact table
  • Snapshot table

If you are interested in or have any problems with Dimensional Modelling, feel free to contact me.

Or you can connect with me through my LinkedIn.

Originally published at http://jacquiwu.com on January 13, 2020.

Written by

A current Data Analyst in a subsidiary under Webjet, with experience in applying data science techniques to business.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store