Looking at my previous write up on BI solution purpose, there are 4 main requirements that can be expected out of a dimension table:
- They should obviously be linked to the facts or the business process.
- Since we are getting data from so many different back end source systems, we need to keep track of from where the data is coming.
- Data should be consistent and reliable. For example, if it is a date, all dates should have the same format.
- We might need to keep a history of the changes to the data. I have written a separate article on dimension types for more details.
- Apart from the above data in a dimensional table should be understandable by any non IT person and should be presentable. For example – ProductDiscontinued column should have the values Yes/No/N/A instead of 1 or 0 in software engineering terms.
- The dimension table can be highly normalised and may contain many duplicate values.
We maintain the following types of keys to function the above requirements:
- Surrogate keys
- Business keys
Surrogate key – This is simply a primary key for the dimension table.
- Should be a simple key:
- In most cases it should be some kind of an integer, int, small int, big int, depending on the number of records your dimension table can have.
- Non descriptive
- This number should be able to auto increment.
- This will improve the performance.
- In rare cases like date dimension, it can be a descriptive key; ex-20180402.
- Refrain from making the primary key of the source table the primary key of the dimension table. This is because, you might be getting data from multiple source systems. Maybe at first it is just 1 source system, but with time this can change. Avoid problems late.
- This will protect the warehouse from operational changes
- Unwanted data being deleted in the source table will not have an impact on the warehouse.
- ID\’s re-assigned to new data in source table – warehouse will always have a new key.
- Source tables usually do not keep data for years and years – but warehouse will.
- Should handle null or unknown conditions
- Example – have an additional row for employee table as N/A. So that any patients coming to the hospital who get self registered can be assigned to N/A for the person who attended that patient.
Business Key – Primary key of the source system
- Maintain the primary key of the source system in a different column in the dimension table.
- Best practice is to append the primary key with the name of the source system name: ex: EPIC001.
- This can identify which source system the data came from and what source data row it is.
Leave a reply to Anonymous Cancel reply