One of the tasks we were given during the BI training is to compare and contrast two different date dimension generators.
Given the complexity of what this code looks like, my preferred method would have been to write up a date generator from scrap, so that I know what is in it than going through someone else\’s code. But for the sake of the assignment, I browsed and got hold of several generators. There were quite interesting results, things I never thought of having inside a date dimension table.
Sample 1 was quite simple and straightforward, that I would have written on my own if I had to write this.
Sample 1
- Have a table (date dimension) containing the general date information like date Id, year, month, date, day, month number, quarter, day of the week, financial calendar details
- Assign a start date and an end date
- Maintain a date counter
- Set the table attributes by calling date T-SQL functions inside a while loop
- Increment the date counter by 1 day.
Sample 2 was complex, the table structure was bigger with a lot of extra columns to hold different holidays of each year.
Sample 2
- Create the date dimension table
- Set the first date and the number of years to generate the dates
- declare separate table variables weekly and annual holidays and insert the holiday name against a date id. This table contains only the date and the description. Holidays for 2013, 2014 and 2015 have been inserted.
- Set the table attributes by calling date T-SQL functions inside a while loop
- Increment the date counter by 1 day.
In both samples, the codes have not handled a fail scenario, if the user inputs dates which are already covered.
There are several differences I see in the above 2 samples:
Sample 1
- does not contain holiday information
- start and end dates are clearly defined. Does not contain a variable to capture how many years/months the dates should be generated
Sample 2
- Contains holiday information
- Only start date is defined and the user must say how many months he wants to generate the data.
Apart from the above differences, the major difference I can see is the addition of complexity, performance and non extendibility to the code in Sample 2.
1. Table variables are created for holidays and the holiday against each date id is inserted for 2013, 2014 and 2015. If the user select 1st of January, 2000 and 15 years to generate the dimension table, the years prior to 2015 will not have any holiday data. Hence, the user will have to manually add the following records for each year.
declare @NZAnnulPublicHolidays table([Date] int,[Description] varchar(50)) — int in YYYYMMDD format
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20130101,\’New Year\’\’s Day\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20130102,\’Day after New Year\’\’s Day\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20130206,\’Waitangi Day\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20130329,\’Good Friday\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20130401,\’Easter Monday\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20130425,\’ANZAC Day\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20130603,\’Queen\’\’s Birthday\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20131028,\’Labour Day\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20131225,\’Christmas Day\’)
insert into @NZAnnulPublicHolidays([Date],[Description]) values(20131226,\’Boxing Day\’)
Same is done for australia public holidays and to determine if it is a weekday or weekend.
In addition, creating a table variable, creates a space in the memory, which could affect the performance.
2. When iterating along the loop to insert records for each day, above holidays are inserted as follows:
case when datepart(WEEKDAY,@CurrentDate) in (select [weekday] from @WeeklyHolidays) then 0 else 1 end as [IsWeekDayCode],
case when datepart(WEEKDAY,@CurrentDate) in (select [weekday] from @WeeklyHolidays) then \’Weekend\’ else \’Weekday\’ end as [IsWeekDayDescription],
case when convert(int,convert(varchar(8),@CurrentDate,112)) in (select [Date] from @NZAnnulPublicHolidays) then 1 else 0 end as [NZIsPublicHolidayCode],
case when convert(int,convert(varchar(8),@CurrentDate,112)) in (select [Date] from @NZAnnulPublicHolidays) then \’Holiday\’ else \’Non Holiday\’ end as [IsNZPublicHolidayDescription],
case when convert(int,convert(varchar(8),@CurrentDate,112)) in (select [Date] from @NZAnnulPublicHolidays) then (select [Description] from @NZAnnulPublicHolidays where [Date]=convert(int,convert(varchar(8),@CurrentDate,112))) else \’N/A\’ end as [NZPublicHolidayFullDescription],
case when convert(int,convert(varchar(8),@CurrentDate,112)) in (select [Date] from @AUAnnulPublicHolidays) then 1 else 0 end as [AUIsPublicHolidayCode],
case when convert(int,convert(varchar(8),@CurrentDate,112)) in (select [Date] from @AUAnnulPublicHolidays) then \’Holiday\’ else \’Non Holiday\’ end as [IsAUPublicHolidayDescription],
case when convert(int,convert(varchar(8),@CurrentDate,112)) in (select [Date] from @AUAnnulPublicHolidays) then (select [Description] from @AUAnnulPublicHolidays where [Date]=convert(int,convert(varchar(8),@CurrentDate,112))) else \’N/A\’ end as [AUPublicHolidayFullDescription],
If they are calling this function for 10 years, 8 case statements in which there is an inner select statement that run for 365 * 10 times, which adds a performance hit to the function.
Suggestion
An alternative that will even simplify the above would be to move the above out of the loop and do a direct update since there are only about 10 distinct types of holidays. Example:
UPDATE [dbo].[dim_date]
SET [IsNZPublicHolidayDescription] = 1,
SET [NZPublicHolidayFullDescription] = \’Christmas Day\’
WHERE Day = 25 and MonthNumber = 12;
UPDATE [dbo].[dim_date]
SET [IsWeekDayCode] = 1
WHERE DATEPART(DW, @CurrentDate) IN (2,7)
- It is done outside a loop without case statements – hence less expensive
- Since it is not year specific, if a annual public holiday is added, it is added for all the years in the table. It saves the user from manually adding records for each year in a table variable.
Leave a comment