Thursday, March 12, 2009

How To Populate Your TIME Dimension

Yesterday I wrote about the Datawarehousing TIMES table. Today I populated it. I modified it for my own purposes naturally, specifically removing the fiscal components (thankfully calendar year equals fiscal year). I think that saved me days of figuring out how to calculate certain fields.

I remember the first time trying to do this took about 20 different SQL statements. I generated the key (date, time_id) and then calculate that either loop or perform another SQL statement using specific dates out of the table (last day of the month for instance). This time I managed to do it in a single SQL statement. Again, I left out some of the more complicated calculations since I did not have to worry about fiscal requirements.

I also got a short refresher course in the date format models which always come in handy and learned a new function, NEXT_DAY.
NEXT_DAY returns the date of the first weekday named by char that is later than the date date. The return type is always DATE, regardless of the datatype of date. The argument char must be a day of the week in the date language of your session, either the full name or the abbreviation. The minimum number of letters required is the number of letters in the abbreviated version. Any characters immediately following the valid abbreviation are ignored. The return value has the same hours, minutes, and seconds component as the argument date.
Here's the SQL to populate your TIME Dimension:
SELECT 
TRUNC( sd + rn ) time_id,
TO_CHAR( sd + rn, 'fmDay' ) day_name,
TO_CHAR( sd + rn, 'D' ) day_number_in_week,
TO_CHAR( sd + rn, 'DD' ) day_number_in_month,
TO_CHAR( sd + rn, 'DDD' ) day_number_in_year,
TO_CHAR( sd + rn, 'W' ) calendar_week_number,
( CASE
WHEN TO_CHAR( sd + rn, 'D' ) IN ( 1, 2, 3, 4, 5, 6 ) THEN
NEXT_DAY( sd + rn, 'SATURDAY' )
ELSE
( sd + rn )
END ) week_ending_date,
TO_CHAR( sd + rn, 'MM' ) calendar_month_number,
TO_CHAR( LAST_DAY( sd + rn ), 'DD' ) days_in_cal_month,
LAST_DAY( sd + rn ) end_of_cal_month,
TO_CHAR( sd + rn, 'FMMonth' ) calendar_month_name,
( ( CASE
WHEN TO_CHAR( sd + rn, 'Q' ) = 1 THEN
TO_DATE( '03/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 2 THEN
TO_DATE( '06/30/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 3 THEN
TO_DATE( '09/30/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 4 THEN
TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
END ) - TRUNC( sd + rn, 'Q' ) + 1 ) days_in_cal_quarter,
TRUNC( sd + rn, 'Q' ) beg_of_cal_quarter,
( CASE
WHEN TO_CHAR( sd + rn, 'Q' ) = 1 THEN
TO_DATE( '03/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 2 THEN
TO_DATE( '06/30/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 3 THEN
TO_DATE( '09/30/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 4 THEN
TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
END ) end_of_cal_quarter,
TO_CHAR( sd + rn, 'Q' ) calendar_quarter_number,
TO_CHAR( sd + rn, 'YYYY' ) calendar_year,
( TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
- TRUNC( sd + rn, 'YEAR' ) ) days_in_cal_year,
TRUNC( sd + rn, 'YEAR' ) beg_of_cal_year,
TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' ) end_of_cal_year
FROM
(
SELECT
TO_DATE( '12/31/2002', 'MM/DD/YYYY' ) sd,
rownum rn
FROM dual
CONNECT BY level <= 6575
)
/
You can find the table definition and the INSERT statement here as well.

Any feedback good, bad or ugly is welcome.

13 comments:

  1. Hi!

    Just a quick observation: As you are always calculating sd+rn it might be convenient to put that in the driving query.

    ReplyDelete
  2. Good point Chris.

    I was "rushing" through it and gave it a brief thought, but didn't really pursue it...

    ReplyDelete
  3. Hi

    I must say that It is a good source first of all.

    I think for estimating week_ending_date SUNDAY must be used instead of Saturday.

    ReplyDelete
  4. @Saim

    That would depend on your requirements of course...I used Oracle's default which is day 1 of the week is Sunday and day 7 is Saturday, it's fairly trivial to adjust according to your needs.

    chet

    ReplyDelete
  5. Thank you very much for making it clear.

    In estimation of another field days_in_cal_year +1 is needed I think.

    ( TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
    - TRUNC( sd + rn, 'YEAR' ) ) days_in_cal_year : It gives 364 or 365 days for a day. I used it like this:

    ( TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
    - TRUNC( sd + rn, 'YEAR' ) + 1 ) days_in_cal_year

    and got the desired results.

    Have a good day or night depending on where you are :)

    ReplyDelete
  6. Interesting article.

    If I can add my two cents based on our 80TB data warehouse and one single period table ...

    1. I would recommend to add all date periods (not just days) but also MONTHS, QUARTERS, YEARS, ... and corresponding PERIOD_TYPE_CD which would be DAY / MONTH / QUARTER / YEAR ... this way your period table really covers all periods and can be used in all snapshot fact tables even e.g. monthly snapshot tables (for example PERIOD_KEY = 111 would correspond to 201004 - whole month).

    2. in general, I always felt that PERIOD table is nice to have for anything I can predict (e.g. is next day Monday? type questions), it is critical for non-predictable information e.g. Federal Holidays, Retail Calendar, "is my servicer sending data today", ... that's where you get real interest from user community and where people stop asking why you replaced all dates with key.

    3. small comment about name (I know I am too picky), I would not call it TIME dimension but PERIOD or something like that. It really tracks days, not time (hours, minutes, seconds, ...)

    ReplyDelete
  7. @Chet: great stuff... nice starting point for anyone that needs to do this.

    @Jiri:

    You said: "I would recommend to add all date periods (not just days) but also MONTHS, QUARTERS, YEARS, ... and corresponding PERIOD_TYPE_CD which would be DAY / MONTH / QUARTER / YEAR ... this way your period table really covers all periods and can be used in all snapshot fact tables even e.g. monthly snapshot tables (for example PERIOD_KEY = 111 would correspond to 201004 - whole month)."

    I couldn't disagree more. The number one rule in dimensional modeling is to never mix grains in the same table. If you really need a "shrunken dimension"... then create a new dimension table called, called MONTH_DIM for instance, that is not at the grain of an individual day, but instead at the grain of a month. This is also easy to do with a view over the daily dimension table. That's also why I think the date dimension table should always be called DATE_DIM. A dimension table name should always specify the grain of the table.

    If you go with your PERIOD_TYPE approach... every single join of a fact table to the PERIOD table would require a filter on PERIOD table. Why do that? It just begs for a user to make a mistake and overallocate. Keep it simple... use another table or view.

    However... if you use Oracle, why go through the process of building aggregate tables at all? The database will do this for you with fairly little effort. Search for "oracle query rewrite" and see where that takes you.

    ReplyDelete
  8. Great stuff,
    just one comment, I don't see "SEMESTER", was that just an oversight?

    ReplyDelete
  9. @anonymous

    No oversight.

    Should be fairly simple to add in there though.

    chet

    ReplyDelete
  10. Hi, I know this is an old post, but you mentioned you Removed the complicated fiscal calculations. I infact need just those.

    Do you by any chance still have the script that figured fiscal years into the calculation? Been looking for one everywhere!

    ReplyDelete
  11. Hi,
    I am currently having big trouble trying to create a Time dimension table With fiscal year support (not starting in January). You seem to have removed just that!

    I know this is an old post, but do you by any chance have the script or the link to one, that I can refer to?

    Thanks in advance!

    ReplyDelete
  12. @nilay9999

    I don't have it available, but I might be able to find it.

    How is your fiscal year defined? July - June?

    ReplyDelete