I am thoroughly exhausted! Margy followed up day one with another day packed with information.

We went through the various types of Slowly Changing Dimensions (SCD) and even talked about Type 6 SCDs, which I’ll get to in a bit. We then rolled right into mini dimensions and other related concepts. After that, we talked about the modeling process, different types of fact tables, and then finally how to deal with a mature data warehouse.

Slowly Changing Dimensions

Although I’ve posted a few entries (1 2) on the SCD Types, I’ll go over them once again here, but hopefully in a slightly new light. We talked specifically about:

Type 1: Overwrite the Attribute
In short, when an attribute’s value changes, we overwrite the old value with the new value. This technique does not preserve history. While it might be easy to implement, its use must be fully justified! In most every case, we will want to keep some sort of history in our dimensions.
Type 2: Preserve History / Add a New Record
The most common type, this technique preserves the old value by inserting a copy of the current row with the updated attribute values into the Dimension. The old row is marked as archived (typically by using active and inactive dates and a status indicator).
Type 3: Alternate Reality / Add a New Attribute
New attributes are added to the dimension to represent the old and new value. In some cases this is a handy technique because you can preserve the old value along side the new value. This technique is best used when there is a massive, dimension-wide change that affects many rows. For example, if a voter’s representative district changes during redistricting, we might want to see what election results would have been if the old districts were used in the analysis.
Type 6: Advanced Hybrid
This is a hybrid approach that combines the other 3 types (1 + 2 + 3 = 6). According to Margy: “We’re creating new rows to capture change (Type 2), adding attributes to reflect an alternative view of the world (Type 3), which are overwritten for all earlier dimension rows for a given product (Type 1).” Read more about this here and be sure to check out this article by Margy and Ralph for more information. I hope to have some examples posted in the coming weeks as well.

We didn’t talk about inferred dimensions (although Margy did mention the late arriving kind). I suppose the idea of an inferred dimension (or Type 1.5) might be discussed with Warren in the coming days.

The Mini

After all that SCD talk, we talked about dealing with very large dimensions. Especially those that have Type 2 attributes. Margy introduced the concept of the Mini-Dimension, and stated that for extremely large dimensions, using Type 2 may not be a good idea for attributes that might change often. For example, suppose you wanted to store an airline passenger’s ‘age’ in your Passenger dimension. If the passenger flies 7 times over an 8 year period, you could have as many as 7 rows reflecting that change.

Instead, create a mini-dimension with attributes that change frequently and link them to the Fact table using a separate foreign key. It is hard to explain this in any detail without a picture. I promise once I get back from Amsterdam I will prepare a better example for this (with pictures!).

The Modeling Process

I was happy to see a segment dedicated to the overall modeling process. When it first came up, I felt it was out of place (perhaps it could have been discussed before we dove into sketching our case study models?), but in the end, it seemed to fit the flow.

The modeling process includes the following steps: Preparation, a high level design session, detailed model development, Dimensional model review and validation, and final documentation to be handed to the ETL team. It is important to note that this process is iterative.

Next we went over the various participants that need to be involved: the data modeler, business analyst, power users, BI application developer, data steward, source experts, and the ETL team.

During this process, we’ll also start to conduct our first rounds of data profiling. I find this part of the process to be most crucial because it is at this level where you can start to determine the feasibility of implementing a particular business process.

Types of Fact Tables

After a brief case study, Margy took a little more time to go over the different types of Fact table grains. These include the transaction, periodic snapshot, and accumulating snapshot. For more information on the different types, please read my post “Fact Tables“.

Some additional notes on Accumulating Snapshots: They generally represent some sequential, short lived, and predictable process that needs to be expressed by various milestones. These milestones are most often dates. These fact tables often span multiple business processes as well. For example, an order is placed, it is released, shipped, and delivered. These are separate milestones that have some serious significance to the business for analysis.

Lastly…

We finished the day discussing mature BI projects and how to handle existing warehouses that might not be in great shape. Margy went over several important common disorders (ranging from lack of business sponsor to infrastructure issues), some symptoms, and finally she proposed a treatment plan. Every slide in this section was packed with good, solid advice. When I can get my notes together on this, I plan to write about some of my own experiences with these disorders.

Tomorrow Warren will take over. I trust he’ll be as good as Margy!