In my post, Fact Tables, I discussed three types of fact tables as defined by their grain: transactional, accumulating, and periodic. In this post, I’ll go over the basic types of dimension tables typically found in a dimensional model.
What are Dimensions?
A Dimension table is a denormalized entity in a dimensional model that contains detailed information about a fact stored in a fact table. Dimensions are independent of each other, joined through the fact table or in the case of an outrigger, to other specialized dimensions that are not connected to facts. This type of design facilitates simple many-to-many analysis where an operator can pick attributes from a broad set of dimensions and “slice-and-dice” through them via the fact table.
Every dimension contains a single-part surrogate primary key field. This primary key is used to relate to one or more fact tables in the schema. Natural keys (like an employee id) are kept in the dimension, but are not used in joins. This technique ensures a unique value for each record, simplifies joins, and keeps indexes small.
A zero-key row represents the “null” or “unknown” condition in a dimension. The fact table, if the data dimension is unknown, will use a foreign key of zero, and still link to the dimension. This facilitates “did not happen” or “does not exist” queries and related analysis on missing or incomplete data.
Dimensions can take on different forms in different contexts. The grain of the fact table dictates if a particular dimension can be a part of the star schema involving the fact. A Star schema represents a single business process and its facts are stored at a particular grain. Multiple Star schemas can exist in a single database, where dimensions are shared among business processes. A Product dimension might be used in both the “procurement” and “order” Stars for example — provided the facts are at compatible grain. A time dimension, which I discuss below, will most likely be shared by every Star.
Types of Dimensions
The following list is not exhaustive, but should give you an idea of the types dimensions that can be found in a typical dimensional model:
Primary Dimension: Primary dimensions represent the major elements of the business process. For example, in an order system, you will find Order, Salesperson, Product, and Shipper dimensions (among others). Primary dimensions are familiar to business users because they represent some tangible or recognizable aspect of the business (typically the “nouns”). Most ad-hoc queries and reports will use attributes from Primary Dimensions as predicates and in Select and Order By clauses. It follows that significant time will be devoted to end-user training on the structure and content of these Primary Dimensions.
Time Dimension: Date and time dimension are crucial for the data warehouse and to derive Business Intelligence. Most business processes are tracked and analyzed over time; these points in time are stored in the fact table along with the appropriate metrics. Instead of embedding a date in the fact table, dimensional modelers create a date dimension that stores information about each day of the year, for how ever many years is relevant to the business.
Time can be stored in a separate table as well, that is, if it makes business sense. Typically, though I see time stored as a degenerate dimension in the fact table as in integer representing seconds since midnight.
I’ll spend more time on time dimensions in a future post. The uses of this type of table extend beyond the Dimensional Model!
Junk, or “Mystery” Dimension: A Junk Dimension is a dimension that stores extra “junk” data about the fact. A junk dimension can take the place of degenerate dimensions, as well as store other semi-useful information about a particular fact. Transaction codes, operation stamps, yes/no flags, and free text attributes are good candidates for junk dimensions.
Causal Dimensions: These dimensions allow you to store, and therefore analyze external events which can measure causality of a business process. Causal data gives an extra dimension (no pun intended) to your existing data. These dimensions can help explain why a fact exits in the first place! For example, storing weather conditions might be beneficial to a retailer so that last month’s increase in uumbrella sales can be explained (it rained more!). Other causal dimensions would include things like rate specials, store promotions, a newspaper ad, or some major world event.
Outriggers: A type of dimension, outriggers are not directly related to a single fact but rather to a dimension. An example of an outrigger might be postal code, where postal and delivery point information is retrieved from the USPS in order to link and verify address data stored in an Employee, Customer, or perhaps Shipper dimension.
…
For more information about Dimensions, feel free to download my code examples and presentation slides from FoxForward. Also, bookmark this page as I plan to expand upon these ideas in future posts. You can also read my articles on the subject at Advisor.com.