<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: ETL Subsystem 10: Surrogate Key Generator</title>
	<atom:link href="http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/</link>
	<description>Supporting decisions through sound data management</description>
	<lastBuildDate>Tue, 23 Aug 2011 03:10:04 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Tod McKenna</title>
		<link>http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/comment-page-1/#comment-15478</link>
		<dc:creator>Tod McKenna</dc:creator>
		<pubDate>Thu, 15 Oct 2009 05:01:20 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/#comment-15478</guid>
		<description>Hi Nick, thanks for the great comment. Partitioning on the date key is quite natural, and made easy by using the ISO format. It is one of the best arguments for using the date as the key. 

The only thing I would add is that once you use the ISO format, you no longer have a surrogate key; you have an ISO date that can be used like you would a degenerate dimension. That&#039;s what makes the date dimension exceptional (for the reasons I listed). Otherwise, you can use a sequential number (a real surrogate) and force your DBA to do some math and date mapping in his/her partition scheme (keys 1-31 = partition 1, keys 32-59 - partition 2, etc..). 

When building ROLAP queries, you can gain some nominal performance benefits and simplify some queries by not joining on the date dimension when unnecessary. The ETL developer, the DBA, and the analyst are all happy for various reasons.  There is always a trade-off between convention (fact/dimension surrogate key system) and usability/performance. The date dimension / foreign key is one of those cases that the normal rules can be bent (IMHO).</description>
		<content:encoded><![CDATA[<p>Hi Nick, thanks for the great comment. Partitioning on the date key is quite natural, and made easy by using the ISO format. It is one of the best arguments for using the date as the key. </p>
<p>The only thing I would add is that once you use the ISO format, you no longer have a surrogate key; you have an ISO date that can be used like you would a degenerate dimension. That&#8217;s what makes the date dimension exceptional (for the reasons I listed). Otherwise, you can use a sequential number (a real surrogate) and force your DBA to do some math and date mapping in his/her partition scheme (keys 1-31 = partition 1, keys 32-59 &#8211; partition 2, etc..). </p>
<p>When building ROLAP queries, you can gain some nominal performance benefits and simplify some queries by not joining on the date dimension when unnecessary. The ETL developer, the DBA, and the analyst are all happy for various reasons.  There is always a trade-off between convention (fact/dimension surrogate key system) and usability/performance. The date dimension / foreign key is one of those cases that the normal rules can be bent (IMHO).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick Galemmo</title>
		<link>http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/comment-page-1/#comment-15461</link>
		<dc:creator>Nick Galemmo</dc:creator>
		<pubDate>Tue, 13 Oct 2009 21:45:12 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/#comment-15461</guid>
		<description>In ETL Subsystem #10 you gave the following reasons for using YYYYMMDD as the primary key for a Date Dimension table:

&quot;You can treat Date dimensions a little differently. Instead of making the first row in your Date Dimension start at 1, use the ISO 8601 date standard YYYYMMDD format. This makes sense for a few reasons. First, it is easier to look at a row in a Fact table and have the date available without a join to the Date dimension or one of its role players. Second, a Date Dimension is treated as a Type 1 SCD, meaning, you would never have the same date repeated with one row being active and the other inactive as in TYPE 2 SCDs (hence throwing off the ISO date format). Third, because Fact tables generally contain several dates, using the integer date key (which is an int in the ISO date format) instead of fetching the formatted date from the Date Dimension may have some performance benefits because you can easily format the date later on in the presentation layer. Lastly, you can usually predict with certainty the total number of rows that the Date Dimension will have, and then pre-populate all the dates before you even do your first integration. This removes the Date Dimension from the surrogate key processing all together.&quot;

I agree that YYYYMMDD should be used as the primary key, however it is NOT for any of the reasons you listed.  In fact, ANY end-user interpretation of a foreign key in a fact table violates the whole idea of surrogate keys.

The reason date dimensions are an exception to the rule is to make life easy for DBAs when they need to define partitions for fact tables.  Date is, by far, the most common partitioning criteria used in physical database design.  If the value of the date key column is known in advance, the DBA can do his thing with the DDL to lay out partitions to support data distribution, archiving and a host of other functions.  If the date dimension uses a true surrogate key, it would require the dimension to be built in advance and remain unchanged before the rest of the physical database is constructed.  Doing it that way creates a lot of headaches.

Even though the key is in a known format, you should never shortcut the dimension process and try to directly interpret the foreign key value in a data warehouse query.</description>
		<content:encoded><![CDATA[<p>In ETL Subsystem #10 you gave the following reasons for using YYYYMMDD as the primary key for a Date Dimension table:</p>
<p>&#8220;You can treat Date dimensions a little differently. Instead of making the first row in your Date Dimension start at 1, use the ISO 8601 date standard YYYYMMDD format. This makes sense for a few reasons. First, it is easier to look at a row in a Fact table and have the date available without a join to the Date dimension or one of its role players. Second, a Date Dimension is treated as a Type 1 SCD, meaning, you would never have the same date repeated with one row being active and the other inactive as in TYPE 2 SCDs (hence throwing off the ISO date format). Third, because Fact tables generally contain several dates, using the integer date key (which is an int in the ISO date format) instead of fetching the formatted date from the Date Dimension may have some performance benefits because you can easily format the date later on in the presentation layer. Lastly, you can usually predict with certainty the total number of rows that the Date Dimension will have, and then pre-populate all the dates before you even do your first integration. This removes the Date Dimension from the surrogate key processing all together.&#8221;</p>
<p>I agree that YYYYMMDD should be used as the primary key, however it is NOT for any of the reasons you listed.  In fact, ANY end-user interpretation of a foreign key in a fact table violates the whole idea of surrogate keys.</p>
<p>The reason date dimensions are an exception to the rule is to make life easy for DBAs when they need to define partitions for fact tables.  Date is, by far, the most common partitioning criteria used in physical database design.  If the value of the date key column is known in advance, the DBA can do his thing with the DDL to lay out partitions to support data distribution, archiving and a host of other functions.  If the date dimension uses a true surrogate key, it would require the dimension to be built in advance and remain unchanged before the rest of the physical database is constructed.  Doing it that way creates a lot of headaches.</p>
<p>Even though the key is in a known format, you should never shortcut the dimension process and try to directly interpret the foreign key value in a data warehouse query.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ETL Subsystem 14: Surrogate Key Manager &#171; Tod means Fox</title>
		<link>http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/comment-page-1/#comment-13337</link>
		<dc:creator>ETL Subsystem 14: Surrogate Key Manager &#171; Tod means Fox</dc:creator>
		<pubDate>Thu, 21 May 2009 19:42:43 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/#comment-13337</guid>
		<description>[...] tracked. Surrogates are also desirable because they are single-part and fast for joins. I&#8217;ve already discussed the need for surrogates in several posts. Here, I&#8217;ll talk about some techniques for replacing [...]</description>
		<content:encoded><![CDATA[<p>[...] tracked. Surrogates are also desirable because they are single-part and fast for joins. I&#8217;ve already discussed the need for surrogates in several posts. Here, I&#8217;ll talk about some techniques for replacing [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ETL Subsystem 15: Bridge Table Builder &#171; Tod means Fox</title>
		<link>http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/comment-page-1/#comment-13336</link>
		<dc:creator>ETL Subsystem 15: Bridge Table Builder &#171; Tod means Fox</dc:creator>
		<pubDate>Thu, 21 May 2009 19:42:10 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/#comment-13336</guid>
		<description>[...] weighting factors. For the keys, you can use the same techniques described in my posts about Subsystem 10 and Subsystem 14. For weightings, I&#8217;ve implemented weighting algorithms using both stored [...]</description>
		<content:encoded><![CDATA[<p>[...] weighting factors. For the keys, you can use the same techniques described in my posts about Subsystem 10 and Subsystem 14. For weightings, I&#8217;ve implemented weighting algorithms using both stored [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tod means Fox &#124; 34 Subsystems of ETL Data Integration</title>
		<link>http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/comment-page-1/#comment-1844</link>
		<dc:creator>Tod means Fox &#124; 34 Subsystems of ETL Data Integration</dc:creator>
		<pubDate>Mon, 16 Jun 2008 09:49:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2008/04/15/etl-subsystem-10-surrogate-key-generator/#comment-1844</guid>
		<description>[...] Surrogate Key Generator [...]</description>
		<content:encoded><![CDATA[<p>[...] Surrogate Key Generator [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

