<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: ETL Subsystem 1: Data Profiling</title>
	<atom:link href="http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/</link>
	<description>Supporting decisions through sound data management</description>
	<lastBuildDate>Tue, 23 Aug 2011 03:10:04 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: ETL Subsystem 23: Backup System &#171; Tod means Fox</title>
		<link>http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/comment-page-1/#comment-13331</link>
		<dc:creator>ETL Subsystem 23: Backup System &#171; Tod means Fox</dc:creator>
		<pubDate>Thu, 21 May 2009 18:38:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/#comment-13331</guid>
		<description>[...] just spent the better part of the year designing a perfect ETL system - from data profiling to data propagation. You&#8217;re in production, and all jobs are running on schedule in what seems [...]</description>
		<content:encoded><![CDATA[<p>[...] just spent the better part of the year designing a perfect ETL system &#8211; from data profiling to data propagation. You&#8217;re in production, and all jobs are running on schedule in what seems [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ETL Subsystem 28: Sorting &#171; Tod means Fox</title>
		<link>http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/comment-page-1/#comment-13327</link>
		<dc:creator>ETL Subsystem 28: Sorting &#171; Tod means Fox</dc:creator>
		<pubDate>Thu, 21 May 2009 18:36:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/#comment-13327</guid>
		<description>[...] Subsystem 25: Version ControlGetting Started with Data Profiling &#171; Tod means Fox on DownloadsETL Subsystem 1: Data Profiling &#171; Tod means Fox on Downloadsmark on ETL Subsystem 15: Bridge Table BuilderETL Subsystem 31: Paralleling and [...]</description>
		<content:encoded><![CDATA[<p>[...] Subsystem 25: Version ControlGetting Started with Data Profiling &laquo; Tod means Fox on DownloadsETL Subsystem 1: Data Profiling &laquo; Tod means Fox on Downloadsmark on ETL Subsystem 15: Bridge Table BuilderETL Subsystem 31: Paralleling and [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tod McKenna</title>
		<link>http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/comment-page-1/#comment-12034</link>
		<dc:creator>Tod McKenna</dc:creator>
		<pubDate>Sun, 19 Apr 2009 10:32:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/#comment-12034</guid>
		<description>Hi Andrea,

Sorry about the confusion. I&#039;ll try to make a better distinction between data profiling and data quality. 

Think of data profiling as a precursor to everything else; an introduction. Before you design for or use any data in any of the disparate systems in your enterprise, you need to understand it completely. Data profiling is that process. You will build a ton of useful metadata that you can rely on in later phases of ETL and DW/BI development. Data profiling, consequently, is also a precursor to data quality management. 

If you think of data profiling as part of data quality, then you may be inclined to skip a thorough data profile at the start of a new project. The reasoning might be: &quot;I&#039;ll learn the source system while developing the ETL and mappings, and I&#039;ll dig even deeper once the business analysts tell me what to look for regarding potential quality issues. I&#039;ll profile as I go.&quot;

In my experience, this is a death knoll. Profile first and thoroughly. When you move into data quality-related activities (which includes Extract and Transform functions of ETL, as well as complete data quality and monitoring systems you might develop as part of your DW/BI layer), you will have a complete catalog of metadata compiled during your initial profiling to turn to. 

Once I&#039;ve been introduced to the data, I am free to test the quality of the relationship. During data quality screenings, I may conduct additional profile-like activities and compile more statistics. I wouldn&#039;t call this a new &quot;data profile&quot;, but rather, simply a part of my data quality research. 

Another difference is specificity. Data profiles are broad and should touch upon each and every element you intend to work with. Data quality management tends to be more specific, as you build rules and logic to question the integrity of particular attributes and relationships in the data, oftentimes crossing multiple business processes and systems. 

I&#039;d be interested to hear your additional thoughts on this!

-Tod</description>
		<content:encoded><![CDATA[<p>Hi Andrea,</p>
<p>Sorry about the confusion. I&#8217;ll try to make a better distinction between data profiling and data quality. </p>
<p>Think of data profiling as a precursor to everything else; an introduction. Before you design for or use any data in any of the disparate systems in your enterprise, you need to understand it completely. Data profiling is that process. You will build a ton of useful metadata that you can rely on in later phases of ETL and DW/BI development. Data profiling, consequently, is also a precursor to data quality management. </p>
<p>If you think of data profiling as part of data quality, then you may be inclined to skip a thorough data profile at the start of a new project. The reasoning might be: &#8220;I&#8217;ll learn the source system while developing the ETL and mappings, and I&#8217;ll dig even deeper once the business analysts tell me what to look for regarding potential quality issues. I&#8217;ll profile as I go.&#8221;</p>
<p>In my experience, this is a death knoll. Profile first and thoroughly. When you move into data quality-related activities (which includes Extract and Transform functions of ETL, as well as complete data quality and monitoring systems you might develop as part of your DW/BI layer), you will have a complete catalog of metadata compiled during your initial profiling to turn to. </p>
<p>Once I&#8217;ve been introduced to the data, I am free to test the quality of the relationship. During data quality screenings, I may conduct additional profile-like activities and compile more statistics. I wouldn&#8217;t call this a new &#8220;data profile&#8221;, but rather, simply a part of my data quality research. </p>
<p>Another difference is specificity. Data profiles are broad and should touch upon each and every element you intend to work with. Data quality management tends to be more specific, as you build rules and logic to question the integrity of particular attributes and relationships in the data, oftentimes crossing multiple business processes and systems. </p>
<p>I&#8217;d be interested to hear your additional thoughts on this!</p>
<p>-Tod</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: DavidL</title>
		<link>http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/comment-page-1/#comment-11867</link>
		<dc:creator>DavidL</dc:creator>
		<pubDate>Wed, 15 Apr 2009 12:15:15 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/#comment-11867</guid>
		<description>Hi Andrea,

When Tod says data profiling is separate from data quality auditing, I think it seems to be the case. 

Because data profiling is the process of examining the data available in existing data sources (e.g. databases, files...) and collecting statistics. Data quality entails more than helping companies get correct data into their information systems; it also means getting rid of bad, corrupted, or duplicate data.

Of course they are related, but are not complimentary IMHO.

Hope this helped!</description>
		<content:encoded><![CDATA[<p>Hi Andrea,</p>
<p>When Tod says data profiling is separate from data quality auditing, I think it seems to be the case. </p>
<p>Because data profiling is the process of examining the data available in existing data sources (e.g. databases, files&#8230;) and collecting statistics. Data quality entails more than helping companies get correct data into their information systems; it also means getting rid of bad, corrupted, or duplicate data.</p>
<p>Of course they are related, but are not complimentary IMHO.</p>
<p>Hope this helped!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: andrea</title>
		<link>http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/comment-page-1/#comment-11103</link>
		<dc:creator>andrea</dc:creator>
		<pubDate>Wed, 25 Mar 2009 12:00:39 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/#comment-11103</guid>
		<description>hi,Tod

thanks very much.
your answer is very helpful.
but  one thing let me confused that you said data profiling is separate from data quality auditing/management.
do you agree that data profiling is the first and important phase of enterprise data improvement.
what do you think about the relationship between data profiling and data quality.
how do you define the data quality management. in my opinion, data quality management of a DW/BI project includes data profiling, correcting, standadizing, monitoring.</description>
		<content:encoded><![CDATA[<p>hi,Tod</p>
<p>thanks very much.<br />
your answer is very helpful.<br />
but  one thing let me confused that you said data profiling is separate from data quality auditing/management.<br />
do you agree that data profiling is the first and important phase of enterprise data improvement.<br />
what do you think about the relationship between data profiling and data quality.<br />
how do you define the data quality management. in my opinion, data quality management of a DW/BI project includes data profiling, correcting, standadizing, monitoring.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tod McKenna</title>
		<link>http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/comment-page-1/#comment-11079</link>
		<dc:creator>Tod McKenna</dc:creator>
		<pubDate>Tue, 24 Mar 2009 12:52:59 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/#comment-11079</guid>
		<description>Hi Andrea,

Data profiling is separate from data quality auditing/management. Profiling is a design and development tool (and process!) that will allow you to examine the content, context, and quality of data coming into your data warehouse. The results of a good data profile will tell you how to build your staging area, your dimensional models, and your data quality procedures.

Currently, I store all data profiling results in relational tables as metadata. You can read more here: http://blog.todmeansfox.com/2007/07/03/data-profiling/ and here: http://blog.todmeansfox.com/2007/07/10/getting-started-with-data-profiling/. 

Regarding your business rules question: Data profiling is an exploratory process helping you to explain the data you are receiving. In my experience, there is no room or reason to include business logic in a data profile. You would, however, build these things into a data quality system -- preferably one that runs in your staging area or one that executes on data as it is loaded into your data warehouse.

Hope this helps!</description>
		<content:encoded><![CDATA[<p>Hi Andrea,</p>
<p>Data profiling is separate from data quality auditing/management. Profiling is a design and development tool (and process!) that will allow you to examine the content, context, and quality of data coming into your data warehouse. The results of a good data profile will tell you how to build your staging area, your dimensional models, and your data quality procedures.</p>
<p>Currently, I store all data profiling results in relational tables as metadata. You can read more here: <a href="http://blog.todmeansfox.com/2007/07/03/data-profiling/" rel="nofollow">http://blog.todmeansfox.com/2007/07/03/data-profiling/</a> and here: <a href="http://blog.todmeansfox.com/2007/07/10/getting-started-with-data-profiling/" rel="nofollow">http://blog.todmeansfox.com/2007/07/10/getting-started-with-data-profiling/</a>. </p>
<p>Regarding your business rules question: Data profiling is an exploratory process helping you to explain the data you are receiving. In my experience, there is no room or reason to include business logic in a data profile. You would, however, build these things into a data quality system &#8212; preferably one that runs in your staging area or one that executes on data as it is loaded into your data warehouse.</p>
<p>Hope this helps!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: andrea</title>
		<link>http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/comment-page-1/#comment-10672</link>
		<dc:creator>andrea</dc:creator>
		<pubDate>Sat, 07 Mar 2009 16:50:36 +0000</pubDate>
		<guid isPermaLink="false">http://blog.todmeansfox.com/2007/12/07/etl-subsystem-1-data-profiling/#comment-10672</guid>
		<description>Hi, Tod
When u finished the data profiling works every time, how do u stored and presented the results. excel, word, relational tables? 
If they r relationa tables, will u report them via  interface to the users? 

I have tried to documented my results using MS word, but i found that to much items need recorded, Even some explanation of the bad data that given by BA.

Do u agree that data profiling will sometimes refered to business rules, or we only did it merely a techical process.
eg.
two fields named  vhcl_type and vhcl_color,
the BA told that :
when vhcl_type=&#039;bus&#039;, the vhcl_color data quality was quite good.and 
when vhcl_type=&#039;truck&#039;, there r too many &#039;NULL&#039; in  vhcl_color.
how do u consider this problem?

thanks for ur advice.</description>
		<content:encoded><![CDATA[<p>Hi, Tod<br />
When u finished the data profiling works every time, how do u stored and presented the results. excel, word, relational tables?<br />
If they r relationa tables, will u report them via  interface to the users? </p>
<p>I have tried to documented my results using MS word, but i found that to much items need recorded, Even some explanation of the bad data that given by BA.</p>
<p>Do u agree that data profiling will sometimes refered to business rules, or we only did it merely a techical process.<br />
eg.<br />
two fields named  vhcl_type and vhcl_color,<br />
the BA told that :<br />
when vhcl_type=&#8217;bus&#8217;, the vhcl_color data quality was quite good.and<br />
when vhcl_type=&#8217;truck&#8217;, there r too many &#8216;NULL&#8217; in  vhcl_color.<br />
how do u consider this problem?</p>
<p>thanks for ur advice.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

