Live from Kimball University: Day 1 (planning, requirements, dimensional models)
As you may know, I am currently in Amsterdam attending the Kimball University (organized by Quest for Knowledge / Q4K) training on the Data Warehouse Lifecycle. Margy Ross and Warren Thornthwaite are here and so far so good. Margy is teaching for the first two days, while Warren will take the final two days. I am not disappointed thus far.
Admittedly, much of the material in this course is not new to me. I’ve been exposed to dimensional modeling for some time now, and with a few projects under my belt, I’m starting to feel like I know what I’m talking about (which could be dangerous, I know). I’ve read the books, the articles, the tips, and have even written my own articles and blogged on many subjects. Attending these four day sessions will be primarily to validate and shore-up what I already know and hopefully correct some of my misconceptions and mistakes. With that said, I am getting a lot out of each session so far.
Project and Program Planning
Margy started out with a brief introduction, detailing the roots of the Kimball Group. Its pretty interesting to hear how they started out as 3 2-person companies and essentially merged into one: The Kimball Group. These six people have fundamentally changed the way data modelers and architects think about and approach data warehousing. What sets them apart, I believe, is that their ideas and theories stem from needs of the user.
Tremendous emphasis is placed on getting the requirements right. Margy pointed out that involving the users early and often is a key to success. Oftentimes, she said, users are involved early on and then forgotten once development begins. Three or Four months later, what is delivered might not be exactly what the end user had in mind. As the development team works, they uncover more and more hidden issues that must be resolved. The best person to help resolve many of these issues is the business user — not the data modeler or ETL architect.
In short, planning for a data warehouse must involve the business community. You need a strong and charismatic sponsor who has vision and influence, a data steward who cares about convention and quality, and a business analyst who can straddle the fence between the green grassy fields of IT and the barren wasteland of the business (kidding…).
Margy also developed a list of warning signs that ranged from failing to fill the various shoes I mentioned in the previous paragraph to acknowledging “BI/DW success is tied to user acceptance”. One of her points, not to underestimate the data cleansing workload, is finally starting to become less common. But it still remains one of the main reasons why data warehousing projects fail.
The Bus Architecture
Here’s something you don’t get by reading a book: Margy and the group didn’t quite like Ralph’s name for what we know as the Bus Architecture. I don’t have an electrical engineering background, but Ralph does.
I suppose he felt that the way business processes and conformed dimensions related together reminded him of how a computer’s bus architecture worked. Essentially, being able to swap in and out components (If you have an electrical engineering background, or are just plain smarter than me on this, then please correct me if I’m off-base!).
I asked Margy what she would have called it, and she didn’t really know. I guess that’s why Ralph won-out. He was the only one with an idea!
For our purposes, the Bus Architecture “provides a standardized master set of conformed dimensions and conformed facts used throughout the data warehouse.” Click the image for a larger picture. This image comes from a SQL Server Mag article “Dimensional Modeling Basics” by Joy Mundy and Warren Thornthwaite.
Dimensional Modeling and Conformed Dimensions
I’ve talked at length in my articles and posts about dimensional modeling. Margy went over the basics today, starting out with a discussion on the differences between the Kimball approach and the Corporate Information Factory (CIF) approach pushed by Inmon. Although the dust between the two camps has settled in recent years, Margy was quite adamant about the benefits of dimensional modeling over the enterprise data warehouse concepts. Margy detailed these, but until I get my notes together, I’ll leave it at that.
She also stressed the reliance on the star schema, and warned of snowflaking designs. Snowflaking occurs mainly when you attempt to normalize a dimension, usually by breaking the dimension up by its hierarchal data. For example, you might be tempted to break the “Store” dimension, which contains city, district, region, and state into separate City, District, Region, and State dimensions. Of course, doing so creates a snowflake and makes the model much more complex than it needs to be.
Conformed dimensions and facts are really the key to a successful dimensional modeling — and by extension data warehousing — project or program. They allow you to conduct drill-across queries, where you query data across business processes, and ultimately conformed dimensions and facts give your end users a consistent version of the data no matter what process they are looking at. Using the Bus Architecture, dimensions that are used across business processes are easy to spot.
Comments and Questions
I am also paying close attention to many of the questions and comments being made during the Q&A moments. The room is roughly composed of about 75% of people who are somewhat new to data warehousing and business intelligence. So it is a great opportunity to hear some of the questions the group has had for Margy.
Some questions: a.) How do you link a promotion to a product to show all the products that did not sell during a promotion? b.) Can you add the time to the date dimension? c.) Why should I conduct interviews during requirement gathering when I know the business well enough myself? d.) When determining the feasibility of implementing a business process, are there other important factors outside of data availability to consider?
I’ll elaborate more on all of these answers (and more) in future postings. In short, the answer key is simply: a.) Use a factless fact between product and promotion; b.) No, use either a Time dimension or store a timestamp in the Fact table; c.) You don’t know as much as you think you know and interviews will form the basis for future documentation; d.) available resources and technology issues can be overcome, perhaps the next biggest factor is dependencies between business processes.
And Lastly…
Almost everyone in attendance is a consultant. I am still getting used to this. I’ve seen and met more consultants in my short time here in Europe (about 5 months so far) than I have my entire life. Margy was also surprised. She did make a very interesting observation though. Consultants, who usually pay their own way through courses and seminars, typically stay away when times are tough. When the economy is good and/or they have plenty of work, consultants attend more conferences. Interesting. I suppose this could mean that the consultant business is doing quite well in this part of Europe.
…
I couldn’t get everything into this single post. I’ll be sure to post more notes as time goes on. For now, it’s on to tomorrow!
I'm a Quant Technical Specialist (Data Warehousing and Business Intelligence), with expertise in business analysis, data modeling, and data integration. I have extensive experience developing vertical and integrated desktop, Internet, and BI applications spanning municipal, clinical, and financial industries.

May 6th, 2008 at 6:39 pm
Muy agradecido de tus post desde el curso,
mucha suerte,
Saludos,
Diego