Tod means Fox

Business Intelligence, Data Warehousing, SQL, Visual FoxPro.

Archive for ‘Data Warehousing’


Published May 30th, 2008

A Data Warehouser’s Vocabulary (Part 1)

Partly inspired by a post entitled “The most important thing I know about Analytics is that no-one agrees what it means” by James Taylor and partly inspired by the section “Slowly Changing Vocabulary” in the book “Data Warehouse Lifecycle Toolkit 2nd Edition“, I have decided to compile a glossary of terms and concepts that I feel have some relevance to the data warehousing and business intelligence world. I’ll break this list into several postings, and I reserve the right to refine, enhance, clarify, and augment a definition at any time! When finished, I’ll make them a permanent feature of TmF.

With this list, I am not attempting to resolve any debates, nor am I attempting to invalidate or discredit a definition you may be using. These are the definitions I use. Also be aware that certain terms might hold different meanings under different contexts. If I need to use one of those ambiguous terms, I try my best to put a good context around it. For example, when I refer to “Data Mart”, I specifically mean “Atomic Business Process Dimensional Model”. However, there are times when what I mean is to describe a separate (perhaps normalized) database for a specific user or department (i.e. a throw-away sandbox for the big kids).

Each of these definitions has a citation; I am using the XHTML “cite” tag with each. If you would like to see the source, view the source! Also, when I finish this list, and put these all together on a single page, I’ll be sure to include a reference link section as well.

So, without further ado, I give you the first group of many to come (A-Z):

Business Intelligence (BI)
A generic term to describe leveraging the organizations’ internal and external information assets for making better business decisions.
Business Process
The complete response that a business makes to an event. A business process entails the execution of a sequence of one or more process steps. It has a clearly defined deliverable or outcome. A Business Process is defined by the business event that triggers the process, the inputs and outputs, all the operational steps required to produce the output, the sequential relationship between the process steps, the business decisions that are part of the event response, and the flow of material and/or information between process steps.
Changed Data Capture (CDC)
Changed Data Capture (CDC) is a method of identifying changes made to a source database or file for the purposes of integrating the data into the data warehousing pipeline. CDC reduces data volume and processing needed for the data warehouse.
Data Mart
A business process dimensional model.
Data Profiling
Data profiling is a method of assessing source data in a systematic and analytical way. The goal of data profiling is to build an exhaustive inventory detailing the content, context, and quality of source data. It entails much more than reviewing a diagram or running a few SQL statements. Data profiling leads to better data integration, which leads to better data quality.
Data Quality
Assurances that the integrated data is consistent, complete, and fit to publish to the business community.
Data Warehouse Database
The largest possible union of queryable presentation data in a DW/BI System.
ETL
A set of processes that prepare source data for a Data Warehouse, adding value and confidence along the way. These processes include extraction, transformations (cleans & conform), and load operations. Note that the order in which ETL processes occur can be varied based on the situation. Some sources refer to the ET or just the E broadly as “Data Acquisition”.
Master Data Management (MDM)
Centralized facilities designed to hold master copies of shared entities, such as Customer and Product.
Metadata
All the information that defines and describes the structures, operations, and contents of a BI/DW system.
Operational Data Store (ODS)
A physical set of tables sitting between the operational systems and the data warehouse, or a specially administered hot partition of the data warehouse itself. The main purpose of an ODS is to provide immediate reporting of operational results if neither the operational system or the data warehouse can provide satisfactory access.
Staging
Physical workspace for data during the ETL process. Some data is temporarily staged, while other data may persist.

Published May 26th, 2008

ETL Subsystem 12: Special Dimensions Manager

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

Each dimensional model that you create will have special design characteristics and requirements. This typically means that you may need to rely on specialized dimension tables. Not all dimensions will be “Product” or “Customer”. In fact, you will likely have more specialized dimensions in your models than actual normal, business dimensions.

There are several categories of special dimensions. They are:

  • Date and Time Dimensions
  • Bridges to support many-to-many relationships
  • Special Indicators and Flags Dimensions
  • Study or Research Group Dimensions
  • Mini-Dimensions
  • “Current” Dimensions
  • Lookup and other Static Dimensions
  • Administration Dimensions (such as special logs, monitors, and audits)

Much of the above has already been discussed in my posts to date (date dimensions, bridges, and lookup dimensions for example). So not to beat a dead horse, I’ll spend the next few paragraphs explaining those I haven’t really talked much about yet.

Special Indicators and Flags Dimensions
These dimensions, also referred to under some circumstances as “junk” dimensions, exist to store extra information about the fact when it occurred. I typically use these special dimensions to hold flags (for example, if a tax payer has a history of delinquency, if a certain diagnosis and age creates greater risk for death, etc..). I usually store these flags as a tiny int with three settings: -1 null, 0 off, and 1 on. For researchers, this table acts as a way to screen large amounts of data and focus efforts on a subset of events with certain characteristics.
Study or Research Group Dimensions
Whether I’m working with Qualitative Analysts or Quantitative Analysts, they always seem to need specialized research databases (and they all seem to like using SAS too!). In the past, I got in the habit of creating copies of the required business processes dimensional models for their use, filtered and sampled appropriately. These databases would be installed in an isolated environment. This isolation allowed them to develop models, conduct what-if analysis, and to manipulate the data to test theories. This gave the researches their own data to play with and everyone was happy. Another solution — a better solution — which I have only recently begun to implement with more regularity is creating a special study group and/or research dimension which defines the research being conducted (”Infection Research”, “buying patterns of customers in Alaska in January”, etc..). A bridge table between the subject (Customer? Product? Stock? Web Page?) links the study with the sample used. All queries for that study simply use the new special dimension as an additional filter. And everyone’s happy!
Mini-Dimensions
Mini-dimensions are necessary when a subset of attributes (typically related) need Type 2 change behavior tracking, the changes happen frequently, and the dimension is rather large (such as customer, webpage hits, or stock index characteristics). A specialized mini dimension is created with the required attributes, stored at a grain that represents a combination of attributes. This mini dimension does not contain a link back to the parent dimension, nor does it contain the natural or primary key of the dimension. Each row is instead given a unique key that is referenced alongside the parent dimension in the fact table. For a large Customer dimension, you may create a mini-dimension with income bands, year-to-date behaviors, job status, and other related financial and purchase information. This “profile” might be shared by hundreds or thousands of customers and change frequently. Therefore, the data you store would not be identifiable on its own to any single customer, but to all customers who share the characteristics.
“Current” Dimensions
There are times when large Type 2 SCD dimensions become too unwieldy for querying and other analysis; or, what you really want is a Type 1 dimension to do most of your work, but need to maintain Type 2 changes over time to accommodate some special need (like compliance). The solution is to copy out the “current” rows of a dimension (using a current flag or date range) into a second, special Type 1 dimension. I’ve had success using this technique using three methods: (a) create a role-playing view on the dimension using only the current rows (one challenge with this method is that you need to store a special key in the parent dimension so that you can use the view to relate to a fact table row); (b) by creating a new Type 1 dimension through ETL and placing a new foreign key to my table in the fact; and (c) by using the Type 2 dimension as a bridge table, linking to the special dimension through its natural key (which can be a view or a physical table created through ETL).

SQL Server 2005 Integration Services (SSIS)

Other than building the dimensions, there is nothing special from an SSIS point of view. You will generally handle the creation of the special dimensions in separate packages (dates and lookups for example) or as part of normal dimensional processing (which would be the case for mini-dimensions and current dimensions for example).

Hand Coding with Visual FoxPro (VFP9)

Same as with SSIS. Nothing special. If you can process a dimension, you can process a special dimension. Most of the onus falls on business requirements, data modeling, and overall data warehouse design. Non of which are inhibited by Visual FoxPro.

From Here

This subsystem is deigned to be a place-holder for handling special dimensions. I have found that a great deal of value found in a DW/BI solution actually stem from these. All those special flags, scoring results, quality assessments, and additional features that we’ve painstakingly added through ETL are now available.

In my next post in this series, I’ll discuss lucky subsystem 13: Fact Table Builders.

Published May 20th, 2008

ETL Subsystem 11: Hierarchy Manager

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

This subsystem exists to ensure that hierarchies are appropriately translated and represented in the dimensional model. There are two types of hierarchies that you’ll need to contend with: fixed and ragged.

Managing hierarchies can be a complex process especially when you have hierarchies that are extremely ragged (for example, manufacturers’ parts or an organizational chart). You’ll also run into complexities when a dimension entity (like a single customer or a single product) exists in simultaneous hierarchies. I’ll talk more about these later in this post. In contrast, the fixed variety is easier to work with.

Hierarchies as Attributes of a Dimension

Generally speaking, you can think of hierarchies as many-to-one relationships. In the dimensional modeling world, these relationships are represented in a single table. This would include stores to regions to state, children to parents to grandparents, and greens to vegetables to produce. ETL Subsystem 11 seeks to maintain the integrity of these relationships.

To simplify even further, instead of thinking in “tree” structures, think in “lines” from the child up the hierarchy to the parent. This will help you build the dimensions to accommodate the relationships. In the simplest form: A single store is in a single region in a single state. This is actually a very interesting topic from a modeling perspective — one in which I’ll get to in more detail in a future posting.

state_region_storeDimensions are denormalized structures, which means that you will have many repeated elements. This is normal and desired. For example, a store region will be repeated for each store and state will be repeated for each region all in the same dimension (look left). This is normal and desirable in a dimension. The trick to getting this to work correctly is that the hierarchy must be represented as a single value with the dimension row’s primary (surrogate) key.

Data modelers who are used to 3rd Normal Form might look at the above and cringe. But remember: normalized models are for preventing data anomalies in a transaction environment. In a data warehousing environment, the rules are different. First, there are no opportunities for data anomalies due to data integration controls. Secondly, normalizing the data warehouse makes absolutely no sense from a usability perspective: it only complicates and slows down reports and queries.

The Ragged Kind

So fixed hierarchies are easiest to work with: got it. It is not so easy to work with variable and ragged hierarchies because of their variable depth. The classic example is an organization chart, where any employee can be at the top or at the bottom of the hierarchy. Knowing how deep the organization runs from any point is a challenge that usually requires self joins and bridge tables to represent the relationships.

I have always solved these types of hierarchy “problems” using “helper tables” in a dimensional model. Ralph Kimball wrote a great article a decade ago on this subject. Check it out for more details.

Helper tables look like bridge tables that sit between the fact and the dimension. They facilitate the representation complex hierarchal information. This design complicates user queries though, so be sure that helper tables are absolutely needed. It might be, for example, you only really need the manager’s name and not the entire chain of command with each employee. You don’t need a helper table for that (see the following code)!

SELECT
   c.FirstName AS empFirstName,
   c.LastName AS empLastName,
   e.title AS empTitle,
   COALESCE(m.mgrFirstName,'N/A'),
   COALESCE(m.mgrLastName,'N/A'),
   COALESCE(m.mgrTitle,'I am the boss')
FROM
    HumanResources.Employee e 
JOIN Person.contact c ON c.ContactId = e.EmployeeID 
LEFT JOIN (
   SELECT
      e2.EmployeeID,
      c2.FirstName AS mgrFirstName,
      c2.LastName AS mgrLastName,
      e2.title AS mgrTitle 
   FROM
      HumanResources.Employee e2 
   JOIN Person.contact c2 ON c2.ContactId = e2.EmployeeID  ) 
  AS m ON m.EmployeeID = e.ManagerID

I wrote a while ago on ragged hierarchies from a programming perspective. Take a look at that post for more details.

Date Hierarchies

For another example, let’s look at a common hierarchy we can all relate to: year, quarter, month, and week. You can see how this hierarchy is modeled by looking at any date dimension. As you’ll see, each day contains information about how it is grouped on the calendar. The information is repeated for each day until one of the groups change. When designing reports that allow your users to drill down into the data, it is a common approach to start at the highest group (sales by year) and then look a bit deeper as necessary (sales by quarter or sales by month, for example).

Also, in the date dimension, weeks don’t line up too well within a year, quarter, or month. This is a classic example of how one group can fit entirely or partly into another. You see this often when a sales region crosses multiple states, or when an employee serves multiple roles within the company. These are all situations that must be accounted for by your hierarchy manager. For the date dimension, one technique I’ve adopted is to include the week of the year number, week of the quarter, and week of the month in the date dimension to give users the ability to drill into weekly data much easier (for example, you might want to measure holiday sales in the US from the 3rd week of November to the 4th week of December).

Snowflakes and Hierarchies

SnowflakeSnowflakes are usually a sign that a hierarchy has been normalized. This is bad. Don’t fall into the trap! In order to keep the dimensional model as simple as possible, you should avoid snowflaking; however, snowflake designs are perfectly legal under certain circumstances. Carefully examine the reasons though. If you are snowflaking to accommodate a hierarchy, hold your horses. Hierarchies are a natural part of a dimension. In fact, most things in life are categorized and need to be grouped in some way. Remember that one of the primary purposes of delivering a denormalized dimension is to remove almost all complexity from the user’s perspective. This usually means hierarchies.

SQL Server 2005 Integration Services (SSIS)

Denormalizing hierarchies using SSIS is not difficult. The hardest part is usually in writing the SQL that correctly fetches the right data. In a Data Flow, you can use a series of Lookup and Merge components. In some cases, especially for the more complex ragged hierarchies, your best bet is to use SQL statement in an Execute SQL Task.

Although I won’t get into details in this post, CTEs (Common Table Expressions) are excellent for working with recursion and hierarchies.

The real joy of hierarchy management using the SQL Server Business Intelligence suite comes at the end when you want to start using your hierarchies to allow users to drill down into the data. Even if you have complex ragged hierarchies structured with bridge tables in your model, Analysis (SSAS) and Reporting (SSRS) Services can really make sense of it all. But I’m digressing; this series of posts is about ETL and not presentation tools!

Hand Coding with Visual FoxPro (VFP9)

It’s easy to write recursive functions in Visual FoxPro. Recursion is one of the “secretes” to flattening out hierarchal structures. For more details on writing recursive functions in FoxPro, check out my post “Ragged Hierarchy Alert” or even better, visit the FoxWiki page on the subject.

From Here

In the end, as I continue to compare SSIS with VFP on data integration, I find that hierarchy management is about equal between the two. FoxPro seems to perform much better but I have no benchmarking (yet) to prove it. One advantage with SSIS is being able to utilize CTEs.

This was a long one, but a heavy subject. In my next post, I’ll discuss the Special Dimensions Manager, ETL Subsystem 12. Not as heavy and hopefully not as long!

Published May 15th, 2008

Mums the word: KU Alumni Group in the works for LinkedIn

It turns out that the Kimball Group is working on a new Kimball University Alumni group for LinkedIn. This is pretty cool because as far as I know, there is no real good way to get in touch with Kimball University “graduates”. The plan is to announce the group officially at some point in the near future (I believe on the next design tip).

According to my “source” (can you tell I’m really trying to sound like a reporter with some really important scoop?), they even have a logo ready to go. As some of you can attest to, the logo can be the hardest thing to settle on!

ETL Subsystems

I’ll be getting back on track with my dissection of the 34 Subsystems of ETL (in which I give my thoughts on each subsystem through SSIS and Visual FoxPro). I took a few weeks off while in Amsterdam, but should be ready to start again. Next on the list is ETL Subsystem 11: Hierarchy Manager. I expect to have it ready by early next week.

LinkedIn

By the way, if you’re on LinkedIn, and you read this blog, feel free to invite me into your network. You likely are a FoxPro user which means I’m going to like you, or you are involved in some way with data warehousing, which means I’m going to like you! Just let me know you’re a reader!

Published May 10th, 2008

Live from Kimball University: Day 4 (ETL, Addresses, Applications, BI Toolkit?)

Ok, so I’m a day late! I had to catch a train back to Brussels, and considering I hadn’t seen my family since Monday morning, I didn’t spend last night blogging! Besides, the four days were fairly taxing. Whether you think you know everything about dimensional modeling or not, the material was fairly dense and Warren and Margy both moved fast.

If you are serious about data warehousing, and call yourself a data warehousing professional (especially if you are a “dimensional modeler”), then attending this and other sessions is a must. I learned more Warren and Margy directly than I could have from their books. And hearing about their various experiences added an extra dimension (no pun) to the education.

Now, if we can just convince them to create a certification exam

ETL System

The entire morning was focused on ETL Design and Development. The first step in designing the ETL processes is to meet a basic set of requirements. These would include having all business requirements fleshed out; a “solid” dimensional model ready to go; data quality, archiving, lineage, and management infrastructures defined; compliance, latency, and security issues addressed; and finally a good understanding of all application requirements. With these in place, it is possible to create the ETL System.

Basically, the ETL system is in charge of converting source system data to the data warehouse database. Don’t get hung up on the acronym though. Depending on the circumstances, the process may feel more like ELT, or ELTL, or E&T, more T, some L, and more T.

Warren recommends using an ETL vendor tool mainly because the tools handle technical metadata requirements and likely have a lot of other very helpful functionality built in (logging, standard tasks, connection managers, dimension management, surrogate key lookup capability, in-memory pipelining, etc..).

As many of you know, I am an advocate of hand-coding ETL, especially using Visual FoxPro. VFP makes it easy to build in all the things that other ETL tools do with little effort. In addition, VFP is data centric and object-oriented, and if the ETL processes are designed correctly can be totally metadata driven. With that said, ETL tools are getting better and better. If you have a couple of developers in house that are experienced with FoxPro or another data-centric language, then by all means go for it. If not, then using an ETL tool is likely your best bet.

Regardless if you hand-code or buy, Warren defined the following steps that are needed to develop the ETL system:

  1. Create a high-level plan for the target model
  2. Install and learn the ETL tool!
  3. Determine your default strategies
  4. Drill down by target table
  5. Develop the historical load for dimensions
  6. Populate historical facts
  7. Dimension table incremental processing
  8. Fact table incremental processing
  9. Aggregate and OLAP loads
  10. ETL operation and automation

Nothing earth-shattering, but I have seen first-hand how ETL systems can stall and flutter along because one or more steps were ignored or done out of order. The Kimball Group has spent a long time together developing their best-practice and practical techniques so that we can all benefit. This list is just another example; it’s a simple pattern to follow that will likely lead to greater successes.

Name and Address Fun

As part of our ETL discussion, we spoke of data cleansing and conforming. One major challenge that data integrators have is dealing with very dirty data from multiple source systems. One good example is the challenges associated with name and address matching.

Warren showed us a series of slides with scanned address labels delivered (miraculously in some cases if you saw the slides) to his house. His name was butchered in every which way. Some had his name as Thorn. Another had him as a woman. It goes to show that data quality and name and address cleansing is very hard to do. When you consider that some companies spend millions on their mailings, this is a big deal: And they can’t even get it right!

BI Applications

After lunch, we dove right into a discussion on Business Intelligence (BI) Applications. One thing Warren said really stuck out to me, and it is something I wrote in big letters on my copy of the presentation slides: BRANDING.

BI Applications are in fact a way for us — as DW/BI professionals — to brand our services. We create our image through the BI Application layer. Despite all the background and expertise we need to do all other phases of the LifeCycle, the executives, managers, and users will see us through our applications.

I admit, I had an “ah-ha” moment when this came up. Everything from Portal design to reporting templates to our logo (yes, we need a logo too!) is equally as important as cleansing and conforming data, designing a perfect data model, and conducting hours upon ours of requirement-gathering interviews.

I only wish that we had more time to spend on this topic. An entire day would have been great.

BI Toolkit

We need a BI Toolkit. A vendor-neutral, general discussion on what it means to deliver BI Applications to our users. I talked with both Margy and Warren about this and both seemed to agree (as does their publisher). It’s been in their sights for some time now, I suppose. While the other toolkit books have been an invaluable reference for the entire community — this book could literally change the way data warehousing teams and businesses consider BI, BI Applications, and the various possibilities that BI offers.

Personally, I would like this book to take more time delving into user requirements. I don’t want it to tell me why I need one p-value over another, nor do I want it to tell me to use Business Objects because it can support feature X or Y. I need a reference that reminds me that developing a scorecard requires multiple business processes, that building a reporting template to pass around to business users can be a great tool to flesh out design decisions, and that perhaps a “BI Matrix” needs to be developed so that each BI Application can be resolved against each underlying business process dimensional model. Ideally, it would be a book that a business sponsor can read and digest, while providing the Data Warehousing team with enough technical knowledge to appropriately evaluate tools, build strategies, and formulate roll out plans that give the user the biggest and quickest return on their investments.

Lastly…

It’s a bummer that Warren had to race through the deployment and maintenance materials. We really did have a good class though, and a lot of great questions were asked. You can tell that most people in the room really “get it”, and their questions all reflect that. Of course, this led to a few diversions and a tangent or two. We paid for that in the last section. Thankfully, I have a brand new, 2cnd Edition Lifecylce Toolkit book to refer to!

This was a great experience for me, and I met a lot of great people. If you haven’t been to one of these courses, I recommend it no matter the cost.

If you have questions or comments for me, don’t be shy. I will be writing more and more about these topics in future postings, and I’ll also be picking up where I left off with my postings on the ETL Subsytems 1 through 34. So if there is something you would like me to address first, let me know!

Published May 8th, 2008

Live from Kimball University: Day 3 (Architecture, Metadata, Real Time BI)

Today’s session was packed with practical design and implementation advice and best practices for the technical architecture design, product selection, and physical design of the data warehouse. Warren has a great wit and produced a lively session out of material that most normal humans would fall asleep to. We all managed to stay awake though — even past our 5 O’Clock end time.

So far, out of the three days, I learned the most today.

Technical Architecture

“Most data warehouse projects don’t take the time up-front to create a viable technical architecture.” That was the bullet on the Power Point slide. Fortunately, I’ve been on projects that did. In one group we had a CORD (Concept of Operations & Requirements Document) together with several other documents detailing the ETL architecture choices and standards. In another, we had a full LPAD (Logical and Physical Architecture Document) which clearly laid out the hardware, metadata, and flow of the expected system.

Designing such a document is unchartered territory for me, even though I’ve penned several (more than I care to recount) technical documents. The framework and requirements of the final data warehouse technical architecture document is quite demanding and just plain hard to do. You must include business requirements, standards, implementation details for the business requirements, ETL service details, Subsystem strategies, data access strategies, metadata maintenance plans, data storage, backup and recovery, deployment, interview summaries, and many more categories.

To create this plan, Warren detailed an 8 step process:

  1. Prepare
  2. Gather architecture-related requirements
  3. Draft document
  4. Develop architectural models
  5. Determine implementation phases
  6. Spec the Subsystems
  7. Develop the final document
  8. Review and finalize the plan

As you can see, preparing this document is not trivial but an integral part of the design process. Most of what is done at this level can be implemented across the entire data warehousing “Program” (not just the “project”). Individual projects might have some specific requirements, so during each iteration of the Lifecycle, it might be necessary to refine or update the document.

Metadata

We talked at length throughout the sessions today about metadata. In our environment, there are really three types we truly care about: business, technical, and process metadata. The following sections will give you an idea of what is expected for each category.

Business Metadata: Describes the contents of the data warehouse in more user-acceptable terms:

  • Data quality
  • Data dictionaries
  • Logical data mappings
  • Business rules

Technical Metadata: Defines the objects and processes of the warehouse:

  • System inventory
  • Source descriptions
  • ETL Job logic
  • Backup and security

Process Metadata: Describes the results of various warehouse operations:

  • ETL operations statistics
  • Audit results
  • Quality results

Metadata management is really about following the 80/20 Principle. Warren even talked about striving for a realistic 10% metadata coverage instead of shooting for the holy grail, entirely integrated metadata repository. There are a lot of reasons for holding back efforts. Perhaps the biggest reason is that it is a maintenance headache with very little or spotty business value. Warren stressed that vendor tools should provide metadata and it’s about time we started demanding better of them.

Presentation Server

The presentation server(s) contains the atomic business process dimensional models. The server(s) acts as the access point for all BI applications and is fed by the ETL system. The dimensional models are complete with atomic-level data, aggregates, and conformed dimensions and facts. The server’s configuration and design is based on the requirements established during planning. The dimensions will contain combinations of SCD types, multi-valued hierarchies, and surrogate keys. Fact tables will be at transaction, accumulating snapshot, and periodic snapshot grains.

All of these things are planned for and conceived before the ETL and DBA groups get their hands on the documentation.

Real Time Data Warehousing

Real-time is defined (at least for this discussion) as any processing which requires access to data in the presentation servers less than 24 hours after the event occurs. Although the buzz of real-time has subsided — in some cases the term “right-time” has taken over — there is still a need to process data quickly. Warren went over a few different solutions, including the design of a “Real Time Layer”, which is a parallel structure to the data warehouse. This structure is in a dimensional form that matches (conforms, I guess you could say) to the dimensional models in the presentation servers.

The real-time layer would contain facts and their related dimensions only for the business events that have occurred since the last reconciliation with the data warehouse. Sounds simple? Not so fast. There are various issues with Type 1 and Type 2 changes, as well as the added complexity in maintaining the layer. But, as Warren stresses, if there is a compelling business need to have real-time access to dimensional data, then this might be a good way to go. Other options include building operational data stores, or building some sort of federated reporting system.

Finally…

We talked a lot about different vendor products. I may talk about these in some future post, but for now I don’t really see much value in it. There have been so many mergers and acquisitions, that it is very difficult to keep track of who is who. Warren even admits that he can’t keep up with some of these marketing departments as they re-brand and re-package many of the same products.

Tomorrow is the last day. I feel pretty good about what I’ve learned so far. More importantly, I feel that the work I have been doing to this point is valid. It’s nice to get reassurance once in a while!