This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.
In my previous post, I discussed OLAP, the Analytics Server, and Cubes as a primer to this post. If you’re new to this topic, please have a look there for more details. This post is specific to building cubes from the ETL environment.
Cube building is simply the process of populating the analytical server database with data from the relational database. The relational database can contain relational or dimensional models.
You will want to re-build your analytical (OLAP) database after loading data into your dimensional model. This will keep your analytical data stored in the analytical database in sync with your dimensional models stored in the relational database. From the ETL point-of-view, the basic process works like this:
- Extract data from disparate sources for a business process
- Cleanse and conform the data
- Load Dimensions
- Load Facts, rebuilding any aggregate fact tables
- Build cubes and other analytical server objects (like aggregates and dimensions)
These steps are repeated for each atomic business process dimensional model in the data warehouse. The first four are familiar, as I have discussed them at length. The last one should seem natural: You need to devise a way to alert the analytical database that the data warehouse has just been updated. The best place to do this is directly in the ETL system. How you handle the rebuild process is dependent upon how the OLAP vendor’s database works (I offer some suggestions below).
One thing to consider before building cube-updating code in your ETL workflow is latency. Latency, in this context, describes the time delay between the moment the atomic business process dimensional model is up-to-date, and the moment the related cube is up-to-date. It might seem natural to keep this in sync as close as possible. Certainly placing the code to update the cubes right after processing the fact tables in the ETL system is the way to go. But check your requirements. You may only need to have data refreshed once a day, in the early mornings for example (whereas your ETL system might run every 4 hours). You may also have a research environment where OLAP data need only be one week fresh. You can save yourself a lot of processing time by getting these requirements right.
To accommodate this and other scenarios, I highly recommend putting any cube-building or OLAP-related update code in a separate ETL package or module. As with fact and dimension processing, a master package or program (depending on your tool) will “call” the child packages or modules. You could embed some logic in the final step to only process cubes after the morning ETL run, for example.
I do not recommend processing OLAP cubes on a fixed schedule unless you can predict with certainty when your dimensional models will be ready. Normally, it makes most sense to kick off this process as soon as the facts have been loaded.
SQL Server 2005 Integration Services (SSIS)
If you’re using SSIS to update your cubes, then you are likely using SQL Server Analysis Services as your analytical database. If not, and you use SSIS, I’d like to know what you’re using for your analytic server and how you’re building your cubes!
SSIS comes with a powerful task that handles cube-building for you. It’s called “Analysis Services Processing Task” and can do the following: (a) connect to a SSAS database through the Analysis Services connection; (b) specify various objects in the database to update (measure groups, partitions, dimensions, cubes, mining models, mining structures, and databases); (c) define batch-mode or sequence mode to process objects together or one after another; (d) perform an impact analysis on objects associated with the cube; and (e) specify whether dependent and affected objects are also processed.
The Analysis Services Execute DDL Task also allows you to work with cubes by giving you the ability to run DDL statements, as defined by the Analysis Services Scripting Language (ASSL). These statements are included in an XMLA command (more on XMLA below). Although this task can’t process a cube, it can manipulate it. For details on the ASSL language, check out the Analysis Services Scripting Language Reference.
Hand Coding with Visual FoxPro (VFP9)
FoxPro, like all programming languages, just isn’t an analytical database engine so there is no native cube or cube builder. So a lot here depends on what type of technology you are using to handle your cubes.
Your first stop should be to check with the vendor to see if there is an api or technique available to update and rebuild cubes. Even if this functionality exists, it might not be the best option.
For your second stop, look into running XML for Analysis (XMLA) scripts against the database. XMLA is a SOAP-based protocol for interacting with OLAP databases. Through web services, you simply issue commands (like “process”) and receive responses. For more information on XMLA, please check out the XML for Analysis website.
It is normal for an ETL developer to be involved in generating cubes for analysis services, so this is a skill you should try to foster. Understanding dimensional modeling will give you a huge head start.
Next post (I’m getting closer and closer to the magic 34 — I just hope Ralph and the gang don’t decide to renumber them on me or something), I’ll go through ETL Subsystem 21: Data Propagation Manager!