Tod means Fox

Business Intelligence, Data Warehousing, SQL, Visual FoxPro.

Archive for September, 2008


Published September 16th, 2008

ETL Subsystem 21: Data Propagation Manager

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

It will often become necessary for you to move data from the data warehouse to other applications, databases, and servers. This is a little different then scaling out your data warehouse to support multiple servers (an integration server, relational database server, reporting server(s), analytic server(s), etc.); typically, distributing data in this way is done via replication techniques. But this subsystem addresses another important need.

I usually refer to this need as “Data Provisioning“. Provisioning is a term borrowed from the telecommunications industry meant to describe the process of providing various telecom services to a customer. A data provision is simply a service that provides data to some client. The client can be an Excel sheet, XML file, application, operational system, or perhaps a master (MDM) database.

This subsystem is not the only place that you’ll be provisioning data to various client applications. BI Applications often allow various extracts and downloads that can be made available to the same list of clients. This usually comes by way of a downloadable report or extract. However, the data propagation subsystem provides a few additional benefits above what BI Applications can usually deliver. First, as an ETL process, you can utilize the full power of the ETL system to perform additional transformations on the data before delivery. Second, propagating data can be part of the ETL pipeline so that provisions can be made available as soon as the dimensional models are loaded (highly desirable if the provision is meant to update an operational or master data system).

Consider though that as an ETL function, these data provisions are best suited for well-defined, almost-never-changing requests for data. You would not use this subsystem to provide data in an ad hoc way to an analyst! You would, however, use this subsystem to provide a daily extract to your risk department for compliance reasons.

SQL Server 2005 Integration Services (SSIS)

The only challenge that you’re bound to face is providing the data in the various formats required by the client. I’ve found that Excel is particularly tricky to work with and as a consequence, I usually push my clients to accept CSV or Tab delimited files instead!

Hand Coding with Visual FoxPro (VFP9)

You’ll have no problems propagating data with FoxPro. In many cases, you can use SQL to get the data you need and simply write it to a table or file on disk. The COPY TO command supports different formats and it isn’t difficult to create outputs in other formats such as XML. Using the same techniques you would employ to connect to various servers (using ODBC drivers for example), you can push data into almost any environment. This is a strength of FoxPro that sets it apart from other ETL scripting languages.

From Here

There’s no real magic to propagating data. It’s an ETL function with the typical extract, transform, and load operations. You can treat it as part of your current ETL system; perhaps running provisions after your facts are loaded. Alternatively, provisions can be run on a schedule independent of normal ETL processing or can be run on demand as needed.

This post wraps up the “Data Delivery” function in the ETL environment. In my next post, I’ll move on to “ETL Management” functions including scheduling, backup, security and metadata.

Published September 6th, 2008

Book Review: Wordpress Theme Design

Wordpress Theme DesignOver the past few months I’ve had the book “WordPress Theme Design” on my desk ready to be reviewed (it’s another book from Packt publishing). I finally got to it earlier this week. You can read my review over at Amazon. I gave it 3 stars mainly because it wasn’t complete and lacked some of the details I’d like to see in a “guide” book. But I can’t say I didn’t learn anything.

One of the reasons I picked up this title was because I’ve been itching to redesign Tod means Fox. I use WordPress (and like it very much) and have all the tools (and some questionable skills) for a redesign. But don’t worry, any redesign will have the same glorious orange that helps remind me of FoxPro, Firefox, ING, Packt, and Holland! But honestly, It’s been a while and I’m restless. I also want a redesign to be done entirely on my new iMac. If I do it quick enough, it will be my first real work performed on the new machine.

It turns out though that the book wasn’t very helpful. I’m sure it will sit on my shelf until some day in the distant future when its time to “clean up” the office. I’ll likely be on revision n of Tod means Fox, and who knows where WordPress will be. But I suppose that’s the nature of buying technology books.

Very few tech books stand the test of time. Those that do are usually pure reference and theory books and not how-to books.

I keep reference books (like the PHP Cookbook, The Complete Reference: MySQL, and of course the Hacker’s Guide to Visual FoxPro) at arms length. These books I go to often. Then there are the theory books. These are books I keep further away but access often — and usually not in front of the computer. Books like Kimball’s The Data Warehouse Toolkit and Zeldman’s Designing with Web Standards fit into this category.

OK, enough babbling. I’d be interested to hear your thoughts on a redesign, WordPress, time-tested tech books, or the fabulous color orange.

Published September 5th, 2008

SQL Server and FoxPro on September 9 MS Security Bulletin

According to Microsoft’s advanced notification system for security updates, slated to be released on September 9, SQL Server and Visual FoxPro are among the affected software. A webcast will likely be held the day after the bulletin is released. The critical issue has to do with a vulnerability allowing remote code execution.

Specifically, these FoxPro versions as installed on Microsoft Windows 2000 Service Pack 4:

  • Microsoft Visual FoxPro 8.0 Service Pack 1
  • Microsoft Visual FoxPro 9.0 Service Pack 1
  • Microsoft Visual FoxPro 9.0 Service Pack 2

For SQL Server:

  • SQL 2000 Reporting Services Service Pack 2
  • SQL Server 2005 Service Pack 2
  • SQL Server 2005 x64 Edition Service Pack 2
  • SQL Server 2005 for Itanium-based Systems Service Pack 2

Be sure to check out the full report when issued on September 9th if this concerns you (.NET, MS Office, and IE users, you had better have a look too!).

Security Bulletin website

Published September 4th, 2008

ETL Subsystem 20: OLAP Cube Builder

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

In my previous post, I discussed OLAP, the Analytics Server, and Cubes as a primer to this post. If you’re new to this topic, please have a look there for more details. This post is specific to building cubes from the ETL environment.

Cube building is simply the process of populating the analytical server database with data from the relational database. The relational database can contain relational or dimensional models.

You will want to re-build your analytical (OLAP) database after loading data into your dimensional model. This will keep your analytical data stored in the analytical database in sync with your dimensional models stored in the relational database. From the ETL point-of-view, the basic process works like this:

  1. Extract data from disparate sources for a business process
  2. Cleanse and conform the data
  3. Load Dimensions
  4. Load Facts, rebuilding any aggregate fact tables
  5. Build cubes and other analytical server objects (like aggregates and dimensions)

These steps are repeated for each atomic business process dimensional model in the data warehouse. The first four are familiar, as I have discussed them at length. The last one should seem natural: You need to devise a way to alert the analytical database that the data warehouse has just been updated. The best place to do this is directly in the ETL system. How you handle the rebuild process is dependent upon how the OLAP vendor’s database works (I offer some suggestions below).

One thing to consider before building cube-updating code in your ETL workflow is latency. Latency, in this context, describes the time delay between the moment the atomic business process dimensional model is up-to-date, and the moment the related cube is up-to-date. It might seem natural to keep this in sync as close as possible. Certainly placing the code to update the cubes right after processing the fact tables in the ETL system is the way to go. But check your requirements. You may only need to have data refreshed once a day, in the early mornings for example (whereas your ETL system might run every 4 hours). You may also have a research environment where OLAP data need only be one week fresh. You can save yourself a lot of processing time by getting these requirements right.

To accommodate this and other scenarios, I highly recommend putting any cube-building or OLAP-related update code in a separate ETL package or module. As with fact and dimension processing, a master package or program (depending on your tool) will “call” the child packages or modules. You could embed some logic in the final step to only process cubes after the morning ETL run, for example.

I do not recommend processing OLAP cubes on a fixed schedule unless you can predict with certainty when your dimensional models will be ready. Normally, it makes most sense to kick off this process as soon as the facts have been loaded.

SQL Server 2005 Integration Services (SSIS)

If you’re using SSIS to update your cubes, then you are likely using SQL Server Analysis Services as your analytical database. If not, and you use SSIS, I’d like to know what you’re using for your analytic server and how you’re building your cubes!

SSIS comes with a powerful task that handles cube-building for you. It’s called “Analysis Services Processing Task” and can do the following: (a) connect to a SSAS database through the Analysis Services connection; (b) specify various objects in the database to update (measure groups, partitions, dimensions, cubes, mining models, mining structures, and databases); (c) define batch-mode or sequence mode to process objects together or one after another; (d) perform an impact analysis on objects associated with the cube; and (e) specify whether dependent and affected objects are also processed.

The Analysis Services Execute DDL Task also allows you to work with cubes by giving you the ability to run DDL statements, as defined by the Analysis Services Scripting Language (ASSL). These statements are included in an XMLA command (more on XMLA below). Although this task can’t process a cube, it can manipulate it. For details on the ASSL language, check out the Analysis Services Scripting Language Reference.

Hand Coding with Visual FoxPro (VFP9)

FoxPro, like all programming languages, just isn’t an analytical database engine so there is no native cube or cube builder. So a lot here depends on what type of technology you are using to handle your cubes.

Your first stop should be to check with the vendor to see if there is an api or technique available to update and rebuild cubes. Even if this functionality exists, it might not be the best option.

For your second stop, look into running XML for Analysis (XMLA) scripts against the database. XMLA is a SOAP-based protocol for interacting with OLAP databases. Through web services, you simply issue commands (like “process”) and receive responses. For more information on XMLA, please check out the XML for Analysis website.

From Here

It is normal for an ETL developer to be involved in generating cubes for analysis services, so this is a skill you should try to foster. Understanding dimensional modeling will give you a huge head start.

Next post (I’m getting closer and closer to the magic 34 — I just hope Ralph and the gang don’t decide to renumber them on me or something), I’ll go through ETL Subsystem 21: Data Propagation Manager!