Tod means Fox

Business Intelligence, Data Warehousing, SQL, Visual FoxPro.

Archive for March, 2008


Published March 14th, 2008

Site Update: Resource Page Additions

This quick post is to alert you to some additions to my resource page. Some of the new additions include links to DM Review, TDAN, SQL Server Magazine, and FoxRockX Magazine.

I’ve been rummaging through my bookmarks and favorites and should have more pretty soon. If you know of a good (or even better: excellent) resource in either Visual FoxPro, Business Intelligence, or Database categories please contact me.

Published March 12th, 2008

The Future of Open Source in BI

At last Thursday’s TDWI Benelux Chapter meeting, Davy Nys of Pentaho gave an overview of how open source could/might/will change the face of Business Intelligence. He gave a reasonably good vendor-neutral presentation (important for TDWI events). His session was a nice compliment to the “BI Trends” presentation given by Steve Hoberman an hour earlier.

Perhaps there is a trend for organizations to turn to more open-source software solutions for BI projects. After all, Davy’s company and others like Talend and CloverETL are making great strides in competing for market share.

As the big players in BI continue to merge and consolidate, it is pretty exciting to see several open source vendors and tools emerge. Is this a reflection of the community’s general dissatisfaction regarding commercial software? Are the open source solutions better? Is this truly a trend to be reckoned with? Should MS and others be worried?

Open Source Considerations

Davy stressed the importance of reducing the TCO of BI software. Without licensing fees, open-source can do just that. As Rick Sherman predicts in an article for DM Review, TCO will become a much more significant factor in the adoption of any and all BI trends. Licensing costs could impact TCO in such a dramatic way that a company can save a significant amount on their investment by switching to open source.

TCO isn’t the only consideration. Before evaluating open-source software, Davy suggests to examine the vibrancy of the community. A vibrant community with contributors and enthusiasts is a good sign for future product development and support.

Licensing is yet another very important consideration. As part of his presentation, Davy initiated a discussion on viral verses non-viral licensing. here’s is how I understand it: In a “viral” agreement, any source code changes to the product must be returned back to the public. Non-viral agreements allow companies to modify the source as they see fit without having to report back to the community. With viral licensing, I would have tremendous concerns about intellectual property and protecting business practices and methods.

What I found strange about the Q&A session and roundtable discussion that followed his presentation was the focus on the Pentaho business model. The concern of some BI professionals is thus: How can the economics associated with running an open source software company be sustainable over a long period of time? The question is relevant because as BI Professionals, we need to supply solutions that will be supportable, scalable, and usable in the future. The concern is that a company — like Pentaho — might not live and thrive long enough to meet the long-term needs of the business. I am not qualified to answer this question, but I admit, I wonder myself. Davy did an excellent job of presenting the case, however. If you need details on how Pentaho and other open source organizations make their money, it would be best to contact them directly. These are valid concerns and should be part of tool analysis that should go on early in a project’s planning.

I am open to the possibilities that open source can provide for BI applications. I use open source software all the time (from Wordpress to MySQl to Codeplex), but for a mission critical business initiative? Before making a decision like that, I would certainly need to have more information and a project to try it out on.

To learn more about Davy, you can check out his LinkedIn profile or visit the Pentaho website.

Published March 10th, 2008

Trends in Business Intelligence

I had the pleasure of attending an excellent TDWI Benelux Chapter meeting in Antwerpen, Belgium last Thursday. Presenting were Steve Hoberman and Davy Nys. Steve discussed the latest trends in business intelligence, including the drivers that influence them. Davy’s presentation, which I’ll talk about in a future posting, discussed the place for open source technologies in a BI environment.

Steve Hoberman

I have been reading Steve’s work for a few years now. I first discovered him in print when I picked up his book “Data Modeler’s Workbench: Tools and Techniques for Analysis and Design”. Unfortunately (for me), I lent the book to a former colleague some time ago, and will likely not get it back! It looks like I’ll need to re-order. You can keep up with Steve at The Data Administrator’s Newsletter website, where Steve is a columnist. His writings are excellent. He offers a great deal of insight and experience into data modeling and related topics.

So what are the trends?

Steve identified five drivers and five trends. I don’t have access to his Power Point slides, so I will do my best in regurgitating what I heard based on my notes and memory. First, the drivers, which include:

  • Cheaper storage and computing;
  • businesses are more BI savvy and demand more of IT;
  • increased pressure for business and IT to perform well;
  • struggling global economy; and
  • scarce IT resources.

These drivers have contributed to the following five trends:

  • The merging of data warehouses and operational data stores;
  • increased integration efforts;
  • excitement around unstructured data;
  • pressure to deliver solutions correctly the first time; and
  • the need for BI personnel to wear multiple hats.

Certainly, Steve explained the above with more elegance and grace. Hopefully, I haven’t butchered his thoughts too much.

The point though, is that BI is heading in a particular direction. I’ve had a sense of this for the past couple of years:

  • I have witnessed firsthand the merging of DW and ODS.
  • I have seen a rise in integration efforts — including application and data.
  • Unstructured data and text mining are certainly talked about now as integral to future BI initiatives.
  • Because of the fact that business users are now more involved in BI projects (it used to be that IT had to struggle to obtain business sponsorship), the ideas come faster than the ability of IT to implement them, meaning that IT must strive to get them right the first time.
  • Lastly, I do wear multiple hats: DBA, developer, integration architect, researcher, business analyst, project manager, etc.

I would be interested in hearing your thoughts on this subject. Feel free to write me at Tod at Blackstone Providence dot com, or simply just make a comment to this post. You can also reach Steve at his website, or visit TDWI for more details on events that might be coming in your area.

Published March 5th, 2008

Gary Gygax, Co-creator of D&D, Has Died

It’s a sad week for Dungeons & Dragons fans, players, and enthusiasts. Gary Gygax, its co-creator and lead cheerleader, has died at the age of 69.

Although it has been a few years since I’ve rolled twenty-sided dice, I’m still very much attached to D&D. We just moved to Belgium and while setting up my office, I set aside a shelf specifically for my more than 30 books, boxed sets, and notebooks. For the past 4 years, these things have been boxed up.

As one article puts it: “Dungeons & Dragons formed a bridge between the noninteractive world of books and films and the exploding interactive video game industry”. Created in 1974, the role-playing fantasy game was a huge success (mostly among geeky teenage and twenty-something boys).

D&D is a complex game. It takes months to learn (although new players can start playing with some experienced players almost immediately), and you need hours to play per sitting. But, the main rule is to have fun and enjoy its social and collaborative nature. D&D is also a game that stresses and embraces imagination.

I’ll always be thankful for Mr. Gygax and his partner Dave Arneson for their invention. It taught me how to be a team player, collaborate, and most of all, use my imagination. The complex rules also got me focused, and with the myriad tables and charts, I felt well prepared for my career as a software developer and business analyst. In fact, my very first FoxPro program was a D&D Character database, where I stored all of my player characters and produced a few interesting reports about them. It wasn’t perfect, but it gave me plenty of motivation to learn Fox2.6. I was also a “Dungeon Master” and created several hundred adventures across multiple campaigns. I wrote a lot and I studied history, mythology, and religion.

The game is still in production today (Hasbro, in my home State of Rhode Island is keeping it alive) even though online gaming has taken a large percentage of its audience.

I think I might just go order a pizza and roll some 20-sided dice for fun now…

Published March 3rd, 2008

SSIS Debugging: Find in Files, Naming Conventions, and a Problem Solved!

I recently inherited several complex SSIS packages that are now being promoted from the development environment (ALPHA) to testing (BETA). There was one lingering issue though that took me a few hours to figure out.

We’ve implemented custom logging and auditing in the packages. In the Alpha environment, we set up Agent jobs to run groups of related tasks. For some odd reason, a log entry was being inserted into our audit table that was from a previous job step and not from the one we expected.

To give a simplified example: There are 3 SSIS packages being called by 3 job steps. Each loads a CSV file into a staging area for dimension processing. Step 1 reports that file A was processed. Step 2 reports on file B, and step 3 on file C. The step 3 job was reporting that file A was processed, but that it was the step 3 job that did it.

This wasn’t noticed until recently for several reasons, but primarily because we generally don’t rely on this particular log entry to determine if our files have been staged. So no one looks at it. In fact, this log entry merely serves as a bracket in the log table for other log entries posted in the same job. Nevertheless, I wanted to fix it.

So, where was the problem? All the logging is done using an Execute SQL Task. Every task checked out OK. The correct variables were being used in all cases. I was about ready to blame SSIS or corruption in the package when I stumbled upon a valuable clue: The name of the exact SQL Task Causing the problem!

I found the pesky task by doing a “Find in Files” search (BIDS -> Edit -> Find and Replace -> Find in Files) on the exact message being inserted into my audit table. You can use regular expressions and wildcards in this dialog so I was sure to find at least a clue. It turned out though that my exact search found the phrase; it was in a task named “Execute SQL Task”.

Red Flag!

I meticulously went through each package last month and renamed each and every task using the naming conventions posted on Jamie Thomson’s blog. This is a great best-practice and I certainly advise all SSIS developers to use this (or similar) convention. Anyway, the developers before me did a fairly good job of renaming tasks so they had meaningful names, but it was nearly impossible to look into sysdtslog90 or our custom audit tables and identify what kind of task the record was referring to. So I felt that while these packages were still in development, it was worth the few days effort to make these changes. All I did was add the suggested prefix before each task and renamed a few that I thought needed better descriptive names (sometimes shortening them in the process).

I know for sure that I didn’t miss any! Right?

Well, yes. I did. And so did the developers before me. When the package was being developed, the developers used cut & paste to move similar tasks from an existing package into this one. Then, a series of Sequence containers and Loop containers were added and tasks were moved around. A Sequence container was sized over the troublesome “Execute SQL Task” – to be forever (almost) lost in the Control Flow maze. I first saw it in the Package Explorer view and then went back to the Control Flow to see where it had gone. The task ran an SQL statement that updated the audit table.

Advice

Take care when using cut & paste to move items from one package to another. Not only could you run into some simple problems like the one detailed here, you could also forget to update a variable or an expression – producing odd or destructive results on your database. For us, this was just a simple audit log entry with zero business impact. Are there other cases buried in the packages? Need to check right now…

Also, use the naming convention best practice as discussed by Jamie. There are a lot of great reasons to do so (ease in development, debugging, readability).

I have never been in the habit of using the “Find in Files” function. But it sure saved me here. If you haven’t been using it in your packages, then give it a try!