Archive for May, 2009

SOX to Go the Way of Gitmo Bay?

While the Obama administration is making a lot of changes — foreign policy and the economy are taking center stage — the business world is eyeballing the US Supreme Court’s latest decision to review the constitutionality of the Sarbanes-Oxley Act of 2002.

I wrote about this topic last year for Advisor Media, “Auditing Your Warehouse For Sox Compliance” as well as in my post “Formula 409: Private Companies Must Comply with SOX“. In my blog post, I said that innovation would take a series hit due to Section 409′s mandate that companies “must disclose material change events that would impact their financial condition or operations”. These material changes could include failed R&D projects: Not good for a public company looking to experiment. Imagine, for a second, what the technology world would look like if Bell Labs in the middle of the 20th century had these restrictions. Or Apple, or Microsoft, et al.

So will SOX be repealed? Is it unconstitutional? Time will tell, but I’m sure many CEOs are keeping their fingers crossed. Investors still need protection, but SOX just isn’t quite right.

Tags: , , , ,

No Comments

Reorganization at TmF

Unlike the real reorganization going on at my work, this one should have a minimal impact. As you know, I have recently updated Tod means Fox to the latest version of WordPress, as well as updated the look and feel of the entire place. While I’ll continue to tweak the graphics, layout, and functionality, I think that I have finally finished reorganizing the content.

What did I do? Well, in the old version of WordPress, I had no ability to add tags (without the help of a widget, which I avoided). I was forced to rely on categories instead. A single post might end up filed under multiple categories. In theory and practice, this isn’t a big deal. Many people do it, and there are many reasons why a blog post might fit into multiple categories.

But as I wrote more over 2.5 years, my list of categories began to grow in odd directions: Partially a result of bad planning and partially because my focus here on this website has moved away from software development and more into data management. The old categories were too restrictive in some places and too broad in others. But now, I’ve carefully thought about what I want to talk about, and I’ve created categories that reflect.

So instead of using categories to drill into topics, I’ll use tags. But I’ll be careful. You can be sure that I think about the tags as much as I think about the categories. The tags will be relevant, and consistent across my postings.

New Categories

The following new categories will now be used:

Business & IT Issues
Issues and topics that relate to business and IT decisions, information management, project management, and planning. Common tags: Marketing, Productivity, Security, Compliance.
Data Management
Data management comprises all the disciplines related to managing data. Common tags: Data Warehousing, Data Integration, ETL, MDM, Data Quality, Data Profiling, SQL.
Decision Support
Everything and anything related to business intelligence which is not covered in Data Management. Common tags: Analysis, Analytical Databases, OLAP, Aggregates.
Events and Trainings
Educational and informational events, sessions, conferences, and trainings that are worth writing about.Common Tags: Conference, Training
In The News
The postings in this category come directly from current events, dispatches, and other blog posts that I find interesting. Common Tags: Globalization, Green, Security, VFP.
Personal
These topics are of a personal nature and cover topics not included elsewhere on Tod means Fox. Common tags: TmF, Family, Twitter.
Reviews
My criticism of technology and business books and articles, in which I examine the book’s content, style, and merit. Common tags: Social Science, Open Source.
Software Development
This category includes posts having to do with the research, new development, modification, reuse, re-engineering, and maintenance of software. Common tags: Visual Studio, VFP, Codeplex, SQL, C#, Debugging.

Other tags exist in addition to the ones mentioned above. I’ll be installing a tag cloud a bit later to help navigate through the new structure. I’ll also be updating my sitemap.

Consequences

Changing a post’s category has consequences, I know. If you had linked to an old category either through the Web or RSS, your link will break. The best I can say is to either subscribe to one of the new categories, or use a tag if you want to read very specific content from me (e.g. BI, DW, VFP, MDM, SOA).

Thanks for reading!

Tags: ,

No Comments

ETL Subsystem 31: Paralleling and Pipelining

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

intro 150x150 ETL Subsystem 31: Paralleling and PipeliningOf all the subsystems that I’ve discussed so far, this one resulted in the most research. I had to (a) learn more about how paralleling works and (b) experiment with my environment to better understand it.

Honestly, I’ve taken this subsystem for granted over the years. And for VFP, I’ve done little exploration in this arena. For SSIS, I have tended to adjust the settings I can adjust (more on this below), monitor the results, and tweak my performance as needed. In some environments, this lackluster approach will get you by just fine. If you have very small load windows and performance is critical, then you’ll need to make a better effort.

So what is Paralleling and Pipelining?

Lumped together into a single subsystem, these two performance means are related but different. They’re cousins, I suppose. Running your ETL processes in parallel means that your ETL system is carrying out multiple operations simultaneously. Pipelining your ETL processes means that you can start new operations before the previous ones complete.

Paralleling and Pipelining are quite desirable. And depending on your tool of choice, taking advantage of them can be painless or painful.

How can you apply them?

You can achieve parallel processing by utilizing the CPUs on a single machine, or you can utilize multiple machines. The first option is the easiest to set up and results can be quite good. For a multiple CPU (or core) setup, you are actually running code (programs, algorithms) simultaneously, potentially doubling performance, all on the same box. You can scale out (i.e. scale horizontally) your ETL processes by adding computers (see What is distributed computing? by Kirk Pearson), allowing you to take advantage of the CPUs, RAM, and I/O of each machine. The latter has some significant design implications, but well worth it if your environment needs it.

assembly line women.thumbnail ETL Subsystem 31: Paralleling and PipeliningPipelining increases throughput. Unlike parallelism, it will not allow the instruction to run faster, but rather it permits downstream processes to start before the upstream process finishes. A great analogy is an assembly line, where parts are added to the whole as it travels down the line.

Getting parallelism and pipelining to work together is the Holy Grail of ETL performance. While certain performance techniques are available at all phases of data integration (from Extraction and CDC, to surrogate key handling and using partitions for fast loading), none can compare to the gains you can realize with this subsystem.

You should also keep in mind that CPU multitasking is different from parallel processing, and multithreading is different from pipelining. A multitasking process shares CPU resources, giving the illusion of parallelism (although one man’s illusion can be another man’s reality). Multithreaded applications share the same memory, but operate on different engine threads (i.e. a subtask). Multitasking and multithreading, like pipelining, increase throughput and also play an important role in performance tuning. I’ll talk a little more about this below in my section about FoxPro. Otherwise, if anyone is interested, I can try to elaborate in another post.

Where can this best be utilized?

Here are some ideas on where you can focus your efforts:

  • When loading historical data or retrieving data from multiple similar sources, execute the same package for different date ranges, at the same time (in SSIS for example, use multiple Execute Package Tasks or run the same package multiple times together as Jamie Thomson explores here), or you could design your historical load packages to break apart the data into separate threads.
  • Spread out UPDATE statements. This can be real handy if you have a few accumulating snapshot fact tables.
  • Spread out complex algorithms and routines that can operate on a subset of data.
  • Load staging tables while downstream processing loads your dimensional model
  • Do lookups (especially surrogate key lookups) in parallel
  • Distribute your conformed dimensions to other machines, data marts, etc. in parallel

 

SQL Server 2005 Integration Services (SSIS)

As you know, I use SSIS and VFP for ETL (not at the same time or on the same project though). With SSIS, I can quickly create complex routines that can automatically take advantage of multiple processors. The native support for buffers, execution trees, and parallelism makes my job pretty easy (which is why I suppose I’ve taken this subsystem for granted over the years). Simply understand how SSIS works, adjust the settings you need to adjust, monitor your performance, and tweak as needed.

To get a grip, the following resources are invaluable:

 

Hand Coding with Visual FoxPro (VFP9)

While SSIS and SQL Server have built-in mechanisms to manage most of the paralleling and pipelining responsibilities for you, FoxPro does not. You can achieve some very good results using VFP and multithreading, but you have to be extremely creative in how you handle paralleling and pipelining. If you don’t think this is the case, I’d love to hear how paralleling and pipelining can be achieved with VFP!!!

Of course, the VFP community is — and has always been — quite creative. As with most of this sort of thing, Calvin Hsia is near the front of the line. MTmyVFP (True VFP multi-threading) on CodePlex is a creative example using Hsia’s Multithreading class. For more information and a ton of details, check out:

As I’ve stated before, multithreading is not parallelism, nor is it pipelining. But if you utilize MTmyVFP (or similar solution) in your VFP ETL system, you will realize many performance benefits. Lastly, there was a pretty interesting, albiet short, discussion on this issue here.

From here

This post might have come off a bit long-winded, but there were quite a few important points to make. I hope that I’ve been able to distill what I’ve learned and that in the end, it all makes some sense. In my next ETL post, I’ll talk about ETL Subsystem 32: Security.

Tags: , , , , , , , , , ,

No Comments

The Tod means Fox facelift

After almost 2 years, I’ve finally updated the design of Tod means Fox.

I’m not sure why today was the day. I suppose the stars were aligned. While I spend the next few weeks tweaking it, I was hoping to get some feedback. My goals for the redesign are:

  • To have TmF look and feel more modern,
  • To simplify the layout (and code),
  • To be up-to-date with the latest from WordPress, and
  • To move away from solid orange.

What are your initial impressions? How does it load? Am I missing anything important?

Tags: , , ,

1 Comment

20 Year Old FoxPro Marketing Video

I love this stuff. FoxPro 2.0 was revolutionary and ground-breaking in many ways. The focus on both Mac and PC, the graphic interface, the data access speed. Memo fields. Form view. Debugging. I can really see why FoxPro developers simply are not willing to let go (myself included).

It really is a shame that Microsoft didn’t keep this up. For those looking for some nostalgia, take a look (or a second look) at this marketing video from around 1990 or 91:

Tags: , ,

No Comments

FoxPro in the News

A nice article in Computer World, uncharacteristically highlighting the tool of choice, Visual FoxPro. Titled “A greener environment through better data management. Data supporting farmers to improve productivity”:

The database software, which is written in Microsoft’s Visual FoxPro, gathers data from interactive PDFs and Web site entries. According to database developer Anne Allen, once the data is received from the farmers, the Visual FoxPro software delivers a report which in turn is sent back to the farmer in an email.

I’d like to see more good press on the VFP front!

Tags: ,

No Comments