Tod means Fox

Business Intelligence, Data Warehousing, SQL, Visual FoxPro.

Archive for ‘Programming’


Published June 6th, 2008

Fun with Serial Dates

I ran into a bit of an oddity today while working with serial dates in SQL Server 2005 and MS Excel 2003. I’ve dealt with this before with Excel using FoxPro, but don’t recall the specifics. Anyway…

A serial date is a sequential number, starting on January 1st, 1900 (or if you want compatibility with a Mac serial date, 1904) that represents the number of days since any date to that point in time (you can also represent time by using a decimal, but I won’t get into that here). Excel has a bug (or a “Lotus 123 compatibility feature” as some like to call it) in it that counts Feb 29, 1900 as an actual date, when in fact 1900 was not a leap year.

Hence the first bit of fun. Run this TSQL code:

SELECT DATEDIFF(dd,'19000101','20080606')

The above query returns ‘39603′. So far so good. In SQL Server, you can now derive the date from the serial value by using convert:

SELECT CONVERT(DATETIME,DATEDIFF(dd,'19000101','20080606'))

…which gives me ‘2008-06-06 00:00:00.000′. So far so good.

The reason we’re using serial dates in the first place is to accommodate some models developed in Excel. The data is fed in directly from the database and calculations are performed.

When I take value 39603, put it into an Excel cell and re-format the column from General to Date, I get ‘June 4, 2008′! Two days off from what I get from SQL Server.

I already mentioned that Excel counts Feb 29, 1900. That’s one day. So why am I off by two? Here’s some more fun. In SQL Server, try the following:

SELECT CONVERT(DATETIME,0)

You get ‘1900-01-01 00:00:00.000′. But January 1st is supposed to be day one. Not day zero (must be a zero-based array sort of thing). When you try the same in Excel, you get a more appropriate (but equally odd) value of ‘January 0, 1900′. You’d think I’m dealing with scientific ephemeris or something.

SELECT CONVERT(DATETIME,1)

So, in SQL Server, we’re off by a day. The first day shouldn’t result in the second day (’1900-01-02 00:00:00.000′) of the year as the above TSQL shows.

This helps to explain the 2 day offset between Excel and SQL Server.

Come on Redmond! Work with me here!

To solve (read: work around) the problem, we’ve decided to stick with keeping Excel happy (against my recommendation, but at the end of the day, this is a business decision). Our data integration packages add two days to the serial date we calculate from a YYYYMMDD value. When data is moved to Excel from the data warehouse, no transformation is done on the serial value and Excel calculates as expected. When we use the serial date in a SQL Server context, we need to subtract 2 before the convert.

This is not a very good design and will undoubtedly cause someone problems some day.

I recommend storing the ISO YYYYMMDD format instead. But Excel won’t format a value such as 20080606 to a date. What a shame! If anyone knows of a way, or can enlighten me more about this subject — do not hesitate!

Published April 21st, 2008

Book Review: SOA Approach to Integration

I posted a review of the book “SOA Approach to Integration” by Matjaz, B. Juric, Ramesh Loganathan, Dr. P., and G Sarang (published by Packt Publishing) over at Amazon this past weekend. Please check it out if you get the chance. Unlike my last review, this one is more favorable!

I wanted to read more about SOA for two reasons: curiosity and to round-out my knowledge of various integration strategies. Those who know me, know me as a “data guy”. I like to design data models, create databases, normalize things, and sketch integration strategies in UML. Boring. I know.

I suppose this comes directly from my background as a VFP application developer. In the nineties, I developed a dozen or so customized, vertical applications that existed for the most part in departmental islands. Their purpose was to solve business problems, usually at the process level. I soon began writing code to integrate these applications, the fancy term is “Enterprise Application Integration (EAI)”, but I never really called it that. Using Remote Procedure Calls (RPC) and shared objects, I was able to build point-to-point bridges allowing these islands to communicate with one another.

When I had the chance to start developing data warehouses, I jumped. I no longer write applications, instead, I do a lot of data modeling and I write code and design workflows to integrate data from any number of disparate applications spread out across an enterprise. I find this work more than just “satisfying”.

SOA is a different approach to integrating an enterprise. It is like EAI in some ways, but overall, the SOA approach is more advanced and scalable. Up until I read this book, I could not easily draw the line between exposing a few functions in a peer-to-peer api/RPC scenario, to this “Enterprise Service Bus” that coordinates and orchestrates entire business processes off in some far off place using XML and web services.

As you know from my postings and articles, I talk a lot about “Business Processes” in regards to dimensional modeling. This book brought me greater insight into what a “process” is and what it could be. In Dimensional Modeling, we take a bottom-up approach to building an enterprise database. Using conformed dimensions, we start process-by-process to construct a complete data warehouse. Unlike what some detractors and skeptics conclude (are there really any of those still?), we’re not creating new silos or islands, but rather an integrated, highly valuable data warehouse organized by business process, facilitated by the use of conformed dimensions. SOA looks at the business process in much the same way, but while the data warehouse typically gets a hold of a transaction after it occurs, SOA is part of the transaction. They’re two pees in the same pod.

While I agree that SOA is necessary for real-time transactional and document-related (”doc-literal”) integration, I don’t feel that data warehouses are threatened by the emergence of this “technology”. SOA solves a “business logic” problem, where business logic is spread out across an organization. Data warehousing solves reporting, analytical, and data exploration problems. A fully integrated organization will rely on SOA and data warehousing.

To buy this book, click here.

Check out these other reviews as well:

Tech Initiatives
Ken Guest’s online diary
Enterprise Architecture SOA and More

Published March 3rd, 2008

SSIS Debugging: Find in Files, Naming Conventions, and a Problem Solved!

I recently inherited several complex SSIS packages that are now being promoted from the development environment (ALPHA) to testing (BETA). There was one lingering issue though that took me a few hours to figure out.

We’ve implemented custom logging and auditing in the packages. In the Alpha environment, we set up Agent jobs to run groups of related tasks. For some odd reason, a log entry was being inserted into our audit table that was from a previous job step and not from the one we expected.

To give a simplified example: There are 3 SSIS packages being called by 3 job steps. Each loads a CSV file into a staging area for dimension processing. Step 1 reports that file A was processed. Step 2 reports on file B, and step 3 on file C. The step 3 job was reporting that file A was processed, but that it was the step 3 job that did it.

This wasn’t noticed until recently for several reasons, but primarily because we generally don’t rely on this particular log entry to determine if our files have been staged. So no one looks at it. In fact, this log entry merely serves as a bracket in the log table for other log entries posted in the same job. Nevertheless, I wanted to fix it.

So, where was the problem? All the logging is done using an Execute SQL Task. Every task checked out OK. The correct variables were being used in all cases. I was about ready to blame SSIS or corruption in the package when I stumbled upon a valuable clue: The name of the exact SQL Task Causing the problem!

I found the pesky task by doing a “Find in Files” search (BIDS -> Edit -> Find and Replace -> Find in Files) on the exact message being inserted into my audit table. You can use regular expressions and wildcards in this dialog so I was sure to find at least a clue. It turned out though that my exact search found the phrase; it was in a task named “Execute SQL Task”.

Red Flag!

I meticulously went through each package last month and renamed each and every task using the naming conventions posted on Jamie Thomson’s blog. This is a great best-practice and I certainly advise all SSIS developers to use this (or similar) convention. Anyway, the developers before me did a fairly good job of renaming tasks so they had meaningful names, but it was nearly impossible to look into sysdtslog90 or our custom audit tables and identify what kind of task the record was referring to. So I felt that while these packages were still in development, it was worth the few days effort to make these changes. All I did was add the suggested prefix before each task and renamed a few that I thought needed better descriptive names (sometimes shortening them in the process).

I know for sure that I didn’t miss any! Right?

Well, yes. I did. And so did the developers before me. When the package was being developed, the developers used cut & paste to move similar tasks from an existing package into this one. Then, a series of Sequence containers and Loop containers were added and tasks were moved around. A Sequence container was sized over the troublesome “Execute SQL Task” – to be forever (almost) lost in the Control Flow maze. I first saw it in the Package Explorer view and then went back to the Control Flow to see where it had gone. The task ran an SQL statement that updated the audit table.

Advice

Take care when using cut & paste to move items from one package to another. Not only could you run into some simple problems like the one detailed here, you could also forget to update a variable or an expression – producing odd or destructive results on your database. For us, this was just a simple audit log entry with zero business impact. Are there other cases buried in the packages? Need to check right now…

Also, use the naming convention best practice as discussed by Jamie. There are a lot of great reasons to do so (ease in development, debugging, readability).

I have never been in the habit of using the “Find in Files” function. But it sure saved me here. If you haven’t been using it in your packages, then give it a try!

Published November 5th, 2007

VFPBuild to build Visual FoxPro Applications Using MSBuild

As I mentioned last week, I have been using MSBuild for some time now to build and deploy complex data warehousing applications. Many parts of this project involve Visual FoxPro. I wrote that I had not yet experimented doing builds with VFP using MSBuild. Alan Stevens quickly pointed me towards his CodePlex project “VfpMsBuildTarget” (aka:VFPBuild).

The goals of the project are simple:

  1. Provide a GUI for including projects/files in a scheduled build, and set their appropriate properties
  2. Create a VFP build target to be used by the MSBuild engine, the build system found in Visual Studio.

While the first goal might be a ways off, the second is close to being complete.

I had a chance to look at this project, test it, debug it a little, and compile a short list of improvements I would like to see (or add myself through CodePlex). The bugs were minor and easy to fix, and my list of improvements is pretty short. It should be noted that the body of work gone into this Task already is excellent. With little effort, I was able to get it going. And with a little more effort, it can be used in production. Note: Before you can have a go at it, you will need to compile vfpprojectbuilder.dll in VFP and VfpBuild.dll in C#.

Documentation

We’ll need some documentation. Alan knows this (and has promised some soon). Even without though, I made it through with little problem. I think I am at an advantage because I (a) know VFP well, (b) know C# well, and (c) know MSBuild well. We would need instructions on how to compile the dlls as well as how to construct the MSBuild XML. If you’re new to MSBuild, you might struggle with setting up the xml. For a simple test, here is a sample screenshot. I would have provided the code but the XML would have been eaten alive by the browser:

Visual FoxPro MSBuild VFPBuild sample


To execute the above, your cmd would look similar to the following:

C:>msbuild projectx.msbuild

Functionality

It doesn’t seem to be too much effort to add support for lRebuildAll, lShowErrors, and lBuildNewGUIDs of the project.build() method. Other than that, it gives you the ability to set build options and to specify a project to load.

From Here

I would like to see goal #2 of this project buttoned up soon. I have a production environment (actually, a few) that could really utilize a stable assembly like this. The build I downloaded was a bit buggy (there were references to class properties that did not exist, for example this.nLevel and this.nBuildAction). Fixing these, plus some enhanced bullet-proofing (defensive coding) and documentation would get us there I believe.

I’ll post more as I go. For now, you can get the active VFPMSBuildTarget project files here.

Published October 30th, 2007

Errors Eating My TEXTMERGE

A few days ago I decided to refactor a 6-year old data integration program I wrote in VFP7 (or was it VFP6?) that moves data from a legacy tax system to an operational data warehouse for reporting (SQL Server). The program was self-contained (could be run as a compiled exe from the command line or through the IDE). To make a long story short, I decided to up the code to VFP9 and add some TRY/CATCH statements and additional parameter checking.

The code in this program makes heavy use of TEXTMERGE, utilizing the “\” and “\\” commands.

I wrote a few test cases to exercise my new error handling. What I discovered was that some of my logging stopped working after I added a few TRY…CATCH blocks. Hugh? Here’s a sample:

SET TEXTMERGE ON
SET TEXTMERGE TO test ADDITIVE NOSHOW 
\hello
TRY
    USE xxx
CATCH TO oErr
    \Error: <<oErr.Message>>
ENDTRY
\world
SET TEXTMERGE TO
MODIFY FILE test.txt

If you run the above, you only see “hello” printed in the file test.txt:

hello

I went back to my original code without any TRY/CATCH blocks and to my surprise, got the same result. Only ‘hello’ printed. But WAIT… I’m running in VFP9 SP1.

When I run the original code in VFP7:

ON ERROR \Error: <<MESSAGE()>>
SET TEXTMERGE ON
SET TEXTMERGE TO test ADDITIVE NOSHOW 
\hello
USE xxx
\world
SET TEXTMERGE TO
ON ERROR
MODIFY FILE test.txt

I get the expected output:

hello
Error: 'xxx.dbf' does not exist.
world

It appears as if program errors are killing my TEXTMERGE in VFP9 SP1. I’ll have to try this out on my laptop with VFP9 SP2 to see if it magically got fixed.

To get around the problem in VFP9, you can put the line “SET TEXTMERGE TO test ADDITIVE NOSHOW” in the CATCH block above just before the “\” command. If an error occurs, you remind TEXTMERGE that it needs to keep working. Perhaps it figures it needs some time off.

Recommendations

Unless someone has some insight or additional thoughts, I would suggest searching through your source code for cases where you are using TEXTMERGE with TRY/CATCH/FINALLY blocks. If so, insert the SET TEXTMERGE TO … line in your CATCH block. Alternately, you could use STRTOFILE() to append logging information to a file. This is the final approach I took in this program.

STRTOFILE("hello"+CHR(13),"test.txt",1)
TRY
    USE xxx.dbf
CATCH TO oErr
    STRTOFILE("Error: " + oErr.Message+CHR(13),"test.txt",1)
ENDTRY
STRTOFILE("world"+CHR(13),"test.txt",1)
MODIFY FILE test.txt

I hope this helps someone, or maybe someone who knows more about this particular issue can help me!

Published October 25th, 2007

Getting Busy with MSBuild :: Deploying Heavy Projects

For some time now I have been using MSBuild exclusively for deploying data warehouse applications. Because the deployment process is often time-consuming and can touch many components and elements of a data warehouse, using MSBuild is a smart choice.

Consider an integration project that contains a dozen SSIS (dts) packages, some reports (rdl files), a few C# assemblies, an ASP.NET interface, schema changes, new database functions, new user roles, expanded constraints, and updated stored procedures. Now expand this deployment across development, quality, and production environments. Introduce deltas (incremental updates of varying components) and you can begin to understand why some companies pay two or more people just to deploy applications.

But not here. MSBuild (and other similar build/deploy projects like Apache ANT or GNU MAKE) is an excellent solution to the problem. With some planning up front, and a little… ehem… trial and error, you can simplify the entire deployment process so much it will make your head spin.

MSBuild not only builds .NET projects (like C#.NET and VB.NET), but it can also run any number of additional “tasks”, such as:

  1. Create directories and log files
  2. Do a VSS Get by Label
  3. Compile a FoxPro project (see Rick Borup’s article in Advisor: “Automate Microsoft Visual FoxPro“)
  4. Compare directory contents
  5. Upload SSIS packages
  6. Run test scripts
  7. Execute stored procedures
  8. Etc…

I think you get the idea. There is not a single task MSBuild cannot wrap (at least nothing that I’ve come across). And with its ability to handle dependencies, talk with source control, and run homegrown assemblies, the limit is only limited to your needs and imagination.

I am starting to sound like a commercial, I know.

But the point is simple: The deployment of a large or ‘heavy’ project needs special attention, planning, and resource allocation. Using MSBuild (or ANT, or MAKE, etc.) can alleviate the burden on developers and DBAs and provide consistent and predictable build and deployment scenarios.

I will share some additional insights and lessons learned on using MSBuild over the next few weeks, especially details on how to deploy multi-pronged projects for different environments. I will also try to share some tips on how to handle delta deployment scenarios. In the meantime, if you are new to MSBuild, take a look at the MSBuild Team Blog and the MSBuild Wiki.

As a side note, I have never used MSBuild for deploying VFP applications in the past. This is all about to change…