Tod means Fox

Business Intelligence, Data Warehousing, SQL, Visual FoxPro.

Archive for ‘Programming’


Published October 25th, 2007

Getting Busy with MSBuild :: Deploying Heavy Projects

For some time now I have been using MSBuild exclusively for deploying data warehouse applications. Because the deployment process is often time-consuming and can touch many components and elements of a data warehouse, using MSBuild is a smart choice.

Consider an integration project that contains a dozen SSIS (dts) packages, some reports (rdl files), a few C# assemblies, an ASP.NET interface, schema changes, new database functions, new user roles, expanded constraints, and updated stored procedures. Now expand this deployment across development, quality, and production environments. Introduce deltas (incremental updates of varying components) and you can begin to understand why some companies pay two or more people just to deploy applications.

But not here. MSBuild (and other similar build/deploy projects like Apache ANT or GNU MAKE) is an excellent solution to the problem. With some planning up front, and a little… ehem… trial and error, you can simplify the entire deployment process so much it will make your head spin.

MSBuild not only builds .NET projects (like C#.NET and VB.NET), but it can also run any number of additional “tasks”, such as:

  1. Create directories and log files
  2. Do a VSS Get by Label
  3. Compile a FoxPro project (see Rick Borup’s article in Advisor: “Automate Microsoft Visual FoxPro“)
  4. Compare directory contents
  5. Upload SSIS packages
  6. Run test scripts
  7. Execute stored procedures
  8. Etc…

I think you get the idea. There is not a single task MSBuild cannot wrap (at least nothing that I’ve come across). And with its ability to handle dependencies, talk with source control, and run homegrown assemblies, the limit is only limited to your needs and imagination.

I am starting to sound like a commercial, I know.

But the point is simple: The deployment of a large or ‘heavy’ project needs special attention, planning, and resource allocation. Using MSBuild (or ANT, or MAKE, etc.) can alleviate the burden on developers and DBAs and provide consistent and predictable build and deployment scenarios.

I will share some additional insights and lessons learned on using MSBuild over the next few weeks, especially details on how to deploy multi-pronged projects for different environments. I will also try to share some tips on how to handle delta deployment scenarios. In the meantime, if you are new to MSBuild, take a look at the MSBuild Team Blog and the MSBuild Wiki.

As a side note, I have never used MSBuild for deploying VFP applications in the past. This is all about to change…

Published October 19th, 2007

Use .sql Files with SQLEXEC for Better Script Management

The SQLEXEC command in FoxPro allows you to sends SQL statements to a database where they can be executed, returning one or more result sets. You might make a call like this in your application:

SQLEXEC(lnConnHand,'SELECT * FROM Customer')

Of course, using SQLEXEC in this way isn’t a great architectural choice, although serviceable. It is difficult to manage code in this way and difficult to support varying back end databases. I would never code customer applications in this way. Instead, I would load an external SQL file and execute that instead.

SQLEXEC(lnConnHand,FILETOSTR("c:\sqlserver2005\script1.sql"))

This approach not only decouples the proprietary SQL code from my application, but it also allows me to test and review the scripts in the native application (and likely get syntax coloring and intellisense). For example, script1.sql might contain TSQL code that I can review (and test) in SSMS. Even better, if I have a lone client running SQL Server 2000, I can use a different script if a new command (like INTERSECT, for example) is not available. This approach will prove even more valuable as some clients upgrade to SQL Server 2008. I will be able to take advantage of some of the new, faster features of SQL Server 2008 while still supporting older versions.

Using Metadata

To take this a step further, I have built metadata to manage these scripts. Consider an application that supports 4 different back-end databases: Visual FoxPro, SQL Server 2000, SQL Server 2005, and MySQL. Create the following table and store all SQL for each back-end database in the appropriate memo field:

CREATE TABLE sql_meta FREE (;
  query_id character(15),;
  sql_vfp m,;
  sql_ss2005 m,;
  sql_ss2000 m,;
  sql_mysql m)

Include this table in your project. In your beforebuild event in your project hook, import or update the sql scripts from your development folders into this table when you build your application.

Now, when you want to execute a script, your code would look a little like this:

*-- create the query/sql management object and execute the code
oMQ = CREATEOBJECT( "manage_query" ) 
SQLEXEC(lnConnHand, oMQ.get_sql('SQL_ss2005','CUSTOMER'))
 
*-- ok, so not very bullet-proof but you get the idea:
DEFINE CLASS manage_query AS Custom
 
  PROCEDURE get_sql
    LPARAMETERS tcBackEnd, tcQueryId	
    SELECT sql_meta 
    LOCATE FOR query_id = tcQueryId 
    RETURN sql_meta.&tcBackEnd
  ENDPROC 
 
ENDDEFINE

I’m a big fan of writing readable code. In fact, it is a priority in my development tasks. I’m also a big fan of making testing easier. This approach allows me to view and test each SQL script outside of my business and data access objects. It also makes for more readable and manageable code.

I’ve left out some examples on how you could use parameters or SQLPREPARE using this approach. I’ll be sure to make this a topic in a future entry.

Published October 5th, 2007

Sys(2017) and Comparing Data

I’ve been showing how I use intersect and the symmetric difference to compare data in two datasets. What’s nice about these approaches is that they are extremly flexible. You can, for example, pick just a few columns to compare from each table. So if you had two similar tables, you could easily detect which primary key + select attributes are either in both sets or not in both sets, returning the result into a third table or cursor. I’ve used these techniques in the following scenarios:

  1. to build master lists from client tables
  2. test code changes (run a procedure once, save the results A, make a code change, save the results B, compare differences)
  3. for data integration projects, especially for Changed Data Capture (CDC or incremental update) operations
  4. to check the status of metadata
  5. to do some down-and-dirty comparisons between two or more tables

Sys(2017)

One limitation of the SQL approach that I’ve been demonstrating is that you need to use a GROUP BY clause, which insists that you identify each attribute used in the select list (true for VFP, T-SQL, and other ANSI compliant SQL-enabled languages). This can be a lot of work, especially if you have a handful of tables with dozens of attributes.

The solution? Use a CRC (Cyclical Redundancy Check):

SELECT p_key, crc ;
  FROM (;
    SELECT p_key, PADR(Sys(2017, '', 1, 1),10) as crc FROM CursorA;
    UNION ALL ;
    SELECT p_key, PADR(Sys(2017, '', 1, 1),10) as crc  FROM CursorB;
  ) AS tmp1 ;
  GROUP BY p_key, crc  ;
  HAVING COUNT(*) = 1

SYS(2017) will Return a checksum value based on each row in the table. The second parameter is an exclusion list of attributes that you do not want included in the algorithm. The third parameter is the seed value, ignored when using the CRC32 version of function. The last parameter allows you to set flags to include memos or to use the CRC32 algorithm. In my example above, I am using CRC32 without memo support.

The resulting table shows me the primary keys that intersect (when HAVING COUNT(*)=2) or do not intersect (when HAVING COUNT(*)=1).

I hope you find this useful!

PS: It’s worth noting that you should avoid SYS(2017) in FoxPro versions 8 and below. There is a memory leak (fixed in VFP9) that could crash your app!

Published September 28th, 2007

Log4Fox Revival: Instrumenting VFP Applications

Lisa Slater Nicholls has re-published her 2006 FoxTalk article “Log4Fox: A Logging API for Visual FoxPro” over at Spacefold. At FoxForward, some of us were introduced to the concept of instrumenting applications, and Dave Bernard gave us some great ideas to get started. Log4Fox — an article I missed in January 2006 — is a VFP instrumenting approach:

Although Visual FoxPro’s SET COVERAGE, SET DEBUGOUT, and SET EVENTTRACKING features have been steadily enhanced in each version, VFP doesn’t offer native support for fine-tuning the levels and types of trace data as completely as many other environments do. No problem; VFP’s OOP and output-handling resources make it easy to provide class libraries to handle all the required tasks.

If you’re interested in instrumentation (I certainly am), then please take a moment to read (or re-read) this article. Lisa does a marvelous job explaining how the class works and why you might want it.

Published September 24th, 2007

Delete Duplicates from a Table

The need to delete duplicate records from a table rises up all the time, whether you are importing data from a third party, integrating systems, or cleaning dirty data. Duplicate data is a reoccurring quality issue.

A duplicate records does not necessarily mean an exact copy of another record (as if you pressed CTRL+Y with SET CARRY ON in VFP). It could mean that all of the important identifying attributes are duplicated but other data, such as a surrogate key or a last updated timestamp are not. So here is a technique that I’ve used to delete a row if it is in fact a duplicate based on attributes in the table:

DELETE FROM [tablename] ;
  WHERE [primary key] NOT IN ;
    (SELECT MAX([primary key]) FROM [tablename] ;
      GROUP BY [attribute1], [attribute2], [attribute3])

To apply this to a Customer entity, for example, you can try the following (the first half of this code is setup, scroll down to see the delete):

*-- example table
CREATE TABLE Customer FREE   (;
    cust_key integer         ,;  && a surrogate key
    cust_id character(3)     ,;  && natural key
    cust_fname varchar(20)   ,;  && first name
    cust_lname varchar(20)   ,;  && last name
    cust_phone character(14) ,;  && phone number
    cust_lupd datetime       ,;  && last updated
    cust_user character(15)   )   && user last updated
 
*-- need data 
INSERT INTO Customer VALUES (1,"AM1","Andrew","MacNeill","555-5551",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (2,"AK1","Andy","Kramek","555-5552",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (3,"CC1","Cesar","Chalom","555-5553",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (4,"CB1","Craig","Baily","555-5554",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (5,"DC1","Dave","Crozier","555-5555",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (6,"DS1","David","Stevenson","555-5556",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (7,"GF1","Garrett","Fitzgerald","555-5557",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (8,"KR1","Kevin","Ragsdale","555-5558",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (9,"KR1","Kevin","Ragsdale","555-5558",DATETIME(),"ZorroMeansFox")
INSERT INTO Customer VALUES (10,"RS1","Rick","Schummer","555-5559",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (11,"RS2","Rick","Strahl","555-5510",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (12,"RS2","Rick","Strahl","555-5510",DATETIME(),"ZorroMeansFox")
INSERT INTO Customer VALUES (13,"RS2","Rick","Strahl","555-5510",DATETIME(),"FanakMeansFox")
INSERT INTO Customer VALUES (14,"SB1","Stephen","Bodnar","555-5511",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (15,"SD1","Stuart","Dunkeld","555-5512",DATETIME(),"TodMeansFox")
INSERT INTO Customer VALUES (16,"TR1","Ted","Roche","555-5513",DATETIME(),"TodMeansFox")
 
*-- delete duplicates (we don't care about the surrogate 
*-- key or the audit information).
DELETE FROM Customer ;
  WHERE cust_key NOT IN ;
    (SELECT MAX(cust_key) FROM Customer ;
      GROUP BY cust_id, cust_fname, cust_lname, cust_phone)
 
*-- results
SET DELETED OFF
BROWSE NORMAL

This works by selecting the most recent issuance of the primary key (using MAX) and discarding the others using the subquery. If you run the subquery on its own, you can see how SELECT MAX plus the GROUP BY clause gives you the most recent version of the row that is distinct from the others (’Rick’ and ‘Kevin’ only show once) by leaving out fields that don’t identify the row (like auditing information or perhaps de-identified attributes). Next, we delete everything that is not in this list.

This works in all ANSI SQL environments were subqueries are supported in the DELETE statement.

Published September 10th, 2007

Live from FoxForward: The Last Day

Yesterday was the last day of the conference, and I’m sorry it’s ended. I learned a lot and met a lot of great Fox developers (who are people too, by the way). As the sessions were winding down, many of us hung out outside the conference rooms chatting about all things Fox (and a little about cars). I spoke for a second time today (materials here), and was able to catch a few other sessions. Here are some of my notes:

GDIPlusX library

Bo spoke on the GDIPlusX Library. After a brief overview and some instructions on how to download and get the System object in memory, he took a moment to discuss how system.prg works. What a great piece of architecture. If you haven’t checked out the GDIPlusX Library, you should at least download it and review the system.prg file (and then all the others too). GDIPlusX looks a lot like the .NET implementation on purpose, and works the same too.

Of course, at the moment, my head is spinning on all the other implementations for this type of architecture (Math, IO, etc.).

Now here’s a great tip. Declared DLLs take precedent over function calls. Therefore, if you name your own function the same name as a declared library, your code won’t run — the library’s code will. Bo showed us how he and his team developed a technique to take advantage of this behavior. When you need a dll function, call a VFP function with the same name that DECLAREs the DLL and then calls it (passing in all relevant parameters). Now the DLL function is in memory, and next time you need it, it will use it instead of the function. This means that you only need to declare your DLLs when needed. It also means that CLEAR DLLS ALL commands won’t kill your app if you need a DLL after that call (by the way, this is something I lost sleep over a while ago when some 3rd party function issued this command). Additionally, you now only DECLARE the DLLs when needed. I think Bo mentioned that this technique was first used by Rick Strahl, but on a smaller scale.

Bo also talked about how the GDIPlusX help is coming along. It looks like we’ll have more help for this library than we’ll ever need. Bo and his team are doing an amazing job with it.

A great resource for GDIPlusX, other than the official CodePlex/VFPX site, is Cesar Chalom’s Blog. Cesar has a lot of examples and is very active on the project.

How to OGLE Your Application

Alen Stevens taught us about OGLE: “Optimized, Generic, Layer, Extensions”. The goals of OGLE apps is to be future proof, data source agnostic, location independent, and .NET accessible. OGLE relies on physical n-Tier development techniques, where XML is passed among layers. Using XML in this way helps applications to be future proof. After the session, I talked with Alan about the architecture and he reiterated the importance of using XML to pass data between layers. We’re also reminded that VFP is a state-of-the-art XML parser — faster and more reliable than other available technologies. Therefore, using XML to pass data back and forth is a natural fit for OGLE.

The presentation was a perfect compliment to the whitepaper Alan created. The slides helped shape out the logic in the paper. Now I’ve got my head around it. He walked through lots of code examples, demonstrating the basics of the architecture. I haven’t used com as a middle tier before, so I was glued to this session. I felt that Alan gave a great overview of the process and now I feel like I’m up to speed. He showed us how to set up COMATTRIB attribues, which specifies type library attributes and values for PEMs. This applies to OLEPUBLIC classes only. You can do things like insert descriptions and read-only attributes. These are exposed and can be read by other technologies.

OGLEing apps seem like a smart design move. The separation and isolation of layers in each tier are more dramatic and defined. Although I missed Alan’s test-driven-development (TDD) session, I got the feeling that OGLE is also quite friendly for TDD as well. Whether using FoxUnit (Alan loves this thing) or if you’re just writing stubs to test apps, the architecture makes it simple.

I’ll be sure to recap the entire event after my wife and I get out of the Tennessee mountains on Tuesday. I’ll also be at AFUG Tuesday night.

More on FoxForward over at Kevin’s blog: http://cully.biz/. Alan said he might post too, so check out http://netcave.org/.