Posts Tagged Cleanse

Sys(2017) and Comparing Data

I’ve been showing how I use intersect and the symmetric difference to compare data in two datasets. What’s nice about these approaches is that they are extremly flexible. You can, for example, pick just a few columns to compare from each table. So if you had two similar tables, you could easily detect which primary key + select attributes are either in both sets or not in both sets, returning the result into a third table or cursor. I’ve used these techniques in the following scenarios:

  1. to build master lists from client tables
  2. test code changes (run a procedure once, save the results A, make a code change, save the results B, compare differences)
  3. for data integration projects, especially for Changed Data Capture (CDC or incremental update) operations
  4. to check the status of metadata
  5. to do some down-and-dirty comparisons between two or more tables

Sys(2017)

One limitation of the SQL approach that I’ve been demonstrating is that you need to use a GROUP BY clause, which insists that you identify each attribute used in the select list (true for VFP, T-SQL, and other ANSI compliant SQL-enabled languages). This can be a lot of work, especially if you have a handful of tables with dozens of attributes.

The solution? Use a CRC (Cyclical Redundancy Check):

SELECT p_key, crc ;
  FROM (;
    SELECT p_key, PADR(Sys(2017, '', 1, 1),10) as crc FROM CursorA;
    UNION ALL ;
    SELECT p_key, PADR(Sys(2017, '', 1, 1),10) as crc  FROM CursorB;
  ) AS tmp1 ;
  GROUP BY p_key, crc  ;
  HAVING COUNT(*) = 1

SYS(2017) will Return a checksum value based on each row in the table. The second parameter is an exclusion list of attributes that you do not want included in the algorithm. The third parameter is the seed value, ignored when using the CRC32 version of function. The last parameter allows you to set flags to include memos or to use the CRC32 algorithm. In my example above, I am using CRC32 without memo support.

The resulting table shows me the primary keys that intersect (when HAVING COUNT(*)=2) or do not intersect (when HAVING COUNT(*)=1).

I hope you find this useful!

PS: It’s worth noting that you should avoid SYS(2017) in FoxPro versions 8 and below. There is a memory leak (fixed in VFP9) that could crash your app!

Tags: , , ,

3 Comments

Web Analysis with VFP: Download, Parse, and Stage

My Web Analysis application is beginning to take shape. At the moment, I have created a little road map document (which I’ll share shortly), defined all my base classes (the entire application will be made available in the coming months), and proved the general concept (see below).

To recap, this project is about building a tool to help me make better business decisions on and using the Internet. While there are a ton of canned products on the market, I feel that I can get what I need with VFP (and get it better). In general, I want to be able to download a variety of data from my server, analyze it, and use that data to make decisions (everything from what IPs to ban to how I might redesign a certain page). See my post “Business Intelligence Through Web Analysis” for more details.

The following zip file contains all the code necessary to prove the concept. It (a) establishes a database, (b) downloads the log files from the FTP server, and (c) parses the file, storing it in a stage directory, which I can then use to create a Dimensional Model for analysis purposes. Here’s the programs:

web_analytics.zip

In the zip, you will find:

  • wa_main.prg (the setup program, check the path and run this once)
  • functions.prg (a place to store my functions, this will become a class soon)
  • download.prg (set your server information up and run, it calls load_logfile(), which lives in functions.prg)
  • ftp/ftp.prg (FTP services class written by Robert Abram many moons ago — still works great, requires wininet.dll)

Now that the concept has been proved, and I can visually analyze my log files, I will start to build an application in VFP9 around it. Stay tuned!

Tags: , , , , , ,

No Comments