Conforming data and building master data are two very important aspects of data warehousing, migrating legacy systems, and federating reports. A common task is to compare a string to a known format and return the matching value. For example, you may need to take parcel identifier information from multiple towns and do some smart parsing to obtain atomic map, blocks, and lots. Town A might store their Parcel ID in the format MMM-BBB/LLL, while another in the format MM/LL.BBB. You could store all known formats in a table, do some magic to convert all parts to a format code, and finally find the match in the formats table.

In VFP, there are many ways to compare the contents of a string to a particular format. TRANSFORM() and LIKE() come to mind. Using TRANSFORM, you can do something like the following:

lcTestVal = "SOMETEST_VAL"
? TRANSFORM(lcTestVal,"XXXXXXXX_XXX") == lcTestVal     && returns .T.
? TRANSFORM(lcTestVal,"XXXXXXX_XX") == lcTestVal    && returns .F.

LIKEwise, you can do some comparisons with LIKE:

lcTestVal = "SOMETEST_VAL"
? LIKE("????????_???",lcTestVal)    && returns .T.
? LIKE("???????_??",lcTestVal)    && returns .F.

But in some cases, it might be necessary to compare a single string against multiple formats, returning the format that the string matches (a common task when conforming data for data warehousing). There are a lot of ways to handle this, but simply using TRANSFORM or LIKE won’t do it alone.

Take the following example, where we have three different account number formats that come from three disparate source systems. You could manage this easily in FoxPro:

lcFormat1 = "NCCCCCN"
lcFormat2 = "CCCNNNNNN"
lcFormat3 = "NNNNNNNCCC"
DIMENSION aTestValues [10]
aTestValues[1] = "3UUFGP7"
aTestValues[2] = "MCK000989"
aTestValues[3] = "0090892LLG"
aTestValues[4] = "9ABCD2"
aTestValues[5] = "JKL230945"
aTestValues[6] = "JKLM00989"
aTestValues[7] = "JJ2000989"
aTestValues[8] = "3U99G7"
aTestValues[9] = "3UUFGP"
aTestValues[10] = "7XYZA1"
FOR x = 1 TO 10
    ? ICASE( lcTestValMasked == lcFormat1 , "String " + aTestValues[x] + " matches format 1",;
            lcTestValMasked == lcFormat2 , "String " + aTestValues[x] + " matches format 2",;
            lcTestValMasked == lcFormat3 , "String " + aTestValues[x] + " matches format 3",;
            "String " + aTestValues[x] + " does not match any format")

In the above example, there are three acceptable format codes that we’re using to compare against 10 values. Using ICASE, we can easily determine which codes match the format string. The inner CHRTRAN converts all character values to the letter “C”, while the outer CHRTRAN converts numbers to “N”. If we did the numbers first, our “N”s would become “C”s! Of course, you can use “X” and 9, “A” and “#”, or whatever letters you want to use as your mask.

Ideally, the format codes would belong to a table which would contain additional information about the code, such as where it came from and its status (active or inactive). Then, a loop through the table, or a seek on the format column would point you to the corresponding record.

I would love to hear some additional solutions on this. If you have any, please comment!