There are many ways to parse or tokenize data in VFP. I like using STREXTRACT and GETWORDNUM for example. But there are cases in which the actual parsing needs to be a bit smarter than what these common methods offer. Take a concatenated name field for example. Common to many legacy (and unfortunately, some modern) systems, is a ‘name’ or ‘address1′ field that stores a person or company’s full name. This method is not looked upon favorably by data integrators or people who like atomic, normalized data (like me!). These types of fields don’t work well with indexes either.

Of course, the design solution is to create 5 or more fields that represent a person’s name: prefix / first / middle / last / suffix. These atomic values can then be arranged as needed (”last, first middle”; “first last”, initials only, etc.) in reports and for display. But oftentimes, making this decision now is too late.

An alternative is to write a smart parser that can place each name element into its appropriate box. In the following code example, I provide a solution that can handle this tasks. My only assumption for this code is that the name is stored in prefix / first / middle / last / suffix order. By data profiling, you can determine how true and consistent this assumption is on your data. You may discover that 95% of the file is stored in lastname / firstname order instead. Whatever the case may be, you can use the method below as a starting point.

The following snippet gets us started, and in the end, loops through the names and prints the parsed results to screen:

#DEFINE PREFIX  1
#DEFINE FIRSTNAME  2
#DEFINE MIDDLENAME  3
#DEFINE SURNAME  4
#DEFINE SUFFIX  5
 
CLEAR
 
*-- some names to demonstrate
DIMENSION aMyNames[13]
aMyNames[1] = "Andrew MacNeill"
aMyNames[2] = "Andy Kramek"
aMyNames[3] = "Bernard Bout"
aMyNames[4] = "Calvin Hsia"
aMyNames[5] = "Cesar Chalom"
aMyNames[6] = "Craig Baily"
aMyNames[7] = "Craig Berntson"
aMyNames[8] = "Emerson Santon Reed"
aMyNames[9] = "Eric den Doop"
aMyNames[10] = "gonzomaximus"
aMyNames[11] = "Tod J. McKenna II"
aMyNames[12] = "Dr. Ralph Kimball"
aMyNames[13] = "Miss Lucy Jones"
 
*-- placeholder for tokenized name, passed by reference into split_name
DIMENSION aParts[1]
 
FOR x = 1 TO 13
  cLastFirst = split_name(aMyNames[x], @aParts)
  ? cLastFirst
  ? "firstname: " + aParts[FIRSTNAME] + CHR(9)
  ?? "lastname: " + aParts[SURNAME]
NEXT

Here’s the code in function “split_name”

FUNCTION split_name
LPARAMETERS tcName , taParts
 
*-- variables and arrays I will use
LOCAL cPart AS Character
LOCAL x , nTotParts AS Integer
LOCAL lIsSuffix , lIsPrefix AS Logical
LOCAL ARRAY aTemp[1]
EXTERNAL ARRAY taParts
 
*-- get the data from tcName into an array for easy looping
tcName = STRTRAN(UPPER(ALLT(tcName))," ",CHR(13))
DIMENSION taParts[5]
STORE "" TO taParts
DIMENSION aTemp[1]
FOR x = 1 TO MEMLINES(tcName)
  cPart = ALLTRIM(MLINE(tcName,x))
  IF !EMPTY(cPart)
    aTemp[ALEN(aTemp,1)] = cPart
    IF x < MEMLINES(tcName)
      DIMENSION aTemp[ALEN(aTemp,1)+1]
    ENDIF
  ENDIF
NEXT
 
*-- now loop through temp array and put each name segment into a smart little box
nTotParts = ALEN(aTemp,1)
FOR x = 1 TO nTotParts
  IF VARTYPE(aTemp[x])!="C"
    LOOP
  ENDIF
  *-- get rid of any periods used in the part, and convert to uppercase
  cPart = STRTRAN(ALLTRIM(UPPER(aTemp[x])),".","")
  IF nTotParts = 1
    *-- Madonna? gonzomaximus? Treat single names as a lastname
    taParts[SURNAME] = cPart
  ELSE
    *-- Note: A data profile on a name file I parsed a few years back revealed
    *-- the following common name prefixes. Beef this list up if you have others!
    lIsPrefix = INLIST(cPart,"MR","MRS","MS","MISS","DR","PROF","SIR",;
                  "MASTER","REV","REVERAND","FATHER","SISTER","BROTHER",;
                  "BR","SIS","ATTORNEY","COL","COLONOL","REP","PRES")
    IF lIsPrefix
      taParts[PREFIX] = cPart
      LOOP
    ENDIF
    *-- Note: The same profile mentioned above revealed the following suffixes
    lIsSuffix = INLIST(cPart,"PHD","II","III","IV","V","JR","SR","ESQ","CPA","CEA","DD","MD","JD")
    IF lIsSuffix
      taParts[SUFFIX] = cPart
      LOOP
    ENDIF
    IF x = nTotParts
      IF EMPTY(taParts[SURNAME])
        taParts[SURNAME] = cPart
      ELSE
        IF EMPTY(taParts[FIRSTNAME])
          taParts[FIRSTNAME] = cPart
        ELSE
          taParts[MIDDLENAME] = cPart
        ENDIF
      ENDIF
    ELSE
      IF EMPTY(taParts[FIRSTNAME])
        taParts[FIRSTNAME] = cPart
      ELSE
        IF EMPTY(aParts[MIDDLENAME])
          taParts[MIDDLENAME] = cPart
        ELSE
          taParts[SURNAME] = cPart
        ENDIF
      ENDIF
    ENDIF
  ENDIF
NEXT
 
RETURN ALLTRIM(ALLTRIM(taParts[SURNAME]) + ", " +;
         ALLTRIM(taParts[FIRSTNAME]) + " " + ;
         ALLTRIM(taParts[MIDDLENAME]) )
ENDFUNC

So there you have it. The code above looks at the tcName variable and attempts to dump its parts into the taParts array. Some defined constants exist to help ease the “where the heck am I in this array” confusions (a practice I use for stuff like this all the time). As an added bonus, the code returns the name in last / first middle order, which is usually a good format for display purposes.