Parsing firstname / middlename / lastname
There are many ways to parse or tokenize data in VFP. I like using STREXTRACT and GETWORDNUM for example. But there are cases in which the actual parsing needs to be a bit smarter than what these common methods offer. Take a concatenated name field for example. Common to many legacy (and unfortunately, some modern) systems, is a ‘name’ or ‘address1′ field that stores a person or company’s full name. This method is not looked upon favorably by data integrators or people who like atomic, normalized data (like me!). These types of fields don’t work well with indexes either.
Of course, the design solution is to create 5 or more fields that represent a person’s name: prefix / first / middle / last / suffix. These atomic values can then be arranged as needed (”last, first middle”; “first last”, initials only, etc.) in reports and for display. But oftentimes, making this decision now is too late.
An alternative is to write a smart parser that can place each name element into its appropriate box. In the following code example, I provide a solution that can handle this tasks. My only assumption for this code is that the name is stored in prefix / first / middle / last / suffix order. By data profiling, you can determine how true and consistent this assumption is on your data. You may discover that 95% of the file is stored in lastname / firstname order instead. Whatever the case may be, you can use the method below as a starting point.
The following snippet gets us started, and in the end, loops through the names and prints the parsed results to screen:
#DEFINE PREFIX 1 #DEFINE FIRSTNAME 2 #DEFINE MIDDLENAME 3 #DEFINE SURNAME 4 #DEFINE SUFFIX 5 CLEAR *-- some names to demonstrate DIMENSION aMyNames[13] aMyNames[1] = "Andrew MacNeill" aMyNames[2] = "Andy Kramek" aMyNames[3] = "Bernard Bout" aMyNames[4] = "Calvin Hsia" aMyNames[5] = "Cesar Chalom" aMyNames[6] = "Craig Baily" aMyNames[7] = "Craig Berntson" aMyNames[8] = "Emerson Santon Reed" aMyNames[9] = "Eric den Doop" aMyNames[10] = "gonzomaximus" aMyNames[11] = "Tod J. McKenna II" aMyNames[12] = "Dr. Ralph Kimball" aMyNames[13] = "Miss Lucy Jones" *-- placeholder for tokenized name, passed by reference into split_name DIMENSION aParts[1] FOR x = 1 TO 13 cLastFirst = split_name(aMyNames[x], @aParts) ? cLastFirst ? "firstname: " + aParts[FIRSTNAME] + CHR(9) ?? "lastname: " + aParts[SURNAME] NEXT
Here’s the code in function “split_name”
FUNCTION split_name LPARAMETERS tcName , taParts *-- variables and arrays I will use LOCAL cPart AS Character LOCAL x , nTotParts AS Integer LOCAL lIsSuffix , lIsPrefix AS Logical LOCAL ARRAY aTemp[1] EXTERNAL ARRAY taParts *-- get the data from tcName into an array for easy looping tcName = STRTRAN(UPPER(ALLT(tcName))," ",CHR(13)) DIMENSION taParts[5] STORE "" TO taParts DIMENSION aTemp[1] FOR x = 1 TO MEMLINES(tcName) cPart = ALLTRIM(MLINE(tcName,x)) IF !EMPTY(cPart) aTemp[ALEN(aTemp,1)] = cPart IF x < MEMLINES(tcName) DIMENSION aTemp[ALEN(aTemp,1)+1] ENDIF ENDIF NEXT *-- now loop through temp array and put each name segment into a smart little box nTotParts = ALEN(aTemp,1) FOR x = 1 TO nTotParts IF VARTYPE(aTemp[x])!="C" LOOP ENDIF *-- get rid of any periods used in the part, and convert to uppercase cPart = STRTRAN(ALLTRIM(UPPER(aTemp[x])),".","") IF nTotParts = 1 *-- Madonna? gonzomaximus? Treat single names as a lastname taParts[SURNAME] = cPart ELSE *-- Note: A data profile on a name file I parsed a few years back revealed *-- the following common name prefixes. Beef this list up if you have others! lIsPrefix = INLIST(cPart,"MR","MRS","MS","MISS","DR","PROF","SIR",; "MASTER","REV","REVERAND","FATHER","SISTER","BROTHER",; "BR","SIS","ATTORNEY","COL","COLONOL","REP","PRES") IF lIsPrefix taParts[PREFIX] = cPart LOOP ENDIF *-- Note: The same profile mentioned above revealed the following suffixes lIsSuffix = INLIST(cPart,"PHD","II","III","IV","V","JR","SR","ESQ","CPA","CEA","DD","MD","JD") IF lIsSuffix taParts[SUFFIX] = cPart LOOP ENDIF IF x = nTotParts IF EMPTY(taParts[SURNAME]) taParts[SURNAME] = cPart ELSE IF EMPTY(taParts[FIRSTNAME]) taParts[FIRSTNAME] = cPart ELSE taParts[MIDDLENAME] = cPart ENDIF ENDIF ELSE IF EMPTY(taParts[FIRSTNAME]) taParts[FIRSTNAME] = cPart ELSE IF EMPTY(aParts[MIDDLENAME]) taParts[MIDDLENAME] = cPart ELSE taParts[SURNAME] = cPart ENDIF ENDIF ENDIF ENDIF NEXT RETURN ALLTRIM(ALLTRIM(taParts[SURNAME]) + ", " +; ALLTRIM(taParts[FIRSTNAME]) + " " + ; ALLTRIM(taParts[MIDDLENAME]) ) ENDFUNC
So there you have it. The code above looks at the tcName variable and attempts to dump its parts into the taParts array. Some defined constants exist to help ease the “where the heck am I in this array” confusions (a practice I use for stuff like this all the time). As an added bonus, the code returns the name in last / first middle order, which is usually a good format for display purposes.
I'm a Quant Technical Specialist (Data Warehousing and Business Intelligence), with expertise in business analysis, data modeling, and data integration. I have extensive experience developing vertical and integrated desktop, Internet, and BI applications spanning municipal, clinical, and financial industries.
