Tod means Fox

Business Intelligence, Data Warehousing, SQL, Visual FoxPro.

Archive for December, 2006


Published December 12th, 2006

Sample Standard Deviation (and SQL)

The CALCULATE command contains a range of statistical operations that can be utilized in FoxPro. These commands work on fields in a table (or cursor) and can be filtered using a FOR clause and SCOPE operator. Calculating the mean, standard deviation, and variance is rather straightforward:

CALCULATE AVG(fieldname) TO variable
CALCULATE STD(fieldname) TO variable
CALCULATE VAR(fieldname) TO variable

And some other useful functions:

CALCULATE MIN(fieldname) TO variable
CALCULATE MAX(fieldname) TO variable
CALCULATE CNT(fieldname) TO variable
CALCULATE SUM(fieldname) TO variable

In a previous blog entry I wrote a program to calculate the median from a set of data. VFP does not have a native median function.

Population vs. Sample Data.

Visual FoxPro doesn’t have a built-in standard deviation formula for a sample. The functions CALCULATE STD() is a population function. The same is true of variance. The question then becomes, how can I get a sample standard deviation? I’m glad you asked! Here’s how:

FUNCTION std_devs
    LPARAMETERS tcTableField, tcPredicates
    LOCAL nRet, nsavg , nscnt AS float  
 
    SELECT(JUSTSTEM(tcTableField))
 
    IF !EMPTY(tcPredicates)
        CALCULATE AVG(&tcTableField) FOR &tcPredicates TO nsavg
        CALCULATE CNT(&tcTableField) FOR &tcPredicates TO nscnt
        SELECT SUM( (&tcTableField- nsavg) ^ 2 ) as ssum ;
            FROM JUSTSTEM(tcTableField)
            WHERE &tcPredicates
            INTO CURSOR crTmpSDEVS2
    ELSE
        CALCULATE AVG(&tcTableField) TO nsavg
        CALCULATE CNT(&tcTableField) TO nscnt
        SELECT SUM( (&tcTableField- nsavg) ^ 2 ) as ssum ;
            FROM JUSTSTEM(tcTableField)
            INTO CURSOR crTmpSDEVS2
    ENDIF
 
    nRet = SQRT(crTmpSDEVS2.ssum / (nscnt-1) )
 
    USE IN crTmpSDEVS2  RETURN nRet
 
ENDFUNC

What about STD or VAR in an SQL statement? Again, glad you asked!

SELECT DISTINCT std_devp("crtmp.value") AS stddev ;
    FROM crtmp INTO CURSOR cResult
SELECT DISTINCT std_devs("crtmp.value") AS stddev ;
    FROM crtmp INTO CURSOR cResult
 
*-- here is the population version fit for SQL:
FUNCTION std_devp
    LPARAMETERS tcTableField, tcPredicates
    LOCAL nRet AS float
 
    IF !EMPTY(tcPredicates)
        CALCULATE STD(&tcTableField) ;
            FOR &tcPredicates TO nRet IN JUSTSTEM(tcTableField)
    ELSE
        CALCULATE STD(&tcTableField) ;
            TO nRet IN JUSTSTEM(tcTableField)
    ENDIF
 
    RETURN nRet 
 
ENDFUNC

A big issue with a custom function in an SQL statement is performance. The sample standard deviation function I wrote above is certainly not going to win any sprints (but might just hold out during a marathon). Some environmental tests also need to be done in the above code (for example, resetting the alias back and checking for deleted would be nice improvements). For what it’s worth, this should be a good stepping stone for those in need of a good sample standard deviation formula in pure FoxPro. In addition, using it as a function call in an SQL statement increases its usability and worth.