Sample Standard Deviation (and SQL)
The CALCULATE command contains a range of statistical operations that can be utilized in FoxPro. These commands work on fields in a table (or cursor) and can be filtered using a FOR clause and SCOPE operator. Calculating the mean, standard deviation, and variance is rather straightforward:
CALCULATE AVG(fieldname) TO variable CALCULATE STD(fieldname) TO variable CALCULATE VAR(fieldname) TO variable
And some other useful functions:
CALCULATE MIN(fieldname) TO variable CALCULATE MAX(fieldname) TO variable CALCULATE CNT(fieldname) TO variable CALCULATE SUM(fieldname) TO variable
In a previous blog entry I wrote a program to calculate the median from a set of data. VFP does not have a native median function.
Population vs. Sample Data.
Visual FoxPro doesn’t have a built-in standard deviation formula for a sample. The functions CALCULATE STD() is a population function. The same is true of variance. The question then becomes, how can I get a sample standard deviation? I’m glad you asked! Here’s how:
FUNCTION std_devs LPARAMETERS tcTableField, tcPredicates LOCAL nRet, nsavg , nscnt AS float SELECT(JUSTSTEM(tcTableField)) IF !EMPTY(tcPredicates) CALCULATE AVG(&tcTableField) FOR &tcPredicates TO nsavg CALCULATE CNT(&tcTableField) FOR &tcPredicates TO nscnt SELECT SUM( (&tcTableField- nsavg) ^ 2 ) as ssum ; FROM JUSTSTEM(tcTableField) WHERE &tcPredicates INTO CURSOR crTmpSDEVS2 ELSE CALCULATE AVG(&tcTableField) TO nsavg CALCULATE CNT(&tcTableField) TO nscnt SELECT SUM( (&tcTableField- nsavg) ^ 2 ) as ssum ; FROM JUSTSTEM(tcTableField) INTO CURSOR crTmpSDEVS2 ENDIF nRet = SQRT(crTmpSDEVS2.ssum / (nscnt-1) ) USE IN crTmpSDEVS2 RETURN nRet ENDFUNC
What about STD or VAR in an SQL statement? Again, glad you asked!
SELECT DISTINCT std_devp("crtmp.value") AS stddev ; FROM crtmp INTO CURSOR cResult SELECT DISTINCT std_devs("crtmp.value") AS stddev ; FROM crtmp INTO CURSOR cResult *-- here is the population version fit for SQL: FUNCTION std_devp LPARAMETERS tcTableField, tcPredicates LOCAL nRet AS float IF !EMPTY(tcPredicates) CALCULATE STD(&tcTableField) ; FOR &tcPredicates TO nRet IN JUSTSTEM(tcTableField) ELSE CALCULATE STD(&tcTableField) ; TO nRet IN JUSTSTEM(tcTableField) ENDIF RETURN nRet ENDFUNC
A big issue with a custom function in an SQL statement is performance. The sample standard deviation function I wrote above is certainly not going to win any sprints (but might just hold out during a marathon). Some environmental tests also need to be done in the above code (for example, resetting the alias back and checking for deleted would be nice improvements). For what it’s worth, this should be a good stepping stone for those in need of a good sample standard deviation formula in pure FoxPro. In addition, using it as a function call in an SQL statement increases its usability and worth.
Data Warehousing and Business Intelligence consultant, with expertise in business analysis, data modeling, and data integration. Extensive experience developing vertical and integrated desktop and Internet applications spanning municipal, clinical, and financial industries.
January 5th, 2008 at 2:32 pm
Hi Todd,
The Std (standard deviation) function can be done entirely in VFP SQL.
Can be combined with GROUP BY which can save a lot of code.
Assumin a numerical column ’score’ in a table ‘Scores’:
SELECT SQRT(AVG(score^2)-AVG(score)^2) FROM Scores
A Samples STD can also be calculated:
SELECT SQRT(COUNT(*)*(AVG(score^2)-AVG(score)^2)/(COUNT(*)-1+0.000001)) FROM Scrores
or a Variance
SELECT COUNT(*)*(AVG(score^2)-AVG(score)^2)/(COUNT(*)-1+0.000001) FROM Scores
The +0.000001 getsa around a possible divide by zero error.
Cheers
Anders
Anders Altberg
Pragam