A few months back I wrote an entry on how to achieve an INTERSECT using VFP. That post was inspired by the INTERSECT command that exists in SQL Server 2005. In set theory, the intersect can be defined as follows: “x is an element of sets A and B if and only if x is an element of A and x is an element of B.”
But what about returning non-intersecting data from two sets? This would be like a union, but that we only want a list of unique rows within each table, that is, the difference. The difference between two sets could be defined as: “x is an element of set A or B when x is not an element of A and B.” In set theory, this is called a “symmetric difference” (like XOR for x-basers!).
The SQL to accomplish this is almost identical to the Intersect query. Here is the general form in VFP:
SELECT [key field], [attributes..] ; FROM (; SELECT [key field], [attributes..] FROM A ; UNION ALL ; SELECT [key field], [attributes..] FROM B ; ) AS tmp ; GROUP BY [key field], [attributes..] ; HAVING COUNT(*) = 1
The change is in the HAVING clause, where instead of checking to see if COUNT(*) = 2, we check to see if COUNT(*) = 1. This gives us the “Not Intersect” of two sets: perfect for determining how two tables are different! The result is a complete list of non-intersecting data.
Here’s an actual example in VFP (followed by an example using T-SQL):
*-- two sets CREATE CURSOR crTempL (keyfld i) CREATE CURSOR crTempR (keyfld i) *-- test data INSERT INTO crTempL VALUES(1) INSERT INTO crTempL VALUES(3) INSERT INTO crTempL VALUES(5) INSERT INTO crTempL VALUES(7) INSERT INTO crTempL VALUES(9) INSERT INTO crTempR VALUES(3) INSERT INTO crTempR VALUES(6) INSERT INTO crTempR VALUES(9) INSERT INTO crTempR VALUES(12) INSERT INTO crTempR VALUES(15) *-- the difference SELECT keyfld ; FROM (; SELECT keyfld FROM crTempL; UNION ALL ; SELECT keyfld FROM crTempR; ) AS tmp1 ; GROUP BY keyfld ; HAVING COUNT(*) = 1
Now, here is the equivalent in T-SQL.
-- two sets CREATE TABLE #crTempL (keyfld INTEGER) CREATE TABLE #crTempR (keyfld INTEGER) -- test data INSERT INTO #crTempL VALUES(1) INSERT INTO #crTempL VALUES(3) INSERT INTO #crTempL VALUES(5) INSERT INTO #crTempL VALUES(7) INSERT INTO #crTempL VALUES(9) INSERT INTO #crTempR VALUES(3) INSERT INTO #crTempR VALUES(6) INSERT INTO #crTempR VALUES(9) INSERT INTO #crTempR VALUES(12) INSERT INTO #crTempR VALUES(15) -- the query SELECT keyfld FROM ( SELECT keyfld FROM #crTempL UNION ALL SELECT keyfld FROM #crTempR ) AS tmp1 GROUP BY keyfld HAVING COUNT(*) = 1
As you can see, the syntax is almost identical. Thank you ANSI SQL standards!
Can anyone come up with a better or more efficient way to find the Symmetric Difference between two tables in TSQL, VFP, or any other DB?