Ragged Hierarchy Alert
So this week I was faced with writing yet another hierarchy script. I say this with affection, of course. I find that developing efficient hierarchies can sometimes be quite a challenge depending on the data set. When there are only a few records (a hundred or so) almost any method will work (SQL, procedures, or a mixture). As the data sets grow, the method becomes more and more important. Couple this with the hierarchy requirements (can a child have more than one parent? Is there a maximum depth? Will the result be used to populate a tree control? Do I need to process each branch now or can I just get the top level nodes? Etc.) and the tidy and efficient code you imagined is now becoming bloated and difficult to debug.
There are two ways to store hierarchies in a database: Using a recursive pointer or a bridge table. The recursive pointer is possibly the most widely used. Each record has a primary key and a foreign key that points to the primary key of another record in the same table (more on this in a bit). This approach is simple to maintain and intuitive. The bridge table provides much more flexibility (for example, a child can have more than one parent), but is more difficult to maintain (you do have another table to update, and there could be issues with time stamps and sequences if a child can have multiple parents). The discussion on bridge table hierarchies ends here. I may address it in a future blog, however.
The key to ragged hierarchies is that any record in the table can participate to any depth or degree. Organizational hierarchies (which I will demonstrate in a moment) are very common and take on a ragged frame; they are also perfect for the recursive pointer. The following figure will give you an idea of how a ragged hierarchy might look:
Like I said earlier in this post, there are lots of ways to snake through this diagram: from using the OVER clause in SQL Server 2005 to writing a procedural program in FoxPro, the possibilities are as varied as the structure. In my VFP applications, I always find that a mixture of SQL, some procedural code, and a recursive function provides the most flexibility and greatest performance. To demonstrate, consider the following:
CREATE CURSOR Employee (emp_id i, emp_name c(25), sup_id i) INSERT INTO Employee VALUES (1,"Martha Jones",0) INSERT INTO Employee VALUES (2,"Dan Brown",1) INSERT INTO Employee VALUES (3,"Ed Smith",2) INSERT INTO Employee VALUES (4,"Sarah Parker",2) INSERT INTO Employee VALUES (5,"Henry Johnson",1) INSERT INTO Employee VALUES (6,"Samual Smyth",5) INSERT INTO Employee VALUES (7,"Ali Jennah",1) INSERT INTO Employee VALUES (8,"Tori Heart",7) INSERT INTO Employee VALUES (9,"Bob Jones",7) INSERT INTO Employee VALUES (10,"Kelly Robinson",7) INSERT INTO Employee VALUES (11,"Manny Diaz",7) INSERT INTO Employee VALUES (12,"Jack White",11) INSERT INTO Employee VALUES (13,"Melinda Jo",11) CREATE CURSOR crTree (; top_level l,; this_id i,; description c(25),; parent_id i,; deep n(2)) *-- run the recursive function SELECT Employee LOCATE snake(0,0) && start at the top FUNCTION snake (tnParenID, tnDeep) LOCAL lcAlias lcAlias = "crTemp_" + TRANSFORM(tnParenID) SELECT * FROM Employee WHERE sup_id = tnParenID INTO CURSOR (lcAlias) tnDeep = tnDeep + 1 SELECT (lcAlias) SCAN put_record(&lcAlias..emp_id, &lcAlias..emp_name, &lcAlias..sup_id, tnDeep) snake(&lcAlias..emp_id, tnDeep) ENDSCAN USE IN (lcAlias) ENDFUNC FUNCTION put_record (tnThis_id, tcDescription , tnParent_id, tnDeep) INSERT INTO crTree (; top_level ,; this_id ,; description,; parent_id ,; deep ) ; VALUES (; tnParent_id=0,; tnThis_id ,; tcDescription,; tnParent_id ,; tnDeep ) ENDFUNC
I start off by creating a dummy Employee table, followed by a tree cursor (crTree)
to hold the results of my work. Next I call the recursive function ’snake’, whose job it is to find each child of the passed in parent, tracking its depth as it goes. The assumption in this program is that there is at least one zero sup_id key for the top-most level. This code works well to populate treeview controls (you could easily add a unique character sequence ID to each record). You can also output a text representation:
FUNCTION show_tree() SELECT crTree LOCATE SCAN ? REPL("-",crTree.deep) + " " + ALLT(crTree.description) ENDSCAN ENDFUNC
There you have it! For 1000 employees, the snake function takes about .8 seconds to run. Anything more than that and you should consider populating the tree only to a certain level (perhaps all parents and one child deep) and filling in the rest as needed. Another approach would be to turn this cursor into a table, relate it to your Employee table and update it as part of a save routine.
I'm a Quant Technical Specialist (Data Warehousing and Business Intelligence), with expertise in business analysis, data modeling, and data integration. I have extensive experience developing vertical and integrated desktop, Internet, and BI applications spanning municipal, clinical, and financial industries.

May 20th, 2008 at 1:36 pm
[…] wrote a while ago on ragged hierarchies from a programming perspective. Take a look at that post for more […]