Tod means Fox

Business Intelligence, Data Warehousing, SQL, Visual FoxPro.

Archive for July, 2008


Published July 30th, 2008

Aggregate Facts and Dimensions

Aggregate — picture from http://www.stonetohome.comAggregate fact tables are special fact tables in a data warehouse that contain new metrics derived from one or more aggregate functions (AVERAGE, COUNT, MIN, MAX, etc..) or from other specialized functions that output totals derived from a grouping of the base data. These new metrics, called “aggregate facts” or “summary statistics” are stored and maintained in the data warehouse database in special fact tables at the grain of the aggregation. Likewise, the corresponding dimensions are rolled up and condensed to match the new grain of the fact.

These specialized tables are used as substitutes whenever possible for returning user queries. The reason? Speed. Querying a tidy aggregate table is much faster and uses much less disk I/O than the base, atomic fact table, especially if the dimensions are large as well. If you want to wow your users, start adding aggregates. You can even use this “trick” in your operational systems to serve as a foundation for operational reports. I’ve always done this for any report referred to by my users as a “Summary”. (As an aside, there is a difference between an “aggregate” and a “summary”. I’ll explore these differences in my next post.)

For example, take the “Orders” business process from an online catalog company where you might have customer orders in a fact table called FactOrders with dimensions Customer, Product, and OrderDate. With possibly millions of orders in the transaction fact, it makes sense to start thinking about aggregates.

To further the above example, assume that the business is interested in a report: “Monthly orders by state and product type”. While you could generate this easily enough using the FactOrders fact table, you could likely speed up the data retrieval for the report by at least half (but likely much, much more) using an aggregate.

Here, using the atomic transaction FactOrders table:

SELECT c.STATE, p.product_type, t.YEAR, t.MONTH, SUM(f.order_amount)
    FROM FactOrders f
    JOIN DimCustomer c ON c.CustomerID = f.CustomerID
    JOIN DimProduct p ON p.ProductID = f.ProductID
    JOIN DimTime t ON 
        t.YEAR = DATEPART(yy, f.YearMonthID) AND 
        t.MONTH = DATEPART(mm, f.DateID)
    GROUP BY c.STATE, p.product_type, t.YEAR, t.MONTH

The aggregate is querying much less data and queries against time are now much simpler. In my non-scientific tests, the following query ran many times faster (a few seconds compared to about 30 seconds!).

SELECT c.STATE, p.product_type, t.YEAR, t.MONTH, SUM(f.order_amount)
    FROM FactOrders_Agg1 f
    JOIN DimCustomerState c ON c.CustomerStateID = f.CustomerStateID
    JOIN DimProductType p ON p.ProductTypeID = f.ProductTypeID
    JOIN DimMonth t ON t.YearMonthID = f.YearMonthID
    GROUP BY c.STATE, p.product_type, t.YEAR, t.MONTH

Creating the Fact and Dimensions

To implement, you will need to roll up your fact table by the hierarchies found in your dimensions. The result will be a new fact table, a set of new accompanying dimensions at the grain of the fact, and all new foreign keys for mapping. I usually name the fact table the same as the base fact with some meaningful suffix appended to the end. In SSMS, this keeps the aggregates with the fact in my object explorer. Dimensions usually get new names (like CustomerState and ProductType) and should be conformed so that they can be reused across business processes. You could even create views instead of new dimensions, but this does not eliminate the need to regenerate new surrogate keys.

When rolling up dimensions, you are provided with an excellent opportunity to perform aggregate functions on the dimension itself and store the results as new attributes. For example, you may want to know how many customers are living in each state. This could be used as the denominator in some population calculation you plan to use against the aggregate fact. Your new dimension might therefore look like the following:

SELECT Cust_Country, Cust_Region, Cust_State, COUNT(Cust_ID) 
FROM DimCustomer 
GROUP BY Cust_Country, Cust_Region, Cust_State

The most obvious aggregate function to use is COUNT, but depending on the type of data you have in your dimensions, other functions may prove useful. Just be warned: If you find that aggregate functions are being used a lot on your dimensions, you may need to revisit your design. There may be opportunities to pull out those metrics into existing or new fact tables!

Generating aggregates is largely an incremental process, where you examine query and reporting usage looking for places to improve performance. Aggregates stored in the RDBMS are maintained through ETL and/or your OLAP engine.

A Note About OLAP Aggregates

Theoretically, storing aggregates in a fact table in a RDBMS is the same as storing them in an OLAP cube. In OLAP storage, aggregates are precalculated summaries of data from the different dimensions of the cube, such that a query that seeks to know the aggregate (sum) of some metric (order amount) for X (customer state) and Y (product type) over T (monthly orders) would only need to look inside the cube at those exact coordinates to get the answer. I won’t pretend to know how this stuff is physically stored, but OLAP engines across the board offer better performance, management, and navigation mechanisms for the aggregations than is available through the RDBMS (even when using Indexed or Materialized Views).

Next post, I’ll write some thoughts on the differences between “Summaries” and “Aggregates”!

Published July 25th, 2008

Business Casual

So today I am wearing jeans and a faded (but once pleasant) button-down shirt. No tie, although my shoes are nice and I am wearing dark socks. A few others around me are similarly dressed, which is typical for summer Fridays. Then there are those who are in full suits, as if preparing for a job interview or some important sales meeting. Others are in suits but without tie which is in-and-of-itself a very strange practice; I call these “half-suits”.

You can draw two lines in the sand separating all three groups of Friday dressers. You have the “workers” – those actively engaged in business operations like myself; the middle management-type who are no longer “workers” and who aspire to delegate more and more activities, you know the type – it’s just like them not to wear ties with their suits; lastly, there are those who on the one end of the spectrum make only infrequent strategic decisions of the C-level type and on the other end those who work for mostly commission and must rely on their superior charisma (and sharp suits) to get ahead.

My colleague and I are working feverishly at the moment to improve the performance of one of our BI applications: A process we hope to improve from roughly 15 minutes to about 7 minutes (so half-suit and full-suit can have more time at the cooler). In addition, I am troubleshooting a foreign key violation in one of our ETL loads, and my partner-in-crime is hunting down the results of some replication testing in our production environment. Meanwhile, a full suit is currently browsing an online golf store; the half-suits are centered around the water cooler.

This, symbolically, highlights the problems with business and IT alignment in general — especially in large organizations. I find that IT is normally of the first variety – willing to dress down whenever possible to add a little comfort to an otherwise fast-paced existence full of responsibility and accountability. Dressing down in no-way implies a dress down of activities or a dumbing down of skills.

As you can tell, this is a bit of a rant and a fallacious attempt at tossing my colleagues into generalized buckets. But one thing is very true: business and IT need to get in sync. I would like to think that my team is above average in this regard. The immediate team consists of business and IT personnel - all of which are fighting for a successful project.

‘d like to hear your thoughts on this matter….

Published July 18th, 2008

ETL Subsystem 18: Fact Table Provider

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

This subsystem mirrors the previous one I discussed: ETL Subsystem 17: Dimension Manager. It exists to ensure that fact tables are properly maintained and delivered to the organization. The functionality to accomplish this is done through ETL (with some help by other technologies, such as replication). The functions include (but are not limited to) the following significant tasks:

Note: I’ll discuss Aggregates more in my next post.

SQL Server 2005 Integration Services (SSIS)

Fact distribution can be done using ETL or through replication. While other methods exist (RPCs, Web Services, and other SOA-type solutions for example), ETL and replication are by far the most widespread and documented. SSIS works in both conditions just fine. For more thoughts on this, refer back to ETL Subsytem 17.

Hand Coding with Visual FoxPro (VFP9)

Nothing new to report! VFP is limited, without a native replication feature, on its own. It can move data in distributed environments quite well though. So using ETL as a solution is certainly high on the list. If replication is a must, then the VFP database backend will need to be replaced with SQL Server or similar RDBMS product.

From Here

As stated, in the next subsystem, I’ll talk about aggregates. I love aggregates (my wife wants a divorce), so my next post looks to be quite interesting!

Published July 14th, 2008

ETL Subsystem 17: Dimension Manager

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

This subsystem is largely a management task. It involves ensuring that conformed dimensions (see my post “ETL Subsystem 8: Data Conformance“) are properly published to the organization. Conformed Dimensions are one of the most important and fundamental aspects of data warehousing. In order to deliver these conformed dimensions to business users spread across the organization — including geographically — a system must be in place to deal them out.

In simple environments (a single RDBMS server that services a handful of analysts, for example), dimension management is as easy as inserting new rows through ETL and updating data model metadata by the DBA on a single server. In more complex environments — where servers and applications are scaled up and out, across large geographies — dimension management becomes more of a challenge.

It is common in these distributed environments (horizontally and vertically) to have copies of dimensions sitting at various locations in the enterprise. Don’t confuse this with “Mart Madness” or “Silos”. These copies are managed by a single ETL process which not only handles normal ETL, but likely handles the distribution as well (replication, as I’ll talk about in a moment, is another option).

Why not have remote users connect to a single, centralized database? A Dimensional Model exists to facilitate analytics and query performance. To this end, it is necessary at times to move data closer to the analyst. Not only is this necessary, but it is also smart. If you have presentation servers in Brazil, Costa Rica, Singapore, and Sri Lanka, then guess what: You will distribute your data directly to these locations. How you do it is more of a question of technology and policy, rather than a process tied to ETL directly.

SQL Server 2005 Integration Services (SSIS)

If you choose to distribute dimensions using ETL, then SSIS will work just fine. This is nothing more than a set of special packages that you design to select changes from the master dimensions (stored on your central integration server), multicast the data, and load to the destination dimension or dimensions. This incremental approach is desirable when you are processing a small amount of changes to medium and large dimensions.

Alternately, if your dimensions are small enough, it might make sense to drop and recreate them on the destination servers. The overhead associated with collecting the changes from the master dimensions might not be worth the effort. But before you think this method the best, be aware that all constraints to this dimension (including those connected through the fact) will need to be dropped. Also, the database will be unavailable to the users during this operation.

Another good option is to simply use Replication to propagate the changes to the various distributed dimensions. I am currently using replication now as my standard, but only because my situation currently dictates it. I have large data volumes, which are updated frequently (several times a day, with sensitivity on global timezones).

Hand Coding with Visual FoxPro (VFP9)

I am starting to sound like a broken record: VFP is not an ETL tool, but an OOP language. Dimension distribution will need to be done manually by either making copies of dimensions and pushing them out to the client systems, or designing a mechanism for the client systems to pull changed data from the master dimensions. I have found this to be a major challenge using VFP as the sole solution. Obviously, if your backend database is SQL Server or another RDBMS, you likely have a replication option. That being said, it certainly is not an impossible task, but one in which you need to prepare yourself for the various complexities. I would go out on a limb and suggest that this is true for all hand-coded solutions.

From Here

In my next post, I will discuss ETL Subsystem 18: Fact Table Provider. As with this subsystem, the Fact Table Provider is largely a management task designed to bring facts to the user community.

Published July 10th, 2008

ETL Subsystem 16: Late Arriving Data Handler

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

Sometimes, not all dimension data is available when a transaction (i.e. business event, such as an order) comes into the Warehouse. This is a fact of data warehousing life. Under normal circumstances, you process dimensions first so that you have all your keys ready for inserting into the fact. When dimensional data is unavailable, missing, or incomplete you end up with a bit of a problem. NULL foreign keys in the fact table are not allowed.

Missed the trainOther times, your fact comes in late. In this scenario, you have all your dimensions, but now you need to insert a fact into the fact table and link it to existing dimension rows that were active at the time of the fact. Late arriving facts are easier to work with than late arriving dimensions, in my opinion. I already discussed some techniques for handling this in my post “ETL Subsystem 14: Surrogate Key Manager“.

I find late arriving dimensions (also called “early arriving facts”) to be a more difficult problem. There are a few of good solutions though, all of which are handled during ETL:

  • Hold onto the fact until all dimensions arrive
  • Create a dimension called ‘unknown’ or ‘not available’ with a primary key of -1 and use that
  • Insert a dummy row in the dimension and populate it with whatever you can

None of these options are terribly appealing across all scenarios; however, each has a place under certain circumstances.

Hold Onto the Fact

Holding onto the fact in your staging area may be a great option if, for example, you expect the late arriving data to be coming in soon. Perhaps you process data from multiple source systems. System A contains the transactions, while systems B, C, and D combine to give you your dimensions. Because of your extraction windows, you may not be able to get data from system D until later in the morning, meaning that the business events from system A are missing some context. Holding onto these events and waiting for the data from system D is the appropriate action!

Even if the situation is not predictable, holding onto facts might be the best option. If you are loading daily stocks, for example, and for some reason a particular company is not available (a new company perhaps), but the stock is showing up in your extracts, you should hold onto that stock in your staging area until the company is available. You don’t want that stock in the warehouse until the full details of the company comes in (which could be available in a separate extract, or may need to be manually entered).

As you can see, deciding to work this way has implications on the ETL processes and staging area. Planning for these circumstances up front is a must.

The ‘Unknown’ Dimension

Creating an empty row in the dimension called ‘Unknown’ is also an option that is good under some circumstances. If you are processing orders and no information about a promotion comes in, then it would be safe to link the fact to a special row in the dimension that denotes that no promotional information was available. Depending on your dimension and business requirements, you could have many different levels of ‘unknown’.

Here are a few suggestions:

  • -1, ‘none’
  • -2, ‘unknown’
  • -3, ‘not applicable’

If you use key -1 in your fact row for the foreign key to the promotion table, you allow your users to write queries and reports for all orders that were made when there was no promotion (promotion.name = ‘none’). The same is true for all facts that have an unknown promotion status, meaning there could have been a promotion, but the data was incomplete. Lastly, there are cases — if you do online orders for example — where promotions simply don’t apply.

You can expand these keys heading backwards with lots of detailed descriptions telling you why a dimension might not exist for an event. Don’t go overboard though! Having a few options is a good idea for data quality and analysis purposes. But too many may overload users with too many choices all too similar to be useful.

This approach works well when the likelihood of acquiring the context later is slim-to-none. If you use this method, it gets very complex if you later do in fact find that there was some context (i.e. a promotion). Your only option then is to revisit your facts and look for, and update, the fact rows corresponding to the late arriving data. On the bright side, if you’ve already tagged these rows with a foreign key of -2, for example, you may be able to focus only on those facts — saving you some process time.

Inferring a Dimension

In the company stock example I gave above, you have another option. If the company details of a stock is not known — and all you have is a SEDOL or Ticker — then simply insert a new row into the dimension with all of the information you know to be true about the dimension. This only works if you have enough details about the dimension to construct the natural key. Without this, you would never (although, I’ve learned to never say never) be able to go back and update this dimension row with complete attributes.

To make this work, and if you are using Type 2 SCDs, you will need another attribute in your dimension. Call it “inferred” or similar, make it a bit or char(1), and mark it as true (1, or “Y”).

This has a major implication on your dimensional processing. Now, before you process any Type 2 changes on any dimension that contains an attribute called “inferred”, you need to check if the row you are ready to work on is in fact inferred. If so, treat the entire row as Type 1. Because it is inferred, we don’t have the details anyway. If we went along and processed the row as a Type 2, then your fact will continue to point to an essentially empty row.

SQL Server 2005 Integration Services (SSIS)

The SCD Component handles the inferred dimension approach quite nicely. In fact, “Inferred Dimension” is a term introduced by SSIS (I believe). I talked briefly about the SCD Component and inferred dimensions in my post “ETL Subsystem 9: Slowly Changing Dimensions (SCD)” so I won’t repeat myself here. The bottom line is that SSIS has a very elegant solution to the late arriving dimension problem. If you have them — and almost every data warehouse has late arriving dimensions — then using the SCD Component and the inferred dimension option is must. Also check out this external link for more inspiration: “Handling Late-Arriving Facts“.

Hand Coding with Visual FoxPro (VFP9)

While SSIS has built-in capabilities to handle the inferred rows, VFP does not. This is not surprising of course because VFP is a language and not an ETL tool! With that said, writing SQL to accommodate the “inferred” flag in a dimension row is not a big deal. The processing logic for a SCD Type 2 dimension then becomes (in pseudo-code):

 
For each row
  If any attribute marked as 'Type 2' changes AND inferred = "N" Then
    update status of active/current row
    insert into dimension a new row
  End If
  For each attributes marked as a 'Type 1' change OR inferred = "Y" Then
    If attribute is subtype 1
        update all rows that match the key
    Else
        update only the current/active row that matches the key
    End if
  End For
End For

I discussed the above pseudo-code in more detail in my post “Loading the Slowly Changing Dimension“. Check that out for more insight into some of the logic. I should mention again that this is a very basic structure and not intended to be comprehensive or de jure.

From Here

Enough about late arriving data! Late arriving data, changed data capture, and surrogate key lookups might be the most challenging aspects of ETL — especially when it comes to performance and efficiency. It will take you some time to get these right. You may — like me — be forever tweaking these areas to squeeze every last second out of the process.

Next, I’ll move on to discuss Dimension Management. A much lighter topic!

Published July 2nd, 2008

ETL Subsystem 15: Bridge Table Builder

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

Before I can discuss this subsystem, I need to discuss bridge tables. This is a topic I haven’t covered much until now. So to help me illustrate how bridge tables are used in a dimensional model, I’ll present a case for a real-estate property transaction.

Sometimes the defined grain of a fact has multiple dimension rows that characterize it. These scenarios are referred to as many-to-many or multi-valued dimensions. A patient encounter at a hospital might have multiple procedures. Multiple bank accounts might belong to a single person. In the case that follows, I’ll show how multi-valued dimensions help support property sales (i.e. a real estate transaction).

I would like to make one important point before I begin: If you can thwart the need for bridge tables by redefining the grain of your fact table to be more atomic, then you should do so. Bridge tables add complexity to the model, ETL process, and the end-user application layer. With that said, multi-values bridge tables in many environments are unavoidable and absolutely necessary.

Case Study: A Real Estate Transaction

In the case of a property transaction (aka a property “sale”), you are potentially dealing with several many-to-many relationships. First of all, you may not be buying alone (one-to-many buyers). You may also not be buying your house from a single person (one-to-many sellers). Your house may be on two parcels of land or no land at all if you are buying an apartment or condo (none-to-many parcels). A barn or in-law apartment may be included or perhaps you are just buying land to build your dream home later (none-to-many buildings). These and other complexities make the case for bridge tables to represent the many-to-many relationships inherent in a complex transaction such as this.

When you buy land or a house (in the US, at least!) you must “record” the sale with the clerk of the municipality in which the property exists. This recording becomes part of the chain of ownership and tax record for the property. This recording can accommodate multiple grantors (the people or organizations selling the property), grantees (those who are buying the property), and properties (multiple parcels, land lines, lots, buildings, outbuildings, units, etc). This recording is given a book and page and filed away in a database. A hard copy is typically stored in a physical book in the clerks’ vault as well. It’s pretty fun (for some people who are into history) to go to your clerk’s office and peruse the vault. Anyway…

What follows is a simplified ER model from an OLTP system I designed several years ago. This model represents the complexities of the above case in a transaction environment. It has been simplified to only show the more important business entities:

Property Sales Schema - OLTP

Note that in the above OLTP system, table Real represents land and building values from tax billing software. These figures have been aggregated into a single row (i.e. total land value and total building value, number of buildings, total acres, etc.). For the above model, knowing the exact building or land line is not important. For the Data Warehouse, though, we want and need this atomic detail for some detailed analysis. The Deed table represents the actual recording (book, page, date, time, etc.). The bridge table ND_link contains grantor and grantee information. NR_link represents ownership history between the names and entities in Namefile and the aggregate property information in table Real.

The business process dimensional model could therefore be modeled as follows:

Business Process Dimensional Model for a Property Sale

As you can see, there are bridge tables between the SaleFact and DimBuilding, SaleFact and DimLand, and SaleFact and DimPerson dimensions. I typically name these tables by including the word “Group” in its name (LandGroup, BuildingGroup, etc.) but you could use the word “Bridge” or any other term that is consistent throughout your models. These bridges define the many-to-many relationships that are normal for a Real Estate transaction. Because a property sale is a natural business event, and cannot be logically defined to be any more atomic, multi-value bridge tables are a must.

In the above schema, a particularly astute and observant modeler will notice the many-to-many relationship between the keys in the fact and bridge. FoxPro, which is where I put together these examples, will not allow a many-to-many join between two keys. So, I Photoshopped the graphic in order to show you the logical representation! In FoxPro, and in many other database systems, you will need another table in between the Group and the Fact with a single attribute primary key. This would represent the physical implementation of the above design. You would therefore need two physical tables to represent the many-to-many relationship. I try my best to avoid this. In FoxPro, for example, I would simply not create the relation at all! Or in some cases, the bridge group would be unique for each event, making the foreign key in the fact table always unique. Being unique, it can be defined as a candidate key and then happily joined to the bridge table (more on this in the section below on “special considerations”).

Bridge Table Characteristics

Bridge tables are not just janitors or pianos (well, they do have all they keys!). Most bridge tables also contain attributes. For the above case, I include two attributes in each bridge: weight and primary indicator.

Weights are used to allocate metrics in the fact. Weights always add up to 1 across the group. If you need to allocate the sale price across all buildings involved in the sale, you would use the weight. Weights are calculated as part of ETL process and are an important part of the mechanisms of Subsystem 15. Sometimes, it is perfectly OK to divide 1 by the number of group records to get the weight (applying any fraction to the first row perhaps). But more often than not, there is a more complex algorithm to determining the weight. For land, you might base the weight on lot size or price code. For buildings, you may base the weight on the replacement cost of the building, or you might use square footage. All of these are defined by the business and applied during ETL.

I use primary indicators because some towns only care about the primary grantor, grantee, building, or parcel for analysis. One row in the bridge table will be identified as ‘primary’. During ETL, the primary row is identified by using a sequence number (number 1 = primary) or by determining the highest valued property. Sometimes, its arbitrary, in that you just mark the first one encountered as primary. These decisions are part of your business requirements and must be carefully implemented during data transformation.

Other Considerations

Guess what? Your bridge table is really a type of fact table. In the above schema, LandGroup is a sort of fact with some semi-additive metrics that represent the relationship between a parcel of land and a sale.

In my example, a sale is a static event. The buildings, land, and people involved will change over time, but the groupings used at the time of the sale will remain the same. For this reason, there is no need to make the group tables Type2 dimensions. If we ever need to change a value in one of these bridge tables, then it is likely to correct a data problem and not to represent a changing dimension. Additionally, in this scenario, it would be a safe assumption — especially because there are weighting factors and primary indicators involved in each bridge — that each grouping is unique to the fact. For modeling purposes, you could avoid the many-to-many relationship issue discussed in the previous paragraphs by making the bridge keys in the fact table candidate keys. There would be a unique set of group keys for these bridge tables. But be cautioned: The volume of property sales is small in any given year for any given municipality. You’ll never see a million transactions in a year. This makes the candidate key approach attractive. If you are dealing with patient diagnosis across multiple hospitals then your bridge tables will become too large. In these cases, you’ll need to consider re-using groups. The consequence of reusing groups is that the key in the fact table can no longer be a candidate.

To be clear: Bridge tables can also be a Type 2 SCD. If this is needed, and tracking bridge table changes over time is important, then you will need to combine bridge building with your SCD logic. You will need activity dates (from and to), and a current row indicator.

As I said at the top: If you can thwart the need for bridge tables by redefining the grain of your fact table to be more atomic, then you should do so. Bridge tables are not for the faint of heart!

SQL Server 2005 Integration Services (SSIS)

Bridge table building is largely a modeling and business rules problem. There is nothing in SSIS that will prohibit you from implementing even the most complex bridge table design. Most difficulties will come when managing the keys and implementing weighting factors. For the keys, you can use the same techniques described in my posts about Subsystem 10 and Subsystem 14. For weightings, I’ve implemented weighting algorithms using both stored procedures (Execute SQL Task) and VB (Script Component) when appropriate.

Hand Coding with Visual FoxPro (VFP9)

The above case is taken from an actual implementation I created in Visual FoxPro. I must say, it was rather simple once I ironed out the nuances and had my weighting algorithms in place. Developing the bridge table builder subsystem was a simple matter of inserting and linking the fact with the multi-valued entity and applying the weight and primary indicator logic. The resulting analytics that can be derived now far outshines that which could be done with the original OLTP model. Not to mention, the dimensional model has atomic detail (not like the aggregate Real table shown in the first schema) which allows for more slicing and dicing. But I digress!

From Here

Next post, I’ll discuss Late Arriving Data (Subsystem 16). That’s a big topic, so I’ll distill it as much as possible to keep it brief!