Most of what I discuss on Tod means Fox (and especially in my 34 Subsystem series) has to do with relational database engines (SQL Server 2005 and Visual FoxPro in particular). For most data integration projects, desktop and web solutions, and data warehouses, the relational database is all you’ll need. It can be used to create and manage both relational and dimensional models. But in Business Intelligence applications, chances are that you’ll need an additional, more specialized form of multidimensional data storage and retrieval. Analytical databases cover this need. In fact, Dr. Codd, the inventor of the relational database, often explained how analytical databases are a “necessary companion” to relational databases.
The OLAP Story
OLAP stands for Online Analytical Processing. It is a terrible name to describe multidimensional analytical databases. Instead of a more meaningful name, like FASMI (Fast Analysis of Shared Multidimensional Information) proposed by Nigel Pendse of “OLAP Report” we seem to be stuck with OLAP. With great pain, I will continue to discuss analytical databases using the OLAP terminology.
Basically, an OLAP (*sigh*) database has certain characteristics that set it apart from relational databases. And, coincidentally, Pendse included all characteristics in the FASMI acronym. Check out the OLAP Report for more details. If you’re just getting into business analytics, or are in the process of evaluating analytical database software – you must visit this site!
The Smartness Factor
I like to think of the relational database as “dumb” and the analytic database as “smart”. Let me explain:
The primary focus of Analytic servers is to get the (often pre-calculated) data out of the database as quickly as possible, allowing the user to zoom in and out along different hierarchies. Contrast this with the Codd’s relational model which seeks to eliminate data anomalies at transaction time through normalization: data retrieval is slow and often complex. With this in mind, you can get a feel for how a database engine can store, catalog, and retrieve data differently. (In fact, this is what makes dimensional modeling in general so favorable for querying and analytics – it is not bound by the restrictiveness imposed by normalization. I’d also like to note that as with dimensional models, OLAP databases are multidimensional.)
While the relational database plays a very important role, so does the multidimensional (OLAP) database. It’s simply built differently to service a different need.
Primarily, analytic servers can manage aggregates, aggregate navigation, and complex calculations and summaries across hierarchies at blazing speeds. These skills were borrowed from both navigational database designs (think of the DOM or a Wiki) and hierarchal designs (trees with leaves and branches). A lot of this has to do with how the data is stored.
Analytics servers offer a different type of storage. MOLAP, or Multidimensional OLAP, can be much more efficient than relational engines. Some tools (like SQL Server Analysis Services) allow you to store your analytical databases in a relational way (ROLAP) or using a hybrid approach (HOLAP). Personally, I see no benefit at all with ROLAP, aside from real-time systems where using the relational database is a must because of the latency involved with updating the OLAP database. (Not to digress too much here, but even real-time environments can use MOLAP storage by using partitions and caches correctly.)
MOLAP is more natural and faster. This may differ depending on your tool choice, but I beg someone to tell me differently. For a detailed discussion on how MOLAP and ROLAP engines store data, you can try this resource “Analysis Services 2005 Performance Guide” (which defines the way Analysis Services does it).
HOLAP storage is an interesting option and could actually perform well –even better than MOLAP in some instances — with less disk usage. But a lot of this depends on how many aggregates are defined and how often the system needs to query the relational database. It might be a good consideration, but if you’re not sure and you have plenty of disk space (disk space is cheap), then go with MOLAP.
The smartness factor2 (the MDX language)
I like to think of SQL as “dumb” and MDX as “smart”. Let me explain:
The MultiDimensional eXpression (MDX) language was created about 10 years ago by Microsoft to work specifically with multidimensional data stored in analytical servers. This OLAP-specific language improves upon the SQL syntax, removing much of the bulkiness associated with the language. MDX is an elegant and highly relevant partner to the analytical database. While you could get out what you need using SQL, most every SQL statement will be a challenging one. It’s like cutting down a tree with a hand saw as opposed to using a chain saw. MDX is painless and intuitive. Concepts such as dimensions, hierarchies, and sets are built into the syntax. For more on MDX and the mdXML standard, please visit XML For Analysis.
Central to the MDX language is the cube concept, which deserves a proper introduction:
OLAP data is stored in structures called cubes. As you know, a cube is a 3-dimensional solid that, given a point on three of its faces which form a vertex, can take you to a precise point somewhere within the cube itself. This point represents the aggregate metric you want to view. The faces represent the different dimensions (like Product, Customer, and Time) that are used to find the point. The dimensions are further broken down by its hierarchies on each face.
Cubes are sometimes — and more aptly — called hypercubes (or a tesseract or “n-cube”) because analysis often makes it necessary to examine more than three points. As soon as you look at more than three faces, you need higher dimensionality.
I admit, when I first heard about cubes I was a bit intimidated. I felt that they were too complex and advanced. Now, working with cubes is natural. Notice though that the concept of a cube is almost the same thing as a star schema. The only difference (at least that I can really think of) is that a star schema generally stores atomic data, and barring any usable aggregate fact table, calculations need to be done on the fly. The cube theoretically stores the results of these calculations inside the cube.
If you read my blog, then you know I advocate dimensional models for proper data warehousing. If you also advocate and use dimensional models, then the concepts discussed so far will fit you well.
What’s great about using Dimensional Models in your relational database is that your OLAP database will almost mirror the design, making the development and deployment of your cubes so much easier. Not only that, but the cubes will likely load faster and be easier to maintain when built off of the dimensional model. Note you can still achieve most of what you can do without an analytics server, just by using a properly constructed set of business process dimensional models. The OLAP database is just smarter, making several tasks (navigating hierarchies, security, and calculations to name a few) easier.