I had the pleasure of speaking with Andrew MacNeill of The FoxShow last week for an interview. We discussed database normalization and denormalization in relational databases. I’d like to thank Andrew for the opportunity to discuss this very important database design topic. He contacted me after a listener asked for a discussion on normalization, and having seen my recent posting “The Role of Good Database Design and Normalization for Data Integration Projects“, Andrew felt that I would be up to the challenge.
You can listen to the Interview here: FoxShow #47: Data Normalization with FoxPro and Tod McKenna.
I promised during the interview to present the 4th and 5th normal forms here on Tod means Fox because trying to explain these forms without pictures and examples is very difficult. I would also like to confess (shame on me!) that I didn’t take the time ahead of time to prepare solid examples for these forms, so my explanation of these forms in the interview was a bit lacking. I hope to clarify what I said here.
4th Normal Form
In my previous post, I defined the 4th Normal Form as a form that seeks to “remove all multi-valued dependencies”. So what exactly does this mean? A multi-valued attribute is an attribute that helps to describe the entity. In order to represent this in a table, the attribute could be repeated several times. Consider this real-world scenario:

The town, Region, and (Congressional) District together form the candidate key. In this example, Providence has a split congressional district and exists in multiple regions. Lincoln also exists in multiple regions but only has a single district. The above table meets Boyce-Codd Normal Form because all three attributes are part of the candidate key (read more about BCNF here).
This type of table is called a ‘cross-product’, where data is repeated. It is not in 4th normal form because of this repeating data scenario. To fix, you would create two separate tables:

In 4th normal form, you reduce the need for inserts, updates and deletes and therefore you eliminate the possibility for data anomalies. It is reasonable to strive for 4th normal form as a baseline in your designs. In some cases, your database will take up less space. The cost for normalizing this far is only in query performance, as you’ll need additional joins to report on the data. But in my opinion, data integrity (especially if you plan on re-purposing your data for data warehousing and data integration projects) should always take precedence in OLTP systems. With that said, the 5th Normal Form goes even further:
5th Normal Form
The 5th Normal Form is a “projection-based normalization approach where every nontrivial join dependency is implied by a candidate key”. In the previous example, I showed how one table became two, reducing the risk of data anomalies. 5th normal form could split a single table into three or more.
Let’s say you have a table with store, supplier, and products. The table contains a row for each store, which is repeated for each supplier the store uses, and again for each product that the supplier provides to the particular store (A store might not sell all the products that a supplier provides). All fields in this table are valuable and necessary to represent the row and is in 4th Normal Form. In fact, this is a common type of table that could represent a many-to-many-to-many relationship among stores, products, and suppliers. But, notice that there is repeating data:

In 5NF, this table would become 3: a store_to_prod table, a supp_to_prod table, and finally a store_to_supp.

When a store picks up a new supplier, a new row is added to the store_to_supp table, eliminating the redundancy of extra rows for product. When a store sells a new product, the supp_to_prod table will let you know where to get it. A simple example, yes — but it demonstrates 5th normal form.
As you can see, you increase the amount of relations and tables as you normalize deeper and deeper.
Denormalizing
As discussed in the interview, at a certain point it is logical to push back and denormalize. You would do this for performance and manageability purposes, or because the design of the database dictates it. In the previous example, it might not be practical from a development standpoint to bring all tables to 5th form. However, you should weigh the decision to denormalize carefully. Consider the purpose of the database. If it is a transaction-based database with many data access points, you will want to normalize as much as possible to ensure the integrity of the data. If the primary purpose of the database is for ad-hoc queries, then backing off might be well received.
As I said in my previous post, normalization is an art form. I find it difficult to explain at times, but relatively easy to implement. I hope that this post is helpful in some way, and if I have made any errors, or if you don’t agree with any part of this discussion please feel free to contact me!