In this, and in the next series of posts, I will be exploring the 34 subsystems of ETL Data Integration (note: there used to be 38) as defined by the Kimball Group. I introduce the subsystems in this post, and then I will discuss how each fits (or does not fit) into both a SQL Server environment and in a hand-coded environment using VFP9.

The subsystem concept is a best-practice initiative formulated by The Kimball Group to help organizations design effective and efficient Data Integration environments for Data Warehousing using the Dimensional Model. As you’ll see, the subsystems are well-defined, battle-tested, and represent a relatively complete picture of what is needed for ETL Integration.

The Kimball Group categorizes the subsystems into 4 distinct groups: Data Extraction, Cleansing and Conforming Tasks, Data Delivery, and Management. Management is an often forgotten aspect of ETL (If the acronym was ETLM, we’d be OK!). You’ll notice that there are quite a few management-specific subsystems.

    Data Extraction

  1. Data Profiling
  2. Change Data Capture (CDC)
  3. Extraction
  4. Cleansing and Conforming Tasks

  5. Data Cleansing Subsystem
  6. Error Event Management
  7. Auditing
  8. Removing Duplicates
  9. Data Conformance
  10. Data Delivery

  11. Slowly Changing Dimensions (SCD)
  12. Surrogate Key Generator
  13. Hierarchy Manager
  14. Special Dimensions Manager
  15. Fact Table Builders
  16. Surrogate Key Management
  17. Bridge Table Builder
  18. Late Arriving Data Handler
  19. Dimension Manager
  20. Fact Table Provider
  21. Aggregate Generation
  22. OLAP Cube Builder
  23. Data Propagation Manager
  24. Management

  25. Scheduler
  26. Backup System
  27. Recovery and Restart
  28. Version Control
  29. Version Migration
  30. Work flow Monitor
  31. Sorting
  32. Data Lineage and Dependency
  33. Problem Escalation
  34. Paralleling and Pipelining
  35. Security
  36. Compliance Manager
  37. Metadata Repository

Data Integration architects have a lot to be thankful for the work and effort put forth by the Kimball Group. As you can probably guess, to do ETL right, you’ll need adequate time and resources. It is no wonder that data integration is by far the most time-consuming aspect of data warehousing.

In subsequent articles I will define each of these systems, provide examples, and discuss how they can be accomplished in SQL Server as well as VFP9. I have a lot of work to do, so stay tuned!