Posts Tagged Quality

Chaos Theory and the Data Warehouse

Have you ever considered the Data Warehouse as a chaotic system? The work of the Data Warehouse team is never complete: new requirements trickle in every day, and user feedback gets more and more sophisticated as time passes. Chaos Theory can help explain this, and in the end, offer us some insight into how we can better plan Data Warehouse development, deployment, and maintenance.

butterfly effect 150x150 Chaos Theory and the Data WarehouseThe Data Warehouse is a process which forms the center of an information supply supply chain, with several inputs and several outputs. Each input and each output is subject to change based on factors such as vendor upgrades, new interfaces, expanded interfaces, and perhaps most importantly end-user (client) evolution. All of these changes happen continuously. As people use the Data Warehouse, they become more inquisitive. They want their output and analysis rolled up or down in different ways. Predicting (i.e. planning) for Data Warehouse change can be as difficult as predicting (and therefore planning for) the weather. This environment of ever-changing needs fits neatly into the confines of Chaos Theory. But what is chaos in this context? What is Chaos Theory exactly?

From the book “Chaos Theory Tamed”, author Garnett P. Williams writes:

Chaos is sustained and disorderly-looking long-term evolution that satisfies certain mathematical criteria and that occurs in a deterministic non-linear system. Chaos theory is the principles and mathematical operations underlining chaos. (pg 9)

Meteorologist Edward Lorenz in the 1960s determined that even the tiniest differences in an initial measurement can have a huge impact on an outcome. In other words, as his butterfly effect posits, a butterfly flapping its wings in Africa can affect weather patterns in North America. Weather is a system which has a highly sensitive dependence on its initial inputs.

The foundation of the Data Warehouse is only as stable as how you control for the tiniest changes to the inputs into the information structure. As weather, it too has a highly sensitive dependence on inputs. One tiny change to a source system can have almost catastrophic effects on the Data Warehouse.

Finding Order

However, despite the chaos, we should be able to find some order. This is what Lorenz and scientists after him tried to do. The first step in this process is understanding that even seemingly random changes are not always as random as they seem. If we can understand that changes to our Data Warehouse are not random, then we can build a better Data Warehouse.

There are a few things you can do to tame the chaos:

  • Be consistent and systematic. The more predictable you and your Data Warehouse team are, the easier it will be handle change. In other words, control any and all variables that you can.
  • Adopt proven analysis and development methodologies that others have had success with. This is not to say that some level of adaptation to your environment, team skills, and situation are not required, but rather, start off with a good foundation and follow along where it makes sense.
  • Keep the team close. Quality and frequent interaction among the people who make and run the DWH is essential.
  • Stay in the groove like an improvisational jazz band. If your data modelers are not in tune with your decision-support analysts who are not in tune with your DBA, then you can’t expect to handle the challenges of chaos.
  • Feedback and evolution are two very important aspects of Data Warehousing. Keep your ear to the wall and try to anticipate changes before they occur. This takes practice, but (back to the improvisational jazz band analogy) practice makes perfect.
  • Keep in step. In the Data Warehouse world, change is natural and will come in waves. More significantly, if changes cannot be implemented quickly, your clients will lose confidence in your ability to keep up.
  • Think and act quickly. The longer you debate, the longer your client must wait. While they wait, they construct workarounds or look elsewhere. If you’re lucky and they do wait for you, their change may become outdated and no longer relevant; an opportunity might have been missed (and you’ve essentially failed them).
  • Don’t be afraid to be wrong. The consequence of acting quickly is that you might get something wrong. Just be agile enough to respond and deliver new change with urgency.

I’ll post more thoughts on this over the next weeks. I’m particularly interested in how users of the Data Warehouse become more and more sophisticated as they use its tools and applications.

Tags: , , , , , ,

No Comments

The Three Faces of a Good ETLer

Hiring a “data integration expert” or consultant for your next, greatest, data warehousing project? Don’t take it lightly. ETL personnel are critical to the success or failure of your project.

The following are what I deem to be essential technology-related aspects, or faces, of a good ETL developer and/or architect (herein referred to as an ETLer for lack of creativity). While you need to consider business and industry knowledge, personality, and experience in your team-building process, you should start by checking off the following on your interview sheet:

First Face: the technologist

Programming must come natural to an ETLer. Objects, logical constructs, expression construction, program flow, and the like, must be well understood. The truth is that no matter how much your vendor proclaims that their tool does it all, chances are excellent that some hand coding will be required. On top of that, ETL tools work a lot like procedural programs. Technologists are very good at putting their right foot forward, and will generally think of things to make the ETL flow perform better. They also think about logging, auditing, and exception handling; all important.

Second Face: the theorist

But a solid programming background is not enough. Knowledge of Data Integration theory and best practices are equally important. While I believe in and use Kimball’s methodologies for integrating data into a dimensional data warehouse, other methodologies exist that may be more suitable to your business and integration needs. Following a proven methodology, with slight modifications to suit your environment will get you further, faster. Having little or no theory behind what you’re doing gets you somewhere, slower. Identify your methodology, and then find someone who understands it.

Third Face: the specialist

Knowing the ins and outs of your ETL tool (SSIS, OWB, Datastage, Talend Open Studio, etc.) is essential. I would venture to guess that a solid programmer who has a great understanding of ETL theory will be able to get by using most tools with little learning curve. What I worry about (and you should too) are the nuances in the tooling that can stump even the best. These nuances (SSIS, my tool of *ehem* choice — sorry, I needed to clear my throat, has many of these nuances) can cost you many project hours and force rewrites if blocking issues are encountered. Tool knowledge is also essential to know when it is appropriate to forgo the tool because of I/O issues, or because hierarchical data is better handled elsewhere, or because business logic is best not bundled within a data flow.

About Face

While junior members of your data integration team can be one or two-faced (that came out funny), senior members and architects must have more meat on the bone.

I suppose this is why good ETLers are difficult to come by. The ETLer needs to have a healthy mix of programming talent, an approach discipline, and tool knowledge. Trained DBAs and software developers might have a lot to offer, as might a troop of certified tool jocks and method junkies, but to get your project in on time and within budget, don’t settle.

Tags: , , , , ,

No Comments

10 Commandments of Data Integration

  1. You shall compile and document all requirements and mappings; segregate the work by business process. You may have more than one of these business processes, some of which may come before others.
  2. Do not begin without first conducting a thorough data profile; otherwise, you will be punished for your inequities, as will the generations that come after you.
  3. Do not think commandments one or two are in vain, lest you will become overrun by the dead line, scope creepers, and a great exodus of people from your tribe; if this happens to you, do not swear or curse, for you have been warned.
  4. Remember that latency and timeliness are equal in importance to non-volatility and having a traceable lineage; a staging area may lead you to this promised land.
  5. Honor the rules of data conformance.
  6. Do not kill dirty data: you shall clean them, or take them back to their sources for retribution.
  7. Do not commit the worst data integration transgression of all and ignore data quality, your ignorance will not be forgiven.
  8. Do not be shy about stealing your neighbor’s work, for his trials have led to best practices that you can make equally good use of.
  9. Do not rely solely on business keys; surrogates are your friend and will permit you to engage in slowly changing your dimensions.
  10. You shall covet a proper audit and log system; for on the day of judgment, you will need proof of your compliance.

Tags: , , , , ,

No Comments

MDM is a Capability, Not a Product

I had bookmarked, and finally just read, an article by Loraine Lawson of IT Business Edge titled “Consultant: Master Data Management Can Pay off During M&As” which referred to this blog post from Evan Levy, “MDM and M&A“.

MDM is an interesting topic, and one that has a lot of relevance in my work environment. M&As are also interesting and can have a huge impact on a great many people. But while reading these articles, I was reminded of an important MDM axiom.

Even writes:

MDM provides a company the capability to link the data content from disparate systems within and across companies.

Remember that MDM is a capability and not a technology. You cannot buy MDM, but you can build a MDM strategy. This strategy will likely cross several technologies and platforms. It may consist of data warehousing elements, SOA, and SaaS applications. It will surely consist of certain disciplines such as data governance, data quality management, and data integration.

Vendors will continue to push their MDM solutions, but be careful not to trap yourself into thinking that the job is done once you’ve installed. Vendors can wrap most technologies necessary for MDM into a single package, but they cannot provide you with a strategy or the personnel to make it work for your organization.

MDM is a capability you create, and not a product you can buy.

Tags: , , , ,

2 Comments

Maintaining a DW/BI Environment

In posts Version Control and Version Migration I glossed over one of the more complex and challenging aspects of data warehousing: Once the DW/BI environment is in a production environment, how do you maintain and update it?

Some Thoughts

While I do not have a blueprint for you, I do have a few thoughts on the subject.

First, you must consider the Data Warehouse as a living and breathing organism. Not only will it be growing in size as your carefully constructed ETL packages churn, but it will also likely be growing in scope and importance (if not, then you may have to re-think your DW/BI marketing approach and/or find a new sponsor).

Second, you have to realize from the beginning that deploying a data warehouse is both an iterative and incremental process. Iterative in that you will build and rebuild as you get deeper into the project; incremental in that different parts of the warehouse will be constructed and delivered at varying rates. This is in direct contrast to the normal waterfall approach to releasing software applications and systems. It is not realistic, practical, or advised to attempt to deliver a DW/BI project in one shot. You may as well use your bullet for something else!

Passing the Baton

track baton pic.thumbnail Maintaining a DW/BI Environment Your DW/BI team must plan up front for the often complex handoffs between development and maintenance amid the ever-turning wheel of the DW Lifecycle (detailed here).

Those handoffs are critical. If you’re in a large organization (like me now), then you will be literally handing off the maintenance of the project to a complete different group. This group will need documentation and escalation procedures to monitor and respond to various exceptions that may occur in the production environment. While you can’t plan for every exception, you should prepare enough documentation so that all the basics are covered (for example, what should the maintenance team do if a SQL Agent Job reports a failure?). While some things will fall back on the Dev team, many of the maintenance tasks can be handled by a well trained support group.

If you’re in a small organization (like I used to be in), then you are the maintenance team. You have the added responsibility of maintaining a high-priority production system while you continue to build new pieces for the next release. This can be exhausting and stressful so be sure you automate as much of the maintenance part as you can! And be sure to allocate enough “business as usual” time in your schedule each week. Make sure your manager and/or project leader is cognizant of this added pressure and responsibility.

Back to the Start

Once you’ve passed the baton, you can go back to the beginning of the DW Lifecycle and start over. You may improve some processes (iterative development) while adding on new functionality (incremental development).

data flowchart.thumbnail Maintaining a DW/BI Environment For example, in your team’s first run through the lifecycle, you implemented a single business process dimensional model (retail sales) that allowed you to also produce a series of reports for your sales department. During this first run through, you accomplished quite a bit: You installed and learned how to use the toolsets, you built the dimensional model and ETL architecture, you designed a rudimentary web portal, you created a few useful reports, and you released all the pieces to your users. So far so good.

Now, this first version is out and the business users have some feedback. An issue log and wish list is compiled (perhaps a few new reports, an updated report template, and some user-defined filters for the portal). In addition, you want to also include a new business process, inventory, so you can expand the usage of the date warehouse in your business.

You and your team get back to work. The modeler begins to construct the new dimensional model while the ETL team works on the integration packages. This process takes the longest, and may be interrupted occasionally by the production team reporting some data quality issues or some failed packages.

In a separate thread, your BI application developers are busy with implementing many of the suggestions brought forward by the sales department. When finished, they release version 1.1 of the portal. Immediately, they begin working on some new inventory reports.

This scenario repeats with each cycle for each thread lasting anywhere from 3 weeks to 3 months.

Rosy Picture?

The above scenario really paints a rosy picture of the process. In reality, it doesn’t always work smoothly. But it does work if managed correctly. And it can be quite exciting when each iteration completes and the handover goes according to plan.

I would like to talk more about this subject in future posts. For now I hope my thoughts on the matter have left you with some insight!

 

Tags: , , , , ,

1 Comment

Data Quality (For the sake of the children!)

Here is a somewhat humorous example of how data quality can impact more than just your analysis or reporting. This example is good for me in particular because for so many years I have been dealing with municipal data quality, tax assessment, and billing. It alarms me how little data quality is discussed, and alarms me more when business talks about data quality because they think they need to, yet they don’t fully understand what it takes to implement and maintain good data stewardship. Anyway, perhaps a video like this can at least raise some eyebrows:

Tags: , ,

No Comments