Posts Tagged MDM

Scoping Data Warehouse Initiatives

focus Scoping Data Warehouse InitiativesData warehousing is a complex operation. From start to finish (if there is a finish), project teams are faced with many challenges. In all phases of the lifecycle, there are opportunities for derailment. The best way to mitigate potential issues and stay on time and within budget is to carefully define and manage scope. Managing scope can be an ongoing struggle (especially if requirements are not clearly defined or justified). While this is really a PM101-type of topic, I feel there are some fine points in a DW/BI environment that are not mentioned enough.

Consider the following:

Programs verses projects

I won’t get into a deep PM discussion here, but it is important to point out that data warehousing (or business intelligence, master data management, etc.) initiatives should be thought of as programs and not projects. This mindset will help in scoping.

A program (which might also be called a “project portfolio” in some circles) is basically just a set of related projects. With a program, the emphasis is on organizing, prioritizing, and allocating resources to the right projects. Program scope is more strategic, and answers long-term questions about what type of value the organization hopes to achieve from the initiative.

A project, on the other hand, is much more specific — with a set number of deliverables and goals that have a high immediate impact. The scope at the project level is therefore more tactical in nature: high impact, fast delivery. Be aware that some projects may never be given the green light (for example, if there is a low business impact or if there is a low feasibility rating because of data source or data quality complications).

What I find odd is that organizations still choose to tackle immense data warehousing initiatives in one or two shots, trying to deliver everything at once over a period of 18 or more months. This is the wrong approach (here’s why). Break this large initiative into individual projects and try to deliver functionality every 6 to 8 weeks.

The business process

The best way to break down data warehousing programs into high-impact projects is along business process lines. A business process, as defined here, is:

The complete response that a business makes to an event. A business process entails the execution of a sequence of one or more process steps. It has a clearly defined deliverable or outcome. A Business Process is defined by the business event that triggers the process, the inputs and outputs, all the operational steps required to produce the output, the sequential relationship between the process steps, the business decisions that are part of the event response, and the flow of material and/or information between process steps.

Some example of the above: inventory tracking, Internet sales, retail sales, marketing, tax assessment, tax collection, pitching, batting.

In any data warehousing environment, you can expect to have several business processes to model. Each business process you tackle will have elements touching upon different aspects of the data warehouse, including infrastructure, middleware, data modeling, ETL, business logic development, presentation elements, and so on. If you scope each project to the business process, you can deliver complete solutions in the shortest amount of time. (It should be obvious that the very first business process you implement will take the longest, as the team works out the core infrastructure. Most of this infrastructure will be reused by other business processes.)

Avoid scoping to a data source

Do not fall into the trap of scoping to a data source. Scoping to a data source is almost guaranteed to deliver mediocre outcomes. These projects typically include many unfinished or inadequate business processes all delivered at once some time in the distant future and long after the excitement over the initiative has subsided.

While it is true that only one or two data sources might exist in some organizations, it is not true that inventory, customers, sales, procurement, shipping, and other business processes need to be taken on at once. Create a single project for each business process, prioritize based on impact and feasibility, and then badabing badaboom, you deliver. Next.

Along the same lines, do not adjust your scope if the data source is unavailable, uncooperative, or lacking in quality. Instead, bring the fight to the data source (here is where a good, preferable C-Leveled, business sponsor can come in handy) and set things right. This is obviously a project risk, and also an organizational risk. If you are having problems extracting inventory data then maybe its time to put down your data warehousing gloves and get a new inventory system.

Last thoughts

Scoping the data warehouse is a difficult problem. Troubles start early on with the initial idea, it moves on through requirement gathering, and finally into the development phase of the lifecycle. There is not a lot of good advice in this area for data warehousing (if you happen to know of a good source, please send me a link or title). But I do find that if you work towards business processes, think in terms of programs and projects, and avoid the data source trap, scoping decisions will settle into the real needs of the business.

Tags: , , , , , , , , ,

1 Comment

MDM is a Capability, Not a Product

I had bookmarked, and finally just read, an article by Loraine Lawson of IT Business Edge titled “Consultant: Master Data Management Can Pay off During M&As” which referred to this blog post from Evan Levy, “MDM and M&A“.

MDM is an interesting topic, and one that has a lot of relevance in my work environment. M&As are also interesting and can have a huge impact on a great many people. But while reading these articles, I was reminded of an important MDM axiom.

Even writes:

MDM provides a company the capability to link the data content from disparate systems within and across companies.

Remember that MDM is a capability and not a technology. You cannot buy MDM, but you can build a MDM strategy. This strategy will likely cross several technologies and platforms. It may consist of data warehousing elements, SOA, and SaaS applications. It will surely consist of certain disciplines such as data governance, data quality management, and data integration.

Vendors will continue to push their MDM solutions, but be careful not to trap yourself into thinking that the job is done once you’ve installed. Vendors can wrap most technologies necessary for MDM into a single package, but they cannot provide you with a strategy or the personnel to make it work for your organization.

MDM is a capability you create, and not a product you can buy.

Tags: , , , ,

2 Comments

ETL Subsystem 21: Data Propagation Manager

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

It will often become necessary for you to move data from the data warehouse to other applications, databases, and servers. This is a little different then scaling out your data warehouse to support multiple servers (an integration server, relational database server, reporting server(s), analytic server(s), etc.); typically, distributing data in this way is done via replication techniques. But this subsystem addresses another important need.

I usually refer to this need as “Data Provisioning“. Provisioning is a term borrowed from the telecommunications industry meant to describe the process of providing various telecom services to a customer. A data provision is simply a service that provides data to some client. The client can be an Excel sheet, XML file, application, operational system, or perhaps a master (MDM) database.

This subsystem is not the only place that you’ll be provisioning data to various client applications. BI Applications often allow various extracts and downloads that can be made available to the same list of clients. This usually comes by way of a downloadable report or extract. However, the data propagation subsystem provides a few additional benefits above what BI Applications can usually deliver. First, as an ETL process, you can utilize the full power of the ETL system to perform additional transformations on the data before delivery. Second, propagating data can be part of the ETL pipeline so that provisions can be made available as soon as the dimensional models are loaded (highly desirable if the provision is meant to update an operational or master data system).

Consider though that as an ETL function, these data provisions are best suited for well-defined, almost-never-changing requests for data. You would not use this subsystem to provide data in an ad hoc way to an analyst! You would, however, use this subsystem to provide a daily extract to your risk department for compliance reasons.

SQL Server 2005 Integration Services (SSIS)

The only challenge that you’re bound to face is providing the data in the various formats required by the client. I’ve found that Excel is particularly tricky to work with and as a consequence, I usually push my clients to accept CSV or Tab delimited files instead!

Hand Coding with Visual FoxPro (VFP9)

You’ll have no problems propagating data with FoxPro. In many cases, you can use SQL to get the data you need and simply write it to a table or file on disk. The COPY TO command supports different formats and it isn’t difficult to create outputs in other formats such as XML. Using the same techniques you would employ to connect to various servers (using ODBC drivers for example), you can push data into almost any environment. This is a strength of FoxPro that sets it apart from other ETL scripting languages.

From Here

There’s no real magic to propagating data. It’s an ETL function with the typical extract, transform, and load operations. You can treat it as part of your current ETL system; perhaps running provisions after your facts are loaded. Alternatively, provisions can be run on a schedule independent of normal ETL processing or can be run on demand as needed.

This post wraps up the “Data Delivery” function in the ETL environment. In my next post, I’ll move on to “ETL Management” functions including scheduling, backup, security and metadata.

Tags: , , , , , ,

1 Comment

Business Processes and the Integrated Enterprise

It’s time to think about business processes.

In a recent post, I defined a business process as “the complete response that a business makes to an event”. Because this is such an important topic for data warehousing, I thought I’d share some additional thoughts.

IntegrationBusiness processes include such activities as accounts receivable, orders, sales, and inventory management. Each process has a specific event (or goal) that defines the process and in many cases allows us to gauge the health of that process. For example, an order is an event within the orders business process. Inventory movement is an event within the inventory management process. And so on.

For a few years now, there has been a significant push — mainly by service oriented (SOA) and data warehousing architects — to get businesses to think more about business processes and not about departments, applications, and technologies. Traditionally, most organizations have structured IT around specific software purchases and departmental needs. Integrating these disparate systems later becomes a significant challenge for business intelligence, performance management, and master data initiatives.

James Gibson, in his research piece “A Research Strategy for Investigating Business Process Management Approaches”, wrote that it’s time to start thinking about process and process processing rather than data and data processing (I had to read that more than once too!). The key is that the business process — which is tied to a specific event — is a driver that can lead all other initiatives along. Actionable insights (typically what you hope to derive from your Business Intelligence and Performance Management initiatives) are only useful if they’re tied to a process that can be improved.

Thinking more about business processes, and developing architectures to support them, leads to a more integrated enterprise

Data Warehousing with dimensional modeling is solely focused on the business process. In fact, you cannot develop a true dimensional model without modeling it around some business event. And it should be clear that a single business event can span multiple source systems and departments. The dimensional model pulls all this together.

On the transactional and operational side of the fence, SOA is the right approach to take. Essentially, SOA provides a standard way to access myriad resources across a network through RPC, Web Services, and APIs (among other techniques). One application can communicate with another in real-time.

Developing an SOA and a Data Warehouse one-process-at-a-time is smart. I will talk more about this in a future posting, but the idea is simple: start with a single business process that will make the most impact and is most feasible. Then, in an iterative way, expand into additional processes. This allows development to quickly turn over key functionality while leaving room to resolve business process volatility issues and political ramblings. If you are lucky enough to be starting both data warehousing and SOA programs simultaneously, it makes most sense for the same business process to be the subject for both!

Master Data Management is about data governance and forms a core part of the integrated enterprise. Through SOA, applications can access master copies of shared entities, such as Customer and Product. Master data might be derived partly from a data warehouse using ETL and partly by operational applications in a transactional environment through SOA. When it is time to embark on an MDM initiative, it makes a lot of sense to start thinking about business processes, conformed dimensions, and how to maintain this critical data.

So imagine for a moment an enterprise with dozens of departments all using different tools and software solutions to manage their day-to-day operations. Through SOA, these applications can all talk to each other so that when a customer checks on an order, the clerk can also see who took the order, where the product currently is in transit, the customer’s order history and much more. The data is not integrated, but the processes are. At the end of the day, when the regional salespeople need their numbers, the data warehouse — which has integrated data arranged around various business events — provides the results quickly giving all subscribers a complete and integrated view of all relevant business processes.

Adopt architectures that facilitate business’ natural orientation towards the business process. Business Intelligence, Performance Management, Business Process Reengineering, and Master Data Management initiatives will benefit tremendously. I’ve been saddled by the department-oriented mentality by business for too long. Better IT/Business alignment in this area will create more opportunities for defining clear business processes which in turn will lead to a better integrated organization.

Tags: , ,

1 Comment

A Data Warehouser’s Vocabulary (Part 1)

Partly inspired by a post entitled “The most important thing I know about Analytics is that no-one agrees what it means” by James Taylor and partly inspired by the section “Slowly Changing Vocabulary” in the book “Data Warehouse Lifecycle Toolkit 2nd Edition“, I have decided to compile a glossary of terms and concepts that I feel have some relevance to the data warehousing and business intelligence world. I’ll break this list into several postings, and I reserve the right to refine, enhance, clarify, and augment a definition at any time! When finished, I’ll make them a permanent feature of TmF.

With this list, I am not attempting to resolve any debates, nor am I attempting to invalidate or discredit a definition you may be using. These are the definitions I use. Also be aware that certain terms might hold different meanings under different contexts. If I need to use one of those ambiguous terms, I try my best to put a good context around it. For example, when I refer to “Data Mart”, I specifically mean “Atomic Business Process Dimensional Model”. However, there are times when what I mean is to describe a separate (perhaps normalized) database for a specific user or department (i.e. a throw-away sandbox for the big kids).

Each of these definitions has a citation; I am using the XHTML “cite” tag with each. If you would like to see the source, view the source! Also, when I finish this list, and put these all together on a single page, I’ll be sure to include a reference link section as well.

So, without further ado, I give you the first group of many to come (A-Z):

Business Intelligence (BI)
A generic term to describe leveraging the organizations’ internal and external information assets for making better business decisions.
Business Process
The complete response that a business makes to an event. A business process entails the execution of a sequence of one or more process steps. It has a clearly defined deliverable or outcome. A Business Process is defined by the business event that triggers the process, the inputs and outputs, all the operational steps required to produce the output, the sequential relationship between the process steps, the business decisions that are part of the event response, and the flow of material and/or information between process steps.
Changed Data Capture (CDC)
Changed Data Capture (CDC) is a method of identifying changes made to a source database or file for the purposes of integrating the data into the data warehousing pipeline. CDC reduces data volume and processing needed for the data warehouse.
Data Mart
A business process dimensional model.
Data Profiling
Data profiling is a method of assessing source data in a systematic and analytical way. The goal of data profiling is to build an exhaustive inventory detailing the content, context, and quality of source data. It entails much more than reviewing a diagram or running a few SQL statements. Data profiling leads to better data integration, which leads to better data quality.
Data Quality
Assurances that the integrated data is consistent, complete, and fit to publish to the business community.
Data Warehouse Database
The largest possible union of queryable presentation data in a DW/BI System.
ETL
A set of processes that prepare source data for a Data Warehouse, adding value and confidence along the way. These processes include extraction, transformations (cleans & conform), and load operations. Note that the order in which ETL processes occur can be varied based on the situation. Some sources refer to the ET or just the E broadly as “Data Acquisition”.
Master Data Management (MDM)
Centralized facilities designed to hold master copies of shared entities, such as Customer and Product.
Metadata
All the information that defines and describes the structures, operations, and contents of a BI/DW system.
Operational Data Store (ODS)
A physical set of tables sitting between the operational systems and the data warehouse, or a specially administered hot partition of the data warehouse itself. The main purpose of an ODS is to provide immediate reporting of operational results if neither the operational system or the data warehouse can provide satisfactory access.
Staging
Physical workspace for data during the ETL process. Some data is temporarily staged, while other data may persist.

Tags: , , , ,

1 Comment

Formula 409: Private Companies Must Comply with SOX

I’ve been doing a lot of research on Sarbanes-Oxley (SOX) compliance lately in part because I am now working in the financial industry and in part because I am preparing an article on the topic for Advisor Media.

SOX compliance is both complex and vague. There is no official compliance checklist, only various guidelines and advice from agencies, accountants, and vendors. Businesses are left to implement control frameworks, introduce new segregation of powers, add auditing and logging to existing systems, and rely on the advice and expertise of consultants and vendors who promise to deliver various solutions.

And if there is a misstep, the CEO could go to jail.

Section 409

One area I don’t hear a lot of discussion about from the IT world is the implications of Section 409. Not to say that there is no discussion, but that the vast majority of IT articles on SOX compliance focus on Sections 302 and 404. The reality is that Section 409 doesn’t easily translate to any specific IT implementation or control structure.

But it certainly has significant implications for a public company’s IT/R&D department. Here is the text of the Sarbanes-Oxley Act, Section 409:

Section 13 of the Securities Exchange Act of 1934 (15 U.S.C. 78m), as amended by this Act, is amended by adding at the end the following:

“(l) REAL TIME ISSUER DISCLOSURES. - Each issuer reporting under section 13(a) or 15(d) shall disclose to the public on a rapid and current basis such additional information concerning material changes in the financial condition or operations of the issuer, in plain English, which may include trend and qualitative information and graphic presentations, as the Commission determines, by rule, is necessary or useful for the protection of investors and in the public interest.”.

Basically, a public company must disclose material change events that would impact their financial condition or operations. And Big Brother wants pictures!

As an investor, this is great news; for the sake of innovation though, not so much.

Material changes

What is a material change? No clue. Well, I do have some clue, but there is no official definition of a material change in relation to Section 409 compliance. The only requirement seems to be that it is any change that impacts a company’s finances or operations. I suppose outsourcing a project to IBM, laying off a few dozen employees, or significantly cutting supplier costs all apply. Any change in an organization that could change profitability is a candidate. This includes a failed research and development project.

Yes, a failed R&D project.

Innovation takes a hit

The prospect of reporting failure likely makes CEOs a bit weak in the knees. Competitors will sniff the SOX box to find out what their rivals are doing — or not doing, for that matter. This in turn will force public companies to think twice about taking R&D risks. If you like innovation and continuous improvement, this doesn’t bode well.

As a result (directly or indirectly), we’ve seen a flurry of big-time acquisitions. Instead of developing new technologies in-house, companies are more inclined than ever to acquire them from smaller companies. To restate: the prospect of a failed innovative R&D project is forcing large companies to purchase private companies with proven ideas and technologies.

One of many examples

Take Microsoft’s acquisition of Stratature, an MDM vendor, last year. Stratature was recognized as the fastest growing private company in the Southeast in both 2005 and 2006. Microsoft bought them in 2007. Certainly Microsoft could have developed their own MDM solution. Right?

It is my feeling that the purchase had to do in part with Section 409. Microsoft could have started R&D on their own MDM solution. But MDM is complex and evolving. There is no one clear solution. If Microsoft embarked on this path, there would have been a chance they would have failed. Stratature was already a big success. The price was high, but worth it.

Opportunities for the rest of us

It is clear that Section 409 presents an interesting opportunity to small, private companies. If you invent an idea and grow and market it, it is more likely today than ever before that a larger company would seek to acquire you. Larger companies don’t want to take the risk of exposing themselves (and their failed project initiatives) under the “material event” clause of SOX. Besides, larger companies buy up smaller companies anyway: it is good business and often fits their strategic interests. Section 409 merely gives them an additional reason to do so.

Therefore, SOX compliance for all

Now you have a great product, and you have some interest from a larger public company looking to acquire you. But you have no internal control structures in place, no financial audit trail, and your IT department has broad access to all of your data. Because of this, the purchasing company will need to do a lot of work getting your business in shape for public life.

Not only that, but partnering with a public company may force you into compliance as well.

Lastly, your valuation will be higher if you comply with SOX (check out the Aberdeen Group’s “SOX Compliance and Automation: A Benchmark Report”, which can be downloaded from the Compliance Library at ultimate Software). Private companies who comply with SOX — especially sections 302 and 404 — operate better, are trusted, and are more attractive to potential buyers.

Unless you have no plans of being acquired or partnering with a public company, then it seems foolish not to start the process of meeting the requirements of SOX: Especially if you are an innovative company doing one or more progressive research projects.

Tags: , , , ,

3 Comments

What is MDM Anyway?

What is master data and what is the master data domain? What does it cover? Where is the business value in MDM? Is MDM a data warehousing function? How can business users be sold on the MDM investment?

While the industry dukes it out over answering these questions (you can’t get two experts or vendors to give you the same answer to any of those questions), I thought I’d share my thoughts. I admit, that my MDM experience to this point is minimal, having only been involved in one official MDM project (and only at the planning and data modeling level). But nevertheless, I do have some of my own ideas.

The bottom line, though, is that MDM is important and should not be overlooked when planning for new business intelligence or enterprise integration initiatives.

What MDM is

Master Data Management, an information activity, is the process of ensuring that an organization’s data (including its metadata) is consistent, reliable, accessible, distinctive and well defined. This master data, once it meets these requirements, can then be used by the enterprise as a system of record for key business entities and attributes.

There are different types of MDM: Analytical (A-MDM), Operational (O-MDM), and Enterprise (E-MDM). I see it like this: A-MDM is related to data warehousing and would therefore be implemented solely as a compliment to (or a reaction from) a data warehouse/business intelligence project (I don’t agree that simply having conformed dimensions constitutes “having” A-MDM, though). O-MDM is about collecting, managing, and redistributing master data to be used by operational systems (which is largely an effort in synchronization). E-MDM, is the Holy Grail, and would be a combination of both O-MDM and A-MDM. E-MDM reminds me of Enterprise Data Modeling (EDM), in that you truly need a 360-degree view of the business to make it work (READ: get business and IT together for tea).

Data governance and stewardship, as well as data quality management, data integration, and service-oriented architecture (SOA) are all functions and processes related in some way to Master Data.

What MDM is Not

MDM is not a technology. It is a business function. You are either managing master data, or you are not. It is unlikely that any single approach or software technology will solve the Master Data problem. It drives me nuts to hear about how vendors sell MDM (and BI, for that matter), but I digress…

The work of developing and maintaining master data is not a data warehousing function. To restate: Data warehouses are not developed to create master data (even A-MDM). Some will disagree with me on this. In my opinion/observation, MDM is a separate, highly important activity that works in conjunction with the data warehouse and all other applications and processes that exist in the enterprise.

Other Thoughts

I think that a common mistake made with regard to planning for, and implementing MDM, is that there is not enough emphasis on the data quality gains that will be realized. In addition, the data integration process for data warehousing projects will be infinitely easier, as the complex work of conforming dimensions and attributes will already be complete (at least in definition, with plenty of good metadata to use as a reference).

Master data, in many ways, provides a new way of looking at data as an asset with tangible, strategic value.

I’ll post more on this topic as time rolls on. As always, thoughts, questions, and criticisms welcome!

Tags: , , ,

2 Comments