Posts Tagged Deployment

SQL Server Deployment and Performance Tips and Advice

In line with my previous post, I wanted to share with you an instructional video I found some time ago on YouTube. Kevin Kline (author of SQL In A Nutshell) walks through several important topics and challenges in deploying and maintaining good performance for SQL Server. Although he uses SQL Server 2005, most (if not all) of this information applies to SQL Server 2008 as well.

It’s a bit long, but well worth it.

 

Tags: , ,

1 Comment

Maintaining a DW/BI Environment

In posts Version Control and Version Migration I glossed over one of the more complex and challenging aspects of data warehousing: Once the DW/BI environment is in a production environment, how do you maintain and update it?

Some Thoughts

While I do not have a blueprint for you, I do have a few thoughts on the subject.

First, you must consider the Data Warehouse as a living and breathing organism. Not only will it be growing in size as your carefully constructed ETL packages churn, but it will also likely be growing in scope and importance (if not, then you may have to re-think your DW/BI marketing approach and/or find a new sponsor).

Second, you have to realize from the beginning that deploying a data warehouse is both an iterative and incremental process. Iterative in that you will build and rebuild as you get deeper into the project; incremental in that different parts of the warehouse will be constructed and delivered at varying rates. This is in direct contrast to the normal waterfall approach to releasing software applications and systems. It is not realistic, practical, or advised to attempt to deliver a DW/BI project in one shot. You may as well use your bullet for something else!

Passing the Baton

Pass the BatonYour DW/BI team must plan up front for the often complex handoffs between development and maintenance amid the ever-turning wheel of the DW Lifecycle (detailed here).

Those handoffs are critical. If you’re in a large organization (like me now), then you will be literally handing off the maintenance of the project to a complete different group. This group will need documentation and escalation procedures to monitor and respond to various exceptions that may occur in the production environment. While you can’t plan for every exception, you should prepare enough documentation so that all the basics are covered (for example, what should the maintenance team do if a SQL Agent Job reports a failure?). While some things will fall back on the Dev team, many of the maintenance tasks can be handled by a well trained support group.

If you’re in a small organization (like I used to be in), then you are the maintenance team. You have the added responsibility of maintaining a high-priority production system while you continue to build new pieces for the next release. This can be exhausting and stressful so be sure you automate as much of the maintenance part as you can! And be sure to allocate enough “business as usual” time in your schedule each week. Make sure your manager and/or project leader is cognizant of this added pressure and responsibility.

Back to the Start

Once you’ve passed the baton, you can go back to the beginning of the DW Lifecycle and start over. You may improve some processes (iterative development) while adding on new functionality (incremental development).

DW LifecycleFor example, in your team’s first run through the lifecycle, you implemented a single business process dimensional model (retail sales) that allowed you to also produce a series of reports for your sales department. During this first run through, you accomplished quite a bit: You installed and learned how to use the toolsets, you built the dimensional model and ETL architecture, you designed a rudimentary web portal, you created a few useful reports, and you released all the pieces to your users. So far so good.

Now, this first version is out and the business users have some feedback. An issue log and wish list is compiled (perhaps a few new reports, an updated report template, and some user-defined filters for the portal). In addition, you want to also include a new business process, inventory, so you can expand the usage of the date warehouse in your business.

You and your team get back to work. The modeler begins to construct the new dimensional model while the ETL team works on the integration packages. This process takes the longest, and may be interrupted occasionally by the production team reporting some data quality issues or some failed packages.

In a separate thread, your BI application developers are busy with implementing many of the suggestions brought forward by the sales department. When finished, they release version 1.1 of the portal. Immediately, they begin working on some new inventory reports.

This scenario repeats with each cycle for each thread lasting anywhere from 3 weeks to 3 months.

Rosy Picture?

The above scenario really paints a rosy picture of the process. In reality, it doesn’t always work smoothly. But it does work if managed correctly. And it can be quite exciting when each iteration completes and the handover goes according to plan.

I would like to talk more about this subject in future posts. For now I hope my thoughts on the matter have left you with some insight!

 

Tags: , , , , ,

1 Comment

ETL Subsystem 24: Recovery and Restart System

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

As eluded to in my previous post (Backup System), bad things happen to good people. There are a ton of reasons to justify having a good backup procedure. A lot can go wrong with the hardware and software that make up your system. From network failures to bad service pack installs, a backup system is a must. But the best backup system in the world only gets you so far if you don’t have a system to recover or restart a failed process.

This is an ETL / Data Integration topic, so I’ll leave out discussion on restore capabilities found in backup tools. Of course, there may be times when restoring entire files, intermediate results, and staged data is necessary, but this is a job for your IT department. I’ll focus instead on how the ETL system should handle exceptions and failures.

At its most basic level, a recovery and restart system must allow you to stop, start, and retry your ETL processes.

man-pulling-hair-out-2.jpgSome might argue (and I have been in the room with these debaters) that it is simpler and more manageable to simply restart the jobs from the beginning and not bother with the added complexities of recovery and restart. While this might be true for simple environments or small jobs, it is certainly not true for larger, more critical projects. Restarting, for example, saves you from having to re-acquire data from your sources. This is a huge benefit, as some sources are only available to you during certain load windows. In some cases — like in my environment — some source files are overwritten each day! Other times, your source data might come from an FTP site. Downloading the same (large) data over and over takes time.

Keep in mind that your recovery and restart systems must be smart enough to clean up partial or damaged work. You’ll need the ability to back out of processes that partially finished, or even to rollback committed changes in order to ensure continued quality and integrity of your data warehouse and staging area. This is not an easy task and requires a lot of planning. Make your life easier by planning well ahead of time and try to link your recovery and restart procedures directly to the times when you write data to disk (”staging”).

SQL Server 2005 Integration Services (SSIS)

To start, use transactions in your packages. Transactions are an often underused capability of SSIS. They should be a part of your package designs so that blocks of tasks can be handled together as logical units. Transactions can be defined on the Task and container level, giving you the flexibility to nest and work with transactions in parallel. To get this working, you will need to set TransactionOption on the parent container to “Required”. This will initiate a new transaction if its parent hasn’t already done so. Child Tasks can participate by also setting TransactionOption to “Required” or “Supported”. A common technique you could employ: Use a Sequence Container to group related Tasks together, setting the Sequence Container’s TransactionOption to “Required” and all of its contained Tasks TransactionOption to “Supported”.

But transactions only get you part of the way there, and are not suited for every situation.

SSIS comes with a powerful and also underused feature called “Checkpoints”. Checkpoints allow you to restart failed packages from the point of failure, instead of rerunning the whole package again. The process is rather simple. SSIS will record information about its current execution to a special log file (this log file is overwritten each time the package is started from the beginning). When the package fails, and is subsequently rerun, the package will read the checkpoint log and attempt to pick up where it left off.

What’s particularly handy about SSIS’s implementation is that you can control checkpointing on Tasks and containers using the FailPackageOnFailure property. Set this property to True for each Task you want to participate. Furthermore, checkpointing is smart enough to recognize and handle transactions.

Hand Coding with Visual FoxPro (VFP9)

VFP supports transactions for both database tables and free tables (the latter works in VFP9 if you use the MAKETRANSACTABLE() function). This is nice, but doesn’t allow you — without some creative coding — to wrap transactions in the same way SSIS does with its containers and tasks.

You also don’t get built-in functionality to resume processes in Visual FoxPro after a failure. You might be able to use the RETRY command in combination with an error handler to fix problems as they are occurring, but for the sake of this subsystem, you need a “retry” mechanism that can span multiple sessions (i.e. restarting a process after the executable has already exited). The consequence is that you will need to design your ETL system in such a way that it can pick up where it left off, or go back to a previous step. You will need to:

  • Create “markers” in the workflow of your code (the most natural place for any marker is right after data has been staged to disk, but you could also create intermediate markers as well)
  • Log successful passes through each marker so the system will know where to pick up where it left off
  • Make each marker smart enough to clean up any partial work left over by a previous failed run

The easiest way to implement the third point above is to use transactions in combination with cursors for all temporary work. If a job fails (the lights go out) then your cursors will be lost, but you should be able to restart from the last good marker. Since the cursors didn’t do anything permanent, there is nothing to clean up. This creates a natural and easy-to-manage recovery and restart system.

The bottom line: You’ll need to be creative when implementing a recovery and restart system using VFP (or any other hand coded solution).

From Here

Next, I’ll talk about version control (Subsystem 25) and version migration (Subsystem 26). Both of these subsystems deal with deployment and have tripped up many teams (I’ve been a part of such teams and have hopefully learned from the mistakes).

Tags: , , , , , ,

No Comments

Busy Stretch

Man I’ve been busy! Not only has my posting been limited in the past 3 weeks, but almost all of my extra-curricular activities have been put on hold as well (it feels like months ago I last played my guitar). There are a few things going on currently. Topmost on the list is the rollout of the first real significant upgrade to our DW/BI Project here at ING. When I mull over the “feature” list, it amazes me how our team of a few developers and analysts (all quant’s), and 2 contractors (an SSIS developer and a QA Manager) got it done. What’s more, it looks like we’ll be right on schedule with the release with a few days to spare. Of course, I hope now I haven’t jinxed us!

I joined this project in mid-stride back in January when I was hired as an external consultant. I was coming over fresh from the United States, to a new and exciting land, Belgium (where the waffles are hot and the chocolate is worth dying for) and a new industry, financials. After a year and a half in healthcare at Cardinal Health, I felt I was ready to enter the financial world — particularly investment management. And so I joined the quant team in the quantitative research department at ING Investment Management.

I was hired as a BI/DW consultant (functioning primarily as an Integration Architect) and was immediately charged with supporting the in-place data integration (ETL) processes for the then un-released DW/BI system. I began my work by documenting (some functional, bust mostly technical) the entire system. In a few weeks I had uncovered many issues and offered a long list of improvements to the team. Many items on my list were directly related to many of the ETL Subsystems that I’ve been discussing. Taking these to heart, we began the long process of refactoring, renaming, and redeveloping some of the processes that were in place. The focus was on future maintainability and performance. I’ve chronicled some of these in a few of my recent posts.

In addition to this, I slowly took over the project management duties and have now been managing the development activities for the entire team. By mid-summer, and due in part to several converging circumstances, I was hired as a full time member of the quant team. Now, my stake in this project’s success is even greater!

I expect to be just as busy over the next two weeks as the final touches are added to this release of the project. I’ve also started to compile a list of “lessons learned”, most of which will be relevant-enough to share with you all. I think that some of these might surprise you!

This week I’ll continue my posts on the ETL Subsystems. I’ll also likely start working on another article for Advisor (topic yet unknown). Lastly, planning for the next release of our DW/BI system is already underway, with development starting as early as November!

PS: If you ever find yourself feeling a bit overwhelmed or too engaged in your work, I highly recommend re-reading Walden (ahh, come on — everyone’s read Walden, right?!?). It has always managed to level me, reminding me how important it is to just sit back and relax once in a while.

Tags:

5 Comments

ETL Subsystem 21: Data Propagation Manager

This article is part of a series discussing the Kimball Group’s “34 Subsystems of ETL“. The Subsystems are a group of “Best Practices” for delivering a BI/DW solution. In my articles, I discuss how each Subsystem can be implemented in SSIS or hand coded in Visual FoxPro.

It will often become necessary for you to move data from the data warehouse to other applications, databases, and servers. This is a little different then scaling out your data warehouse to support multiple servers (an integration server, relational database server, reporting server(s), analytic server(s), etc.); typically, distributing data in this way is done via replication techniques. But this subsystem addresses another important need.

I usually refer to this need as “Data Provisioning“. Provisioning is a term borrowed from the telecommunications industry meant to describe the process of providing various telecom services to a customer. A data provision is simply a service that provides data to some client. The client can be an Excel sheet, XML file, application, operational system, or perhaps a master (MDM) database.

This subsystem is not the only place that you’ll be provisioning data to various client applications. BI Applications often allow various extracts and downloads that can be made available to the same list of clients. This usually comes by way of a downloadable report or extract. However, the data propagation subsystem provides a few additional benefits above what BI Applications can usually deliver. First, as an ETL process, you can utilize the full power of the ETL system to perform additional transformations on the data before delivery. Second, propagating data can be part of the ETL pipeline so that provisions can be made available as soon as the dimensional models are loaded (highly desirable if the provision is meant to update an operational or master data system).

Consider though that as an ETL function, these data provisions are best suited for well-defined, almost-never-changing requests for data. You would not use this subsystem to provide data in an ad hoc way to an analyst! You would, however, use this subsystem to provide a daily extract to your risk department for compliance reasons.

SQL Server 2005 Integration Services (SSIS)

The only challenge that you’re bound to face is providing the data in the various formats required by the client. I’ve found that Excel is particularly tricky to work with and as a consequence, I usually push my clients to accept CSV or Tab delimited files instead!

Hand Coding with Visual FoxPro (VFP9)

You’ll have no problems propagating data with FoxPro. In many cases, you can use SQL to get the data you need and simply write it to a table or file on disk. The COPY TO command supports different formats and it isn’t difficult to create outputs in other formats such as XML. Using the same techniques you would employ to connect to various servers (using ODBC drivers for example), you can push data into almost any environment. This is a strength of FoxPro that sets it apart from other ETL scripting languages.

From Here

There’s no real magic to propagating data. It’s an ETL function with the typical extract, transform, and load operations. You can treat it as part of your current ETL system; perhaps running provisions after your facts are loaded. Alternatively, provisions can be run on a schedule independent of normal ETL processing or can be run on demand as needed.

This post wraps up the “Data Delivery” function in the ETL environment. In my next post, I’ll move on to “ETL Management” functions including scheduling, backup, security and metadata.

Tags: , , , , , ,

1 Comment

Getting Busy with MSBuild :: Deploying Heavy Projects

For some time now I have been using MSBuild exclusively for deploying data warehouse applications. Because the deployment process is often time-consuming and can touch many components and elements of a data warehouse, using MSBuild is a smart choice.

Consider an integration project that contains a dozen SSIS (dts) packages, some reports (rdl files), a few C# assemblies, an ASP.NET interface, schema changes, new database functions, new user roles, expanded constraints, and updated stored procedures. Now expand this deployment across development, quality, and production environments. Introduce deltas (incremental updates of varying components) and you can begin to understand why some companies pay two or more people just to deploy applications.

But not here. MSBuild (and other similar build/deploy projects like Apache ANT or GNU MAKE) is an excellent solution to the problem. With some planning up front, and a little… ehem… trial and error, you can simplify the entire deployment process so much it will make your head spin.

MSBuild not only builds .NET projects (like C#.NET and VB.NET), but it can also run any number of additional “tasks”, such as:

  1. Create directories and log files
  2. Do a VSS Get by Label
  3. Compile a FoxPro project (see Rick Borup’s article in Advisor: “Automate Microsoft Visual FoxPro“)
  4. Compare directory contents
  5. Upload SSIS packages
  6. Run test scripts
  7. Execute stored procedures
  8. Etc…

I think you get the idea. There is not a single task MSBuild cannot wrap (at least nothing that I’ve come across). And with its ability to handle dependencies, talk with source control, and run homegrown assemblies, the limit is only limited to your needs and imagination.

I am starting to sound like a commercial, I know.

But the point is simple: The deployment of a large or ‘heavy’ project needs special attention, planning, and resource allocation. Using MSBuild (or ANT, or MAKE, etc.) can alleviate the burden on developers and DBAs and provide consistent and predictable build and deployment scenarios.

I will share some additional insights and lessons learned on using MSBuild over the next few weeks, especially details on how to deploy multi-pronged projects for different environments. I will also try to share some tips on how to handle delta deployment scenarios. In the meantime, if you are new to MSBuild, take a look at the MSBuild Team Blog and the MSBuild Wiki.

As a side note, I have never used MSBuild for deploying VFP applications in the past. This is all about to change…

Tags: , , , ,

7 Comments