ID Lookup Custom Transform (Part 1 of 3)
As promised, I’ll discuss in this post some of the details of my custom transform component I wrote in C# for SSIS. Unfortunately, I cannot publish the component because it contains business logic and company secrets! I will, however, talk about how I went about it and detail some of its finer and not-so-finer points.
This topic will be presented in multiple parts because of its length. In Part 1, I’ll discuss the ProvideComponentProperties method in depth. Part 2 will focus on AcquireConnections and PreExecute methods. Finally, I’ll look at ProcessInput and give some closing thoughts.
Before reading on, please have a look at my previous post to get some background on the problem.
Setup and Constraints
Looking up ID Numbers in our environment is rather complex due to the number of valid search combinations and the disparity and reliability of the source files. Because all source data comes from many different vendors and affiliates, the structure and content is often provided on a “best effort” basis. Therefore, extra care must be taken when accepting a file for import.
The current environment makes use of staging tables, which I will continue to use. Data without our ID is staged, the ID is added in a Data Flow (or if it is not found, it is flagged as unmatched), and then staged again for later processing. This made implementing the component a breeze. Essentially, I deleted every component in the original Data Flow between the Source and Destination(s) and dropped in the new component.
I also had to focus on recreating the logic in C# that existed in the package, and do so without loosing performance. Gaining a degree of maintainability was not good enough: It needed to perform better too. Thankfully, the team before me went through all the hard work of turning the “business” problem of determining an ID into an easily followed data flow in SSIS. Decomposing the pieces and turning them into C# was infinitely easier as a result.
Settin up the ProvideComponentProperties method
The ProvideComponentProperties method is called when the component is dropped on the Data Flow view. Its purpose is to initialize the component’s ComponentMetaData. The ComponentMetaData contains the custom properties, inputs, and outputs of the component.
To set this up, I’ve done the following in the ProvideComponentProperties method:
// although unlikely that this function would be called more // than once, it is a good idea to start out clean RemoveAllInputsOutputsAndCustomProperties() // define specific properties of ComponentMetaData ComponentMetaData.ValidateExternalMetadata = false; ComponentMetaData.UsesDispositions = true; ComponentMetaData.Name = "IDLookup"
ValidateExternalMetadata, when set to false, tells the component not to perform any validations against external sources at design time. UsesDispositions tells the component to include (true) or not to include (false) error output. I also added additional metadata (Description, ContactInfo, etc..) not shown in the snippet above.
// add custom properties IDTSCustomProperty90 propAllow3rdPartyLookup = ComponentMetaData.CustomPropertyCollection.New(); // set properties on the new, um, property propAllow3rdPartyLookup.Name = "Allow3rdPartyLookup"; propAllow3rdPartyLookup.Description = "Also search the external database"; propAllow3rdPartyLookup.ExpressionType = DTSCustomPropertyExpressionType.CPET_NOTIFY;
The above chunk of code adds a new property to my component called “Allow3rdPartyLookup”. This property’s value will be used later on to determine if I need to look up the ID in a different database. The ExpressionType property gives me the ability to allow (CPET_NOTIFY) or disallow (CPET_NONE) expressions for the property’s value. I’m not certain if there are any general rules-of-thumb for setting this property: I always seem to allow expressions and have never disallowed them. This provides more flexibility and allows property settings to be dynamic, both good things.
I have added an additional 6 properties (not shown here) that are used to further customize the behavior of the component.
Finally, I set up the inputs and outputs. The component will accept a single input and direct rows to one of five outputs (including the error output if needed). Here is a sample view showing the input and outputs:
The following code sets up the Input:
IDTSInput90 RuntimeInput = ComponentMetaData.InputCollection.New(); RuntimeInput.Name = "IDLookupInput"; RuntimeInput.ErrorRowDisposition = DTSRowDisposition.RD_FailComponent;
For the Input, the ErrorRowDisposition tells the component how to handle errors. In this case, I want to fail the component.
ID Lookup is synchronous, which means that it processes buffers as it gets them from the upstream component. To put it another way: Its output is processed with its input. An asynchronous component, like a Sort, sends the output at a different time than the input.
Setting up the outputs is rather straightforward. First, create a new IDTSOutput90 object and assign it a name. To be able to feed the input to the output, set the SynchronousInputID of the output to the ID of the input:
// Specify each output IDTSOutput90 RuntimeOutput = ComponentMetaData.OutputCollection.New(); RuntimeOutput.Name = "IDLookupOut"; RuntimeOutput.SynchronousInputID = RuntimeInput.ID; RuntimeOutput.ExclusionGroup = 1; // Add a new column to the Output to store the ID IDTSOutputColumn90 outputColum1 = ComponentMetaData.OutputCollection["IDLookupOut"].OutputColumnCollection.New(); outputColum1.Name = "FoundID"; outputColum1.SetDataTypeProperties(DataType.DT_STR, 12, 0, 0, 1252);
Note: The above is repeated for the remaining 3 outputs: IDAddToMasterData, IDLookupDuplicateOut, and IDLookupNoMatchOut. I’ve not included those here for brevity.
Setting ExclusionGroup to 1 on all non-error Outputs tells the object that I want to send rows — exclusively and at my discretion — to a specific output (or to more than one as you’ll see in Part 3). This is done manually using the DirectRow method in the ProcessInput method.
…
In my next post, I’ll take you through AcquireConnections and PreExecute methods. In the meantime, your questions, comments, or concerns are welcome!
Data Warehousing and Business Intelligence consultant, with expertise in business analysis, data modeling, and data integration. Extensive experience developing vertical and integrated desktop and Internet applications spanning municipal, clinical, and financial industries.
April 23rd, 2008 at 9:02 am
[…] part 1 of this series, I discussed the ProvideComponentProperties method of my custom ID Lookup component […]
April 25th, 2008 at 7:18 am
[…] part 1 of this series, I discussed the ProvideComponentProperties method of my custom ID Lookup component […]