Posts Tagged programming

The Three Faces of a Good ETLer

Hiring a “data integration expert” or consultant for your next, greatest, data warehousing project? Don’t take it lightly. ETL personnel are critical to the success or failure of your project.

The following are what I deem to be essential technology-related aspects, or faces, of a good ETL developer and/or architect (herein referred to as an ETLer for lack of creativity). While you need to consider business and industry knowledge, personality, and experience in your team-building process, you should start by checking off the following on your interview sheet:

First Face: the technologist

Programming must come natural to an ETLer. Objects, logical constructs, expression construction, program flow, and the like, must be well understood. The truth is that no matter how much your vendor proclaims that their tool does it all, chances are excellent that some hand coding will be required. On top of that, ETL tools work a lot like procedural programs. Technologists are very good at putting their right foot forward, and will generally think of things to make the ETL flow perform better. They also think about logging, auditing, and exception handling; all important.

Second Face: the theorist

But a solid programming background is not enough. Knowledge of Data Integration theory and best practices are equally important. While I believe in and use Kimball’s methodologies for integrating data into a dimensional data warehouse, other methodologies exist that may be more suitable to your business and integration needs. Following a proven methodology, with slight modifications to suit your environment will get you further, faster. Having little or no theory behind what you’re doing gets you somewhere, slower. Identify your methodology, and then find someone who understands it.

Third Face: the specialist

Knowing the ins and outs of your ETL tool (SSIS, OWB, Datastage, Talend Open Studio, etc.) is essential. I would venture to guess that a solid programmer who has a great understanding of ETL theory will be able to get by using most tools with little learning curve. What I worry about (and you should too) are the nuances in the tooling that can stump even the best. These nuances (SSIS, my tool of *ehem* choice — sorry, I needed to clear my throat, has many of these nuances) can cost you many project hours and force rewrites if blocking issues are encountered. Tool knowledge is also essential to know when it is appropriate to forgo the tool because of I/O issues, or because hierarchical data is better handled elsewhere, or because business logic is best not bundled within a data flow.

About Face

While junior members of your data integration team can be one or two-faced (that came out funny), senior members and architects must have more meat on the bone.

I suppose this is why good ETLers are difficult to come by. The ETLer needs to have a healthy mix of programming talent, an approach discipline, and tool knowledge. Trained DBAs and software developers might have a lot to offer, as might a troop of certified tool jocks and method junkies, but to get your project in on time and within budget, don’t settle.

Tags: , , , , ,

No Comments

Naming Conventions and the Underscore

I’ve seen and worked with a lot of naming conventions. When I start a new development project, I always — without exception — document how I intend to name the new items I create, whether they are physical objects such as tables and fields in a database, or names used in code for objects, variables, and the like. This document is shared with the team, adjustments are made where necessary, and adopted as standard for the project or group of projects.

The important thing in this effort is consistency, and not the technique we adopt. If all single-part surrogate key fields are to end in “Key”, and all single-part varchar business key fields are to end in “ID”, then the rule must be obeyed by all developers and DBAs.

A major point of contention that seems to crop up every single time naming conventions are discussed is how and when to use an underscore. Should the field be “CustomerID” or “Customer_ID”? “ProductKey” or “Product_Key”? “SSNumber” or “SS_Number”?

CamelCase

boy and camel 150x150 Naming Conventions and the UnderscoreI have a general rule: If the container (database, property window, etc) can remember case, then only use an underscore when combining an acronym with additional information, when that information comes after the acronym. This is because you can use “CamelCase“, which I find more readable.

Here are a few examples, showing how I would nornally handle underscores when case is remembered:

  • CustomerID and not Customer_ID
  • ProductKey and not Product_Key
  • PhoneNumber and not Phone_Number
  • SS_Number and not SSNumber
  • ISO_Date and not ISODate
  • WP_Theme and not WPTheme
  • ValueEPS and not Value_EPS

And while I’m at it, avoid redundancy: VIN (Vehicle Identification Number) is not VIN_Number; GICS (Global Industry Classification Standard) is not GICS_Classification; and DG (Disease Group) is not DG_Group!

No Case Support?

If the container does not remember case, then always use an underscore to separate the parts of a name. Examples:

  • customer_id and not customerid
  • product_key and not productKey
  • phone_number and not phonenumber
  • ss_number and not ssnumber
  • iso_date and not isodate
  • wp_theme and not wptheme
  • value_eps and not valueeps

If you can be strict about naming, project members and future generations will have an easier time understanding your code, objects, and documentation. I find that the underscore — even within a good naming system — is often used inconsistently. I also find that the underscore can make a major visual impact if used correctly. So it pays to pay special attention to ASCII character 95!

As an aside, naming conventions also play an important role in data warehouse conformity. So while it is important to have good standards, you will also need good governance to be sure that your names are meaningful and consistent.

Tags: , , , ,

6 Comments