If you are not interested in a technical peculiarity that occurs in aggregated data sets, just ignore this post.
I am often asked what my thoughts are about selecting the single best attributes (e.g., best name and best address) when multiple attributes are known. I always respond with, “truth is in the eye of the beholder.”
This came as a hard lesson. In the mid-1990’s, I built a data warehouse that was being fed daily by over 4,000 disparate operational systems belonging to handful of widely recognized consumer brands. The goal was to better understand the customer by recognizing when the same person was transacting across different brands all held by the same holding company. The underlying motivation: the more fully the customer is understood the more you can sell to the customer.
There I sat with a number of marketing VP’s, each representing their brand’s interests. And while everyone worked for the same parent company, there was one question no one could agreed upon: When a consumer has transacted with all of the brands, each time using a slightly different name or new address, which name and address should be considered the enterprise-wide GOLD standard? As it turns out, there is no such thing as a single version of truth.
The name and address supplied to a human resources system by an employee is the best name and address for an IRS filing, even if a different name and address has become available from another system. And a hotel statement better be sent to the address supplied by the guest when he or she checks out of the hotel – not some other address deemed “best” because of its perceived currency and reliability from some other data source. For a direct marketing piece, a name and address from a loyalty club program is generally better than the hotel reservation data provided over the phone. Why? Because loyalty club data is more reliable as consumers want to receive their points statement in the mail.
Thus, the definition of best varies based on who is asking the question. So when I am asked how to determine the single best version of truth I recommend being prepared to deliver every version of truth -- for truth is in the eye of the beholder.
Truth on demand … so to speak.
I used to work for a data recognition company. I called this concept "Quantum Recognition". Basically, a person existed in a number of different states until you provided enough information to determine what "version" of the consumer you wanted.
Good to see someone else thinking along the same lines.
Posted by: Tanton Gibbs | March 20, 2006 at 08:59 AM
It would be more accurate to say that these particular entities have n attributes - like an n-dimensional vector - and that truth is not in the eye of the beholder (the values for each dimension are fixed at any one time), but rather each user defines what is important to him (his "truth") as one (or multiple), but not all, of the dimensions.
Posted by: Alex Simonelis | November 28, 2007 at 08:13 AM
Why is the industry always pushing one version of the truth.
I agree "truth is in the eye of the beholder".
We could not agree an a recent MDM effort and scrapped it. But a seperate app to app integration project I was on adopted an open-spec approach we found on sourceforge called Jumper metamodel. This gave us the best of MDM with much needed flexibility.
Posted by: Kumar | January 28, 2008 at 08:07 AM
great minds...I had a series of brainstorms in early 2006 that paralleled this discussion...
http://existentialprogramming.blogspot.com/2006/06/original-epiphanies-of-existential.html
It has led me to undertake a study of the Philosophy of Identity, and to attempt to put into a book for general software developers the 2500 year old conversation that Philosophers have had on this topic. [Thesis: Because their bread-and-butter activity involves modeling the world, ALL software developers need to know about Philosophy/Metaphysics rather than only exotic post-graduate researchers.]
Posted by: Bruce Wallace | February 16, 2009 at 07:11 AM
It seems to me that this issue comes down to the use of terms to refer to concepts.
The terms "name" and "address" can refer to many different concepts. If these concepts were properly identified and differentiated from each other and instances of these concepts identified and differentiated from each other then the conflicts described in the post would not happen.
We need to build systems that differentiate between concepts, not just terms, and have systems for documenting concepts that are as unambiguous as possible.
I find the guidance given by the Australian Institute of Health and Welfare's data modelling team to be quite useful for providing some rigor and rules for this:
http://www.aihw.gov.au/publications/hwi/nhddv14/nddeb08.pdf pages 19-23
Posted by: Euan Cochrane | February 14, 2011 at 05:49 PM