Semantic reconciliation is possibly the most fundamental building block required to make intelligent systems intelligent. When I say "semantic reconciliation," I mean: "Recognizing when two objects are the same despite having been described differently." Or put more simply, this is about counting like things.
In disease research one would need to know the difference between six reported cases of Lupus versus one case reported six times. A 911 operator receives emergency calls from six people, each reporting the sound of gunshots. Is this one incident, six separate incidents, or somewhere in between?
I stayed at a W Hotel a few weeks ago. They asked me if I was in the loyalty club program. I did not know, so I had them look. I turns out I am in the loyalty club program three times. They think I am three different customers when in fact I am one. They don’t know me! (Ironically, I checked into a different W Hotel last night and they could not find any loyalty club records for me whatsoever).
If all data collected contained global unique identifiers (e.g., a bar coded serial number), then semantic reconciliation would be trivial. But the world collects different features in different ways from the same object. Some systems record me as Jeff Jonas and others Jeffrey Jonas. Sometimes I share a frequent flyer number and no date of birth, and in other places I share a date of birth and passport number. So how many Jeff Jonases are there? Organizations that cannot count unique objects make suboptimal decisions and in the case of the multiple loyalty club accounts, maybe denying a decent customer decent rewards, e.g., had all the points been recognized as one belonging to one account!
It is important to address semantic reconciliation before other analytical processes (e.g., statistical analysis, market segmentation, link analysis, etc.). This is a "first things first" principle because semantic reconciliation makes secondary analytic and computational problems that much easier and that much more accurate.
And, while my primary focus over the years has been the semantic reconciliation of identities (people and organizations) with attention to massive scale and subtle little nuances like sequence neutrality, similar techniques are possible for many other things (e.g., in Las Vegas the Starbucks on the corner of Sahara and Maryland Parkway happens to be the same as the Starbucks at 2595 S. Maryland Parkway).
If one cannot count discreet objects, one cannot properly construct context. And when organizations make decisions without context – brace yourself for bad decisions – and say hello to more Enterprise Amnesia!
RELATED POSTS:
Accumulating Context: Now or Never
Federated Discovery vs. Persistent Context – Enterprise Intelligence Requires the Later
Huh. Sounds distinctly like a way to defeat people's attempts to remain obscure to corporations and governments. I'm not in love.
Posted by: Jim Harper | April 28, 2007 at 01:44 PM
There is a trick you are missing that I wrote about in my blog entries here...
http://existentialprogramming.blogspot.com/search?q=superman
The point in those entries was that Philosophy of Language points out that resolving different "names" of a single entity to the correct single entity is not sufficient.
Different names may well refer to different aspects of the same entity that are not interchangeable.
E.G. Superman and Clark Kent both resolve to the same single physical entity, but attributes of Clark Kent (e.g. work address, favorite suit, needs glasses) have different values than the corresponding attributes of Superman. [Different names for different aspects.]
E.G. Shakespeare-the-historical-figure might not always be considered the same entity as Shakespeare-the-author-of-Hamlet. [Same name (Shakespeare) for different aspects *which might not even be recognized as separate aspects at the time the original "data" is collected*.]
Posted by: Bruce Wallace | February 16, 2009 at 08:30 AM