This month in IEEE Security and Privacy (November/December 2006) there is an article I wrote that describes in relatively plain English the key principles of "Identity Resolution" and "Relationship Resolution."
Here is a link to a PDF version of this story: Threat and Fraud Intelligence – Las Vegas Style
In a nut shell, here are the essential objectives:
- Sequence Neutrality
- Relationship Aware
- Perpetual Analytics
- Context Accumulation
- Knowledge-based Name Evaluations
This story also makes the case that probabilistic-based identity matching systems skew over time as the underlying data changes. I have 23 years of work in the area of identity disambiguation at scale. This has led me to the conclusion that starting with deterministic matching and tuning probabilistically is far superior, especially in large data sets that cannot be retrained or reloaded in any reasonable interval (e.g., quarterly).