My Photo

Your email address:

Powered by FeedBlitz

April 2018

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
Blog powered by Typepad

Become a Fan

« Dumb and Dumber: Consequences of the 2006 Silverman Triathlon | Main | Effective Counter-Terrorism and the Limited Role of Predictive Data Mining »

November 29, 2006


Feed You can follow this conversation by subscribing to the comment feed for this post.


Neat article, I'm glad you wrote it.

What kind of controls do organizations put in place to keep people from lying about (or just manipulating) their personal information? For example, someone trying to beat the system could use a pay-as-you-go cell phone number instead of a home number, or a PO box instead of their home address. It seems like that would be an effective way of blocking the identity and relationship resolution process.

Do organizations end up building unique components or procedures to verify different types of data? For example, one system for SSNs, another for credit card numbers, a third for phone numbers?

Would obfuscated identities reveal themselves in some other way, such as tending to have more generic components to their identities?

Is the problem just not worth worrying about? Or will smart attackers looking for large payoffs try to confuse the identity resolution system?

Ray Garcia

Jeff, the technique you describe applies to International Trade which has a compliance requirement to spot blacklisted people and entities in what is referred to as the Denied Parties list. I worked on this problem many years ago and used a modified version of the Double Metaphone Algorithm to deal with variations in international names. Also, extended the technique to work with international addresses.

You mentioned Soundex in the article which is very poor at phoenitic matching and has been supplanted by Metaphone although none of the Database vendors have advanced their products to replace Soundex yet.

Ray Garcia

The constraint of using an identity structure that can be constructed as information is captured makes sense in this context. The reason is mostly related to the fact that humans already can conceive of the various attempts as tricking the systems to avoid being caught therefore establishing a set of rules against the probable structure of data and relationships may work for this specific class of problems.

The strategy may be worth trying for other classes of problems as well where analysis and prediction have been difficult. Using a similar strategy as describe in the article might be to contruct a fuzzy ontology and fuzzy action semantics to capture information as it is available. The information can be analyzed for partial representation and fuzzy treatment in matching and formulation of relationship to other aspects of the knowledge being captured.

This approach provides a sensible balance between attempting to fully structure the data versus the difficulty of making sense out of purely unstructured data.

The above likely cannot be done with a traditional SQL database and would require an RDF-s or Owl Repository that is modified to support the fuzzy knowledge.

Ray Garcia

A related area of research that can help detect the subversion of internal controls see how data lineage is addressed by models that support Data Provenance. Dr. Sudha Ram at the University of Arizona in Tucson has done some excellent work in this area. See
for a visual example of what Data Provenance is and how is might be used.


Identity is the simple root for searching to the address, we can easily search to their address.

The comments to this entry are closed.