My Photo

Your email address:


Powered by FeedBlitz

June 2008

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
Blog powered by TypePad

« The Next Generation of Network-Centric Warfare: "Process at Posting" or "Post at Processing" (Same thing) | Main | Preparing for the 2007 New Zealand Ironman in Singapore? »

February 19, 2007

To Anonymize or Not to Anonymize, That is the Question

I see a future in which organizations planning to transfer sensitive information from one system of record to some other destination will first ask themselves the question: "Can our data be shared in an anonymized form while achieving materially similar results had the data been transferred in clear text?" And if the answer to this question is "yes," I would then argue, "Why would that organization ever share that sensitive information any other way?"

A new class of technology, "Analytics in the Anonymized Data Space", is making this possible. With this type of technology, information can be anonymized before being transferred between parties, while still permitting sophisticated analysis to be performed on the data even though the data is in a non-human-readable and irreversible form i.e., anonymized.

I think this will become a best practice. When? I don’t know, maybe two years, five years or maybe even twenty years, but someday for sure. It will start with early adopters (already beginning to happen), its use will grow, and finally at some point in time anonymization-based analytics will achieve a critical mass. Thereafter, anonymization will likely be viewed as a best practice. From that moment on, if an organization is not handling its data in such a manner, I would submit they could be considered negligent.

Here is an anonymization scenario:

To stay competitive, banks must understand their customers at least as well as their competition. So, banks send their customer information to data aggregators. The data aggregators then match the bank’s customer data with their private collection of demographics (e.g., marital status) and lifestyle data (e.g., magazines subscriptions). This information is then appended to the original file and then returned to the bank (thus this practice is often called "database marketing appends"). The bank then uses this new information to profile their customers – using this newly found knowledge to improve their customer acquisition and retention programs.

But transferring all customer data to a secondary party causes organizational heartburn. In the example above, the bank’s management recognize sending their customer data to another party comes with some risk: What if an employee at the data aggregator makes an illegal copy of the customer file and secretly sells it? What if a hacker breaks into the data aggregator’s systems and extracts all or portions of the bank’s customer file? What if an employee at the aggregator uses the bank’s customer file to answer very specific questions made by "outsiders" about specific people? What if the aggregator quietly retains portions of the bank’s customer file for use later in unanticipated ways?

As gut wrenching as these risks are, most banks find themselves doing this anyway in an effort to remain competitive.

Emerging innovations which enable advanced analytics to be performed on encrypted or anonymized data will enable the bank to pass non-human readable customer data to the data aggregator. And the data aggregator will then match the bank’s anonymized customer data with their own records – while the bank’s customer records remain anonymized! The demographic and lifestyle data would then be passed back to the bank with a non-personally identifying value (e.g., a customer number).

What is gained? In short, if the data is stolen by a hacker or an agent of the aggregator, they learn nothing useful. A corrupt employee at the data aggregator cannot peruse the customer file for selected information. The aggregator does not learn new information like a new address or phone that the bank knew but the aggregator did not.

What are the risks? Well, there are lots of risks especially in this simplified embodiment (e.g., something called a dictionary attack). But, the basic principle is, if one is going to share information in clear text anyway, then even this simple model reduces to some degree the risk of unintended disclosure.

Luckily, there are a variety of cryptographic and architectural extensions one can use to harden this information sharing model against many different kinds of attacks. [Techie interjection: Commutative encryption, for example, makes it more difficult for any one user to dictionary attack the anonymized values.]

[Another technical note: Anonymization systems that prevent any possible re-identification (e.g., pointers to the original record) come with additional risks, like the inability to fully audit the system and the inability to correctly process deletions. This being the case, I think certain classes of anonymization-based systems must include Source Attribution and Data Tethering. In which case, the original holder of the data can control whether any re-identification is permitted within law and policy.]

OTHER RELATED POSTS:

Advanced Analytics in the Anonymized Data Space

Today’s FCW Story About My Anonymization Work

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/692172/16258316

Listed below are links to weblogs that reference To Anonymize or Not to Anonymize, That is the Question:

Comments

The other consideration here is to pass analytic insight around instead of data. This has been the basis for the scoring industry for years but sometimes companies could do this for themselves.
That said, I do agree that anonymizing is going to become a best practice sooner rather than later.

It's particularly interesting when the aggregator provides anonymized analytics to the bank's customer, not the bank:

http://blog.wesabe.com/index.php/2007/02/23/safeguarding-your-data-the-privacy-wall/

IMHO the name you've started out with for this technology, 'Anonymization,' is way too wonky to be understood as valuable by Joe Average User.

Anonymity is a term that is loaded with all kinds of associations, some of them negative. Better to use a simple descriptive term or phrase. Something like "Personal Protection Layer," while vague, at least is clearly positive and presents itself as something unambiguously beneficial. Or maybe "Privacy Protection System."

You may be able to come up with something better than these. Improving on "Anonymizing Technology" should be easy.

Post a comment

If you have a TypeKey or TypePad account, please Sign In