Having designed a lot of systems over the years – more often than not the customer says they plan on performing periodic purges of historical data. This always seems logical at the time. But, it turns out once you have data it becomes hard to justify its destruction. And if anyone actually destroys data … one is at the same time eliminating any accountability whatsoever (not to mention other adverse consequences).
Data decommissioning is a double-edged sword.
After a number of personal missteps over the years, I have revised my think about data decommissioning. Today I imagine a process where accountability is maximized while the risk of unintended disclosure, misuse, and repurposing are minimized. The goal being to write accountability data into storage de-optimized for information retrieval … therefore rendering retrieval practical only for infrequent, forensic inspection. In simple terms, think paper tape, think hard copy reports, or think microfiche. Alternatively, in more sophisticated settings, I suspect immutable audit logs optimized only for investigative/forensic-specific information retrieval might be useful too. [More detail about this line of thinking available in this paper that Peter Swire and I penned on behalf of the Markle Foundation.] Obviously, at some point in time when there is no longer any reasonable expectation of information accountability, repeatability, etc. wholesale data decommissioned makes sense (burn the microfiche).
How I arrived at this revised thinking in part came about from this series of events.
Many years ago, I deployed a system designed to address a single, very specific threat. Then, several years later I concluded that long after that threat was over, the aggregated data set had probably lived on. I would not have thought twice about the privacy and civil liberties implications of this had I not started to engage in conversations with privacy advocates. Following these conversations, I decided that there are some scenarios in which data decommissioning should be "baked in."
Subsequently, with this in mind, when a pro bono opportunity to assist with a humanitarian disaster relief effort presented itself, I proposed a data destruction caveat for the contract. While the customer didn’t seem to care much one way or another, I was excited to learn the customer agreed to the wholesale destruction of the aggregated data set upon project closure. And delete it all we did.
A small victory for privacy it seemed – that is, until a few years later when I realized that I could no longer prove what was done, right or wrong. In fact, had there been any after-the-fact disputes about incorrect action taken based on the recommendations of the technology, I would have had to say, "We destroyed the evidence!"
In summary, when designing systems which require strong audit, accountability and repeatability processes … very careful consideration must be given to delete processes.
Deeper Technical Points:
1. Much like the challenges that come with processing deletes, record changes can have the same issues. This occurs when a system overwrites changes rather than keeping each incremental record state and its temporal relevance. When overwriting changes – one is deleting previous values; it is this de facto deletion that compromises audit and accountability processes.
2. A further complicating factor is that not all changes are the same. Some changes are corrections, i.e., the earlier value was incorrect, e.g., wrong driver’s license number or a missing apartment number in an address. Another type of change is one where a value supersedes a previous value, e.g., when recording a married name, new email address, or new cell phone number. Further complicating matters, most systems of record do not have a mechanism to capture the difference between corrections and updates – forcing system designers to make some assumptions.
3. When synchronizing data across information sharing environments, propagating deletes through this ecosystem forces each receiving party into this same accountability dilemma.
Related Trivia:
1. When data actually does get purged it is often prompted by a forcing-function. The three purge scenarios I have seen are: (a) all the ancient history is compromising performance; (b) there is no interest in paying for more storage; and (c) "oops - we shouldn’t have been collecting that!"
2. With all the countless copies of data being made, how can one be sure it is ever all deleted anyway?
RELATED POSTS:
Out-bound Record-level Accountability in Information Sharing Systems
Re: With all the countless copies of data being made, how can one be sure it is ever all deleted anyway?
Yeah, this is a hard problem. Encrypting data at rest can provide a small part of one possible solution. It reduces the volume of data you need to dispose of. Instead of needing to track down every backup tape, you can just destroy the keys you need to encrypt the tape.
Posted by: Brian | January 20, 2008 at 08:47 PM
Jeff,
These are the main facts, goals in my opinion you are dealing with data decommission:
a) Data to keep and data to be trashed, decommissioned are living together in data storage resources (hard disks and back up assets).
[QUESTION: Will your technology solved the problem of data to be erased while they are kept in "cold" back up assets? Will your technology immediately and on the fly erase those data in case recovery process has to be performed using "cold" back up assets?
b) Structured and unstructured data are living together. Data to be be kept and data to be decommissed could be part of an unstructured document i.e., personal data included in a text processor application based register
c) According to European Privacy Directive personal data should be cancelled when they are not needed anymore which means that access to data is blocked while they are not necessary (in your scenario, project closing date), but data should be accessible in case they are needed (in your scenario, eventual trial, evidence of tax obligations,...).
After all these terms while you are obliged by law (including terms while obligations fulfillment could be asked by any of the parties of an agreement data) or voluntary agreements to keep information, blocked data (reversible access in case is needed) should be definitely decommissed, deleted unless anonymization is applied on personal data, so neither you nor any third party could identify people behind the anonymized data under reasonable term and efforts.
Bearing in mind, that anonymization is exactly what re-identification researchers have demonstrated is far from being achieved and is becoming more difficult every day due to amount of data being provided publicly by human beings and included in public data bases or easy and cheap to access data bases, as long as processing and storing technology improvements (33bits.org, Paul Ohm,...)
d) Legal terms that are forcing you to maintain data for different purposes, mainly always as evidence of obligation fulfillments, are spread over many different laws and jurisdictions.
e) Furthermore, on unstructured data you could find yourself in a situation that you should indefinitely preserve information for i.e. statistical or Historical purposes, but at the same time removing some personal data and personal identifiable information (note that according to re-identification researchers all data could be considered personal identifiable information)
My conclusion is that:
If anonymization is quite difficult to achieve, or it is something that could be reversed in a short or medium term i.e. people could be re-identified, or apparently it is even impossible to reach, anonymization can not be an option as decommission substitute, which means decommission, securely erasure of data should be the unique rule, the exclusive option.
If there are no standardized terms during which decommission is mandatory (specially if you are a global player) and data to keep/data to be erased are cohabiting on unstructured documents and/or storage resources, being a medium human being without advanced technological knowledge I do not see how technology could secure and irreversibly delete certain and determined data out from a huge data spectrum where some other data have to be preserved
Looking forward to be surprised ;-p
Thanks for all your knowledge sharing and privacy by design researching (protecting not only privacy, as other civil liberties and fundamental rights).
Regards
Posted by: Álvaro Del Hoyo | March 31, 2011 at 06:23 AM
COLD-FX has helped both specialist and novice sportsmen continue to be active and healthy, as well as celebrate its 20th Anniversary this tumble, COLD -FX is saluting our countrywide passion for hockey. But there's no requirement to look from the skates or best your slap picture. Just flexible increase your fingertips for that COLD-FX Desk Ice hockey Tournament, occurring in October in a number of key centres all over Canada - Vancouver, Halifax, Edmonton and Toronto. Browse the website at www.cold-fx.ca for particulars on a gathering close to you. As well as, all hockey supporters can observe having a Particular COLD-FX 20th Anniversary Version DVD of Rock'em Sock'em Ice hockey, offered totally free with buying specifically labeled bundles of COLD-FX 60's. Kitchen table hockey, a greatest hits DVD and COLD-FX - it's the supreme head wear strategy for ice hockey fans from shoreline-to-coast.
Posted by: COLD-FX | January 14, 2014 at 08:26 PM