I was invited to deliver a short keynote about "big data" at today's OECD roundtable focused on the economics of personal data and privacy. My presentation here.
Most big data flows by design. But when big data leaks the consequences can be wicked.
That said … protecting big data from wicked leaks is not going to be easy. Defending against external cyber penetrations and insider threats are both hard problems.
Now with the Wikileaks disclosures it is clear the game has changed. Historically, public disclosure of classified data has been limited and infrequent – let’s even say, to a degree, tolerable. Contrast that with the scope of the recently leaked cables. At some point it may become intolerable. In which case, I will not be surprised if a number of governments around the world attempt to enact new, wide-sweeping, anti-leak legislation directed at not only those engaged in the initial theft of the data, but the distribution points (e.g., Wikileaks) and the publishers (e.g., the media). The principle being if one knowingly receives and benefits from stolen property, they are accomplices. This pendulum could swing so far (backwards) the future will have far fewer media leaks than the historical (tolerable) volumes – i.e., this whole fiasco resulting in less transparency and accountability.
BTW: I suppose it could have been worse. What if the 250,000 classified cables where selectively and quietly passed around to various foreign intelligence services? What if the US thought these were secrets? Imagine believing one has a certain security posture … when one does not. Would that be worse?
Organizations with big data worth protecting must employ extraordinary controls to reduce the risk of unintended disclosure. On that note, I closed with a few ideas related to protecting big data from wicked leaks, including:
Central indexes. There are actually a number of scenarios where a single, central catalog of pointers is better than lots of copies of the same data scattered all over the place – the advantage being fewer copies of the data and uniform access controls and audit logs.
Anonymization. Despite the imperfections of data anonymization, when it comes to reducing the risk of unintended disclosure, most would agree that data anonymization is still better than clear text.
Immutable audit logs. Tamper resistant audit logs can be used to help prove the users of the system are complying with law and policy.
Real-time active audits. It is now going to be essential that user activity be more rigorously analyzed, in real-time, for inappropriate behavior. Audit logs have actually been part of the problem – just another big pile of data – evidence of misuse hiding in plain sight against the backdrop of millions and millions of benign audit records.
RELATED POSTS:
Big Data. New Physics.
More Death Cheaper in Future
It’s All About the Librarian! New Paradigms in Enterprise Discovery and Awareness
"Need to Know" vs. "Need to Share" – A Very Fine Line Indeed
To Anonymize or Not Anonymize, That is the Question
So you're position is tread lightly, ignore war crimes and corruption lest we upset those at the top of the power and wealth chain into taking MORE of our liberties away?
UNACCEPTABLE.
Posted by: Conspiracy2Riot | December 01, 2010 at 12:30 PM
It appears Wikileaks is on the threshold of a monumental exposure of internal communications of a US bank, rumored to be the Bank of America. I suspect such a release of commercial internal operational records will generate substantially more efforts to quash Wikileaks than considerations of the risk of big data stores. It will also be another sad demonstration that national security often takes second place to markets risks.
Ironically, this week the Department of Homeland Security subordinated its own national security concerns to commercial counterfeit and copyright interests by leading an unprecedented mass seizure of domain names without due process or recourse.
I worry that Wikileaks will spur more support for passage of the profoundly ill-considered Combating Online Infringement and Counterfeits Act, putting the entire internet DNS network security system at risk.
Posted by: R_macdonald | December 01, 2010 at 09:40 PM
I would think that there would be a significant kickback from the public and the media in Western Democracies if Governments tried to put in new legislation. Never mind the Freedom to Information laws that these Governments would have to fight against.
This particular leak was started by a 22 year old Army recruit who had total access to the system. The people who put in place the permission rights that allowed the 22 year old to access the system are the cause and problem. You have to kind of ask yourself who authorized these permission rights in the first place.
People want greater transparency from their Governments and all surveys over time have shown that the level of trust between people and their governments has been reducing alarmingly over the years.
Posted by: Dinesh Vadhia | December 02, 2010 at 03:53 AM
Information is data in context
Therefor data has and defines behavior. Which is not used in categorical and attributed systems. All the logging of the world will not help. They have to think about "smarter" systems.
Case in point I replaced my SpamAssassin installation, which used more and more CPU time, with a network behavioral system. Less spam, less CPU usage and I hardly ever look at the logs, the system cleans and manages itself.
Posted by: ronald | December 03, 2010 at 09:49 AM
Thanks for adding "real-time audits" to your list. I've been hammering this nail every chance I get, including last week in India. Hopefully we'll see this become a key component in security architecture sooner rather than later. Keep up the good work, Jeff.
Posted by: Jeffreycarr | December 04, 2010 at 05:40 PM
What's the difference between gossip, news, and leaks? Isn't one person's gossip another person's leak? Even when publishing was restricted to wealth before the printing press, leaking could not be controlled. How can anyone control leaks with 1,2, or 3 billion people on-line with instant access to Internet publishing?
No doubt, governments will try but in the remedy is a state of Orwellian control we should all fear greatly.
Posted by: Steven Adler | December 19, 2010 at 12:40 PM
I agree wholeheartedly with you about the effects of wicked data leaks and, on the IBM Mastering Data Management blog, predicted the "waterproofing" that will take place during 2011 (http://bit.ly/fPqZSt). I have no doubt that the pendulum is going to swing the other way not just with disclosure, but also to the extreme in terms of process. The Air Force has already taken steps to ban the use of removable media on certain servers that have no connectivity to the internet. Unfortunately, this now impacts the good guys along with the bad. Defense contractors are being cut off from the data they need. My favorite quote on all this: "They were asking us to build homes before. Now they are taking away our hammers."(http://bit.ly/fRGm5E)
Posted by: JGoldfed | January 07, 2011 at 12:21 PM