I kept thinking I should title this post, "Knock. Knock. Who’s there?" But I scrubbed that idea.
This post is related to national security, intelligence, classification and information sharing. If this is not your domain, my comments below will make no sense.
Back in pre-9/11 days, those holding classified data used to apply the "Need to Know" model when considering information sharing. Following 9/11, there has been a call for improved information sharing, which has resulted in the new mantra "Need to Share."
What do you think is the difference between "need to know" and "need to share"? When push comes to shove, most people I speak with in the mission cannot quite articulate the difference. Not good.
In part, "need to share" involves a new mindset. This new mindset was highlighted in the third report issued by the Markle Foundation’s Task Force on National Security in the Information Age. For example, in this report we called for an increased use of tearline reporting and a decreased use of ORCON designations. (see report pages 44-48)
There is another aspect, however, that, while referenced in our report (see report pages 46 and 61), may be lacking the attention it deserves. And this is the subject of data indices. These are so fundamental to implementing a functional information sharing program, that I might hazard to say that without data indices … there is little to no hope information sharing will ever be solved. Let me explain.
If someone is the custodian of a highly relevant data item how will they "know who needs to know?" And conversely, if someone else is in need of this highly relevant data item how will they "know whom to ask?" Basically the problem is: who needs to know what? Example: How will the folks working on counter-proliferation know they have a record that is directly related to another team specializing in anti-money laundering? The chances these two groups (even if working in the same building ... ouch) will actually recognize they have related data points is close to Z E R O. If there were just these two groups, the problem would be trivial and could be worked out. But in the real world, organizations may have hundreds of isolated data sets. On whose door shall I knock?
In this earlier post I introduced the Information Sharing Paradox. This paradox basically states that if everyone cannot share everything with everyone else, and everyone cannot ask everyone else every question every day … then how is someone going to find something?
The answer of course is one must first solve "discovery," i.e., knowing who to ask for what. All large scale discovery problems are solved by central indexes (data registries with pointers). Be advised, discovery is not solved by a federated search where one broadcasts searches across the enterprise. And if you hear that federated search is the solution, be afraid, be very afraid. [I explain this in some detail in this post here.]
In order for "need to share" to fulfill its full potential, data custodians must first publish (limited) metadata to the central index. More precisely, when I say "publish data," in actuality they will need to use data tethering to ensure all adds, changes and deletes are properly reflected in the index. At libraries, index metadata about new documents includes subject, title and author. In your business this limited metadata is more likely to be something like who, what, where, when, etc.
As central indexes will be the means by which information discovery challenges are solved, this becomes a way to begin focusing the privacy and civil liberties debate.
One privacy related tension will be defining exactly what kind of data should be discoverable, i.e., placed in the index? For example, in counter-terrorism information sharing programs, there would be significant controversy over, say, including pharmaceutical prescription information of all US citizens; whereas, including foreigners banned from traveling to the US would probably cause little to no concern. The subject of discoverability (i.e., selecting which data will live in the central index) deserves much debate.
On the good news front, solving discoverability via central indexes brings with it a few useful privacy protections including: a) urges to share more data with more parties is replaced by transferring less information to one place (the central index), b) who is searching for what and what they found can be logged (e.g., using immutable audit logs) in a consistent manner thus facilitating better accountability and oversight, and c) information sharing between parties is now reduced to just the records that they need to know and need to share (sharing less by sharing only information that must be shared), and d) it is now possible to make the index anonymized (see: Anonymized Semantic Indexes), which means the risk of unintended disclosure of even the limited metadata in the index is drastically reduced.
Whether living in the "need to know" world or the "need to share" word, one must first be able to answer the question "who" and "what"; otherwise, this dog won’t hunt.
RELATED POSTS:
Discoverability: The First Information Sharing Principle
Information Sharing: Got Directory?
No Need to "Over Share" – Thoughts on Information Sharing
It’s All About the Librarian! New Paradigms in Enterprise Discovery and Awareness
Intelligent Organizations – Assembling Context and The Proof is in the Chimp!
Federated Discovery vs. Persistent Context – Enterprise Intelligence Requires the Later
I think you hit the nail on the head when you said that "need to share" involves a new mindset. What is really needed to advance information sharing is an entirely new paradigm - one where marking and stewardship of information are not based solely on a need to protect sensitive information. I think the "need to share" pardigm is one where people share information with a mindset that the positive outcome to be had by sharing outweigh the potential negatives of reduced "security." Security is important, but I think we have realized that keeping all of our information locked in a box doesn't help - and frsutrates those who want to help us.
Cheers,
Dave
Posted by: David Sobyra | May 01, 2007 at 12:05 PM
"Need to Know" and "Need to Share" are not appropriate contrasting terms. "Need to Know" is often used as a content-producer label to mark data assets asserting that the data could not be shared unless the user/consumer met certain criteria.
"Need to Share" is a concept, a precept, and an objective. So many data assets are created and they need to be visible, accessible, and understandable so that they can be shared. It may be difficult to share your data with someone who could benefit from your data, if they don't know enough about the document to assert their need to know.
Think about the counter label to "Need to Know" as "Need to Hide". Be more overt in putting the onus on the data producers as to why there is a "need to hide". Think about the producer qualifying the need to hide for a specific reason, whether it be for proprietary reasons, HCFA, sources and methods, etc. If the user/consumer has not yet seen the document, how would he know what the need to know was, until after he accessed the document?
Just a thought posted to an interesting blog. Thanks Jeff for putting your commentary out for public consumption.
Posted by: Clay | May 08, 2007 at 10:50 AM
An alternative to a data index would be drilling into people's heads that it is their responsibility to get data they have into people's hands who need it. Kind of like the video clerk who sent the video of the Fort Dix plotters to the FBI. Instead of indexing data, you'd have to index agencies and their requirements.
Posted by: a517dogg | May 08, 2007 at 07:22 PM
Clearly there is an appropriate tension between the "need to know" concept and the need to share philosophy being directed by the Director of National Intelligence. And this new approach is basically asking the Intelligence Community to change its culture (a hard thing to ask) so that its basic ethos is the responsibility to provide information. But who do we want to provide it to? To those who need to know it, correct? And, as your argument articulated, that is the crux of it. To me, the change will be this: share information, and pull back (by exception) those pieces which need to be restricted and let the data provider (via a given community of interest) determine if the individual has the criteria to access it. Thus, need to know is by default if a given user is operating in a given virtual environment.
Posted by: Jesse | September 08, 2007 at 01:11 PM