I hear from time to time the idea that one giant picture containing all known links between people and/or events could provide the analyst the visual stimulation needed to discover the next big clue. I cannot visualize this … or I should say visualize this as being useful.
Almost immediately following September 11th the FBI began carefully disseminating a terrorist watch list across corporate America because they needed immediate and wide-scale assistance with their investigation. This made some news as it can be hard to keep the list current and out of the wrong hands (e.g., the sensitive list ended up on a web site in South America).
During this time, my little company (SRD) offered to help by donating our NORA (Non-Obvious Relationship Awareness) software and our time to help a few companies accurately match the FBI watch list they received against their internal databases. Without our help, they had little chance of producing any accurate results whether by human searches or automated algorithms. For example, how would they account for the 100+ spelling variations of Mohammed?
Simultaneously, the investigative journalists began publishing link charts of how the terrorists were connected to Mohammed Atta and how Atta was connected to Osama bin Laden. Then some folks started suggesting the shapes of these networks held clues, telltale signatures now detectable by observing the unique shape of a network cluster.
My body of experience would suggest otherwise. Because as one amasses larger and larger sets of data, the shape of the network becomes less and less relevant (at least when hunting for bad guys). Think of traveling sports teams or family reunions, might these networks look like Atta’s network? In large populations of data, I believe the false alarm rate of this “pattern-based” network analysis is virtually useless.
What matters is the entrance point into the network. For example, starting with a known bad guy or a communication from an Al Qaeda safe house? Observing the network from such a vantage point is useful.
So I crafted a picture (drawing only from press clippings and other public sources) about how the network actually looked when starting from such an entrance point – in this case, Nawaf al Hazmi and Khalid al Mihdhar, known terrorists believed to be in the United States at this time. [This is well documented on page 271 and 272 of the 9/11 Commission Report.]
I created this specific picture to demonstrate that one did not need vast oceans of medical, financial and communications data to disrupt the 9/11 attacks. Rather, concentrated scrutiny on a small network, a network isolated as interesting by starting with a few known bad guys.
Shortly thereafter, my work depicting the September 11th terrorist network as seen from this vantage point appeared in various policy papers (e.g., page 28 of the Markle Foundation: Protecting America’s Freedom in the Information Age) and media accounts (e.g., Newsweek: Geek War on Terror).
This is the back story behind the 9/11 link chart I created. And I share this perspective every time I hear that people are spending time and money on technology to present gigantic graphs to users with the notion that somehow they will be able to navigate the chart and discover the next big clue.
On a more subtle technical point: Even when observing a network from a specific vantage point, what data one uses to construct the network becomes critical. As it turns out, a lot of data in this world is not helpful for this mission. I intend to post more on this subject including some basic rules about what data is and is not useful in link analysis.
Even if you could possibly have every bit of human knowledge at your fingertips, we don't yet have the computing power to do much useful with it.
Most of us really don't even have the computing power to analyze vast amounts of Internet traffic in real time, as certain government agencies do.
But as you said, to pull useful information out of the network, you really don't need to know everything, and knowing too much gets in the way.
I'm looking forward to see what else you have to say, as I've been thinking about how to adapt data mining techniques to the problem of blog spam prevention. From the view of one weblog, only limited information is available, but by making more effective use of more information than others use, I've made a pretty good dent in the blog spam problem. And since I have to analyze in real time, I can't use a lot of information anyway.
Posted by: Michael Hampton | May 08, 2006 at 10:20 PM
Jeff blogged...
"What matters is the entrance point into the network. For example, starting with a known bad guy or a communication from an Al Qaeda safe house? Observing the network from such a vantage point is useful."
Right on, Jeff! We are in total agreement.
I wrote the original of this white paper in 2002 when I also was asked "how do you predict this from patterns?"
http://www.orgnet.com/prevent.html
Posted by: Valdis Krebs | May 13, 2006 at 07:24 PM
Right on, Jeff! We are in total agreement.
Good theme, simple but pleasant, lke all the themes shoud be to create a good atmosphere.
Posted by: Mike dan | May 07, 2008 at 03:07 AM