I hear from time to time the idea that one giant picture containing all known links between people and/or events could provide the analyst the visual stimulation needed to discover the next big clue. I cannot visualize this … or I should say visualize this as being useful.
Almost immediately following September 11th the FBI began carefully disseminating a terrorist watch list across corporate America because they needed immediate and wide-scale assistance with their investigation. This made some news as it can be hard to keep the list current and out of the wrong hands (e.g., the sensitive list ended up on a web site in South America).
During this time, my little company (SRD) offered to help by donating our NORA (Non-Obvious Relationship Awareness) software and our time to help a few companies accurately match the FBI watch list they received against their internal databases. Without our help, they had little chance of producing any accurate results whether by human searches or automated algorithms. For example, how would they account for the 100+ spelling variations of Mohammed?
Simultaneously, the investigative journalists began publishing link charts of how the terrorists were connected to Mohammed Atta and how Atta was connected to Osama bin Laden. Then some folks started suggesting the shapes of these networks held clues, telltale signatures now detectable by observing the unique shape of a network cluster.
My body of experience would suggest otherwise. Because as one amasses larger and larger sets of data, the shape of the network becomes less and less relevant (at least when hunting for bad guys). Think of traveling sports teams or family reunions, might these networks look like Atta’s network? In large populations of data, I believe the false alarm rate of this “pattern-based” network analysis is virtually useless.
What matters is the entrance point into the network. For example, starting with a known bad guy or a communication from an Al Qaeda safe house? Observing the network from such a vantage point is useful.
So I crafted a picture (drawing only from press clippings and other public sources) about how the network actually looked when starting from such an entrance point – in this case, Nawaf al Hazmi and Khalid al Mihdhar, known terrorists believed to be in the United States at this time. [This is well documented on page 271 and 272 of the 9/11 Commission Report.]
I created this specific picture to demonstrate that one did not need vast oceans of medical, financial and communications data to disrupt the 9/11 attacks. Rather, concentrated scrutiny on a small network, a network isolated as interesting by starting with a few known bad guys.
Shortly thereafter, my work depicting the September 11th terrorist network as seen from this vantage point appeared in various policy papers (e.g., page 28 of the Markle Foundation: Protecting America’s Freedom in the Information Age) and media accounts (e.g., Newsweek: Geek War on Terror).
This is the back story behind the 9/11 link chart I created. And I share this perspective every time I hear that people are spending time and money on technology to present gigantic graphs to users with the notion that somehow they will be able to navigate the chart and discover the next big clue.
On a more subtle technical point: Even when observing a network from a specific vantage point, what data one uses to construct the network becomes critical. As it turns out, a lot of data in this world is not helpful for this mission. I intend to post more on this subject including some basic rules about what data is and is not useful in link analysis.