Data mining means different things to different people and quite frankly has become an overused term. And after having seen quite a few data mining definitions, I have concluded the longer the definition the greater the confusion. So what might be the shortest possible definition?
Data Mining = Prediction.
Marketing organizations use data mining to create promotional offerings (“junk mail” to you and me) targeting selected individuals they “predict” will have a higher propensity to transact. In this scenario, better prediction means a higher promotional response rate, thus improved sales and savings in promotional costs. False positives – or incorrect predictions – in this domain have a benign consequence.
And while I am on record about the negative Consequences of False Positives in Government Surveillance Systems, there are some very powerful uses of data mining in government settings.
Introducing: Data Mining for Predicate Triage
When a government is faced with an overwhelming number of predicates (i.e., subjects of investigative interest), data mining can be quite useful for triaging (prioritizing) which subjects should be pursued first. One example: the hundreds of thousands of people currently in the United States with expired visas. The student studying virology from Saudi Arabia holding an expired visa might be more interesting than the holder of an expired work visa from Japan writing game software.
Applying this line of thinking to the recently reported NSA warrantless surveillance debate, if the surveillance always starts with a predicate (in this setting, phone calls from known Al Qaeda training camps), and then data mining is used for predicate triage … then we are talking about a very useful form of data mining.
So what constitutes a viable (legal and useful) predicate? That is the question of the day!
I think "predicate triage" is a misnomer. You mean "suspect triage" - you're trying to pick which among many suspects to pursue, not which among many predicates to pursue, aren't you? If they're suspects, then you can do whatever you want as far as picking 'em up - go alphabetically, at random, or using data mining. It doesn't matter in terms of civil liberties if you're legally authorized to nab 'em.
Posted by: Jim Harper | April 04, 2006 at 01:25 PM
FYI
Democratic senators want agency data-mining reports
By Winter Casey, National Journal's Technology Daily
The government's mining of information from public- and private-sector databases for clues to terrorism and crime is widespread and federal agencies should regularly report to Congress on such activities, lawmakers said Wednesday.
"The overwhelming majority of these data-mining programs use, collect, and analyze personal information about ordinary American citizens," Senate Judiciary Committee Chairman Patrick Leahy, D-Vt., said during a hearing on balancing privacy and security. "We need look no further than the government's own terrorist watch list, which now contains the names of more than 300,000 individuals -- including infants, nuns and even members of Congress-- to understand the inefficiencies that can result from data mining and government dragnets."
Leahy said that "at least 52 different federal agencies are currently using data-mining technology," adding that there are "at least 199 different government data-mining programs operating or planned throughout the federal government." Despite its widespread use, Leahy said questions remain about how effective data mining is in preventing terrorism.
Full story: http://www.govexec.com/story_page.cfm?articleid=35846&dcn=e_gvet
Posted by: Stephen Taylor | January 11, 2007 at 07:54 AM