Data mining means different things to different people and quite frankly has become an overused term. And after having seen quite a few data mining definitions, I have concluded the longer the definition the greater the confusion. So what might be the shortest possible definition?
Data Mining = Prediction.
Marketing organizations use data mining to create promotional offerings (“junk mail” to you and me) targeting selected individuals they “predict” will have a higher propensity to transact. In this scenario, better prediction means a higher promotional response rate, thus improved sales and savings in promotional costs. False positives – or incorrect predictions – in this domain have a benign consequence.
And while I am on record about the negative Consequences of False Positives in Government Surveillance Systems, there are some very powerful uses of data mining in government settings.
Introducing: Data Mining for Predicate Triage
When a government is faced with an overwhelming number of predicates (i.e., subjects of investigative interest), data mining can be quite useful for triaging (prioritizing) which subjects should be pursued first. One example: the hundreds of thousands of people currently in the United States with expired visas. The student studying virology from Saudi Arabia holding an expired visa might be more interesting than the holder of an expired work visa from Japan writing game software.
Applying this line of thinking to the recently reported NSA warrantless surveillance debate, if the surveillance always starts with a predicate (in this setting, phone calls from known Al Qaeda training camps), and then data mining is used for predicate triage … then we are talking about a very useful form of data mining.
So what constitutes a viable (legal and useful) predicate? That is the question of the day!