My Photo

Your email address:

Powered by FeedBlitz

April 2018

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
Blog powered by Typepad

Become a Fan

« When Federated Search Bites | Main | Using Transparency As A Mask »

July 15, 2010


Feed You can follow this conversation by subscribing to the comment feed for this post.

Matt Devost

As always, a very insightful post.

Of course, one of the barriers is to be able to provide the tool with the appropriate definition of "bad guy" - which means making lists of bad guys that is accurate and sustainable. Seems like your approach was subject based as opposed to transaction based - so how do you adopt this sort of model for transaction based assessments of "bad guys"?

Lewis Shepherd

Great post. The additional caveat against 10,000 principles made me think of the CYC project, which I think exemplifies the tangle of problems to which that train of thought can lead....


I think there is confusion over rules before you get the data vs filters after you get the data - thanks for the good read this morning.


This sounds like a good argument for why template-based link analysis can outperform "needle in a haystack" style datamining. Namely, you use human expertise to seed the searches with known/likely attack vectors, which would likely result in a better ROC curve than purely unsupervised methods. Matching templates in a hit could also form the basis of a more human understandable explanation, which undoubtedly would be helpful for evaluating, creating, and generalizing rules.

As for rule generalization, wouldn't the major issue be the standard bias vs. variance tradeoff seen in machine learning? The more general you make a rule, the better it can accommodate new situations, but because it's "looser" it could let more false positives in as well.

Maintenance Man

Did some programmer sit down and code each of the 10k scenarios? Or did a human detect a bunch of fraud, and then the system starting learning and created the rules itself? The former sounds like a maintenance nightmare. The latter is smart.

Dave M

I got my start building expert systems. Being new and enthusiastic I was a rule generating machine. It quickly became apparent that the more rules I added the more complex and unstable the system became. Very quickly the entire system became untrustworthy. The results were not intuitive based on the inputs and frantic rule ordering and tuning were required for it to make sense. Any new rules made it worse not better. Strangely, Symbolics machines along with development environments like ART and KEE no longer exist. Of course the ultimate in rules is CYC. Cycorp is still in business so I'm assuming they have something worth selling but haven't seen it in a while.

What you are really talking about in this post is simplicity vs complexity and the 80 percent rule. Systems that are supposed to alert you to bad things don't need to be exact. They have to reduce the complexity of the underlying data to the point were a halfway decent analyst will be able to see the relevant connection and fill in the details. In your example NORA didn't actually find collusion, it just pointed out that they shared a phone number at one point. Simple sets of rules will take you at least 80% of the way to the conclusion you want while leaving your system simple enough to maintain and operate. The system will not reach any definitive conclusions (e.g. "A man with a bunch of PETN in his boxers will board a flight in London on Christmas Day") but that is the human's job anyway. Humans are much better at complex reasoning than computers. They are just not good at processing terabytes of raw data.

Now I'm a data architect and the current bane of my existence is people who are in love with very complex ontologies. Same problem, different day.

Sam Tetruashvili

I wonder if there are algorithms that can learn new principles in an online fashion?

D Earley

To hell with rules and to hell with the wrong questions???

It appears to me that they were asking the question "How do I stop fraud?" and their answer was I implement rules to find it.

Your question was "How do you find bad guys?" and your answer was I detect paterns and context to expose them and their associates.

My question to you--is how do you get to the right question-- especially with more and more data?

The comments to this entry are closed.