Not long ago I found myself at a major financial institution talking about one of their fraud detection systems. Over the course of the conversation I stumbled onto the fact they have over 10,000 rules in place to detect fraud ... and oh so proud they were.
On the surface that might sound “powerful and amazing.” Nonetheless, that struck me funny it did. 10,000 rules … WOW! That must be brittle, expensive, and one giant liability I thought to myself. Such a detection system would catch exactly 10,000 things, nothing more, nothing less. Every new discovery would lead to new rules. Over time as the rule library further bloats it would get harder to manage and probably get slower and slower. And by the way, how many people actually understand all those rules and their interrelationship? Then as those people move on, how hard is it to get new people trained up on all those rules? Will they still be bragging about their extensive rule library when they have 20,000 rules?
Imagine telling your kid one day to quit throwing rocks at cars. Only to realize the next day you have to tell them to quit throwing rocks at SUV’s. Then the coming days, you realize you must also tell your kid not to throw rocks at trucks, fire engines, and ambulances. Ummm … 4,172 rules later you must come up with new rules like “don’t throw cans of Dr. Pepper at trolley cars.”
How about: “Don’t throw things at other people’s stuff.”
As parents quickly discover, teaching a principle like this is a much better course of action. While certainly not a perfect principle, at least it would roll up hundreds of explicit rules and catch countless conditions you never thought of. And yes, maybe this simple rule needs to be extended e.g., “unless they are bad people doing bad things and they need to be stopped.” That way if someone is coming at them on a skateboard with a knife they know it is okay to throw a chair at them.
Now back to the real world and a real example from my past. Circa 1993 we were building the first NORA (Non-Obvious Relationship Awareness) system for a casino. In this system the first relevance rule was basically: “Tell me when the bad guy is the good guy.” This one rule was created to detect and alert for such things as: the slot club loyalty card member is banned from gaming (on the Nevada Gaming Control Board’s Excluded Persons List) or the job applicant is a known gaming felon.
The second relevance rule was: “Tell me when the bad guy knows the good guy.”
With just these two rules, the system started kicking out all kinds of valuable, unanticipated insight including one of my favorites: An alert surveillance room operator noticed a dude cheating on a roulette table … making bets after the ball fell (called “past posting”). Dealers are supposed to watch for this. But somehow today this dealer kept missing this obvious scam. Casino security detains the cheater. The dealer says “I can’t believe this happened to me, I am so embarrassed, you surveillance folks are sure doing a good job, it won’t happen again.” During the arrest processing, the cheating player provided a different last name and address than used by the dealer. Fortunately, the cheater provided his real home phone number which happened to be the same number that the dealer had used on her original employment application.
The dealer pretending, up to this point, to not know the player rolled-over in an instant and confessed when NORA popped off a real-time alert: “The cheater is related to the dealer.”
Behind the scenes this was data finds data followed by relevance finds the user. Relevance, in this case, based on the principle; alert when the bag guy knows the good guy.
Had we deployed a traditional rules-based alert system, there was some chance the specific rule – if the employee’s job application phone number matches an arrest record – might have been missed. But because NORA was engineered around principles we caught this colluding roulette dealer. Notably, we would have also detected this had they been connected via an emergency contact phone number. Or maybe the player’s loyalty club card’s original address provided when they signed up (and since changed) was the same address used on the employee’s original job application (but not present on her current payroll record).
Data triage systems, especially those that must detect ever-changing crafty adversaries, should be principle-based where possible; otherwise, you won’t be one step behind. You will be at two or more steps behind!
Principle-based decisioning systems may surprise you … in a good way.
MISC NOTES
1. Maybe some classes of systems need a zillion rules, like the space shuttle program, for instance. But, that is out of my field so I don’t know.
2. The notion that “principles outperform rules” probably applies to most, if not all, of the decisioning processes. For example, I would prefer to see feature extraction, entity resolution, relevance detection, filtering, and insight publishing algorithms leverage principles over rules wherever possible.
3. Just to be fair, many systems will still have to have some very specific rules – like any transaction over $10,000 must be reported to FINCEN, it’s a law. This being not much different than telling your child they have to be home by 9pm on school nights, period.
4. And if you get to 10,000 principles, you might want to focus on more abstraction.
OTHER RELATED POSTS:
You Won’t Have to Ask -- Data Will Find Data and Relevance Will Find the User
As always, a very insightful post.
Of course, one of the barriers is to be able to provide the tool with the appropriate definition of "bad guy" - which means making lists of bad guys that is accurate and sustainable. Seems like your approach was subject based as opposed to transaction based - so how do you adopt this sort of model for transaction based assessments of "bad guys"?
Posted by: Matt Devost | July 16, 2010 at 01:08 PM
Great post. The additional caveat against 10,000 principles made me think of the CYC project, which I think exemplifies the tangle of problems to which that train of thought can lead....
Posted by: Lewis Shepherd | July 16, 2010 at 09:17 PM
I think there is confusion over rules before you get the data vs filters after you get the data - thanks for the good read this morning.
Posted by: Mecredy | July 17, 2010 at 09:18 AM
This sounds like a good argument for why template-based link analysis can outperform "needle in a haystack" style datamining. Namely, you use human expertise to seed the searches with known/likely attack vectors, which would likely result in a better ROC curve than purely unsupervised methods. Matching templates in a hit could also form the basis of a more human understandable explanation, which undoubtedly would be helpful for evaluating, creating, and generalizing rules.
As for rule generalization, wouldn't the major issue be the standard bias vs. variance tradeoff seen in machine learning? The more general you make a rule, the better it can accommodate new situations, but because it's "looser" it could let more false positives in as well.
Posted by: Eric | July 20, 2010 at 12:29 PM
Did some programmer sit down and code each of the 10k scenarios? Or did a human detect a bunch of fraud, and then the system starting learning and created the rules itself? The former sounds like a maintenance nightmare. The latter is smart.
Posted by: Maintenance Man | July 27, 2010 at 04:16 PM
I got my start building expert systems. Being new and enthusiastic I was a rule generating machine. It quickly became apparent that the more rules I added the more complex and unstable the system became. Very quickly the entire system became untrustworthy. The results were not intuitive based on the inputs and frantic rule ordering and tuning were required for it to make sense. Any new rules made it worse not better. Strangely, Symbolics machines along with development environments like ART and KEE no longer exist. Of course the ultimate in rules is CYC. Cycorp is still in business so I'm assuming they have something worth selling but haven't seen it in a while.
What you are really talking about in this post is simplicity vs complexity and the 80 percent rule. Systems that are supposed to alert you to bad things don't need to be exact. They have to reduce the complexity of the underlying data to the point were a halfway decent analyst will be able to see the relevant connection and fill in the details. In your example NORA didn't actually find collusion, it just pointed out that they shared a phone number at one point. Simple sets of rules will take you at least 80% of the way to the conclusion you want while leaving your system simple enough to maintain and operate. The system will not reach any definitive conclusions (e.g. "A man with a bunch of PETN in his boxers will board a flight in London on Christmas Day") but that is the human's job anyway. Humans are much better at complex reasoning than computers. They are just not good at processing terabytes of raw data.
Now I'm a data architect and the current bane of my existence is people who are in love with very complex ontologies. Same problem, different day.
Posted by: Dave M | August 01, 2010 at 04:34 PM
I wonder if there are algorithms that can learn new principles in an online fashion?
Posted by: Sam Tetruashvili | October 27, 2010 at 01:20 PM
To hell with rules and to hell with the wrong questions???
It appears to me that they were asking the question "How do I stop fraud?" and their answer was I implement rules to find it.
Your question was "How do you find bad guys?" and your answer was I detect paterns and context to expose them and their associates.
My question to you--is how do you get to the right question-- especially with more and more data?
Posted by: D Earley | October 25, 2011 at 09:47 AM