My Photo

Your email address:


Powered by FeedBlitz

June 2009

Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30        
Blog powered by TypePad

« Today’s FCW story about my anonymization work | Main | What sharks? Reflections on the 2005 Western Australia Ironman »

January 26, 2006

Sequence Neutrality in Information Systems

When I ask investigators or analysts what technology improvements they would most appreciate, invariably one of their top requests is “to get answers to their questions faster.”  This has always struck me as funny.  What if the question being asked today is not a smart question until next Thursday?  How can we expect analysts to ask every smart question every day?  In short, this is kind of like climbing a tree to get to the moon.  You can always inch further up, but how is that really going to get you where you need to go?

Systems that produce different answers based on the order of events lack a property I refer to as “Sequence Neutrality”.  Sequence neutrality means regardless of the order in which data or queries occur, the end-state, once all data points are known, is the same.  Sequence neutrality prevents systems from having to ask every smart question every day. 

Here’s an example.  Today when a bank searches for “Billy the Kid” the answer will depend on whether such a record existed first.  However, with sequence neutrality the moment “Billy the Kid” opens a bank account, regardless of when that occurs, the user making the original query can be notified.  Furthermore, months later if “Billy the Kid” is added to the OFAC list (people and organizations that financial institutions are banned from doing business with), the bank is instantly alerted.

As another example, government entities perform background checks on individuals seeking “top secret” clearances.  What happens if one of the systems used to favorably qualify a person thereafter receives a record that would suggest the applicant should receive additional scrutiny—a record shows up in a registered (and public) sex offender database shortly after the person is granted a clearance.  How will they learn of this new data point?  One option would be for the government to ask every question every day, which obviously is impractical.  So to address this scenario, the US Government performs background checks every five years.  But that means that a glaring problem in the data may not be discovered until the question is asked again—potentially years later.  In a system designed for sequence neutrality, the moment a relevant record comes into existence, it is published (pushed) to the relevant system or user.

When sequence neutrality is applied to information systems a very interesting effect is created: the “data finds the data.” What this means is that as each new piece of data is observed by the system, how this data relates to all previously observed data points is considered – without waiting for a user to ask a question.  And while this can benefit a single system it is even more powerful when applied across heterogeneous systems.  Suddenly, very interesting insight is possible.

How does a company recognize that its accounts payable manager shares the same phone number as its largest vendor (a relationship that can violate company policy if undisclosed)? 

When the “data finds the data” such insight and awareness is not only possible it is fundamental and essential to create market differentiating services. Whether an organization is focused on managing customer relationships, credentialing parties, evaluating credit risk or handling investigations– with sequence neutrality built in – unusually unique and powerful possibilities emerge.

Comments

Jeff,

I strongly agree with your "aggregate vs sequence results" perspective. The need to have systems neutralize the challenges posed by traditional runtime "race controls" and other nondeterministic factors inherent in distributed systems is key to solving many of the representative problems you cite.

With respect to the "temporal nature of request..." I again strongly support your position on the need to monitor and broadcast "substantial" change in result(s) based on parametric data. I would add that there's considerable value in having some persistent data available to manage users / organizations that "used" information that has changed -- per the above discussion / scenario... Decision may want to consider these insights in addition to leveraging the latest & greatest (temporal) perspective on the available data set.

I recognize this is probably covered in other discussion threads; but I have to bring up the importance of dis-ambiguation. The importance of applying a plethora of text analytics technology to minimize ambiguities is key to these challenges.

Regards, from a long-time SRD (and now IBM Entity Analytics) fan, Fred M-D

Dear Jeff James,

I am a Brazilian journalist and I would like to interview you. I can explain the aim of the interview by e-mail. Could you give me your e-mail address?

Best Regards,

Solange

I actually implemented this concept in an Arabic search engine I wrote for a school project in my data mining class. But it had a lot more to do with the fact that it was more efficient to implement it so that the data was run past the queries rather than the queries run on top of the data. But I used the same argument you present here while convincing my professor that I should get a decent grade for it.

Man that was a tough class.

Thanks for giving me a word for it. I've been striving for "sequence neutrality" in my music aggregator, Grabb.it, and now that I can name it it's a lot easier to whiteboard.

Jeff....isn't sequence neutrality as described in your blog the same as recording a "declaring an interest in and want to be informed when it happens" type of setup?

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment