When the call center is taking a first time caller who happens to be the daughter of their largest customer, does the call center employee know? An employee moves and then submits an address change to the payroll department. Will their internal fraud investigators notice that this new address is related to a current fraud investigation? In both cases, the answer is no. Organizations don’t know what they know.
The real problem is … Data happens.
Every organization has an ocean of historical data and with each passing moment new data continues to stream in via one channel or another. Different streams lead to different silos and these silos are organized to serve different missions … rarely are any two silos alike.
When organizations elect to increase their understanding (e.g., unearth discoveries otherwise isolated across their disparate information silos) they implement secondary analytic systems. Whether the organization calls these systems data mining, business intelligence, or predictive modeling, the way these technologies generally work is that they periodically extract data from these silos and then process this data using specialized algorithms. At the end of this process a new data set is created that encapsulates what has been learned/discovered, e.g., men spend “x” percent less on shampoo than women.
As the number of silos and records increase so does the computational effort. In my view this is like re-boiling the ocean every time insight is desired. Not only does this approach not scale, this approach prevents an organization from achieving real-time situational awareness. In the batch world of analytics, information insights arrive chunky – meaning all insight is only made available on certain intervals, e.g., month end. And while this delay in awareness is not critical to some missions (e.g., direct marketing, actuarials) it is absolutely critical to other missions (e.g., fraud detection, border control systems).
So there is an ocean of historical data and it is raining, which is to say new data keeps being introduced. Re-boiling the ocean every month to discover what is knowable is old think. Perpetual analytics is new think.
“Perpetual analytics” is the term I use to describe the process of performing real-time analytics on data streams. Think of this like “directing the rain drops” as they fall into the ocean – placing each drop in the right place and measuring the ripples (i.e., finding relationships and relevance to the historical knowledge). Discovery is made during ingestion and relevant insight is published at that magical moment.
Not only is this approach more effective in that new observations (e.g., bank account openings) are of immediate use, but it is also more efficient in computational terms. In most cases, less computational effort is required to construct understanding when applied to net change data streams (e.g., adds, changes and deletes) versus the batch-based, re-boil the ocean model. Put another way, systems will achieve greatest awareness per unit of computational effort when the incremental – not batch – learning model is applied. No surprise, as this is how Mother Nature designed us too.
In a system designed to handle perpetual analytics, as data changes in source systems (e.g., an employee updates his address) a message is fired off to the analytics engine and this new observation is integrated into the collective knowledge. In this way, the “data finds the data”. Should this incremental knowledge result in insight (e.g., the employee is related to an open fraud investigation) such discovery can be published to the appropriate user (e.g., in this case the fraud investigator).
In summary, perpetual analytics is vital to achieve real-time situational awareness – a capability whereby the “data actively finds the data.”
Related post: Sequence Neutrality in Information Systems
Jeff,
Again, I am definitely a major fan of the SRD/NORA (now IBM Entity Analytics) technology being discussed. I'd like to bring up the topic of NONDETERMINISTIC discovery, versus discovery through (deterministic) existing, predictive models...
There's tremendous value in using predictive, existing models -- which one may calibrate by changing the attributes and ranges in the associated model parameters. There are many alarms and discoveries that can be made with this dynamic, yet pre-determined approach.
Then there's the nondeterministic discovery model -- finitely bound only by the set of all possible permutations aggregating information via numerous affinity analysis heuristics (e.g., temporal clustering, co-occurrence clustering, other "discovered" affinities (i.e., common attributes not already present in the existing predictive models) ). Clearly, the computational complexity of discovery heuristics has been prohibitive in the past. But advances in algorithm optimization, as well as increased computational power and storage efficiencies make this discovery approach more and more feasible today.
My experience is that one needs both: computers to discover affinities / clustering models; and humans to pragmatically view these results and select the clusterings that truly make sense for a given domain / problem set. These can then be added to the predictive models -- to make better decisions, etc.
Thoughts / comments / foreshadowing you can share?
Thanks in advance!
Fred M-D
Posted by: Fred M-D | February 05, 2006 at 01:55 PM
"As data changes, the new observation is integrated into the collective knowledge" with the incremental knowledge resulting in insight. Does this assume that each new observation generates all the concomitant relationships explicitly, or that inference is used to implicitly draw from what is explicit in the collective knowledge? I suspect it's the latter or the integration would become onerous. So, in other words, if you know that A=B and the new observation that B=C is integrated, if you are making use of an inference engine, you now automatically know that A=C. One new observation (that C cannot=D) can yield a lot of additional knowledge (A cannot=D, B cannot=D).
Posted by: VZ Farrell | November 02, 2006 at 01:14 PM
Jeff I've always valued this post. Now its one I'm recommending to all my CTO friends in the federal space. The Xmas terror attack is going to cause lots of folks to rethink our systems, and, although I'm sure that smart humans can find ways to defeat any systems we throw at this, I'm also certain that we can optimize our systems to do better by following the model you lay out here.
Cheers,
Bob
Posted by: Bob Gourley | January 02, 2010 at 05:42 PM