When the call center is taking a first time caller who happens to be the daughter of their largest customer, does the call center employee know? An employee moves and then submits an address change to the payroll department. Will their internal fraud investigators notice that this new address is related to a current fraud investigation? In both cases, the answer is no. Organizations don’t know what they know.
The real problem is … Data happens.
Every organization has an ocean of historical data and with each passing moment new data continues to stream in via one channel or another. Different streams lead to different silos and these silos are organized to serve different missions … rarely are any two silos alike.
When organizations elect to increase their understanding (e.g., unearth discoveries otherwise isolated across their disparate information silos) they implement secondary analytic systems. Whether the organization calls these systems data mining, business intelligence, or predictive modeling, the way these technologies generally work is that they periodically extract data from these silos and then process this data using specialized algorithms. At the end of this process a new data set is created that encapsulates what has been learned/discovered, e.g., men spend “x” percent less on shampoo than women.
As the number of silos and records increase so does the computational effort. In my view this is like re-boiling the ocean every time insight is desired. Not only does this approach not scale, this approach prevents an organization from achieving real-time situational awareness. In the batch world of analytics, information insights arrive chunky – meaning all insight is only made available on certain intervals, e.g., month end. And while this delay in awareness is not critical to some missions (e.g., direct marketing, actuarials) it is absolutely critical to other missions (e.g., fraud detection, border control systems).
So there is an ocean of historical data and it is raining, which is to say new data keeps being introduced. Re-boiling the ocean every month to discover what is knowable is old think. Perpetual analytics is new think.
“Perpetual analytics” is the term I use to describe the process of performing real-time analytics on data streams. Think of this like “directing the rain drops” as they fall into the ocean – placing each drop in the right place and measuring the ripples (i.e., finding relationships and relevance to the historical knowledge). Discovery is made during ingestion and relevant insight is published at that magical moment.
Not only is this approach more effective in that new observations (e.g., bank account openings) are of immediate use, but it is also more efficient in computational terms. In most cases, less computational effort is required to construct understanding when applied to net change data streams (e.g., adds, changes and deletes) versus the batch-based, re-boil the ocean model. Put another way, systems will achieve greatest awareness per unit of computational effort when the incremental – not batch – learning model is applied. No surprise, as this is how Mother Nature designed us too.
In a system designed to handle perpetual analytics, as data changes in source systems (e.g., an employee updates his address) a message is fired off to the analytics engine and this new observation is integrated into the collective knowledge. In this way, the “data finds the data”. Should this incremental knowledge result in insight (e.g., the employee is related to an open fraud investigation) such discovery can be published to the appropriate user (e.g., in this case the fraud investigator).
In summary, perpetual analytics is vital to achieve real-time situational awareness – a capability whereby the “data actively finds the data.”
Related post: Sequence Neutrality in Information Systems