Next generations of information management systems will not principally rely on users dreaming up smart questions to ask computers. Rather, this new breed of technology will make it possible for data to find itself and relevant discoveries to find the consumer (e.g., a user). And all in real time of course. While this will bring with it new policy debates like which data will be permitted to find which data and who is notified of what relevance, I am going to stay focused in this post on what this technology will enable.
So, here are some examples of what Perpetual Analytics can do:
1. Guest convenience. After tossing and turning in bed all night in a hotel room, the guest finally decides at 7am to call for a late check out and schedule a wake-up call at noon. Shortly, after sinking into a deep sleep, disaster strikes when the maid carelessly knocks on the door to clean the room. No hotel I know of has solved this most basic inconvenience. When the data finds the data, the late check out and wake-up call requests converge with maid scheduling information. This triggers a relevant discovery, which warrants notifying the maid – e.g., via a text message advising this room not be cleaned until after 2pm.
2. Customer service. With interest in a soon to be released book, a user searches Amazon for the title … to no avail. The user decides to check every month until the book is released. Unfortunately, the next time the user looks they find the book is not only sold out but now on back order – awaiting a second printing. When the data finds the data, the moment this book is available this data point finds the user’s original query. As a relevant discovery the user is immediately notified (e.g., sent a text message or email) about the availability of the book.
3. Improved child safety. A parent keen to ensure their young children are safe while walking to school searches the community web site to ensure no registered sex offenders are living on this same route. Will they check this site every day? When the data finds the data, should a sex offender become registered on their kids route to school, this new data will immediately connect with their earlier query. As a relevant discovery, the parent is immediately notified.
4. Cross-compartment exploitation. The government uses "compartments" to intentionally isolate data. Isolating data helps prevent highly sensitive data from escaping. So despite the Presidential mandates for Information Sharing, the Information Sharing Paradox prevents the government from discovering when two such compartments (picture this: on the same floor, three doors away) are dealing with the same subject. For example, imagine one unit working on counter-terrorism and another on counter-narcotics. Of course there are not just two, then this would be easy, but with thousands of compartments across the government, the practicality of locating the data one requires is remote since one never knows who has what information. When the data finds the data, the moment a record is added to the counter-narcotics database of relevance to the counter-terrorism unit (e.g., data involving the same person). This is a relevant discovery and thus notification is immediately published to the appropriate user.
This is not far fetched. It is imminent. It will work, and it will be Scalable and Sustainable. Centralized data catalogs operating with Sequence Neutrality will be at the center of these solutions and Anonymization, Immutable Audit Logs and other privacy-enhancing technologies will (hopefully) play an important role. And while there are endless ways such capabilities will be used to deliver exceptional corporate and consumer advantage, when the government deploys such technology, especially with private sector data (e.g., bio surveillance), we better have really clear policies, oversight and accountability, and enough transparency to Avoid Consumer Surprise.
Jeff,
I continue to dig into issues like this, including reading as much research as I can coming out of some of the big schools, but I can't find anyone who puts this so well. So thanks. Please keep writing on these topics.
I have one somewhat minor comment, that applies to your concept of data finding data and relevance finding users. I've come to believe that one of the most important parts of any architecture that tries to do this is the human part. For at least our lifetime, the human brain is going to remain the greatest processor on earth and is very definitely going to be the best processor positioned to evaluate relevance in complex situations. Of course I'm just stating the obvious. But the not-so obvious thing that needs to be built into any architecture is feedback on relevance from the user. That puts the great brain processor in charge.
In your four use cases above, relevance feedback would be in the form of:
1) Happy sleeper likes hotel and tells front desk
2) Paying customers buy more books
3) Parents notify system of false alerts and also notify system of accurate data so the system becomes better trained.
4) Relevant discoveries are noted in ways that train the system to produce more. Irrelevant discoveries are ignored in ways that train the system to stop reporting them.
So I'm wondering, does the mantra now become:
"create systems where data finds the data, relevant information finds the user, and the user assesses relevance."
Cheers,
Bob
Posted by: Bob Gourley | December 29, 2007 at 12:41 PM
I've found your content over the years to be extremely interesting. I keep coming back to this one in particular.
I'm wondering how practical issues of software engineering are handled? Taking your first example, there might be a plethora of reasons that we'd want to reschedule the maid: for example if you'd just gotten room service in the last few minutes, or if the phone is active.
When data finds data, how can we avoid a combinatorial explosion of application code to handle all these discovery events?
Posted by: Jason Watkins | August 22, 2008 at 10:00 PM