I was invited to speak at the Web 2.0 Summit last week in San Francisco. Believe it or not I actually presented 41 charts in less than 10 minutes. This kind of general session presentation was called a Show Me/High Order Bits. That’s right, the essence of my life’s work in just 10 minutes ... the thrill!
[Note: The formal title was: "Cops and Robbers Las Vegas Style."]
If you did not make this most amazing summit with a most amazing cast of attendees or were there and missed my auctioneer-inspired delivery, here are the key points I covered:
0. I first showed a picture of a fire breather from my last New Year’s eve party – but that is not important right now.
1. I showed a surveillance video of a casino scam involving a corrupt dealer – resulting in a $250,000 loss in 15 minutes. If the dealer had the same address in the payroll system as the "high roller" had in the loyalty club and comp systems (free rooms, meals, etc.) … who would know?
2. I introduced the concept of "Corporate Amnesia." This occurs when one part of the organization makes a decision which very clearly did not account for other key data sitting elsewhere in the enterprise e.g., your marketing department is mailing offers to a person currently in jail for stealing from you!
3. "Perception Isolation" is the leading cause of Corporate Amnesia. Think of each operational system as a distinct enterprise perception. Notably, each perception is isolated from the others.
4. Enterprise intelligence requires persistent context. There is no way to get smart if perceptions are not integrated. When perceptions are integrated and stored in a database … this is persistent context. Think of this like a brain. You need a brain to be smart … duh!
5. I gave a simple demonstration of how context can be constructed and persisted and how this enables the enterprise discovery that otherwise would be missed (more corporate amnesia).
6. Then treat data as a query. And thus I introduced a 1st principle for enterprise intelligence: If you do not process every new piece of key data (perception) first like a query … then you will not know if it matters … until someone asks.
7. Treating data like a query beats periodically boiling the ocean when attempting to achieve real time intelligence.
8. Then, also treat queries as data. This means if one wishes to have a query persist, it must be persisted in the same data space as the data itself. Which leads to the 2nd principle for enterprise intelligence: Treat queries like data to avoid having to ask every question every day.
9. While constructing context (real time receipt of perceptions from across the different operational systems) this happens to be the most ideal time for this librarian function to exhibit enterprise awareness. Which leads to the 3rd principle for enterprise intelligence: Enterprise intelligence is computationally most efficient when performed at the moment the observation is perceived.
10. This is the world I sometimes refer to as "Perpetual Analytics." A world where the "data finds the data … and the relevance finds the user."
11. And this stuff really works … and at scale. In fact, in a benchmark center this was found to scale to over 3 billion historical observations while handling the real-time ingestion of more than 2,000 perceptions a second.
12. This has privacy consequences. For example: (a) What perceptions can or should be placed into context (in one brain)?; (b) What if perceptions are contextualized for one mission, then re-purposed later for another?; (c) What if someone steals the brain?; and (d) What if the librarian is corrupt?
13. I worry about these things. And I spend about 40% of my time thinking about the privacy and civil liberties consequences of such systems. Which prompted one of my more recent inventions: a new class of technology I call "Analytics in the Anonymized Data Space." Basically, instead of transferring perceptions from the various senses (an organization’s operational systems) that are human readable … the perceptions are anonymized first before being handed to the librarian for contextualization in the brain. The Reader’s Digest explanation of anonymization is basically this: if you take a pig and a grinder and make a sausage, even if I give you the sausage and the grinder you are not going to be able to make a pig. The cool thing about this new technology is that the librarian can still construct and persist context and discover relevance without actually handling human meaningful data.
14. So I summarized with the main think towards enterprise intelligence -- (a) Without persistent context … you have no brain; (b) Treat data and queries with equal rights to improve awareness; (c) More intelligence is possible when thinking based on streaming perceptions; and (d) And from a privacy perspective: More or less perceptions, that is the question (there is an important policy discussion that needs to take place about just how many – more versus less – perceptions should be permitted to be put in the brain).
15. While this approach to enterprise intelligence was born in Las Vegas ... today it plays a role in national security, financial services, health care, etc. And much of the focus of my current activity is towards using this technology to deliver new threat and fraud intelligence solutions in these and other areas.
To my shock at this point I had completed 36 charts and still had 1.5 minutes left. As I thought this was in fact a possibility, I quickly moved into what I called the bonus section!
Bonus Picture 1. I showed a picture of a chimpanzee with the words "99.4 percent human." The point being: If a .6% difference matters this much … no wonder traditional information systems lack so much intelligence! Net net, in intelligence systems very tiny little increments of accuracy make the entire difference between being dumb and smart.
Bonus Picture 2. And it may go without saying, that in such systems as this … the more observations one has the better the context. In fact, many times new observations will contain the evidence to improve or fix earlier contextualizations.
Bonus Picture 3. And this brings us to the crucial concept of "Sequence Neutrality." Meaning despite the order of the observations (records A, B, C received in that order versus arriving in the order C, B, A) the end state is the same. If you cannot process information with sequence neutrality then you get "data drift" – meaning you hold contradictory content which must be reconciled eventually or accuracy erodes. This is a common reason data warehouses must be reloaded. Almost no systems possess this sequence neutrality property. Notably, it is virtually essential at scale because it eventually becomes impossible to tear very large databases down to reload them every week, month, or quarter.
Closing thought. After working on designing sequence neutrality into my technologies, I have discovered there are some cases where a new record (perception) will necessitate so much recontextualization, it cannot be done in real time. Drats! That means the system must either be periodically reloaded or alternatively go offline into a maintenance mode (i.e., deep sleep) to remedy the situation. But alas, that is why humans sleep too – deep recontexualization that could not be handled on the fly. Our dreams are the byproduct of this necessary re-shuffling. Or so I have concluded!
This post is now the shortest read about my enterprise intelligence information theory.
I plan on blogging about "why perception isolation is the leading cause of corporate amnesia" very soon.
Jeff, once again you have got me thinking. Of course you and I - well, we have a lot to think about. Or should I say contextualize on... Anyhow, here are some basic thoughts I have about contextualization that relate to data attribution. But first, I must say: I completely agree that data should be cleared from privacy and ethics concerns before being centralized by the librarian. However, going back to some of our previous conversations on form, function, the human mind... I'd like to discuss context and data attribution.
As you know I've worked on a data architecture which is centered around the notions of "key" information - that is to say that there is a single idea, thought, word, or action in which is focused (like an index). This key is meaningless by itself - or better yet, for different librarians holds different meanings. For instance, the mention of a simple date: July 12th, 1954 - may hold relevance to many different individuals, however by itself only signifies a point in time.
However esoterical it may seem, a point in time in and of itself is meaningless to a theoretical observer not within the confines of these boundaries. But I ramble, the first recognition of "context" for specific information has got to be the key, the whole key, and nothing but the key... Getting the librarian to recognize "keys" versus "descriptive information" I think would lead to some interesting findings in the foundational construction of a library system.
Now, if I may be so bold as to build on this concept... The KEY (once recognized) can become a building block for context. That is if we step back and over-simplify the notions of context to mean description surrounding the key. For instance July 12th, 1954 - someone says: i got married, someone else says: I drove my first car, someone else says: the day was cloudy and gray.
All of these provide the foundations of context for a date (one example of a key). From this we span out into the establishment of a) association between keys and b) information related to just the key, describing the key, and c) information describing the relationship of more than one key.
Beyond keys and relationships comes the real work - understanding or assigning relevance or meaning. In the real-time world, this notion is like a fight or flight syndrome - a very simplistic assignment of "value" and "danger" of information. Many times as (you have said above) some of this information (if not all of it) is processed during sleeping hours, and re-shuffled. I believe it is at these times when we really decide how this infomrmation is really related to our experiences in the past, and we assign it to categories, and establish context on the basis of pre-existing "knowledge".
To finish with the contextual piece: Keys surrounded by "descriptive information" based on arrival timings are what change our perspectives. Mining this information for different trends occurs during the night, these trends are utilized by the librarian during the day or real-time to determine fight or flight, and initial relevance of information.
Now regarding sequences of neutrality, the only assumptions I can make are: a) the librarian must be smart enough to separate the keys from the descriptions b) all data or information flowing into this system must be stamped with arrival time - not to say this is in the order they arrived, but to have some form of reference for later assimilation c) the librarian is responsible for categorizing, there must be a master librarian sequence for establishing meaning and definition within the categories as well as across categories - making sense of a specific perspective viewpoint, and assigning relevance based on what the end-user has asked for (ie: tracking queries as data).
Once the tracking mechanism (queries as data) has been put in place, recategorization (mining) can become self-sustaining over time, learning more and more about what and how these pieces of information should be connected, and how they are utilized.
The real trick is then figuring out what might be "not yet seen" that would solve one of the questions posed, or not yet asked.
Does any of this make any sense? My two cents anyhow...
Cheers, and lets' chat.
Dan Linstedt
Posted by: Dan Linstedt | November 17, 2006 at 06:33 PM
Perhaps you and/or IBM might be interested in some of the proprietary data analysis methodologies that we have developed at Strategic IT Security LLC ( www.StrategicITSecurity.com ). The starting point was Discovery Informatics, and has progressed/matured far beyond DI. In a nutshell, we address a number interdisciplinary approaches, including areas such as: 1a. Where A+B may or may not not equal B+A. 2a. Where A+B+C may or may not equal B+A+C. 3a. and other factors that, as we understand it, have not been fully considered or automated. Thank You, SG
Posted by: SG | April 21, 2010 at 03:07 PM