I spent more than ten hours on this post; more than any other single post. And unfortunately, despite this effort, I feel this post deserves substantially more work.
Operating on a datum without first placing it into context is a risky proposition. Whether interested in mitigating risk or maximizing opportunity, no surprise, Context is King. And thus, from my point of view, Determining context is the most significant technical hurdle necessary to deliver the next generation of business intelligence.
So, if you must have context the next question is: "How do you get some?"
The construction of context primarily depends upon: A) the features available in an observation, B) the ability to extract the essential features from the observation, and C) the ability to use the extracted features to determine how the new observation relates to one’s historical observations.
Features, features, features. Without ample features, establishing context is hopeless. Take for example these two observations:
Observation #1: There were fewer fish.
Observation #2: March ‘07 was warmer than usual.
BTW, there would be even less context if the second observation had been recorded in untranslatable Mayan symbols! But, had observation #1 stated "There were fewer fish in March, 2007," we would have some temporal proximity, and, if both observations also included the phrase "in the San Francisco Bay," we would also have geospatial proximity. As more features overlap across more observations, more context emerges.
Want context? Step 1: Get features.
Features matter. But not all feature matters. For example, in observation #1 above 6/7ths of all vowels are the letter "e." The essential features needed to construct context are generally: A) those features that will enable Semantic Reconciliation (i.e., recognizing like objects e.g., same document, same person, same thing, etc.), and B) those features that enable an understanding of relationships between objects (e.g., like documents, former roommates, occurring in the same place and at the same time, etc.).
Some observations include features that make semantic reconciliation a breeze (e.g., an RFID in a passport) but more often than not there is ambiguity. The same goes with recognizing relationships between observations – some observations present an explicit relationship (e.g., traveling partners) but more often than not relationships must be inferred (e.g., two people always entering the same building together).
Because context construction is dependent on features, "key feature extraction" is where "the rubber meets the road."
Big breakthroughs in context accumulating systems are going to first require big breakthroughs in feature extraction.
DEEPER TECHNICAL THINK:
1. Temporal and geospatial (when and where) are possibly the two most helpful features needed to establish context. And while useful establishing historical context, temporal and geospatial features provide essential context when determining what, if any, action is warranted now. [See: Responsible Innovation: Designing for Human Rights and Source Attribution, Don’t Leave Home Without It].
2. Context engines, at least in the Perpetual Analytics class I have been pounding my head on, cannot scale if every observation is simply treated with probability. I have concluded that high-speed, real-time contextualization (at least on today’s technology) requires that when assimilating an observation – some assertions must be made. In short, if confidence is very high … assert it as true! Unfortunately, future observations may invalid earlier assertions. Thus, context engines must constantly be on the lookout for new observations that change earlier assertions – and if a new observation provides such evidence – the invalidated assertions from the past must be remedied. This is Sequence Neutrality, and it is absolutely critical to context engines. Notably, this is very hard to do on real-time data feeds at scale.
3. When I have been referring to Persistent Context in my blog, I mean the physical information space (database) where all historical observations are assembled in context. And one cannot bulk load such a database with a "rack and stack" mentality and expect to get persistent context. To get persistent context, one must learn the past. That means taking the historical observations and streaming them into the engine. The engine then assembling how each observation relates to the others. Pop quiz: Do you think … the order one loads historical data matters? Answer: If it did matter, you are hosed. This is exactly the reason one must have this property in such systems. Therefore, the reason these systems have to be so screaming fast is not to keep up with the present … rather, to learn the past. Hence, my excitement about our recent performance breakthroughs.
4. Other extractable features, like Source Attribution, while often not essential to constructing momentary context (rendering a decision now) are nonetheless absolutely required to achieve perpetual context (e.g., think about an ability to correct or forget a misreported fact). [Related posts: Data Tethering]
5. I have come to the conclusion that the process of extracting features from observations is greatly improved when past experience (historical learnings, persistent context, or whatever you want to call it) are taken into account. In fact, it is my speculation that feature extractors will request substantially more bytes from the persistent context data store than the number of bytes the feature extractor will report down to the context engine. Further speculating that when contextualizing such highly refined feature sets, context accuracy and throughput will both improve. Leveraging accumulated context during feature extraction will significantly improve such things as entity extraction from unstructured documents and object recognition from videos.
7. Self-learning systems will promote new features of interest to feature extractors (akin to keeping an eye out for something particular). The inverse is true as well. Self-learning systems will demote (eliminate) interest in specific features. I think of this as intentional sensory deprivation. We all do this too – for example, right at this moment you (the reader) are blocking out that "background hum" … just stop for a second and listen. Right?
8. Now call me crazy. Have you ever had someone speaking to you – while you are sitting there thinking – "I know they are speaking English" – but you just could not decode it. Then, like a miracle, you replay it – the entire statement, word for word in your head. And presto, it is all clear now. (Would someone please admit this happens to you. too?) Using this caching mechanism for a replay – we throw some additional attention (more CPU) at this observation, error correction improves, and we get a useful decoding of key features. Cool! Notably, context engines will benefit from this too.
9. Expect convergence in the likely places. Unstructured with structured. Biographic and demographic with biometric. Audio and video with text. Efforts to make greater sense of available observations will entail cross-sensor fusion. Just the way we work.
PREDICTION:
When next generation feature extraction engines and next generation context accumulating engines converge, these systems are going to be the underpinnings of very, very smart systems. Add real-time and relevance detection … and you have more than situational awareness … you begin to approach the cognitive domain.
PRIVACY RAMIFICATIONS:
All this adds up to a double-edged sword from a privacy perspective.
The good news is: More context means fewer false positives and fewer false negatives – this is especially good news as this relates to government watch lists. [More about this here.]
The bad news is: If more data makes for more context, and everyone wants more context to ensure they are making the best possible decisions … everyone is going to want more data! [More about this here.]
RELATED POSTS:
Enterprise Intelligence – My Presentation at the third Annual Web 2.0 Summit
Enterprise Intelligence: Conference Proceedings from TTI/Vanguard (December 2006)
Intelligent Organizations – Assembling Context and The Proof is in the Chimp!
Sensing Importance: Now or Never
Accumulating Context: Now or Never
Federated Discovery vs. Persistent Context – Enterprise Intelligence Requires the Later
More Data is Better, Proceed With Caution
It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You
Streaming Analytics vs. Perpetual Analytics (Advantages of Windowless Thinking)
It’s All About the Librarian! New Paradigms in Enterprise Discovery and Awareness
Scalability and Sustainability in Large Information Sharing Systems
Re: using context to improve decoding accuracy
Yeah, it happens to me all the time, with all kinds of data. Like you, I notice it most with spoken language. Using context for decoding is part of why hidden markov models are good at speech recognition.
Posted by: Brian | July 03, 2007 at 10:44 AM
Good point about data demands. Perhaps persisting only "significant" features could reduce storage requirements? Source attribution for the features could be pushed off to cheaper, higher latency storage media.
What do you think about the points made by http://www.identityresolutiondaily.com/70/making-a-case-for-living-context-in-identity-resolution/ ? It seems like an efficient / scalable framework for incorporating "live" updates is a real challenge...
Posted by: beecaver | July 10, 2007 at 06:18 AM