If the goal is to make substantially more sense of data – the only way forward is CONTEXT ACCUMULATION.
This is so important … I have been feverishly looking for a better way to explain in plain English how real-time, streaming, contextualization works. The very best analogy I have come up with to date is that of assembling jigsaw puzzles. The parallels are uncanny.
A pile of puzzle pieces are in a box. You cannot look at the picture on the cover of the box, as that would be cheating and not like the real world.
You grab a piece out of the box and look at your table space. It is the first piece. So it really does not matter where you place it. The second piece is now in hand. Does this relate to the first piece? Probably not. So it is placed stand alone elsewhere on the table. This was an assertion. You decided this second piece is not related to the first piece.
Soon there are many free standing pieces scattered across the table space - none at this point have been associated (snapped together). Now you have the next piece in hand, eager to find its mate. Do you attempt to physically match it with every possible piece in what would be a very brute force and time-consuming manner? No. You notice the piece in hand has at least one discriminating feature, some red and white on one of the puzzle edges. Glancing over the table space you look for pieces with a similar distinguishing feature. You find three such candidates. Your attention is now narrowly focused on just these three pieces.
Comparing the piece in hand to the each candidate, you are assessing confidence. And at the end of this process you have come to a decision point: a new assertion. It is either (A) a match, (B) not a match but possibly a member of a family of pieces that hopefully will converge as new puzzle pieces arrive, or (C) at this time, has no apparent relation whatsoever to any other pieces.
You only connect two pieces when you are sure. You would never get out a hammer and force the piece. In this regard you are favoring the false negative – only connecting the pieces if you have a high degree of certainty.
Let’s say this latest piece finds a match. You are now sitting there with a new puzzle unit (two pieces). Looking this new unit over you ask yourself, “Now that I know this, are there some other puzzle pieces that can now be associated to this unit?” This involves the same process: candidates, confidence, and assertion. There comes a point where an assertion or set of assertions are made and now it is time to move on to the next piece.
Some pieces produce remarkable epiphanies. You grab the next piece, which appears to be just some chunk of grass - obviously no big deal. But wait … you discover this innocuous piece connects the windmill scene to the alligator scene! This innocent little new piece turned out to be the glue.
You can change the approach, operate on a hunch and become curious. Sometimes you decide there is an opportunity to resolve some uncertainty. For example, you have a cluster of pieces all appearing to be related – each having some red and white expressed. With this new interest in mind you don’t grab the next random piece but rather shuffle through the pieces in the box looking specifically for red and white pieces. The goal: with the right few pieces this whole portion of the puzzle may resolve.
Luckily you have only been snapping pieces together when you are sure. Otherwise, your puzzle would be not only a mess but more importantly it would not be evolving towards any degree of clarity. But no matter how careful you have been … every now and then you grab the next piece out of the box and the home you find for it causes you to realize that two earlier piece should not have been connected at all. It catches you off guard for a second, but upon closer inspection you realize you stuck these pieces together inappropriately. Shame, shame. But luckily, this past error (a false positive) was discovered and corrected – thanks to the new observations.
The puzzle you are working on today happens to be a pretty big puzzle with what appears to be thousands of pieces. And without the cover of the box you are unsure of its final size (e.g., 1’x 1’ or 3’x 3’, or bigger). One thing is for sure: you have to leave some space between pieces that do not connect; therefore, you need a table bigger than the final puzzle size. Notably, after a great deal of work, there is a point when the in-progress puzzle reaches a maximum of required workspace. After this tipping point, new pieces have a higher likelihood of finding mates and consolidation than not.
As the working space of the puzzle begins to collapse, not only does context become richer, but the computational effort of figuring out where the next piece belongs becomes more efficient despite the fact there are more pieces on the table than ever. Assertions become faster and more certain. So much so … those last few pieces are as fast and easy as the first few pieces! [I have seen this behavior in one of my systems … an absolutely phenomenal event with extraordinary ramifications!]
But is it really this easy? No.
There may be more than one puzzle in the box, some puzzles having nothing to do with others. There may be duplicate pieces, pieces that disagree with each other, and missing pieces. Some pieces may have been shredded and are now unusable. Other pieces are mislabeled and/or are exceptionally well crafted lies.
Nonetheless, you will never know what you know ... unless you contextualize your observations!
On a Slightly More Technical Level:
1. When I speak of Persistent Context, this is synonymous with the ”work-in progress puzzle” where the net sum of all previous observation and assertions co-exist.
2. Semantic reconciliation (e.g., identity resolution) is but one of several steps of contextualization processing, albeit one of the most important ones that asserts ”same” or ”not same.”
3. Contextualizing observations is entirely dependent on one’s ability to extract and classify features from the observations. Feature extractors fall woefully short today. I’ve hinted at what I think will fix this in previous posts and more about this later.
4. Using new observations to correct earlier assertions is an essential property I have been referring to as Sequence Neutrality. When systems favor the false negative, sequence neutrality most frequently discovers false negatives while the discovery of previous false positives are far and few between.
5. Non training-based, context accumulating systems with sequence neutrality have this behavior: the puzzle can be assembled first pass, without brute force, where the computation cost of the last pieces are as easy as the first pieces, while having no knowledge of what the picture looks like before hand, and regardless of the order in which the pieces are received.
This is what I am working on these days. It is very exciting and I enjoy talking about such things. Feel free to comment or email me questions. I answer every email I get.
OTHER RELATED POSTS: