If the goal is to make substantially more sense of data – the only way forward is CONTEXT ACCUMULATION.
This is so important … I have been feverishly looking for a better way to explain in plain English how real-time, streaming, contextualization works. The very best analogy I have come up with to date is that of assembling jigsaw puzzles. The parallels are uncanny.
A pile of puzzle pieces are in a box. You cannot look at the picture on the cover of the box, as that would be cheating and not like the real world.
You grab a piece out of the box and look at your table space. It is the first piece. So it really does not matter where you place it. The second piece is now in hand. Does this relate to the first piece? Probably not. So it is placed stand alone elsewhere on the table. This was an assertion. You decided this second piece is not related to the first piece.
Soon there are many free standing pieces scattered across the table space - none at this point have been associated (snapped together). Now you have the next piece in hand, eager to find its mate. Do you attempt to physically match it with every possible piece in what would be a very brute force and time-consuming manner? No. You notice the piece in hand has at least one discriminating feature, some red and white on one of the puzzle edges. Glancing over the table space you look for pieces with a similar distinguishing feature. You find three such candidates. Your attention is now narrowly focused on just these three pieces.
Comparing the piece in hand to the each candidate, you are assessing confidence. And at the end of this process you have come to a decision point: a new assertion. It is either (A) a match, (B) not a match but possibly a member of a family of pieces that hopefully will converge as new puzzle pieces arrive, or (C) at this time, has no apparent relation whatsoever to any other pieces.
You only connect two pieces when you are sure. You would never get out a hammer and force the piece. In this regard you are favoring the false negative – only connecting the pieces if you have a high degree of certainty.
Let’s say this latest piece finds a match. You are now sitting there with a new puzzle unit (two pieces). Looking this new unit over you ask yourself, “Now that I know this, are there some other puzzle pieces that can now be associated to this unit?” This involves the same process: candidates, confidence, and assertion. There comes a point where an assertion or set of assertions are made and now it is time to move on to the next piece.
Some pieces produce remarkable epiphanies. You grab the next piece, which appears to be just some chunk of grass - obviously no big deal. But wait … you discover this innocuous piece connects the windmill scene to the alligator scene! This innocent little new piece turned out to be the glue.
You can change the approach, operate on a hunch and become curious. Sometimes you decide there is an opportunity to resolve some uncertainty. For example, you have a cluster of pieces all appearing to be related – each having some red and white expressed. With this new interest in mind you don’t grab the next random piece but rather shuffle through the pieces in the box looking specifically for red and white pieces. The goal: with the right few pieces this whole portion of the puzzle may resolve.
Luckily you have only been snapping pieces together when you are sure. Otherwise, your puzzle would be not only a mess but more importantly it would not be evolving towards any degree of clarity. But no matter how careful you have been … every now and then you grab the next piece out of the box and the home you find for it causes you to realize that two earlier piece should not have been connected at all. It catches you off guard for a second, but upon closer inspection you realize you stuck these pieces together inappropriately. Shame, shame. But luckily, this past error (a false positive) was discovered and corrected – thanks to the new observations.
The puzzle you are working on today happens to be a pretty big puzzle with what appears to be thousands of pieces. And without the cover of the box you are unsure of its final size (e.g., 1’x 1’ or 3’x 3’, or bigger). One thing is for sure: you have to leave some space between pieces that do not connect; therefore, you need a table bigger than the final puzzle size. Notably, after a great deal of work, there is a point when the in-progress puzzle reaches a maximum of required workspace. After this tipping point, new pieces have a higher likelihood of finding mates and consolidation than not.
As the working space of the puzzle begins to collapse, not only does context become richer, but the computational effort of figuring out where the next piece belongs becomes more efficient despite the fact there are more pieces on the table than ever. Assertions become faster and more certain. So much so … those last few pieces are as fast and easy as the first few pieces! [I have seen this behavior in one of my systems … an absolutely phenomenal event with extraordinary ramifications!]
But is it really this easy? No.
There may be more than one puzzle in the box, some puzzles having nothing to do with others. There may be duplicate pieces, pieces that disagree with each other, and missing pieces. Some pieces may have been shredded and are now unusable. Other pieces are mislabeled and/or are exceptionally well crafted lies.
Nonetheless, you will never know what you know ... unless you contextualize your observations!
On a Slightly More Technical Level:
1.When I speak of Persistent Context, this is synonymous with the ”work-in progress puzzle” where the net sum of all previous observation and assertions co-exist.
3.Contextualizing observations is entirely dependent on one’s ability to extract and classify features from the observations. Feature extractors fall woefully short today. I’ve hinted at what I think will fix this in previous posts and more about this later.
4.Using new observations to correct earlier assertions is an essential property I have been referring to as Sequence Neutrality. When systems favor the false negative, sequence neutrality most frequently discovers false negatives while the discovery of previous false positives are far and few between.
5.Non training-based, context accumulating systems with sequence neutrality have this behavior: the puzzle can be assembled first pass, without brute force, where the computation cost of the last pieces are as easy as the first pieces, while having no knowledge of what the picture looks like before hand, and regardless of the order in which the pieces are received.
This is what I am working on these days. It is very exciting and I enjoy talking about such things. Feel free to comment or email me questions. I answer every email I get.
There is a new book out called The Numerati by Stephen Baker. In a nutshell, this book describes machines that are making sense of more and more data about people. As a result, these machines are making more and more decisions for people. The “Numerati” are the people who build these machines and their algorithms.
This book brings to light our journey: machines measuring and directing people and how, as this continues to evolve, it will lead to some pretty interesting consequences … good and bad.
Anyway, Baker names names in this book such as Rayid Ghani (Accenture), Josh Gotbaum (Spotlight Analysis), James Schatz (National Security Agency), Eric Dishman (Intel) and a handful of others as Numerati.
I have a bit of a cameo in the book too (pages 131-37, 140-41, 150-53 and a few other mentions). Also, in this NPR interview (staring at 13:30) Baker describes me as “the dissenting Numerati” which, I still think is a good thing.
And after careful inspection of the references to my work, I must say the author he nailed it. I love that – as there is nothing worse than something untrue making its way into print.
If someone asked you in 2007 to estimate the likelihood that Morgan Stanley, WaMu, Countrywide, Merrill Lynch, Wachovia and other titans of financial services would fall to their knees within a year … what odds would you have made? Might the experts have said “impossible?”
So here is a thought for the day: What if our Earth math and assumptions used to determine “frequency of rare events” in financial markets happens to be the same kind Earth math and assumptions used to determine the risk of playing with black holes?
Wouldn't it be funny if us earthlings never have time to realize all the existing black holes in the universe are really just the residual evidence of other civilizations which had advanced enough to build their own colliders.
On a more serious note … they are looking for yet undiscovered particles that come from the near-light speed collision. What intrigues me is this: what kind of sensor strategy does one use when one doesn’t know exactly what one is looking for? That is cool!