My latest big, bold, new invention, is two years old today!
What is G2?
When I speak about Context Accumulation, Data Finds Data and Relevance Finds You, and Sensemaking I am describing various aspects of G2.
In simple terms G2 software is designed to integrate diverse observations (data) as it arrives, in real-time. G2 does this incrementally, piece by piece, much in the same way you would put a puzzle together at home. And just like at home, the more puzzle pieces integrated into the puzzle, the more complete the picture. The more complete the picture, the better the ability to make sense of what has happened in the past, what is happening now, and what may come next. Users of G2 technology will be more efficient, deliver high quality outcomes, and ultimately will be more competitive.
Early adopters seem to be especially interested in one specific use case: Using G2 to help organizations better direct the attention of its finite workforce. With the workforce now focusing on the most important things first, G2 is then used to improve the quality of analysis while at the same time reducing the amount of time such analysis takes. The bigger the organization, the bigger the observation space, the more essential sensemaking is.
About Sensemaking
One of the things G2 can already do pretty darn well – considering she just turned two years old – is ”Sensemaking.” Imagine a system capable of paying very close attention to every observation that comes its way. Each observation incrementally improving upon the picture and using this emerging picture in real-time to make higher quality business decisions; for example, the selection of the perfect ad for a web page (in sub-200 milliseconds as the user navigates to the page) or raising an alarm to a human for inspection (an alarm sufficiently important to be placed top of the queue). G2, when used this way, enables Enterprise Intelligence.
Of course there is no magic. Sensemaking engines are limited by their available observation space. If a sentient being would be unable to make sense of the situation based on the available observation space, neither would G2. I am not talking about Fantasy Analytics here.
From Insight to Relevance
Not often does a single observation fully contain sufficient information to trigger a high quality, immediate reaction.
Imagine looking out your kitchen windows only to witness your neighbors in an epic fight. A few days later you witness the husband purchasing a handgun. A few days later, while trying to fall asleep, you hear a somewhat muffled ”bang” sound from next door. The next morning, as you leave the house for work, you see the husband dragging a sleeping bag loaded up with something very heavy toward his pickup truck.
Insights add up.
From a G2 | Sensemaking perspective, the notion is ”Insight to Relevance.” Of course, determining what is a bona fide insight and combinations of insights become relevant (for action) and requires domain knowledge, modeling, and other real work. However, unlike brittle rule-based systems, G2 (like any good Sensemaking engine) performs best when taught principles.
General Purpose Context Accumulation
After recently sizing up G2 in action, I pondered what set of features make General Purpose Context Accumulation so unique.
Complete Context: General purpose context accumulation systems must be indifferent to diverse observations e.g., originating from such sources as structured, semi-structured, unstructured text, social media, pictures, and video each containing events and transactions with temporal, geospatial, identifiers, biographics, biometrics and other features. To the extent features can be extracted from an observation and delivered to such engines, they contextualize these diverse observations at the same time, in the same data space – every individual observation having an equal opportunity to find and benefit from all other observations.
Current Context: New observations are incrementally integrated with the whole (puzzle) in real time. As such, G2 is fully capable of recognizing opportunity and risk at the split-second such relevance becomes knowable.
Conflicting Context: There is no such thing as a single version of truth when it comes to general purpose context accumulation. Context accumulating engines let dissent fester. When you search Google and it says “Did you mean ____?”, the same principle is at work. Google is not looking in a static dictionary. No, Google is remembering everyone’s errors. If Google did not remember the errors, it would not be so smart. In the same way, context accumulating systems let disagreement coexist; otherwise new emerging trends and weak signal will never have a chance to add up to anything.
Self-Correcting Context: This is the most essential ingredient of context accumulation. I believe few, if any, analytics in the world can do this – and especially at our scale. Imagine having already seen and contextualized billions of historical observations, and now the next record arrives. At this moment one must not only decide where to place this new observation, one must also decide if this new observation (had it been known in the beginning of time i.e., first), warrants the reversal of any of the billions of previous assertions. And if so fix them. The effect being: new observation can reverse earlier assertions. Smart Systems Flip-Flop. Doing this in real-time over billions of observations is non-trivial. This happens to be the single most technically sophisticated aspect of context accumulation and well worth the last 10 years of effort we have spent researching, tweaking, and tuning our technique. The difference between our previous method and the new G2 method should be more than an order of magnitude more efficient when dealing with some of the nastiest scenarios that one stumbles upon when implementing algorithms that can change their minds about the past. In any case, it is this behavior alone that delivers what I call “Big Data. New Physics.” Essentially, as the observation space widens natural variability in the data (errors) start to become your friend, the false positives and false negatives both begin to self-correct, and at the same time the computation effort required to make sense of the next observation begins to DECREASE as the observation space grows! Translation: As the database grows, predictions become more accurate while computational effort decreases. Absolute magic.
To tell you the truth, one of the most exciting things for me is this: when considering the wide range of unrelated domains I have explored with G2 (from maritime domain awareness and anti-money laundering to genealogy work using the 1880 census) – believe it or not, I think G2 could perform context accumulation over all of them at the same time, on the same computer, in the same database schema, using the same configuration. Not that anyone would want to, mind you, or should. Nonetheless, this idea that G2 is so general purpose quickens my pulse simply because this has never been possible before. You see, I am hopeful that this means G2 will (one day) be able to readily make almost anything smarter – from managing my personal calendar (while better protecting my privacy) to helping researchers head-off some of the most debilitating diseases like Alzheimer’s.
Exciting Times Ahead
There is a lite-version of G2 on the market now, deeply embedded in an off-the-shelf predictive modeling product. I continue to hand select special projects for the big Sensemaking version of G2 to press and stress this technology in new ways – using these “sea trials” to drive her development. I can’t share details about these efforts just yet but boy she is growing up quickly. And just to be clear: I am not saying this is easy, far from it, especially in these formative years. Make no mistake; this is very hard work requiring the team and I to work crazy long hours
I am a dreamer with a long list specific engineering tasks to further advance G2 – things like ‘selective curiosity’ where G2 thinks of its own questions and knows whom to ask (e.g., a Jeopardy! Champion) and a ”hint service” that among other things may help feature extraction algorithms perform much better (without traditional training data).
G2 is different than other software, very different. Another example, saving the best for last, we have built a series of features into G2 for enhanced privacy and civil liberties protections. These socially responsible features factored into the original blueprints – from conception – a design approach called Privacy by Design (PbD).
Finally: While I may be driving the G2 vision, I am certainly not building it. I have an amazing group of engineers who actually do all the real work. We also have some very patient customers willing to tolerate all the ups and downs that come with being early adopters. And without both of them none of this would be possible. So to them I say … thank you.
[TECH NOTES]
A few technical comments for my more technically minded friends. Entirely coded in C++. Schemas designed specifically for primary key row stores (although capable of running against a SQL back-end). One index per table, never two. Application-aware sharding (supporting heterogeneous data stores, table partitioning, and uniform distribution of records over the available grid nodes). Most tested on DB2. Eager to finish testing against other popular SQL back-ends. Hoping to be able to demonstrate near linear scale against a NoSQL row store very soon. G2 does not run on Hadoop Map/Reduce (because it is not batch), although Hadoop-based systems will certainly benefit from G2’s accumulated context.
Talk to G2 via HTTP, flat file, SPSS Modeler stream jobs, or compile it into InfoSphere Streams for stream computing over big data. No user interface work being done at this time. Using our standard and very simple XML-based Universal Message Format (UMF), G2 eats inbound observations and spits out interesting chunks of the puzzle, aka resumes, when it has something useful to say.
G2 does not do entity/feature extraction on unstructured data. G2 does not do pattern discovery or anomaly detection. Hopefully G2 will one day help such things perform better.
Just to be clear, G2 is far from perfect – as upon close inspection she clearly has some moles and warts and even still makes poopy pants from time-to-time. And I am already fretting the teen years. Nonetheless, despite these distractions, I can say with certainty this is a journey I am deeply committed to, it is the right direction, hence my ongoing and relentless focus and motivation.
RELATED VIDEOS (In my own words, Courtesy of Redbooks)
Enterprise Amnesia versus Enterprise Intelligence
Using Entity Analytics to Greatly Increase the Accuracy of Your Models Quickly and Easily
RELATED POSTS:
On A Smarter Planet … Some Organizations Will Be Smarter-er Than Others
What Came First, the Query or the Data?
Puzzling: How Observations Are Accumulated Into Context
Accumulating Context: Now or Never
Asserting Context: A Prerequisite for Smart, Sensemaking Systems
Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems
Sensemaking on Streams – My G2 Skunk Works Project: Privacy by Design (PbD)
G2 | Sensemaking – One Year Birthday Today. Cognitive Basics Emerging.
General Purpose Sensemaking Systems and Information Colocation
Smart Systems Flip-Flop
Big Data. New Physics.
There Is No Such Thing As A Single Version of Truth
Master Data Management (MDM) vs. Sensemaking
It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You
Federated Discovery vs. Persistent Context – Enterprise Intelligence Requires the Later
Enterprise Intelligence – My Presentation at the Third Annual Web 2.0 Summit
Self-Correcting False Positives/Negatives: Exonerate the Innocent
Privacy by Design in the Era of Big Data
Responsible Innovation: Designing for Human Rights