Following two and a half years of incubation, G2 was revealed four years ago yesterday. You ask, “What is G2?” Well, it is specialized Context Computing software, of course. “What is Context Computing?” you now ask. Well, check out this six minute video I made to explain Context Computing!
G2 is seeing some exciting successes. Successes that my team and I are quite proud of. For example:
- In conjunction with the Singaporean government and in partnership with Singapore Technologies, we used G2 to help better focus human attention. In this case, G2 is helping the government decide which vessel traveling through the Malacca and Singapore Straits is of the greatest interest, right this second. With half the world’s oil supply and one-third of the world’s commodities passing through these water ways, this is a non-trivial, mission critical task (see video). One reason we picked this project to “sea trial” G2 is because of the substantial amount of geospatial data – data about where vessels are, and when and how they move. Combining geospatial data with other structured and unstructured sources dramatically improves the ability to discover truly interesting insights.
- In conjunction with The Pew Charitable Trusts, we used G2 to help modernize voter registration in America, enabling several states to get hundreds of thousands of new voters registered, as well as ensuring that their voter lists are more accurate and up-to-date. Using “Selective Anonymization,” one of our baked-in Privacy by Design (PbD) features, the system keeps private data private as member states are not sharing any human readable Personally Identifiable Information (PII) when they send their data to the data center. The ability to perform Context Computing over anonymized data to discover insights was essential to this system’s adoption and subsequent success (see video). When organizations can share anonymized data and get a materially similar result, why would they ever share data any other way? It is for this reason I am very optimistic that Selective Anonymization is going to become rather popular.
- Over 1,000 customers have downloaded G2 via our “Entity Analytics” and/or “Entity Analytics Unleashed” feature that ships inside of SPSS Modeler Premium, a data mining tool (see video and paper). This super easy-to-use feature helps organizations quickly determine who is who and who is related to whom. The enhanced context that comes with reconciling and relating entities helps organizations discover higher quality models. Higher quality models means better business outcomes. One example: At a recent conference someone walked up to me thanked me for the technology – he said G2 discovered over a hundred thousand falsely enrolled students, unraveling a huge scam in his country’s education system.
- Most recently, we have been heads down developing and deploying a new end-to-end solution that will soon hit the market called “Sensemaking for Anti-Money Laundering (AML).” This G2-based system helps banks better triage the massive number of AML leads that their existing transaction monitoring engines generate – primarily false leads that misdirect analyst attention. G2 insights help inform the analyst of critical data points. More informed analysts are then able to be more productive (higher quality work) and more efficient (significantly less time) on each case they investigate. While the solution is very lightweight and easy to attach to existing infrastructure and investments; it is nonetheless a heavily logged and fully reconciling system, as one would expect from such a highly regulated activity. We have seen fantastic results at customer one. Other financial institutions are now lining up to achieve similar gains.
A few technical details for my techie friends.
While we can run on a SQL engine (Oracle, DB2) we are optimized for Key Value data stores. We have a new Key Value data store we are testing with that runs on Remote Direct Memory Access (RDMA). This is scaling linearly and due to its ultra-low latency, it is the first data store we have ever seen where the bottleneck moves from I/O to CPU! Damn exciting.
Thanks to the underlying high speed streaming engine called InfoSphere Streams, we are able to compute Space Time Boxes (STB) and Hang Outs (places things dwell) at a rate of 200k per second per core. I’ve long been fascinated with geospatial data. Recently, we used our STBs to forecast asteroid vs. asteroid interactions over the next 25 years – an astronomy paper is forthcoming. More details here in this speech: Asteroid Hunting & Other Stealthy Things. Geospatial data related to people is going to be extraordinarily useful; but also will come with major privacy issues. We are envisioning that anonymized STBs (using our Selective Anonymization feature) will greatly reduce the risk of unintended disclosure and/or misuse of geospatial data too.
Near term focus is now on reducing CPU in the G2 core and getting data in, out, and onto Hadoop File Systems (HDFS) for those organizations landing more and more data and analytics on Big Data – on premise and/or in the Cloud.
Many more exciting things to come including features I refer to as Selective Curiosity and the Hint Service.
2015 will be a big year for G2.
FYI: My new title over here at IBM is: Chief Scientist, Context Computing. Fitting, as this does happen to be my obsession.