I spoke at Defrag 2010 earlier today and introduced what I am describing as the new physics of big data.
Having designed and deployed a number of multi-billion row context accumulating systems over the last 14 years I cannot help but notice some very interesting, very exciting phenomenology. Not research. Not Theory. Real.
1. Better Prediction. Simultaneously lower false positives and lower false negatives. A bit more about this here: Prediction: Channel Consolidation and Puzzling: How Observations Are Accumulated Into Context.
2. Bad data good. More specifically, natural variability in data including spelling errors, transposition errors, and even professionally fabricated lies – all helpful. A bit more about this here: It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You and There Is No Such Thing As A Single Version of Truth.
3. More data faster. Less compute effort as the database gets bigger. A bit more about this most exciting phenomenon here: The Fast Last Puzzle Piece.
And more good news, as context accumulates, a better sense of when and where to place one’s attention (apply compute effort) develops -- including: (1) very smart observation filters and (2) fully automated ability to determine very specific, very relevant questions – answers to which it may decide to fetch itself. A system that Googles itself? I have not blogged about this thinking as of yet, but hopefully will one of these days.
Anyway, imagine that: As the database grows, fewer CPU cycles are needed for better predictions and you never really wanted to clean all that data up in the first place.
I also took this keynote opportunity to share my latest skunk works project – a project my team and I have been working on for almost two years now. Yes, it’s true, I am building something – a sensemaking engine, designed to fully harness this big data phenomenon. Among other exciting properties, this system will also have an unprecedented number of privacy-enhancing features baked into it. Internally I have been calling this little skunk works effort “G2.” And when this little girl grows up I have big hopes for her. For example, maybe she will help cancer researchers find a cure.
My Defrag 2010 MS PowerPoint presentation here.
RELATED POSTS:
Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel
Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems
Puzzling: How Observations Are Accumulated Into Context
Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food!
“Macro Trends: The Privacy and Civil Liberties Consequences … and Comments on Responsible Innovation” – My DHS DPIAC Testimony, September 2008
Awesome presentation at Defrag yesterday. Thanks!
Posted by: Dan Lynn | November 19, 2010 at 10:10 AM
Jeff,
I have read your blog and everything you are talking about I agree with and have implemented. Everything works very much as you describe. The system is called the Hilbert Engine. I am the inventor of the technology and my name is Bjorn Gruenwald.
Through the Hilbert Engine, all data are coordinate transformed into a numeric space. Because of their quantitative nature, they provide for a well defined context and require far less processor operation in analysis. Since the data are numeric, they serve as their own index. The system accumulates context since data not only find data, but also create new emergent properties, which are added to the vectors. All of this contributes to provide the overall system a much smaller footprint than traditional implementations. Hilbert is therefore ideally suited for real-time operation.
Our company, Hilbert Technology, Inc., is working on a number of high level government projects here and abroad directed at the most complex problems.
I'd be interested in meeting with you with our CEO. I can be contacted through my private e-mail [email protected] Looking forward to talking about both our ideas in much more details with an eye toward developing new innovative solutions embodying our mutual ideas.
Bjorn
Posted by: Bjorn Gruenwald | December 03, 2010 at 10:32 AM
Jeff,
So I see the topic of "Big Data" hitting ComputerWorld this week, 11/7/2011, "Why Big Data is a Big Deal ... a new group of data mining technologies promises to foever change the way we sift through our vast stores of data" I think the key word here is "sift". It seems like another attempt to throw yet more hardware at the problem. We still seem to be missing the basic point of your analysis. It seems so obvious to me. What are we missing? Is it some piece of software that will accelerate the process of implementing solutions that embody your vision? Is there any progress in this arena?
Paul
Posted by: Paul Brewster | November 09, 2011 at 12:56 PM