My Photo

Your email address:


Powered by FeedBlitz

June 2008

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
Blog powered by TypePad

« October 2006 | Main | December 2006 »

November 29, 2006

IEEE Paper: Threat & Fraud Intelligence – Las Vegas Style

This month in IEEE Security and Privacy (November/December 2006) there is an article I wrote that describes in relatively plain English the key principles of "Identity Resolution" and "Relationship Resolution."

Here is a link to a PDF version of this story: Threat and Fraud Intelligence – Las Vegas Style

In a nut shell, here are the essential objectives:

This story also makes the case that probabilistic-based identity matching systems skew over time as the underlying data changes. I have 23 years of work in the area of identity disambiguation at scale. This has led me to the conclusion that starting with deterministic matching and tuning probabilistically is far superior, especially in large data sets that cannot be retrained or reloaded in any reasonable interval (e.g., quarterly).

November 13, 2006

Dumb and Dumber: Consequences of the 2006 Silverman Triathlon

Yesterday I did the Las Vegas Silverman triathlon – this is known as the toughest Ironman in North America. Talk about suffering. Whoever dreamt up this course is a sick puppy.

The 112 mile bike course involved over 9,700 feet of climbing, which is just over a third of the way up Mount Everest! The 26.2 mile run course involved over 2,000 feet of climbing.

To boot it was really windy during the swim and the bike. This made the 2.4 mile swim in Lake Mead very choppy – my most difficult swim ever. And the first 50 miles of biking were into something like a 15 mile per hour headwind – my most difficult biking ever. Good thing I did not try this race on my mountain bike, as I once did in an Ironman; otherwise, I could still be out there.

The swim took me 1 hour 29 minutes. The bike took another 7 hours 18 minutes. This means after getting into my running gear I had already been racing for nearly 9 hours. No better time to start a marathon run, huh? Well, the run took me just over 5 hours.

The funny thing about these events is that you get progressively dumber as the day goes on. On the bike I attempted to simply count cyclists I was passing. Not possible – it just took too much concentration to perform this mathematical feat. Then on the run I distinctly remember hearing that it was 6pm and given that the race started at 6:30am, one would think computing the total number of elapsed hours would be easy. Computing the 11 hours and 30 minutes of elapsed time in my head was exceedingly difficult – in fact, I gave up at least twice before deciding to try one more time!

At about 13 hours into the race, I turned left into a dead end dirt lot with a fence. I saw another athlete coming straight on and politely asked if he was lost. He said "No". As I was concluding that he was simply dazed and confused I looked up to see I was running into a fence. Turns out I was the one off course. So I blurted out … "I guess I am lost!"

The whole race took me 14 hours and 10 minutes.

Well, it is the day after. My legs hurt. It is hard to walk. I’m tired. Still thirsty. And still am suffering from an IQ deficiency.

Enterprise Intelligence – My Presentation at the Third Annual Web 2.0 Summit (November 2006)

I was invited to speak at the Web 2.0 Summit last week in San Francisco. Believe it or not I actually presented 41 charts in less than 10 minutes. This kind of general session presentation was called a Show Me/High Order Bits. That’s right, the essence of my life’s work in just 10 minutes ... the thrill!

[Note: The formal title was: "Cops and Robbers Las Vegas Style."]

If you did not make this most amazing summit with a most amazing cast of attendees or were there and missed my auctioneer-inspired delivery, here are the key points I covered:

0. I first showed a picture of a fire breather from my last New Year’s eve party – but that is not important right now.

1. I showed a surveillance video of a casino scam involving a corrupt dealer – resulting in a $250,000 loss in 15 minutes. If the dealer had the same address in the payroll system as the "high roller" had in the loyalty club and comp systems (free rooms, meals, etc.) … who would know?

2. I introduced the concept of "Corporate Amnesia." This occurs when one part of the organization makes a decision which very clearly did not account for other key data sitting elsewhere in the enterprise e.g., your marketing department is mailing offers to a person currently in jail for stealing from you!

3. "Perception Isolation" is the leading cause of Corporate Amnesia. Think of each operational system as a distinct enterprise perception. Notably, each perception is isolated from the others.

4. Enterprise intelligence requires persistent context. There is no way to get smart if perceptions are not integrated. When perceptions are integrated and stored in a database … this is persistent context. Think of this like a brain. You need a brain to be smart … duh!

5. I gave a simple demonstration of how context can be constructed and persisted and how this enables the enterprise discovery that otherwise would be missed (more corporate amnesia).

6. Then treat data as a query. And thus I introduced a 1st principle for enterprise intelligence: If you do not process every new piece of key data (perception) first like a query … then you will not know if it matters … until someone asks.

7. Treating data like a query beats periodically boiling the ocean when attempting to achieve real time intelligence.

8. Then, also treat queries as data. This means if one wishes to have a query persist, it must be persisted in the same data space as the data itself. Which leads to the 2nd principle for enterprise intelligence: Treat queries like data to avoid having to ask every question every day.

9. While constructing context (real time receipt of perceptions from across the different operational systems) this happens to be the most ideal time for this librarian function to exhibit enterprise awareness. Which leads to the 3rd principle for enterprise intelligence: Enterprise intelligence is computationally most efficient when performed at the moment the observation is perceived.

10. This is the world I sometimes refer to as "Perpetual Analytics." A world where the "data finds the data … and the relevance finds the user."

11. And this stuff really works … and at scale. In fact, in a benchmark center this was found to scale to over 3 billion historical observations while handling the real-time ingestion of more than 2,000 perceptions a second.

12. This has privacy consequences. For example: (a) What perceptions can or should be placed into context (in one brain)?; (b) What if perceptions are contextualized for one mission, then re-purposed later for another?; (c) What if someone steals the brain?; and (d) What if the librarian is corrupt?

13. I worry about these things. And I spend about 40% of my time thinking about the privacy and civil liberties consequences of such systems. Which prompted one of my more recent inventions: a new class of technology I call "Analytics in the Anonymized Data Space." Basically, instead of transferring perceptions from the various senses (an organization’s operational systems) that are human readable … the perceptions are anonymized first before being handed to the librarian for contextualization in the brain. The Reader’s Digest explanation of anonymization is basically this: if you take a pig and a grinder and make a sausage, even if I give you the sausage and the grinder you are not going to be able to make a pig. The cool thing about this new technology is that the librarian can still construct and persist context and discover relevance without actually handling human meaningful data.

14. So I summarized with the main think towards enterprise intelligence -- (a) Without persistent context … you have no brain; (b) Treat data and queries with equal rights to improve awareness; (c) More intelligence is possible when thinking based on streaming perceptions; and (d) And from a privacy perspective: More or less perceptions, that is the question (there is an important policy discussion that needs to take place about just how many – more versus less – perceptions should be permitted to be put in the brain).

15. While this approach to enterprise intelligence was born in Las Vegas ... today it plays a role in national security, financial services, health care, etc. And much of the focus of my current activity is towards using this technology to deliver new threat and fraud intelligence solutions in these and other areas.

To my shock at this point I had completed 36 charts and still had 1.5 minutes left. As I thought this was in fact a possibility, I quickly moved into what I called the bonus section!

Bonus Picture 1. I showed a picture of a chimpanzee with the words "99.4 percent human." The point being: If a .6% difference matters this much … no wonder traditional information systems lack so much intelligence! Net net, in intelligence systems very tiny little increments of accuracy make the entire difference between being dumb and smart.

Bonus Picture 2. And it may go without saying, that in such systems as this … the more observations one has the better the context. In fact, many times new observations will contain the evidence to improve or fix earlier contextualizations.

Bonus Picture 3. And this brings us to the crucial concept of "Sequence Neutrality." Meaning despite the order of the observations (records A, B, C received in that order versus arriving in the order C, B, A) the end state is the same. If you cannot process information with sequence neutrality then you get "data drift" – meaning you hold contradictory content which must be reconciled eventually or accuracy erodes. This is a common reason data warehouses must be reloaded. Almost no systems possess this sequence neutrality property. Notably, it is virtually essential at scale because it eventually becomes impossible to tear very large databases down to reload them every week, month, or quarter.

Closing thought. After working on designing sequence neutrality into my technologies, I have discovered there are some cases where a new record (perception) will necessitate so much recontextualization, it cannot be done in real time. Drats! That means the system must either be periodically reloaded or alternatively go offline into a maintenance mode (i.e., deep sleep) to remedy the situation. But alas, that is why humans sleep too – deep recontexualization that could not be handled on the fly. Our dreams are the byproduct of this necessary re-shuffling. Or so I have concluded!

This post is now the shortest read about my enterprise intelligence information theory.

I plan on blogging about "why perception isolation is the leading cause of corporate amnesia" very soon.

November 06, 2006

Discoverability: The First Information Sharing Principle

As I mentioned in my last blog post entitled "Information Sharing: Got Directory?" enterprise information must be registered in the card catalog or it cannot be located in any efficient manner.

Here are two key points about this process.

What goes in the catalog? The short answer is Metadata. For example, at the library it is subject, title and author. Maybe your mission needs who, what, where and when. This is actually one of the hard parts ... deciding what to include. If you make this too robust out of the gate, you will be doomed by various complexities. So if you have not already done this, then I recommend selecting just the most basic attributes first. Generally one wants to include: (a) enough attributes to determine when future objects (e.g., documents, people, things, etc.) are the same -- semantic reconciliation being the technical word for this, (b) attributes that help relate the object to other related objects (e.g., addresses can play a role in relating people), and (c) the attribution/pedigree (e.g., source attribution) including date/time and location. There are some other categories, but this is a good starter kit.

What is going to prompt a system custodian to give you any catalog metadata in the first place? The answer should be because they care about the enterprise mission. But, that generally won’t cut it. So what you will probably need is "policy" followed by budget authority. Here is the approach I would try. If a system is placing metadata in the card catalog then their information will be discoverable across the enterprise at large. This is good. If they do not put metadata into the card catalog it is not discoverable. This is bad. Therefore, using the metric "percent of data registered in the directory", a budget authority can quantify which systems have the greater potential for enterprise value. Budget authorities then use budget to "reward" systems with the most discoverability as they will have the greater potential to create enterprise value.

The role of directories in information sharing is nothing new of course. For example, the Markle Foundation’s Task Force on National Security in the Information Age discusses the importance of directories in its third report entitled "Mobilizing Information to Prevent Terrorism" (pages 59-61). [Truth in advertising: I am a proud member of this Markle Task Force.]

Directories play a key role in reducing the amount of data flowing in the network. Because only limited attributes (metadata) are transferred to the directory, most data remains with its original custodian – which has decent privacy ramifications. And coincidentally this directory-based architecture is in my opinion the only technically viable solution to address large scale information sharing initiatives.

Related Posts: The Information Sharing Paradox, It’s All About the Librarian! New Paradigms in Enterprise Discovery and Awareness

November 03, 2006

Information Sharing: Got Directory?

Think about the library. Think about its every floor, hallway and shelf as silos of information excellence. Valuable information tucked away … just begging to be shared.

Now exactly how would such sharing occur?

I introduced the Information Sharing Paradox a few months back. This basically highlights the fact that one must construct and use a central directory to efficiently determine who has what. At the library the card file is used to point to the location of a book. A Google search does not scour the Earth for the results, no, a pre-constructed index is searched and the results -- pointers to the real documents -- are returned to the inquirer.

This is the only model that scales. And as the Librarian tasked with keeping this central catalog current gets more sophisticated, enterprise awareness and Perpetual Analytics begin to unfold.

So, in short, information sharing is a second base affair. You cannot get there without going by first base, first. And first base is discovery.

From a policy standpoint, if enterprise information assets are not registered in the central index (directory, card file, or whatever you want to call the thingee) then this information is hardly an enterprise asset at all … because it is virtually undiscoverable.

My next blog post will be: "Discoverability: The First Information Sharing Principle"

November 02, 2006

Delusions of Advocacy

Sometimes you don’t know what you are not … until you meet a real one.

I actually thought I was a privacy advocate for about a month. This belief became very short lived following a lengthy conversation with EPIC’s general counsel David Sobel, now at EFF. His perspective was so deep and substantial that I immediately and publicly announced a self-demotion to simply that of a "student" of privacy and civil liberties protections.

The other life tension in this area is that I design real systems for real organizations with real privacy implications. And this means that as I learn more as a student of privacy, sometimes I see things in my twenty-plus year rear view mirror that I "could have done better." This is especially true in the area of process and policy. So the best I hope for in these circumstances is to not make the same mistake twice.

As more technologists engage the privacy community, one hoped-for outcome will be more responsible innovations. But I also expect these same technologists will have to wrestle with past creations. In my view, this is an important and necessary area of personal reinvention, especially for practitioners in the technology field.