My Photo

Your email address:


Powered by FeedBlitz

June 2008

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
Blog powered by TypePad

« July 2006 | Main | September 2006 »

August 29, 2006

Be Anyone in Las Vegas, Get Help Creating a Cover Story Here

The saying goes “What Happens in Vegas Stays in Vegas.”  And just to be safe, why not travel to Vegas with a cover story?  Heck the opportunists do and you can too.  In fact, the Las Vegas Convention Authority would like to help you.

Be Anyone in Las Vegas

Use this site to establish a name, a cover, get business cards and so on.

While the professionals surely have more sophisticated identity factories … at least you can get started here!

August 20, 2006

Accumulating Context: Now or Never

Sensing importance across a sea of dynamic systems with constantly changing data requires the accumulation and persistence of context.  (I am using the term persistence here to mean storing/saving what one has observed and learned – in a database for example.)

If a system does not assemble and persist context as it comes to know it … the computational costs to re-construct context after the fact are too high.  Therefore, a system will be more intelligent when it can persist context on data streams … and less intelligent when it does not persist context on data streams.

[Sidebar: After explaining this to my lawyer friend Peter Swire he said this is nothing new.  He explained, “That is just like the ‘touch it once’ principle from the One Minute Manager book!”  Yes, I had to confess, it is that basic – as is everything I conjure up.  And, since when have lawyers become so concise?]

It is True: Context at Ingestion is Computationally Preferred

The highest degree of context attainable, per computational unit of effort, is achieved by determining and accumulating context at ingestion.  This is achieved by taking every new data point (observation) received and first querying historical observations to determine how this new data point relates.  And once this is determined, what has been learned (i.e., how the new data point relates to other known data points) is saved with the new data point.

Smart biological systems do this too.  For example, as we humans “sense” the surrounding environment, we assemble these streaming data observations (sights, sounds, etc.) into context at that exact moment.  And we do this, with Sequence Neutral processing – whereby the final context is the same despite the order in which observations are processed – at least for the most part.

Now not to be too abstract here.  But, while I have been harping on the importance of creating Sequence Neutral processes – no trivial feat in real-time context engines – I am coming to the conclusion that a few aspects of Sequence Neutrality cannot be handled on data streams at ingestion!  While this gives me a sinking feeling about the consequences this has to Scalability and Sustainability (i.e., no reloading, no batch processing), I am somewhat comforted by the fact that smart biological systems at the top of the food chain themselves go off-line for batch processing (i.e., sleep).  I’m theorizing that dreams are in fact species’ effort to re-contextualize that information which could not be ingested with Sequence Neutrality.  Because if humans could do this while being awake, from a survival and evolutionary stand point, we would!

With all of this in mind, I believe that many architectures, systems and processes which have originated from the batch world probably will have a hard time emerging as high context, intelligent systems.  Further, I think next generation intelligent systems will be designed to assemble context on streams.  But we have a long way to go towards intelligence on streams before we must resort to off-line processing.

August 04, 2006

Sensing Importance: Now or Never

If you do not process every new piece of enterprise data like a query, then you will not know if you hold content that matters … or at least not until someone asks.

This has a performance consequence.  No longer are the performance requirements measured by (a) how fast can data be loaded and (b) how fast can queries be processed.  The new performance requirement will simply become “How fast are the queries?”

Data and queries are going to converge – the line between the two will blur.

So therefore, such systems must be screaming fast.

Related posts:

It’s All About the Librarian! New Paradigms in Enterprise Discovery and Awareness
You Won’t Have to Ask -- Data Will Find Data and Relevance Will Find the User
What Came First, the Query or the Data?
What Do You Know? Introducing Perpetual Analytics

August 03, 2006

It’s All About the Librarian! New Paradigms in Enterprise Discovery and Awareness

For the moment let’s say that directories, indices and catalogs are all the same thing – a thing used to locate other things.  Some examples include the card catalog at the library, phone directories, Google, eBay and so on.  In each case, these are locator services – they return reference information (e.g., pointers) after being provided one or more search terms. 

But all directories are not created equal.  For example, there is a big difference between traditional “context-less” directories versus directories capable of “accumulating context.” And it is these high-context directories that will deliver the next generation of really smart enterprise discovery and situationally aware information systems.

To explain this, I’m going to draw from the library analog.  Picture an old fashion library. If you are young, this may be difficult for you to imagine, but there were these file drawers each containing sorted white cards (3x5 cards to be exact).  One was sorted on book title, another by author, and another by subject.  There may have been others but I forget.  It was fast and easy to search these files alphabetically to locate relevant index cards, each card revealing the location of a book.  So, drawing on this reasonable enterprise discovery metaphor, these are the key terms I am about to use.

Document – Original content (e.g., book) having a known static location (e.g., physical location at the library).

Document Attributes – The descriptive features of a Document that distinguish it from other Documents (e.g., Subject, Title, Author, Abstract, etc.). 

Index Card – A single record (i.e., 3x5 card in a manual system) representing a Document Attribute (e.g., the book title, “The Art of War”) with reference to the Document’s physical location (e.g., aisle 17, shelf 4).

Card File – A sorted collection of Index Cards sharing the same Document Attribute (e.g., a physical drawer of Index Cards entitled “Sorted by Author”).

Card Catalog – Used to describe all the Card Files (e.g., collectively the three physical drawers entitled “Sorted by Author”, “Sorted by Title”, and “Sorted by Subject”).

Index(ing/ed) – The activity of placing Index Cards into the appropriate Card File (e.g., alphabetizing an Index Card into the “Sorted by Author” Card File).

Librarian – The administrative function responsible for managing the Card Catalog (e.g., ensuring new Documents are properly Indexed in a timely fashion).

The most common type of Card Catalog is context-less – the Librarian Indexes all new Documents with indifference to all other Index Cards.  In other words, the Librarian blindly updates the Card Catalog without observing how new Index Cards relate to any overall context.  Context-less directories are designed to provide users with the most basic ability to locate Documents (e.g., all books related to “Billy the Kid”). 

“Semantically Reconciled Directories” are Card Catalogs with improved context because synonyms are automatically accounted for. This means users looking for one thing (e.g., “Billy the Kid”) … automatically find other “same” things (e.g., “William Antrim,” one of his aliases).  Semantically reconciled directories recognize when Document Attributes and/or Documents reference the same thing even though they are being described differently (e.g., Bill=William; Amex=American Express; 123 Main Street=123 S. Main St).  This type of Card Catalog allows users to locate Documents that would otherwise be completely missed.  Master patient indexes are an example of this type of directory.  These are used to enable health care professionals to locate health care records across organizations and systems despite wide variability in how each patient has been identified in each health care system.

“Semantically Reconciled and Relationship Aware Directories” is a type of Card Catalog which provides an even higher degree of enterprise context by allowing users to locate additional Documents, for example, those related by intimate association (e.g., while Billy the Kid is also known William Antrim, it may also be important to understand there was a real William Antrim, who happened to be Billy the Kid’s step father).  The NORA (Non-Obvious Relationship Awareness) technology I invented in the early 90’s (now owned by IBM) is an example of this type of directory.  This technology leverages such context for the real-time discovery of highly actionable alerts … in an effort to help focus an organization’s finite investigatory resources (e.g., like the surveillance team charged with protecting a casino’s assets).

Smart Librarians are responsible for the creation and management of high-context Card Catalogs.  I would also like to point out that the Librarian will typically be the first (and the computationally least expensive) to notice when new observations (e.g., Documents) are of enough relevance to be published to a consumer (i.e., user).

When I talk about solving the Information Sharing Paradox and Perpetual Analytics (a world where the “data finds the data” and the “relevance finds the user,”) the gate keeper for such enterprise intelligence and situational awareness is none other than the Librarian!

Because Context is King, I strongly recommend investing in “context-caring” Librarians.

And a few more technical points:

1. Any type of directory can be managed in batch or real-time modes.

2. Any type of directory can be clear text, encrypted, anonymized or some combination thereof. 

3. If scalability and sustainability matter, then context must be constructed at ingestion (not constructed just-in-time).  And real-time Indexing with attention to Sequence Neutral processing is vital, otherwise data drift will necessitate periodic database reloads – a non starter at scale.