For the moment let’s say that directories, indices and catalogs are all the same thing – a thing used to locate other things. Some examples include the card catalog at the library, phone directories, Google, eBay and so on. In each case, these are locator services – they return reference information (e.g., pointers) after being provided one or more search terms.
But all directories are not created equal. For example, there is a big difference between traditional “context-less” directories versus directories capable of “accumulating context.” And it is these high-context directories that will deliver the next generation of really smart enterprise discovery and situationally aware information systems.
To explain this, I’m going to draw from the library analog. Picture an old fashion library. If you are young, this may be difficult for you to imagine, but there were these file drawers each containing sorted white cards (3x5 cards to be exact). One was sorted on book title, another by author, and another by subject. There may have been others but I forget. It was fast and easy to search these files alphabetically to locate relevant index cards, each card revealing the location of a book. So, drawing on this reasonable enterprise discovery metaphor, these are the key terms I am about to use.
Document – Original content (e.g., book) having a known static location (e.g., physical location at the library).
Document Attributes – The descriptive features of a Document that distinguish it from other Documents (e.g., Subject, Title, Author, Abstract, etc.).
Index Card – A single record (i.e., 3x5 card in a manual system) representing a Document Attribute (e.g., the book title, “The Art of War”) with reference to the Document’s physical location (e.g., aisle 17, shelf 4).
Card File – A sorted collection of Index Cards sharing the same Document Attribute (e.g., a physical drawer of Index Cards entitled “Sorted by Author”).
Card Catalog – Used to describe all the Card Files (e.g., collectively the three physical drawers entitled “Sorted by Author”, “Sorted by Title”, and “Sorted by Subject”).
Index(ing/ed) – The activity of placing Index Cards into the appropriate Card File (e.g., alphabetizing an Index Card into the “Sorted by Author” Card File).
Librarian – The administrative function responsible for managing the Card Catalog (e.g., ensuring new Documents are properly Indexed in a timely fashion).
The most common type of Card Catalog is context-less – the Librarian Indexes all new Documents with indifference to all other Index Cards. In other words, the Librarian blindly updates the Card Catalog without observing how new Index Cards relate to any overall context. Context-less directories are designed to provide users with the most basic ability to locate Documents (e.g., all books related to “Billy the Kid”).
“Semantically Reconciled Directories” are Card Catalogs with improved context because synonyms are automatically accounted for. This means users looking for one thing (e.g., “Billy the Kid”) … automatically find other “same” things (e.g., “William Antrim,” one of his aliases). Semantically reconciled directories recognize when Document Attributes and/or Documents reference the same thing even though they are being described differently (e.g., Bill=William; Amex=American Express; 123 Main Street=123 S. Main St). This type of Card Catalog allows users to locate Documents that would otherwise be completely missed. Master patient indexes are an example of this type of directory. These are used to enable health care professionals to locate health care records across organizations and systems despite wide variability in how each patient has been identified in each health care system.
“Semantically Reconciled and Relationship Aware Directories” is a type of Card Catalog which provides an even higher degree of enterprise context by allowing users to locate additional Documents, for example, those related by intimate association (e.g., while Billy the Kid is also known William Antrim, it may also be important to understand there was a real William Antrim, who happened to be Billy the Kid’s step father). The NORA (Non-Obvious Relationship Awareness) technology I invented in the early 90’s (now owned by IBM) is an example of this type of directory. This technology leverages such context for the real-time discovery of highly actionable alerts … in an effort to help focus an organization’s finite investigatory resources (e.g., like the surveillance team charged with protecting a casino’s assets).
Smart Librarians are responsible for the creation and management of high-context Card Catalogs. I would also like to point out that the Librarian will typically be the first (and the computationally least expensive) to notice when new observations (e.g., Documents) are of enough relevance to be published to a consumer (i.e., user).
When I talk about solving the Information Sharing Paradox and Perpetual Analytics (a world where the “data finds the data” and the “relevance finds the user,”) the gate keeper for such enterprise intelligence and situational awareness is none other than the Librarian!
Because Context is King, I strongly recommend investing in “context-caring” Librarians.
And a few more technical points:
1. Any type of directory can be managed in batch or real-time modes.
2. Any type of directory can be clear text, encrypted, anonymized or some combination thereof.
3. If scalability and sustainability matter, then context must be constructed at ingestion (not constructed just-in-time). And real-time Indexing with attention to Sequence Neutral processing is vital, otherwise data drift will necessitate periodic database reloads – a non starter at scale.
I point to you here:
http://www.redmonk.com/jgovernor/archives/002015.html
Posted by: James Governor | August 04, 2006 at 01:55 AM
Jeff,
Just discovered your blog via James Governor. He speaks highly of you...
It seems we have two things in common, an interest in long distance cycling, and privacy. I'd be interested finding out more about IBM's privacy research as this is directly related to my PhD, and getting you to join our little jaunt up Mont Ventoux in September....!-)
there is one thing you miss in your library analogy. The good old dewey decimal system. It remains a useful categorization framework, and I'm suprised it hasn't morphed somehow into the online world.
As a youngster I remember heading to the 960 section...
Posted by: Thomas Otter | August 09, 2006 at 12:58 AM