My Photo

Your email address:


Powered by FeedBlitz

June 2008

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
Blog powered by TypePad

« January 2006 | Main | March 2006 »

February 28, 2006

Where Possible … Avoid Consumer Surprise

Today eWEEK published a story about IBM and privacy entitled “Encryption: How to Save Privacy, Businesses and Borders.”  In this piece, I am cited as saying “consumers really hate being surprised.”

The day that news is made which causes consumers (citizens in a government context) to say ”I had no idea” and the surprise involves them, their data, or their privacy … this makes for a bad hair day.

Of course, corporations and governments must protect things such as trade secrets and sources and methods.  But when these “secrets” become broad scale, involve personally identifiable information and/or consumer transactional data, and come with a surprise, count on some degree of revolt and consequence – for example, a negative effect on an organization’s stock price or congressional outcry or both.

The chief antidote to avoiding consumer surprise is Transparency.  And on this subject the book entitled, “The Transparent Society” by David Brin makes for a very interesting read.

And when systems are deployed with virtually no transparency (particularly government systems) not only are policy, oversight and accountability necessary … but additional safeguards are often necessary, for example, the use of privacy-enhancing technologies like Immutable Audit Logs and anonymization.

February 23, 2006

Responsible Innovation: Designing for Human Rights

Technology innovations often become the commonplace standards of tomorrow.  We all end up sleeping in the bed we have made.  In hind sight, will we look back and believe we innovated in a responsible manner?  I worry about this. 

In my podcast earlier this month with IBM Chief Privacy Officer Harriett Pearson, I mentioned the Universal Declaration of Human Rights and posed the question “what if we are creating technologies that kind of go in the face of something like this?”  In other words, what if systems are designed without the essential characteristics needed to support basic privacy and civil liberties principles?

For example, let’s take Articles 9, 12, 15 and 17 of this Declaration, each of which incorporate an arbitrariness test that serves to protect certain human rights.

Article 9

No one shall be subjected to arbitrary arrest, detention or exile.

Article 12

No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honor and reputation.  Everyone has the right to the protection of the law against such interference or attacks.

Article 15

(1) Everyone has the right to a nationality.

(2) No one shall be arbitrarily deprived of his nationality nor denied the right to change his nationality.

Article 17

(1) Everyone has the right to own property alone as well as in association with others.

(2) No one shall be arbitrarily deprived of his property.

In my view, the arbitrariness test cannot be satisfied if any of the identified deprivations rely upon data for which its pedigree cannot be demonstrated.  For example, if technologies play a role in “arrest, detention, exile, interference, attacks or deprivation,” they must support disclosure of the source upon which such invasions are predicated.

In thinking further about this, I would call for the following minimum design characteristics in any system that could affect one’s privacy or civil liberties:

  • Every data point is associated to its data source
  • Every data point is associated to its author
  • Every data point is associated to a specific timeframe
  • Every data point is associated to a specific location

In addition, such a system should also include mechanisms that assure some degree of accuracy, currency and context.

Responsible technology design is no panacea towards ensuring our human rights.  But, if we start thinking about aligning innovation with humanitarian principles, maybe the future will be brighter.

February 14, 2006

No Need to “Over Share” – Thoughts on Information Sharing

“Information sharing” is a hot topic these days.  What drives this interest is the desire to improve decision making by ensuring that users are aware of enterprise content that has otherwise been trapped in isolated information silos.  The objective is to construct robust Context for enterprise optimization.  Having the right information at the right time in the right place matters a lot – whether the mission is to enhance customer service, detect identity theft or fraud, improve health care or secure our nation.

Picture in your mind ten different operational systems, each with its own mission-specific database (i.e., “isolated information silos”).  What would information sharing really look like in this enterprise?  Does information sharing mean every system must transfer all of its data to each other system?  Or does information sharing mean that every system must constantly query all other systems in an effort to locate new context?  And if Sequence Neutrality matters, which I think it does, how could either of the above sharing models deliver accurate, real-time situational awareness (Perpetual Analytics)?  They cannot.

Over Sharing.  In this model where every system broadcasts all of its data to all other systems, the show stoppers include enormous network bandwidth requirements, difficulty in maintaining information currency, inconsistent data protection schemes, inconsistent audit and access control mechanisms and, when dealing with sensitive identity data, legitimate privacy concerns.  Not to worry, for entirely different reasons, data owners hate this model too.

Go Fish.  In this model where every system asks every other system every question every day, again the show stoppers are substantial, including the inability of source systems to efficiently process unfamiliar queries, wrong answers caused by off-line systems, recursive processing required every time a query discovers something new, high latency, unacceptable network traffic and inconsistent audit and access control mechanisms.  This model is similarly untenable because nearly every operational system on the planet would first need to be re-engineered to enable a functional Go Fish model.

Catalogs.  What is better, in many cases, is the catalog model.  Think of this like the card catalog at the library.  In this analogy every aisle of the library is the equivalent of an isolated information silo.  It would be unimaginable to roam the aisles expecting to efficiently find a relevant document (book).  Rather, the card catalog provides a user with pointers to documents … i.e., directions where to go (who to ask).  So instead of Over Sharing and Go Fish, with the Catalog model one can efficiently discover what needs to be shared.  Data transfer is minimized in this model and scalability more certain – just look to Google for that case study.  And as data owners fully control their own content, they can determine when to release what data to whom, and under what authority.

Information discovery is a critical precursor to information sharing.  And Catalogs are one proven pattern for enterprise discovery.  Once information is discovered, information sharing becomes more particular – because you know who to ask for what.  Then to ensure policy is being followed one might implement various controls including Immutable Audit Logs.

What’s next?  Anonymized and semantically reconciled catalogs.

February 11, 2006

Dehydration Science Project in Palm Springs

My friend Joe and I did a 101 mile bicycle ride today called the Tour de Palm Springs. In the cycling world, these are referred to as “centuries”. Because I am planning on doing a few Ironman triathlon races this year it is important to do some cycling training.  And since I never have enough time to train like real athletes do, I devise foolish schemes to enhance my endurance and ability to suffer through such fun.

It was a fairly hot day, I’m guessing it had to be in the 80’s.  I have never been on a century and seen more flat tires.  Literally hundreds of people got flats.  A group of 10 of us got lost while my friend Joe was leading the pack.  My excuse was that I was dehydrated.  So our distance ended up being about 107 miles.

How did I enhance the suffer factor you might ask?  I did the entire race without any liquids.  During this kind of science project one learns to maximize finite resources, e.g., various fluid conservation tricks.  The mental determination to ride the last few miles while staring at my unloved water bottles – without taking a swig – took unusual levels of determination.  This is not a training tip, don’t ever try this.

February 10, 2006

Podcast: The Future of Privacy

Earlier this week IBM released a podcast entitled, “IBM and the Future of Privacy.”  In this interview Harriet Pearson, IBM’s Chief Privacy Officer, and I share thoughts about the future of information technology and privacy.

http://www.ibm.com/investor/viewpoint/podcast/09-02-06-1.phtml

While at IBM just over a year, I am really impressed with the organization, its people, its technology and its continued interest in developing privacy-enhancing technologies.  For example, I have a privacy strategist, John Bliss, working directly for me.  The notion is that when conceiving of next generation technology or mapping out an architecture for a specific customer problem, having a privacy strategist right there in the weeds with me makes it possible to innovate with high levels of privacy and civil liberty protections in mind. 

It is nice to work for a company that considers privacy not just as a differentiator but as a responsibility.

February 09, 2006

Immutable Audit Logs (IAL’s)

Today the Markle Foundation Task Force on National Security in the Information Age released a paper entitled “Using Immutable Audit Logs to Increase Security, Trust, and Accountability.”

http://www.markle.org/downloadable_assets/nstf_IAL_020906.pdf

Peter Swire and I were the lead authors on this paper.  We share with our colleagues serving on the Task Force a great hope for the role such a technology may play in building better oversight, accountability and trust.

February 07, 2006

Sometimes Math and Science Just Don’t Matter

Under the heading, “amazing things just happen,” I have a friend who developed a huge tumor on her neck last year, was diagnosed with cancer, and was given three months to live. 

She happens across her deceased mother’s wedding ring and slips it on to her finger.  Hours later the swelling, tumor and breathing difficulties are completely gone.  The doctors are mystified to say the least. 

Later, when she takes the ring off, the tumor almost immediately begins to resurface.  Put the ring back on, and “poof” the tumor magically disappears.  Last night while I was visiting with her she suddenly noticed the ring was not on her finger.  She began to swell up.  Today I called to make sure she found the ring.  She had. And her neck was back to normal. 

I am not making this up.

What do I make of this?  It makes me realize that reality and science are far from being reconciled.  Sometimes what is true cannot be proven by math or science.

February 06, 2006

Consequences of False Positives in Government Surveillance Systems

Yesterday The Washington Post ran a story entitled, “Surveillance Net Yields Few Suspects.”   I was quoted in this story making the point that technologies that produce too many false positives run the risk of becoming civil liberty infringement engines.

While I am working on a formal paper in this area, let me quickly say a few things.

In the direct marketing business, false positives have one fairly minimal consequence – the wasted expense of the mail piece and its postage.  In the law enforcement and national security business, false positives have other more serious consequences, namely:

1. Overwhelming analysts with dead-end leads that waste resources; and

2. Civil liberties infringements.

Our 4th Amendment requires “reasonable and particular” government searches and seizures.  It stands to reason that the higher the false positives the less "reasonable and particular" the process must have been.

The good news is that analysts and investigators hate all the false positives too … so if we create systems that minimize false positives, everyone wins.

February 02, 2006

The Phone Call is Coming From Your House! Context is King.

I have a lot to say about the importance of context. But what exactly does this word “context” mean in the information management space? The Oxford American Dictionary has this entry for context:

Context noun

1. Parts that surround and clarify a word or phrase

2. Relevant circumstances

The word “bat” by itself is ambiguous. “He swung the bat” or “the bat-mobile turned left” bring context to the term bat. Similarly, in the information management domain context means the association of related data points in such a manner as to yield the highest possible degree of understanding. When a record is being evaluated for some potential action while at the same time other related data points exist but are not made available, context is missing. Decisions made without available context run the risk of being poor decisions.

Last year I checked into the St. Regis hotel in New York at 1am, requested a wake-up call for 10am and ordered breakfast for a 10:30am delivery. The maid woke me up as she knocked on my door at 9am. She was not aware of these additional data points – had no context – and as a consequence interfered with my experience. Curious about the state of information management at one of the finest hotels on earth, I queried the morning manager. He very nicely conveyed the fact that guests are expected to place the “do not disturb” placard on the door. True, this is one form of context! After having thought about this I realized that, to the best of my knowledge, no hotel in the world has an automated system in place to handle this relatively easy context problem.

Enterprise context translates directly into better customer service and a more efficient labor force. While information integration is a core component, real-time enterprise context requires Perpetual Analytics with Sequence Neutrality.

Context is a "main thing" and as such there is so much more to be said. Stay tuned.

February 01, 2006

What Do You Know? Introducing Perpetual Analytics

When the call center is taking a first time caller who happens to be the daughter of their largest customer, does the call center employee know?  An employee moves and then submits an address change to the payroll department.  Will their internal fraud investigators notice that this new address is related to a current fraud investigation?  In both cases, the answer is no.  Organizations don’t know what they know.

The real problem is … Data happens.

Every organization has an ocean of historical data and with each passing moment new data continues to stream in via one channel or another.  Different streams lead to different silos and these silos are organized to serve different missions … rarely are any two silos alike. 

When organizations elect to increase their understanding (e.g., unearth discoveries otherwise isolated across their disparate information silos) they implement secondary analytic systems.  Whether the organization calls these systems data mining, business intelligence, or predictive modeling, the way these technologies generally work is that they periodically extract data from these silos and then process this data using specialized algorithms.  At the end of this process a new data set is created that encapsulates what has been learned/discovered, e.g., men spend “x” percent less on shampoo than women.

As the number of silos and records increase so does the computational effort.  In my view this is like re-boiling the ocean every time insight is desired.  Not only does this approach not scale, this approach prevents an organization from achieving real-time situational awareness.  In the batch world of analytics, information insights arrive chunky – meaning all insight is only made available on certain intervals, e.g., month end.  And while this delay in awareness is not critical to some missions (e.g., direct marketing, actuarials) it is absolutely critical to other missions (e.g., fraud detection, border control systems).

So there is an ocean of historical data and it is raining, which is to say new data keeps being introduced.  Re-boiling the ocean every month to discover what is knowable is old think.  Perpetual analytics is new think.

“Perpetual analytics” is the term I use to describe the process of performing real-time analytics on data streams.  Think of this like “directing the rain drops” as they fall into the ocean – placing each drop in the right place and measuring the ripples (i.e., finding relationships and relevance to the historical knowledge).  Discovery is made during ingestion and relevant insight is published at that magical moment. 

Not only is this approach more effective in that new observations (e.g., bank account openings) are of immediate use, but it is also more efficient in computational terms.  In most cases, less computational effort is required to construct understanding when applied to net change data streams (e.g., adds, changes and deletes) versus the batch-based, re-boil the ocean model.  Put another way, systems will achieve greatest awareness per unit of computational effort when the incremental – not batch – learning model is applied.  No surprise, as this is how Mother Nature designed us too. 

In a system designed to handle perpetual analytics, as data changes in source systems (e.g., an employee updates his address) a message is fired off to the analytics engine and this new observation is integrated into the collective knowledge.  In this way, the “data finds the data”.  Should this incremental knowledge result in insight (e.g., the employee is related to an open fraud investigation) such discovery can be published to the appropriate user (e.g., in this case the fraud investigator).

In summary, perpetual analytics is vital to achieve real-time situational awareness – a capability whereby the “data actively finds the data.”

Related post: Sequence Neutrality in Information Systems