My Photo

Your email address:


Powered by FeedBlitz

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Blog powered by TypePad

January 05, 2008

Data Decommissioning – Destruction of Accountability

Having designed a lot of systems over the years – more often than not the customer says they plan on performing periodic purges of historical data. This always seems logical at the time. But, it turns out once you have data it becomes hard to justify its destruction. And if anyone actually destroys data … one is at the same time eliminating any accountability whatsoever (not to mention other adverse consequences).

Data decommissioning is a double-edged sword.

After a number of personal missteps over the years, I have revised my think about data decommissioning. Today I imagine a process where accountability is maximized while the risk of unintended disclosure, misuse, and repurposing are minimized. The goal being to write accountability data into storage de-optimized for information retrieval … therefore rendering retrieval practical only for infrequent, forensic inspection. In simple terms, think paper tape, think hard copy reports, or think microfiche. Alternatively, in more sophisticated settings, I suspect immutable audit logs optimized only for investigative/forensic-specific information retrieval might be useful too. [More detail about this line of thinking available in this paper that Peter Swire and I penned on behalf of the Markle Foundation.] Obviously, at some point in time when there is no longer any reasonable expectation of information accountability, repeatability, etc. wholesale data decommissioned makes sense (burn the microfiche).

How I arrived at this revised thinking in part came about from this series of events.

Many years ago, I deployed a system designed to address a single, very specific threat. Then, several years later I concluded that long after that threat was over, the aggregated data set had probably lived on. I would not have thought twice about the privacy and civil liberties implications of this had I not started to engage in conversations with privacy advocates. Following these conversations, I decided that there are some scenarios in which data decommissioning should be "baked in."

Subsequently, with this in mind, when a pro bono opportunity to assist with a humanitarian disaster relief effort presented itself, I proposed a data destruction caveat for the contract. While the customer didn’t seem to care much one way or another, I was excited to learn the customer agreed to the wholesale destruction of the aggregated data set upon project closure. And delete it all we did.

A small victory for privacy it seemed – that is, until a few years later when I realized that I could no longer prove what was done, right or wrong. In fact, had there been any after-the-fact disputes about incorrect action taken based on the recommendations of the technology, I would have had to say, "We destroyed the evidence!"

In summary, when designing systems which require strong audit, accountability and repeatability processes … very careful consideration must be given to delete processes.

Deeper Technical Points:

1. Much like the challenges that come with processing deletes, record changes can have the same issues. This occurs when a system overwrites changes rather than keeping each incremental record state and its temporal relevance. When overwriting changes – one is deleting previous values; it is this de facto deletion that compromises audit and accountability processes.

2. A further complicating factor is that not all changes are the same. Some changes are corrections, i.e., the earlier value was incorrect, e.g., wrong driver’s license number or a missing apartment number in an address. Another type of change is one where a value supersedes a previous value, e.g., when recording a married name, new email address, or new cell phone number. Further complicating matters, most systems of record do not have a mechanism to capture the difference between corrections and updates – forcing system designers to make some assumptions.

3. When synchronizing data across information sharing environments, propagating deletes through this ecosystem forces each receiving party into this same accountability dilemma.

Related Trivia:

1. When data actually does get purged it is often prompted by a forcing-function. The three purge scenarios I have seen are: (a) all the ancient history is compromising performance; (b) there is no interest in paying for more storage; and (c) "oops - we shouldn’t have been collecting that!"

2. With all the countless copies of data being made, how can one be sure it is ever all deleted anyway?

RELATED POSTS:

Data Tethering

Out-bound Record-level Accountability in Information Sharing Systems

Information Incontinence

Immutable Audit Logs (IAL’s)

How Many Copies of Your Data? Is Somewhat Like Asking: How Many Licks to the Center of the Tootsie Pop?

January 02, 2008

Information Incontinence

I was on a call the other day working on a family project when the other party asked for my cell phone number. I handed it over on two conditions: (1) she throw it away after the project was completed, and (2) I made her swear to not enter my cell phone number into any computer. Immediately following this conversation my girlfriend overheard me muttering, "Computers are dangerous." Let me explain.

When it comes to preventing information leakage … the best rule is:  "Don’t ever let the data be placed into digital form."

Then for extra protection it is best not to ever speak it.  And, in coming years, it will be best not to ever think it either. (See P300 post below)

RELATED POSTS:

P300 "Brain Fingerprinting": A Very Freaky Future Indeed

How Many Copies of Your Data? Is Somewhat Like Asking: How Many Licks to the Center of the Tootsie Pop?

November 16, 2007

Van Halen, Risk Management and Breaking the Law (Allegedly)

One of the freedoms we have is the freedom (ability) to knowingly bend or break a law.

While in New York this week, I discovered that Van Halen was playing Madison Square Garden Tuesday, November 13th! Back in the day when I used to play guitar, Eddie Van Halen was like a super hero to me. Unfortunately, the concert was sold out.

Sold out or not – I decided I was going, one way or another. After checking Craigslist without luck and checking with the hotel concierge who found a pair for $1400.00, I decided to take matters into my own hands.

9:02pm - Madison Square Gardens

I arrived at the curbside with a load of cash on hand looking for a scalper. The police were everywhere. I stumble immediately into an interesting character who claims to have one ticket. When I ask him how much, he says $350. I say "deal!" And with great disregard for scalper laws and the countless police all about, I pulled out my wad of $20 bills and counted them off … all in plain sight.

Allegedly, of course.

I inspect the ticket for signs of being a forgery and accept it. He pockets the cash, and then pulls out his wallet while saying "I have something else for you." I briefly wondered if I had lucked into an undercover policeman! Nope, handing me his card he says "Call me anytime you want a ticket here." Then he says, "Heck for the price you just paid, I'll walk you to the front door."

9:15pm – I'm in the concert!

Allegedly.

November 10, 2007

Found: An Immutable Audit Log

An immutable audit log is a tamper-resistant recording of how a system has been used – everything from when data arrives, changes, departs, to how users interacted with the system. Each event is recorded in an indelible manner - even the database administrator with the highest level of system privileges cannot alter the past … kinda like the paper tape on an adding machine tape, etched in stone … only more high-tech.

I think (and hope) tamper-resistant audits will become common place in settings ranging from health care patient records to government surveillance systems. The primary value being twofold:

a) Accountability. Enable policy folks charged with oversight and accountability to validate that a computer system has been used within policy and law: and,

b) Deterrence. The "chilling effect" caused by the knowledge that a tamper resistant audit log is in place – deterring a corrupt person or two from bad behavior.

Well, good news. I stumbled onto a software company in Spain called Kinamik which has been dedicating its technical resources towards the creation of … a tamper-resistant audit log!

Now what? What if no one wants to pay for one? Will tamper resistant audit logs need to be built-in to commercial off-the-shelf systems to reach the market? If so, will organizations actually pay for the additional disk space and processing requirements to turn such a log on? Or, will they simply turn the feature off?

This is important technology and one that really needs to see the light of day, especially in conjunction with non-transparent government systems.

If any of my readers have thoughts as to what kind of incentives or levers will be needed to make such audit logs a reality, I would love to hear from you. As well, if you discover any other companies selling tamper-resistant logs, please let me know. I would like to compile a list.

RELATED POSTS:

Yesterday’s Technology Review Story: Blinding Big Brother, Sort of

Immutable Audit Logs (IAL’s)

October 05, 2007

Six Ticks till Midnight: One Plausible Journey from Here to a Total Surveillance Society

The ACLU has recently announced a Surveillance Society Clock which depicts, in their view, how close we are to a total surveillance society. At the time of this writing the clock sits at 11:54pm – just six minutes from midnight!

This clock got me thinking about what series of plausible events might lead up to total surveillance. Unfortunately, such an exercise turned out to be spooky because I quickly concluded that a total surveillance society is not only possible but a certainty. It will happen through a series of fairly quick small steps, it will be irreversible, and the real shocker is that I suspect consumers will find it "irresistible!"

The Six Ticks till Midnight

11:54pm – All cell phone are GPS enabled

Consumers love all of the location-based services. They’ll know that Starbucks is just ahead on the left. The kids just made it home. To avoid the traffic accident at I-15 and Central Parkway, try Pierre Avenue instead. As the prices drop for GPS cell phones, everyone wants one. Manufacturers decide there is no point in making cell phones that don’t have GPS.

Tick.

11:55pm – RFID chips everywhere

The cost of RFID becomes so cheap that objects of all sizes and shapes are embedded with these little transmitters, each announcing what they are … to nearby receivers. RFIDs find their way into your car, keys, sunglasses, prescription bottles and underwear. They also happen to be in everything else ranging from your dinner plates to your casino chips. While manufacturers need this to improve supply chains and lower costs, consumers applaud the new conveniences, e.g., faster check-out lines, simplified warranty service and merchandise returns, etc.

Tick.

11:56pm – Biometric user authentication is added to cell phones

Recognizing that cell phones contain so much information, manufacturers start integrating biometric user authentication (e.g., fingerprint). Consumers cannot seem to live without this feature because it prevents information loss if the phone is stolen and, better yet, now that phones can be tied to specific owners, consumers are able to use the cell phone to pay for goods and services without having to even take out their wallet. Predictably, there is less identity theft. Everyone is a winner! Responding to market demand, manufacturers add biometric user authentication to all cell phones.

Tick.

11:57pm – Cell phones become RFID readers

In a natural convergence of two very useful technologies, cell phones are designed to also be RFID readers. Cell phones can now probe nearby objects recording "what" things (e.g., your Dolce & Gabbana sun glasses), "when" things (e.g., 7:35pm last night) and "where" things (e.g., at your friend Bill’s house). Consumers absolutely love this feature because it makes it so easy to manage all their stuff, e.g., where were my sunglasses last seen. So many nifty services are now possible that user demand for RFID-enabled cell phones goes through the roof. Consumers can’t seem to live without it.

Tick.

11:58pm – Cash is replaced by cell phone debit

Why go to the ATM or manage all those plastic cards when you can move cash via your cell phone? No more losing money. No more stolen credit cards. Consumers also appreciate the improved transaction speeds, and retailers like the fact that many cashier errors are eliminated. The cashless society emerges because it is preferred.

Tick.

11:59pm – All persons carry cell phones at all times

By this point in time, most everybody will be hard pressed to ever separate themselves from their cell phone. In fact, consumers will be incentivized to keep it with them at all times. For example, insurance companies may offer lower rates for those consumers who agree to always carry their cell phone as the GPS will help determine driving habits. Furthermore, since cell phones contain important life saving data like emergency contact info, current medical prescriptions and blood type, the value of marrying a cell phone to every person become obvious. Between personal benefit, corporate benefit, state and federal services, health and safety issues, immigration and national security it becomes a no brainer to mandate legislatively that every person over the age of six carry their cell phone. Instead of having to have a social security number or carry some form of ID, your cell phone will do.

Tick.

12:00am – Welcome to the Total Surveillance Society

Total? How total? I guess one might argue that my made-up sequence of events results in a lot of surveillance but not total surveillance. Maybe total surveillance would require that every bathroom have cameras covering every angle and people having to wear skull caps with mind reading instrumentation (coming?). My argument simply being: there comes a degree of surveillance under which everything that matters will be digitally recorded – one’s location, communications, transactions, associations to others, and one’s proximity to things.

Oh yeah, one more thing, no more need for facial recognition (a very hard problem many years off anyway). In this coming world, all that useless video being collected can now be efficiently recalled because GPS data provides the missing link … who was where when?

While the exact technologies or the exact sequence of events may unfold quite differently, nonetheless such a future is coming. And this future is being created by us consumers, not the government!

Consumers are funding the surveillance economy, with the blistering pace of this extraordinary surveillance being driven by ordinary people who relish all the technological advances and willing to entirely trade in their information and privacy as they optimize their life.

Now what?

Well, if this is the future, then I think here are some key considerations:

1. Under what condition and authority can an actor (i.e., a person, an organization, a government) look at what data, and when?

2. How will we know when an actor is breaking the rules?

3. Will oversight and accountability be easier in a total surveillance society?

4. How do we make sure that access to extraordinary knowledge is not limited to a few? And, how do we ensure that data about us is knowable by us?

5. For the few people that resist being plugging into the matrix – will they be less employable, less trustworthy, or suspected of hiding criminal activity?

With all this in mind, it seems ever more important that the technology community better engage the privacy community – there simply is not enough conversation going on between these two camps – and time is of the essence. [See: Responsible Innovation: Staying Engaged with the Privacy Community]

Why are more people not working on privacy-preserving technology e.g., anonymization, immutable audit, selective revelation, data masking, data expiration and destruction services, etc. – and more importantly why are not more organizations starting to take advantage of these emerging privacy-enhancing alternatives?

Closing Thought: Will virtual reality be the only remaining place one can enjoy anonymity and freedom of action?

RELATED POSTS:

Ubiquitous Sensors? You Have Seen Nothing Yet

Responsible Innovation: Designing for Human Rights

Responsible Innovation: Some Things are Best Left Un-invented

Responsible Innovation: Staying Engaged with the Privacy Community

September 18, 2007

More Death Cheaper in Future

The difficulty and cost of delivering death and mayhem are dropping so fast, there will come a time in which the ill-will of a few evil men could ruin the day for millions.

Technological advances in physics, engineering and biology coupled with the Internet and the dynamics of Web 2.0 have contributed to unprecedented social progress and overall improvement of the human condition. In many ways … and in most places … it is better now than ever before; hence my recent post "The World is Not a More Dangerous Place." At the same time, these same phenomena are accelerating the lethality potential per unit of human effort.

Example 1: The difficulty required to build and deliver the first few 10-kiloton nuclear devices in the 1940’s involved 130,000 people and cost two billion dollars ($23B in 2007 dollars). Today, graduate students are building viable detonation systems … albeit lacking the enriched uranium or plutonium. But unlike the 1940’s when enriched uranium did not exist – every ounce having to be produced – today this nuclear material exists in stockpiles all over the world.

Example 2: Recent biological advances have made it possible to reanimate the 1918 Spanish Influenza. Did I say "possible?" Sorry, I meant to say "this has already been done!" Between a couple of tissue samples left over in a military hospital and a deceased Alaskan Eskimo preserved in the permafrost, the virus has been successfully reconstructed and its DNA sequenced. Researchers then proceeded to inject this virus into mice with the human immune system. The result – unprecedented death – the most deadly flu virus ever tested. [story here] While nuclear material is hard to acquire, I was told the DNA sequence of the 1918 Spanish Influenza was already in the public domain. Hard to believe, so I asked a friend in the biological community for a copy of this DNA sequence. So it appears that I now have a copy on my laptop, but what would I know!

While advances in technology are a big part of this trend, other factors contribute as well including population density, dependence on mobility, the tightly coupled interdependencies in which the world operates (e.g., from just-in-time supply chains to your just-in-time access to cash and food) and media-driven sensationalism. Factors such as these have a force multiplying and amplification effect even upon traditional means for mayhem. For example, consider the death and mayhem created by Malvo and Muhammad, the two Washington DC-area gunmen. They were able to turn an investment of a few thousand dollars (car, gas, gun, bullets) into an instrument of terror which not only killed a number of people but also created so much panic the regional economy lost an estimated half a billion dollars ($500,000,000).

And so it seems, as time marches forward fewer people are able to create more damage cheaper and faster.

RELATED POSTS:

The World is Not a More Dangerous Place

The Only Way to Actually Win the (Long) War on Terror

Web 2.0 – Al Qaeda’s Most Effective Force Multiplier

August 08, 2007

How Many Copies of Your Data? Is Somewhat Like Asking: How Many Licks to the Center of the Tootsie Pop?

I get asked form time-to-time how data flows. But, what they really mean is: How many places does the data land? After explaining this a few times I decided to blog it for easy future reference.

If you give a company your name and address, how many copies of this data might there be twelve months later? Many might be surprised to discover that there could easily be in excess of 1,000 copies!

So roughly speaking it looks something like this …

When data first arrives it is likely to be stored in an operational system – sometimes called the "system of record." This is the first instance.

Systems of record are frequently mission critical systems and are therefore candidates for robust backed-up policies. While different organizations have different back-up policies, one common strategy involves creating one backup every day; keeping each daily backup for seven days. This is a rolling strategy where every Monday overwrites last Monday’s backup. An end-of-week backup (e.g., every Sunday night) might be kept for five rolling weeks. Month-end backups might be kept for twelve rolling months. And year-end backups are likely to be kept for something like seven years.

So at the end of twelve months it is possible that there are now an additional 24 copies of the data (7+5+12). The good news is that backups are well protected; the bad news is that the greater the number of backups the greater the chances one turns up missing -- which happens. [Example here]

Structure governs function. [More on this here.] This is important because how the data is structured in the original system of record is specific to its mission. This means if an organization wants to use the data internally for other reasons (e.g., secondary operational systems like a fraud detection system, statistical analysis, marketing, etc.) this data is copied into each additional system.

Along this line, many organizations create a reporting copy that can be used for ad hoc analysis without effecting operational systems. Some copy the data into an operational data store (ODS). Another copy of the data is often moved into to the enterprise data warehouse. Copies from data warehouses are often used to populate data marts. How many data marts might there be? Who knows; one, two, three, or maybe more?

So if an organization has only one reporting copy, one ODS, one enterprise data warehouse and three data marts, then this would add up to six more copies. And these copies are likely to have backups made of them as well, especially when significant computational effort was involved in moving the data (e.g., pre-processed, translation, standardization and integration/co-mingling with secondary data sets). If the same backup strategy is used this could result in 6*24 or 144 more copies.

So now we are at 1 + 24 + 144 = 169 copies.

But wait, there is more. Many of these systems likely have some form of audit logging – maybe both at the application and database level. Often additional "one-time data snapshots" are made over the course of a year for such things as, pre- and post- maintenance and conversion (e.g., application or database upgrades), specialty analysis projects, audit snapshots, and so on. Then there are complete copies made for testing purpose (e.g., to ensure the scheduled upgrade is going to work as planned) and training systems (yes, sometimes training systems are created with real data). These may be backed up as well!

Furthermore, high availability mission critical systems can be expected to have one or more fully synchronized copies of the database strategically dispersed across the landscape for both work load distribution and/or disaster recovery purposes.

And then there are many odd little places data can get parked including sensor-side caching (e.g., at the slot machine or cash register itself), in-transit caches (e.g., cell phone towers), message queues, local and central search engines, performance enhancing indices, and so on.

Sorry, but I’ve lost count. So let’s just say over a hundred copies are made … internally. Now, what about the copies of the data which travel beyond the organization that originally collected the data?

Let’s say you are applying for credit. In this case, you have likely authorized a credit report. Getting your credit report involves sending your information to a (or all three) credit bureau(s). This information request now sits in their system of record; their audit logs; their data warehouses and data marts; their backups and so on. But wait there is more!

These secondary recipients of your data may in turn further disseminate this information. This is especially true if the organization is a data aggregator/data broker. This data is combined with other information, assembled, scored and sold. These tertiary recipients then make their own mission-centric copies, data warehouses, backups, etc. And, in some cases, it is repackaged and sold again.

Care to guess how many copies of the data are out there now?

  • No copies                     You better hope not
  • >10 copies                    Almost certainly
  • >100 copies                  Very likely
  • >1,000 copies               Quite possible is certain settings
  • >100,000 copies            Sometimes
  • >1,000,000 copies         Not out of the question

What can cause information to be replicated over 100,000 times can come into play with such information as phone service (phone books), credit applications, and believe it or even those warranty cards you have been filling out!

What does all this mean?

1. Keeping data current in the eco-system is not trivial. [See: Data Tethering]

2. Protecting this many copies of the data is not trivial.

3. The more data you see, the more you realize most data is duplicative.

And this leads to an area I have been thinking about for about five years which I sometimes refer to collectively as "Data Reduction Strategies." More about some progress I have made in this area on some future date.

Oh … and my Perpetual Analytics stuff is going to need one more copy (with its own particular database schema) since Enterprise Intelligence requires Persistent Context. And, of course, it would be wise to back this up too.

July 30, 2007

The World is Not a More Dangerous Place

Back in the days when I had my company, Systems Research & Development (SRD), I prevented anyone from pitching my software using "the world is a more dangerous place" as the set up pitch.

Two reasons: (A) I think it is safer to be alive now than ever before and (B) I hate the idea of using the "fear card" to sell.

Before you call me crazy, consider the following: In the 1300’s the Black Death killed an estimated 75 million people – including a third to two thirds of Europe’s population. The 1918 Spanish Flu killed 50 – 100 million in just 18 months making by far the most destructive pandemic on record.

The average life span at the end of the nineteenth century in Western Europe was thirty seven. Today the average lifespan in the world is sixty seven! [Ref: Life Expectancy]

In short, you are more likely to grow older today than any time in the history of man.

Here is another point of reference: Even if America sunk into the ocean the 300 million deaths would be ~4.5% of the world’s current population (~6.7B). The 75 million lives lost to black death amounted to ~17.4% of the world’s population at that time (~432MM). Thus, if you were standing in America and discovered it was going to suddenly fall off into the ocean in the next few minutes, although this makes for a very bad day for you personally, overall the world still would be a less dangerous place as compared to the mid-1300’s.

Nukes complicate this equation. The two primary nuke scenarios being: a) one-se-two-se nuclear detonations carried out by stateless criminals; and b) a full scale global nuclear war causing the annihilation of mankind.

While periodic unscheduled 10-kiloton nuclear detonations would be very very bad, until such events exceed a few a year (or they go thermonuclear) – in the grand scheme of things us Earthlings are still safer than the 1,300’s. (True. If all of these events happen in a single geography, then while the world at large would still not be a more dangerous place, that specific geography would certainly be a more dangerous place!)

The scenario involving a full-scale nuclear exchange of large numbers of thermonuclear weapons deserves special attention. True, the risk of global nuclear annihilation was absolute zero before the 1900’s and today this risk is no longer zero. But, this risk ebbs and flows. One way to consider how this risk changes over time is the Doomsday Clock. Remember that? The idea being, the closer this clock is to midnight, the greater the risk of global annihilation. Its keepers calculated the time period 1953-60 as the closest the world has yet come to a doomsday event (2 minutes till midnight). [Note: the Doomsday clock was not adjusted in 1962 during the Cuban Missile Crisis as this incident came and went faster than the group reconvened and reset the clock.] Then from 1991 to 1995 the Doomsday Clock was rolled back to 17 minutes until midnight suggesting times were the safest since the inception of this clock in 1947. Notably, the clock shows that since 1995 the safety of the world has been declining. Nonetheless, at this point in time even when considering nuclear Armageddon, the world is less dangerous today than 1953-1960.

So in all fairness, when considering whether the world is a more dangerous place one would also have to ask "as compared to when?" and "as compared to where?" For example, if you called Chernobyl home on April 27th, 1986" you were definitely in a more dangerous place.

And one more thing … when the world seems like an incredibly dangerous place ... you can probably thank some of the media for that. The media’s ability to take every bad thing that happens on the planet and package it up for maximum sensation plays a huge role in spreading fear. It’s not their fault of course it is yours (and mine) as sensational news is what draws us into the media. And as our attention gives them higher ratings they justifiably work even harder at finding, packaging and delivering up even more of this bad news for us. [So, I propose we fix this by only directing our attention to "good news" stories from now on ok? Wait, is that a smoke plume on CNN … get out of my way … I gotta see this!]

Honestly, if you could pick another time to live, would you really trade living in this age for an earlier century? I wouldn’t. Oh, and I wouldn’t want to trade it for 100 years in the future either – I think the future has a chance of being really messy.

These could be the golden years!

PS: Before you get too excited one way or the other about this post, take this into account: This is my Yin post. Stay tuned for my forthcoming Yang post which will be entitled something like "More Death in Future Cheaper."

RELATED POSTS:

The Only Way to Actually Win the (Long) War on Terror

Web 2.0 – Al Qaeda’s Most Effective Force Multiplier

July 10, 2007

How to Use a "Glue Gun" to Catch a Liar

"People lie. How are you going to account for that?"

This question used to make me crazy. I always wanted to blurt out, "And the sun is going to consume the earth someday – deal with it!"

I never said this, of course.

Anyway, I have a more thoughtful response these days.

Try this on for size. Yep. People are going to falsify information. In fact, you may have experienced this in your life. Let’s say you had a friend – or so you thought. Over time you discovered that this person was in fact dishonest. How did you discover this? The answer is simple: you collected more observations over time.

Observations add up.

I have seen this play out in real data. For example, there was this very big database (billions of table rows describing hundreds of millions of unique people). In this particular database there was this one fellow who was repeatedly lying about his identity. He did a good job, in fact such a good job that despite Semantic Reconciliation processing he appeared to be six different people.

The guy was a liar and no one knew ... that is until future observations (created by his own actions) flushed him out.

[Skip this next paragraph, if you are speed reading or want to stay out of the weeds.]

Here is how this happened. Imagine six apparently discrete identities. Some name similarity, but that never matters at this scale. Then one day this fellow decides to use one of these identities (using previously reported features e.g., same name, phone, SSN, date of birth, etc.), except this time he introduces a new address, one that had never been previously associated with this identity. So this new record is identity resolved to the existing identity – the identity he wanted to present). This caused context accumulation – in this case the new address enhanced what was known about the person he was being today. Sequence Neutrality processing then fires-up to make sure earlier identity resolution events are still valid. During this process another identity was located that shared the new address (the one just learned) and other matching features (e.g., similar names and more). The identity he was trying to be had now become conjoined to one of his other identities – one he was trying to distance himself from. [Technical note: I am specifically using the term conjoined as opposed to merged. Think of conjoined like being rubber-banded together versus merged where two records become one. This is essential for many reasons e.g., retaining the ability to change one’s mind later. More about this in a future post.]

When two identities collapse into one identity – this new conjoined identity now has more context. As something new had just been learned, sequence neutral processing immediately determines if there are any further assertions of the past to fix (e.g., more identities that can be conjoined, or in some cases, disjoined).

Long and short, his six discrete identities collapsed into one … thanks to the arrival of two new records.

Knowing this, one thinks about what data sources are better than others. Some data sources are so good … they work like "glue guns."

From a national security and privacy point of view, it is the above behavior that makes it so important to debate what perceptions (observations) are fair game for context construction, and when.

RELATED POSTS:

More Data is Better, Proceed with Caution

Ubiquitous Sensors? You Have Seen Nothing yet

Accumulate Context: Now or Never

To Know Semantic Reconciliation is to Love Semantic Reconciliation

June 09, 2007

Transparency, Privacy and Responsibility

I was recently interviewed by Jeff Ubios of the Giannino Bassetti Foundation. The transcript of this interview is available here:

Transparency, Privacy and Responsibility: An Interview with Jeff Jonas

This is the most comprehensive collection of my thoughts about privacy to date. And as you might expect, I covered all the usual suspects like responsible innovation, designing in support of the Universal Declaration of Human Rights (UDHR), limits of predictive data mining for counterterrorism, anonymization, immutable audit logs, watch list redress and so on.

But I also covered a number of new ideas that I have not yet had the chance to blog about including:

  • The one-way watch list – A special watch list exists where you can put yourself on it, but by design, you cannot take yourself off
  • Trapped by your data trail – Ubiquitous access to historical data is making it impossible to escape one’s past (a dwindling freedom)
  • More death in future cheaper – the cost and execution risk to wreak havoc is dropping (e.g., the killer 1918 Spanish Influenza has been brought back from the dead)
  • Pinhole vision caused by lens crafters – as we continue to look to technology to triage data, we are really calling for custom crafted lenses to intentionally narrow our perceptions. That is an important fact to remember.

RELATED POSTS:

Responsible Innovation: Designing for Human Rights

Responsible Innovation: Some Things are Best Left Un-invented

Responsible Innovation: Staying Engaged with the Privacy Community