For some strange reason we have all come to believe that there is data and then there are queries. Over the last few years, I have come to conclude that this is not only odd, but also a mindset that is preventing information systems from being substantially more useful.
When did we start thinking that queries are not data? When a user conducts a search there is this underlying assumption that the data being looked for is both a) known and b) posted. That is a pretty significant speculation that in some settings may produce odds of no better than 50/50. Could that mean half the analytic answers generated by systems are incorrect?
One of the more significant blog postings I have made (at least in my mind) is about the significance of Sequence Neutrality in Information Systems. In the context of search this means that while we traditionally expect queries to find data, in sequence neutral systems the data must have an equal ability to find the earlier query. And the best way to deliver this at scale is to treat queries as the data itself. And when I say “treat” I mean manipulate, process and store queries in the same way.
Then … when queries are treated like data, one discovers that queries also find queries. And this is cool because this allows a system to recognize that two users have asked the same or related questions, despite the fact there was not any underlying “data.”
There are already systems where data and queries are working together to give users significantly better intelligence. Two examples that come to mind are Google and Amazon. Google notices what people have searched for (and selected) in the past to better order their search results for you. And Amazon makes tailored suggestions for you using the search (and purchase) interests of others.
And while Google and Amazon lack the property of Sequence Neutrality … no need to worry, as there are probably not too many users of these services where lives or millions of dollars are at stake. However, in mission critical systems where analytics make or break the enterprise, one would want to know if yesterday’s answer is now believed to be entirely wrong. And you would want to know right now!
I think that the next generation of business intelligence systems (e.g., like Perpetual Analytics) are going to build on this notion that the queries are the data. Thus the answer to the question “what came first, the data or the query” will be moot.
What does it mean for queries to find queries? I can think of 3 different notions:
1. The queries themselves are identical, for instance by string match.
2. The queries are logically equivalent. (Is this even easy to find for, say, SQL queries?)
3. The queries are different themselves but return the same data.
As for relatedness, consider a Google search query: "andrew johnson president". Is this related to "Andrew Johnson OR president"? The two have different results, but match on some of the terms.
Posted by: Bob | May 05, 2006 at 12:25 PM
A symmetrical dicussion could be made for all data being queries, including data about data. Even a single fact, "what is your birthday", captured in a single column record in a database, gets disassociated from the original query, including the means and context under which that query was presented by the inquirying person of system and the person object under review.
Might the corpus of queries have it's own ontology? When thought of this way the discussion may be one of knowledge representation and its inference instead of data and queries.
Posted by: Ray Garcia | January 20, 2008 at 10:01 AM
Hi Jeff,
Intersting thoughts. I would tend to speculate that queries are in fact, unstructured data. Then, to follow: being unstructured data - they can be mined. But also, the information in the queries need to be stemmed, stop-worded, run through natural language ontologies, and so on.
Once persisting the data about the query, and data about what structures were hit - new correlations can be assessed. I'd even go so far as to speculate that the correlative analysis based on ontological study could in fact either "correct" the next query, OR better yet, adapt the model underneath to better meet the needs of the question being asked.
To your point - in other words, help the queries find queries, help the data find the queries. As usual, form and function MUST be glued together.
My two cents anyhow. I'm now blogging on "modeling architecture" and it's impacts at my new site: http://www.BetterDataModel.com
Respectfully,
Daniel Linstedt
Posted by: Dan Linstedt | May 04, 2008 at 07:57 PM