“It is a capital mistake to theorize before one has data”

~ Sir Arthur Conan Doyle

Over the years, folks have often asked me what kind of math am I using to create large scale, real-time, context accumulating systems (*e.g.,* NORA). Some fond of Bayesian speculate I am using Bayesian techniques. Some ask if I am using neural networks or heuristics. A math professor said I was doing advanced work in the field of Set Theory.

My answer is always, “I don’t know any math. I didn’t finish high school. But I can explain how it works, step-by-step, and it is really quite simple.”

That reminds me of a related funny story. After IBM acquired my SRD company in 2005 I began touring IBM’s impressive research facilities around the world. During a visit to one of IBM's research labs I explained my techniques to a room full of researchers. A few months later, to my surprise, they sent me a technical paper to express my work … using math. Fascinating I thought. The idea that my algorithms are now expressed in math terms was really exciting. Could it be? I was so curious. So I asked them to humor me and take me through the paper very slowly via a conference call. It was actually a bit embarrassing. I started out by asking the question what does an equal sign mean when a colon is in front of it? Symbol by symbol I asked for an explanation. Then I asked about this thing shaped like the letter “U” … what does that mean? (Union as it turns out). Anyway, I was able to follow the math and it all made sense until about halfway through the paper when I spotted an obvious error. So I said, “um, the math here is inconsistent with my technique.” I suggested a fix. The phone went quiet for a minute and then about 45 days later they came back with a new and improved paper. Continuing where we left off, I found a similar discrepancy further down the page and then provided some more specifics about my technique. Unfortunately, I never received another draft. Clearly, they could have. But honestly, I suspect they simply lost interest in having to teach me math.

I wish we would have finished that paper, as then folks trained in formal methods would better understand what I am doing and seeing.

One of the things demonstrated by this mathy paper might have been the notion that “data beats math” – at least when it comes to *Assertion Algorithms*. Based on the available observation space, can an assertion be made? Yes or no. In short, there comes a point where sufficient evidence exists such that an assertion can be made as a “no-brainer” without feeling compelled to split hairs with probability math.

Here is practical example. Imagine being presented with two identity records?

Record #1

Name: Mark Smith

Date of Birth: 05/12/1987

SSN: 555-00-1122

Record #2

Name: Mark Smith

Date of Birth: May 1987

D/L: 0099912334

Are they the same person? It is certainly possible. Using population statistics and some math someone could compute a reasonably accurate probability. I say heck with using math to guess. I’d say where can I find some glue around here? For example, a record like this:

Record #3

Name: Mark K Smith

Date of Birth: May 12, 1987

D/L: 0099912334

SSN: 555-00-1122

So the point is: I’d rather look for corroborating and/or dissenting evidence than look to math for estimated probabilities. And if a really important outcome might come from such an assertion, I would continue to seek observations until it was so obvious you could show the board of directors and they would say “duh.” If you run out of available observations and you are still not sure … then you have a few choices: 1) locate and collect the kinds of observations you need, 2) wait until you luck into a future observation related to the assertion in question (letting the existing ambiguity fester), or 3) pound on it with math. But I say only pound on it with math if it is going to be worth the additional effort/compute (*e.g.,* you are playing high-stakes poker in Vegas).

My gripe, if any, is that way too many people are chipping away at hard problems and making no material gains in decades (*e.g.,* entity extraction and classification) … when what they actually need is more data. Not more of same data, by the way. No, they more likely need orthogonal data – data from a different sensor sharing some of the same domain, entities and features (*e.g.,* name and driver’s license number).

When the quality of mathematical predictions start to flatten out, I recommend increasing your observation space. Hence the above reference to this awesome quote:

“It is a capital mistake to theorize before one has data”

~ Sir Arthur Conan Doyle

RELATED POSTS:

Accumulating Context: Now or Never

Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel

How to Use a Glue Gun to Catch a Liar

It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You

Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems