The notion that the more
data, the slower the system – ain’t always true.
My favorite way to
explain this very important phenomenon involves the familiar process of
assembling a jigsaw puzzle.
The first piece you take
out of the box and place on the work surface requires very little computational
effort. The second and third pieces require almost equally insignificant
mental effort. Then as the number of pieces on the table grows the effort
to determine where the next piece goes increases as well. But there is a
tipping point where the effort to determine where to place the next piece gets
easier and easier … despite the fact the number of puzzle pieces on the table
continues to grow.
Well isn’t it
interesting, although obvious, that those last few puzzle pieces take nearly as
little effort as the first few!
I have witnessed
this.
This has a slew of
ramifications.
This does not apply to
all domains. This behavior requires: (a) observations from the same
universe; (b) observations with enough features to enable contextualization;
(c) observations in which these features can be extracted, enhanced and
classified; (d) sufficient saturation of the observational space; and (e)
enough smarts to stitch these puzzle pieces together.
Context accumulating
systems, fed appropriate observations, can be expected to have this behavior.
RELATED POSTS:
More
Data is Better, Proceed With Caution
Context:
A Must-Have and Thoughts on Getting Some …
To
Know Semantic Reconciliation is to Love Semantic Reconciliation
Big
Breakthrough in Performance: Tuning Tips for Incremental Learning Systems
I like using the notion of perimeter. It starts from scratch, increases to a tipping point after which each new piece is decreasing the perimeter rather than increasing it and rapidly decreases. Software tends to be similar IMHO, not just with searching information but also programmer effort in using APIs. Designing software so that it has maximum utility with minimum perimeter is an ideal I aspire to.
Posted by: Jason Watkins | September 29, 2008 at 08:45 PM
This happens because when the memory in the system is full (when it contains the most data in the middle) it will take the longest to process the data. I really like your comparison to a jigsaw puzzle, because that makes tons of sense.
Posted by: Jigsaw Free | January 09, 2009 at 05:25 PM
The puzzle, example is good.
It is simple Probability Theory.
As you add pieces of the puzzle, the probabilities of finding the
solution decreases based on exponential rate of growth of the solutions occurs. Until there is a finite point where the number of possible solutions starts decreasing, because there are only so many pieces of the puzzle left. Then the probability of solutions decrease at an exponential rate.
The series:
1/x
1/x*1/x
1/x*1/x*1/x
1/x*1/x*1/x*1/x
1/x*1/x*1/x*1/x*1/x.....infinity.
Then decreasing:
1/x*1/x*1/x*1/x*1/x
1/x*1/x*1/x*1/x
1/x*1/x*1/x
1/x*1/x*
1/x
Just speaking, Scientist to Scientist!!!
:-)
And this is why we need to build up Information Assurance teams
(Strategic Information Assurance Cybersecurity people) see my presentation!!! to handle all of the knowledge management teams. Information Assurance comprises: Governance, Risk Management, Auditing, Compliance and Counterintelligence- GRACC. These knowledge management people are not process management people as you normally find in Business, Government or the Military who generally think in a linear and logical manner.
Knowledge management people, such as programmers, mathematicians, scientists, auditors, statisticians, etc, are not linear thinkers. They jump from place to place rather than proceed from point to point in a logical manner. These two groups are managed differently. So it is important to put them under a Strategic Information Assurance (Cybersecurity) person who works in a more collaborative management style rather than in a command and control manner of management style.
g.
Thanks.
Respectfully,
Gary S. Elliott, M.S., PMP, NSA 4011
Information Systems Security Officer (ISSO- Former)
Project Manager Professional Certification (PMP)
National Security Agency (NSA/CNSS 4011)
Certification in Information Assurance
Washington, DC 20001
United States of America
Secured Digital: 202 657-5502
Private: [email protected]
----------------------------
Posted by: GlobusProject | August 27, 2009 at 09:58 PM