Manipulating Statistics with a Small Sample Pool


"8 of 10 cats preferred Whiskas"

That statement looks pretty conclusive, but what if the sample was only 10 cats and it was just a statistical anomaly? Small samples can create errors - it's the opposite of the Law of Large Numbers.

That aside, there are other questions. What did those cats prefer the cat food to? How many other options were offered? What if they were hungry or greedy cats who would eat the first thing plonked down in front of them? How did they even measure preference? However it was done, it looks like the figure was manipulated.

Beware Small Sample Pools When Reviewing Claims

Using a small sample pool is a common way to achieve the results you need. Let's imagine I was trying to sell you a potion that guaranteed heads every time you tossed a coin:
"Guaranteed head every time you toss.
See, a head.
Right, who wants some of this potion?"
Clearly, you wouldn't buy the potion until you'd seen the vendor flick at least two dozen heads back to back. (Even then, you'd start questioning whether it was sleight of hand.) In this scenario, you wouldn't trust such a small sample because you fully understand the situation. However, we're far less likely to question a statistic when we are on unfamiliar ground, and lots of companies know this and are prepared to present statistics from small samples. A number of beauty companies selling products to make you look younger have been caught using this ploy, as have companies in the homeopathy circus. In fact, beauty-product marketing is riddled with claims that have been drawn from tiny sample sizes. The evidence supporting my "heads-every-time potion" is really no different from that supporting many of the other big-name beauty products being peddled. The difference is that buyers are usually not aware of the sample size, they trust the scientists claims, they want the claims to be true and they don't have a good grasp of what the components in the product are.

Look for the P-Value in Statistics

If you read a scientific paper describing intellectually rigorous and repeatable (a key point) tests, you might come across something called a P-value. A P-value gives a percentage chance of the answer being arrived at by chance alone (at least that is a good enough definition for our purposes). Obviously, the lower the percentage, the better. Quite simply, a P-value of 1% means that there is a 1 in a 100 chance that the result was "luck". Because of the way the Law of Large Numbers works, the larger the sample size, the lower the P-value.
Help Us To Improve

  • Do you disagree with something on this page?
  • Did you spot a typo?
  • Do you know a bias or fallacy that we've missed?
Please tell us using this form

See Also

Understanding the basics of reading body language