Understanding Confounding Variables in Statistics
In simple terms an argument is constructed as shown in the diagram. However, all too often, a company presenting a statistic will omit (either deliberately or unknowingly) at least one premise that ought to be included in its argument. As a result of leaving out this premise, the inference (i.e., how the company got to its conclusion) is likely to be meaningless and its conclusion is very likely to be false.
Tomatoes Used to Be Poisonous (Confounding Variable Example)Let's look at this example about tomatoes. Clearly this is nonsense. Common sense (and a basic grasp of maths and human mortality) tells you that old age killed off the pre-1910 tomato eaters, not the tomatoes. The premise "Most humans don't live beyond 100" has been omitted from this argument. That missing premise is known as a "confounding variable". A confounding variable is one that distorts the relationship between two other variables. Put simply, when people present statistics, they don't always consider everything they should.
Marijuana Is the Leading Cause of Traffic Accidents (Confounding Variable Example)Sometimes, however, the confounding variables are harder to spot:
"During the summer holidays, one in three drivers involved in traffic accidents tested positive for marijuana. Therefore, marijuana is causing people to drive recklessly."
There are several significant confounding variables in this example, but here is a key one:
People will test positive for marijuana for 30 days after taking it. So, if I took marijuana three weeks ago and had an accident, I would be one of those drivers who tested positive, but the marijuana would almost certainly have had nothing to do with the accident. However, if an upstanding member of society believes marijuana is a pernicious influence on today's youth, he can use this statistic to blame marijuana for a high percentage of car crashes and to support his argument for stiffer sentencing. If this were a real example, we would have cause and effect linked not by scientifically supported evidence but by an illusion of evidence using statistics.
The point here is that simple statistics in support of a bold conclusion are easily challenged. You can almost go on forever challenging statistics like these. I bet almost 90% of those involved in the traffic accidents would have proved positive for coffee. What are we supposed to conclude from that? Coffee causes more accidents than marijuana? What about the reverse logic? Two thirds of those involved in the accidents did not prove positive for marijuana. Therefore, logically, these statistics could show it is twice as safe to drive under the influence of marijuana than not. That's nonsense of course. To understand the significance of the statistics presented, we would need to know a few baseline figures like the percentage of the driving population which would test positive for marijuana routinely (i.e., before having a crash).
This is a simple example, and, already, we've found fault with the inference and the conclusion by identifying other confounding variables or by highlighting why the inference is biased (e.g., marijuana is a cause when 33% proved positive, but coffee isn't a cause when 90% proved positive).
Statistics can be attacked easily, and one of the best ways to do it is to identify the confounding variables that the originator left out.
"More Crimes Are Committed During a Full Moon" (Confounding Variable Example)There's a theory out there that more crimes are committed during a full moon than during other phases of the moon. It's certainly a statistic that is believed by lots of policemen on the beat, and it's been backed up on more than one occasion by crime-database analysis. Surely, there can only be one explanation: it's the inner werewolf in us all. A full moon obviously makes us all go a little bit crazy. It is, after all, where the word lunatic comes from. Well, can a full moon make us all go a bit mad?
Soldiers will tell you that going on night-time patrol during a full moon is a bad thing. Well, it's good a thing to see where you are going, but it's a bad thing for remaining undetected. Statisticians who have studied this lunar effect (or the Transylvanian Effect as it's also known) are divided as to whether the rise in crime rate during a full moon is a statistical anomaly (probably caused by a small dataset) or because more criminals are seen plying their trade in the increased moonlight.
Now, if someone stood up publicly and presented the "inner werewolf" idea and backed it up with some very comprehensive statistics collected across every police force on the planet for the last hundred years, his presentation would be debunked instantly as soon as you raised the better-light-conditions idea. That's the power of finding confounding variables in others' statistics. It's also one of the dangers of spinning statistics to support your arguments.