Confounding Variables

What Are Confounding Variables?

homestatisticsConfounding Variables
In Statistics, a confounding variable is one that distorts the relationship between two other variables. Put simply, when people present the findings of their data analysis, they don't always consider everything they should.
What are confounding variables?
statistics half story argument Here is another way to think of confounding variables. When discussing Critical Thinking, an argument is usually constructed as shown in the diagram.

In the analysis of statistics, the premises are the results of analysing datasets and the environmental factors. The inference describes the relationship between the premises. In this equation, a confounding variable often takes the form of an absent premise.

All too often, a company presenting a conclusion will omit (either deliberately or unknowingly) at least one premise that ought to be included in its argument. As a result of leaving out this premise, the inference (i.e., how the company got to its conclusion) is likely to be meaningless, and its conclusion is highly likely to be false.

Tomatoes Used to Be Poisonous (Confounding Variable Example)

statistics half story argument2 Let's look at this example about tomatoes. Clearly this is nonsense. Common sense (and a basic grasp of maths and human mortality) tells you that old age killed off the pre-1910 tomato eaters, not the tomatoes. The premise "Most humans don't live beyond 100" has been omitted from this argument. That missing premise is known as a "confounding variable". A confounding variable is one that distorts the relationship between two other variables. Put simply, when people present statistics, they don't always consider everything they should.

Marijuana Is the Leading Cause of Traffic Accidents (Confounding Variable Example)

Usually, the confounding variables are far harder to spot:

"During the summer holidays, one in three drivers involved in traffic accidents tested positive for marijuana. Therefore, marijuana is causing people to drive recklessly."

statistics marijuana There are several significant confounding variables in this example, but here is a key one:

People will test positive for marijuana for 30 days after taking it. So, if I took marijuana three weeks ago and had an accident, I would be one of those drivers who tested positive, but the marijuana would almost certainly have had nothing to do with the accident. However, if an upstanding member of society believes marijuana is a pernicious influence on today's youth, he can use this statistic to blame marijuana for a high percentage of car crashes and to support his argument for stiffer sentencing. If this were a real example, we would have cause and effect linked not by scientifically supported evidence but by an illusion of evidence using statistics.

The point here is that simple statistics in support of a bold conclusion are easily challenged. You can almost go on forever challenging statistics like these. I bet almost 90% of those involved in the traffic accidents would have proved positive for coffee. What are we supposed to conclude from that? Coffee causes more accidents than marijuana? What about the reverse logic? Two thirds of those involved in the accidents did not prove positive for marijuana. Therefore, logically, these statistics could show it is twice as safe to drive under the influence of marijuana than not. That's nonsense of course. To understand the significance of the statistics presented, we would need to know a few baseline figures like the percentage of the driving population that would test positive for marijuana routinely (i.e., before having a crash).

This is a simple example, and, already, we've found fault with the inference and the conclusion by identifying other confounding variables or by highlighting why the inference is biased (e.g., marijuana is a cause when 33% proved positive, but coffee isn't a cause when 90% proved positive).

Statistics can be attacked easily, and one of the best ways to do it is to identify the confounding variables that the originator left out.

"More Crimes Are Committed During a Full Moon" (Confounding Variable Example)

statistics full moon There's a theory out there that more crimes are committed during a full moon than during other phases of the moon. It's certainly a statistic that is believed by lots of policemen on the beat, and it's been backed up on more than one occasion by crime-database analysis. Surely, there can only be one explanation: it's the inner werewolf in us all. A full moon obviously makes us all go a little bit crazy. It is, after all, where the word lunatic comes from. Well, can a full moon make us all go a bit mad?

Soldiers will tell you that going on night-time patrol during a full moon is a bad thing. Well, it's good a thing to see where you are going, but it's a bad thing for remaining undetected. Statisticians who have studied this lunar effect (or the Transylvanian Effect as it's also known) are divided as to whether the rise in crime rate during a full moon is a statistical anomaly (probably caused by a small dataset) or because more criminals are seen plying their trade in the increased moonlight.

Now, if someone stood up publicly and presented the "inner werewolf" idea and backed it up with some very comprehensive statistics collected across every police force on the planet for the last hundred years, his presentation would be debunked instantly as soon as you raised the better-light-conditions idea. That's the power of finding confounding variables in others' statistics. It's also one of the dangers of spinning statistics to support your arguments.

See Also

Help Us To Improve

  • Do you disagree with something on this page?
  • Did you spot a typo?
  • Do you know a bias or fallacy that we've missed?
Please tell us using this form
Critical Thinking guru? critical thinking test

Take Our Test.

next up: