Last changed
21 Feb 2012 ............... Length about 800 words (8,000 bytes).
(Document started on 15 Feb 2005.)
This is a WWW document maintained by
Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/best/correlation.html.
You may copy it.
How to refer to it.
Web site logical path:
[www.psy.gla.ac.uk]
[~steve]
[best]
[this page]
Correlation and causation
By
Steve Draper,
Department of Psychology,
University of Glasgow.
|
Correlation is not causation (but it sure is a hint).
|
The possible causal relationships
Correlation and causation: if A and B are correlated, any one of these different
causal relationships may underlie it:
A ⇒ B
B ⇒ A
C ⇒ A, B
A ⇔ B, so doing either one will increase the other. Bi-directional
causality.
A ≡ B. Tautology / identity. Non-causal.
Correlation is a big hint about causality, but it is ambiguous, and mistakes
are frequently made.
If A is correlated with B, then all five of these relationships are equally
possible, given only that evidence.
- A causes B
- B causes A
- A third factor C causes both A and B not necessarily at the same time
(the electrical discharge of lightning causes both flash and boom, light and
sound arriving at different times).
- A and B both increase (cause) the other, as in any positive feedback loop
(vicious circle). For instance, two adjacent blocks of explosive: if one goes
off, it will set off the other; if person A annoys B, B is likely to
retaliate; if a student's motivation is high they are more likely to learn,
but if they succeed at learning their motivation will rise (so motivation is
often an effect, a symptom, not a prime mover); if A sees B as beautiful A is
more likely to be attracted to B, but if A loves B then A is more likely to
see B as beautiful.
- A ≡ B. Tautology / identity. A and B have to occur together
because they turn out to be the same by definition. (See section below.)
Caused by the same third factor
Number 3 above may be the most troublesome.
It is particularly misleading when the time delays involved are consistent
with one direction of causality, but not the other; yet a third factor
is actually prior to and causing both.
What is also misleading is when these cases are reported with no statement
about causality made, leading to almost all readers drawing the false, or at
least unwarranted, conclusion the writer wanted.
Case 1.
School children who are involved with employers (e.g. in work experience)
before they leave school are more likely to end up employed.
(But the factors that make a child more likely to participate in these schemes
may cause both participation and then success at job seeking e.g. liking work,
being stimulated by environments outside the home, not having to stay home to
care for relatives.)
Case 2.
Big budget movies which are promoted at the Superbowl gross about 40% more
than those who don't.
(But having more money for promotion predicts success; and so perhaps does
appealing to the kind of audience that watches the Superbowl.)
Other possible problems (The possible non-causal relationships)
A ≡ B. Tautology / identity.
E.g. If my paternal grandfather's only son is called 'Martin' then my
father is called 'Martin'. If the temperature is zero Centigrade then it is
32 degrees Fahrenheit. One doesn't cause the other: it is another way of
referring to or describing the same thing. They will be perfectly correlated,
not because of causation, but due to another kind of determination.
Conversely, there can be complete determination by definition, yet zero
correlation because correlation is a linear relationship.
E.g. as in the equation
y = {x × x}
or y = {x^2} .
The slogans
Correlation does not entail causation.
(Or as it is more often expressed, correlation does not necessarily imply
causation. Or as it is a little carelessly put even more often "correlation
does not imply causation", even though in fact some of the most important
scientific advances have come precisely because scientists did investigate that
implication.)
As Tufte observes (following David Hume), it's more accurate to say:
|
Empirically observed covariation is a necessary but not sufficient condition
for causality
| |
or
| |
Correlation is not causation but it sure is a hint
|
|
A variation: traits (and correlations over time)
A similar tendency to faulty inference occurs around time scales and states
vs. traits.
Just because a property of a person (or thing) is "reliable", i.e. strongly
correlated over time when you do test, re-test measures, this doesn't tell you
anything about how easy it might be to change it; but the temptation is to
label it a trait.
A person may be poor for years, but one windfall can change that forever
overnight.
If you insist people express a preference for visual over audio materials
for learning, they will do so in a moderately "reliable" way. But this is not
predictive of how well they learn with each kind of material, even though that
inference is drawn by large numbers of published papers.
For a long period in the UK, few girls studied science subjects and it
was assumed that strong forces were at work. When enough pressure was applied
to teachers to make them change their advice, then the numbers changed in less
than a year in the schools where that pressure had been applied. It turned
out there were no large forces on or within the girls preventing it, despite
it being a big effect, and highly reliable up to that time.
On the other hand:
Many smokers believe they can stop any time, and that the predictability
and stability of their habit is misleading. The evidence is against that.
Human body weight is extremely resistant to dietary change: (past weight
is a good predictor of future weight). Because there are many feedback
mechanisms that adjust to cancel out changes of food input.
Contrary to what was believed at one time, sexual orientation and identity
can be very resistant to even extreme external social pressure.
Reliability (i.e. correlation over time) is no predictor of how easy or likely
something is to change.
Web site logical path:
[www.psy.gla.ac.uk]
[~steve]
[best]
[this page]
[Top of this page]