Contrast between Correlation and CausationUnderstanding the difference between correlation and causation isn’t just for data nerds or those who love semantics. It’s a key to thinking clearer in general.

By the end of this essay, you’ll understand how these two ideas are fundamentally different. And why so many people mistakenly think correlation entails causation.

After all, we’ve seen credentialed “experts” in their fields misinterpret correlation as causation, even when it comes to topics that are in their field of expertise — their strike zone in terms of core competence.

So what’s going on? How does this happen to sincere, well-educated people who have spent their lives studying a topic?

Well, the answer is rather simple: We can be suckers for meaning and stories — as we will see across this article grounded in the ideas of Nobel-Prize winner, Daniel Kahneman, and best-selling author and risk analyst, Nassim Nicholas Taleb.

Let’s get started. Or if you’re feeling impatient, you can jump down to the “Key Takeaways” section.

What Is Correlation?

Correlation indicates how much two or more variables change together. The more they tend to move together, the more correlated they are. If the variables increase together, we have a positive correlation, whereas if one variable increases at a similar rate at which the other variable decreases, we have a negative correlation. So, correlation is essentially a measure of association.

Illustration of a positive correlation, and a negative correlation.

What Is Causation?

Causation indicates how one event (or variable) is the cause for the effect on another event (or variable). So, beyond mere association, it implies a cause-and-effect relationship.

What Is the Difference Between Correlation and Causation?

Correlation is often confused with causation, but they are fundamentally different because mere association (correlation) does not imply a cause-and-effect relationship (causation).

To see this difference crystal clear, let me illustrate it in a simple context. There is a clear correlation between ice cream sales and shark attacks, but no one will ever think that ice cream sales are the cause of shark attacks (or vice-versa). Now, let’s analyze a new variable: weather. It’s been detected that weather is also correlated with ice cream sales and shark attacks, but if we use common sense we can go beyond this mere association and realize that weather is the cause for changes in ice cream sales and shark attacks, because as the weather gets warmer more people want to buy ice cream and refresh themselves in the sea.

In the previous example we can see the difference clearly, but it gets more difficult as we switch to more complex domains — where there are multiple variables and dimensions (resulting in unseen factors that are critical to understand what’s going on). If you combine this with the fact that humans overvalue their acquired knowledge and undervalue what they don’t understand (or don’t know exists), many times we get harmful advice from domain “experts.”

One example of a complex domain is medicine, and one particular instance where correlation is misinterpreted as causation is the relationship between Coronary Heart Disease and cholesterol. There is an undebatable positive correlation between Coronary Heart Disease and Cholesterol, but there is a very debatable causation. It has been long established that high levels of cholesterol lead to a higher risk of developing Coronary Heart Disease, but recent research is debunking this causation link.

Dr. David Perlmutter, author of the New York Times best-seller Drop Acid, said the following in an interview (2022):

The issue that relates to risk for Coronary Heart Disease is unrelated to cholesterol. Cholesterol shows up when coronary arteries are inflamed, it shows up to heal the coronary arteries. That’s why when a person dies of a heart attack and you section their coronary arteries and you look under the microscope… you’ll see cholesterol there! [But] it’s trying to heal this inflammation, it’s like blaming the fireman because they are at the fire.

[Tom Bilyeu interview with Dr. David Perlmutter, 2022]

If, in fact, “cholesterol is there to heal the coronary arteries”… you can get an idea of how dangerous it can be to misinterpret correlation with causation and derive prescriptions for people to follow; especially when this flawed advice is given by trusted and expert individuals or institutions, since we have an ingrained tendency to obey authorities (not only ingrained, but also acquired in schools, universities, and employment in big bureaucratic corporations).

But before diving into why we tend to misinterpret correlation as causation, we first need to explore an issue that is even more fundamental: We perceive meaning from random patterns in data.

How We’re Fooled by Randomness

Our brain is heavily conditioned to notice patterns, and assign a meaning to them — which can be to think that there is a correlation (that would hold in the future) or even a causation between variables. But the thing is… Randomness will always present some detectable pattern!

Dots perfectly distributed (fake randomness) vs. Dots showing detectable patterns (real randomness)

Nassim Nicholas Taleb said it best in his book, Fooled by Randomness:

A programmer helped me build a backtester. It is a software program connected to a database of historical prices, which allows me to check the hypothetical past performance of any trading rule of average complexity. I can just apply a mechanical trading rule, like buy NASDAQ stocks if they close more than 1.83% above their average of the previous week, and immediately get an idea of its past performance. The screen will flash my hypothetical track record associated with the trading rule. If I do not like the results, I can change the percentage to, say, 1.2%. I can also make the rule more complex. I will keep trying until I find something that works well.

What am I doing? The exact same task of looking for the survivor within the set of rules that can possibly work. I am fitting the rule on the data. This activity is called data snooping. The more I try, the more I am likely, by mere luck, to find a rule that worked on past data.

A random series will always present some detectable pattern. I am convinced that there exists a tradable security in the Western world that would be 100% correlated with the changes in temperature in Ulan Bator, Mongolia.

This issue is accentuated when someone has vast sets of data and a self-interest to find patterns (e.g. Big Data researchers) — one can simply cherry-pick the data to create a model with significant “correlation” (but of course, this is not real correlation because it won’t hold for the future! It’s just noise!). As Nassim Nicholas Taleb puts it: “The researcher gets the upside, truth gets the downside.” [Antifragile].

The Tragedy of Big Data: The more variables, the more correlations that can show significance in the hands of a “skilled” researcher. Falsity grows faster than information.

– Nassim Nicholas Taleb

[Antifragile]

Seeing shallow patterns is not a virtue — it leads to naive interventionism. Some psychologist wrote back to me: “IQ selects for pattern recognition, essential for functioning in modern society.” No. Not seeing patterns except when they are significant is a virtue in real life.

– Nassim Nicholas Taleb

[“IQ is largely a pseudo-scientific swindle”]

How We’re Fooled by Causation: The Narrative Fallacy

In his book Thinking Fast and Slow, Nobel-Prize winning psychologist, Daniel Kahneman, argues that we are biased to think in causal lines. And he demonstrates this with the following experiment:

Which is more probable?
– That a mother has blue eyes if her daughter has blue eyes?
– Or that the daughter has blue eyes if her mother has blue eyes?

The intuitive response is that it’s more probable that the daughter has blue eyes if her mother has blue eyes than the other way around.

[But] if you stop to do the math on the assumption that the incidence of blue eyes are the same in the two generations… The probabilities are strictly equal.

But even before you do the math your reasoning flows along causal lines. Your thinking flows along causal lines. This happens intuitively and it feels more coherent — and the coherence that we experience can be turned into a [bad] judgment of probability…

[Thinking, Fast and Slow (Talks at Google)]

Something worth noting is that this happens intuitively… We are not aware of it!

Illustration of our intuitive response to think along causal lines on raw information.

From an evolutionary standpoint, we have evolved through narratives and stories passed orally among people. Thus, it makes sense that we would prefer stories over raw data or statistics, and (in lack of stories) it makes sense we tend to create a narrative from the raw data.

The most powerful person in the world is the story teller.

– Steve Jobs

[The Most Powerful Person in the World According to Steve Jobs]

We like stories, we like to summarize, and we like to simplify, i.e., to reduce the dimension of matters. The first of the problems of human nature that we examine in this section, is what I call the narrative fallacy. (It is actually a fraud, but, to be more polite, I will call it a fallacy.)

The fallacy is associated with our vulnerability to over-interpretation and our predilection for compact stories over raw truths. It severely distorts our mental representation of the world; it is particularly acute when it comes to the rare event.

The narrative fallacy addresses our limited ability to look at sequences of facts without weaving an explanation into them, or, equivalently, forcing a logical link, an arrow of relationship, upon them. Explanations bind facts together. They make them all the more easily remembered; they help them make more sense. Where this propensity can go wrong is when it increases our impression of understanding.

– Nassim Nicholas Taleb

[The Black Swan]

Everything should be made as simple as possible, but not simpler.

– Albert Einstein

News outlets know we like narratives, which is why most news headlines tend to have a reason (even when they have no idea if there is actually a cause-and-effect relationship)…

We harbor a crippling dislike for the abstract.

One day in December 2003, when Saddam Hussein was captured, Bloomberg News flashed the following headline at 13:01: U.S. TREASURIES RISE; HUSSEIN CAPTURE MAY NOT CURB TERRORISM.

Whenever there is a market move, the news media feel obligated to give the “reason.” Half an hour later, they had to issue a new headline. As these U.S. Treasury bonds fell in price (they fluctuate all day long, so there was nothing special about that), Bloomberg News had a new reason for the fall: Saddam’s capture (the same Saddam). At 13:31 they issued the next bulletin: U.S. TREASURIES FALL; HUSSEIN CAPTURE BOOSTS ALLURE OF RISKY ASSETS.

So it was the same capture (the cause) explaining one event and its exact opposite. Clearly, this can’t be; these two facts cannot be linked.

It happens all the time: a cause is proposed to make you swallow the news and make matters more concrete. After a candidate’s defeat in an election, you will be supplied with the “cause” of the voters’ disgruntlement. Any conceivable cause can do. The media, however, go to great lengths to make the process “thorough” with their armies of fact-checkers. It is as if they wanted to be wrong with infinite precision (instead of accepting being approximately right, like a fable writer).

– Nassim Nicholas Taleb

[The Black Swan]

A Three Step Guide on Thinking Correctly About Correlation and Causation

1. Whenever you detect a pattern in raw data, check if there is actually a meaningful correlation and think critically about it — as we are biased to find meaning even in completely random data.

2. When you see a correlation, don’t assume that there is an implied causation. Remember that the correlation can be coincidental, or it can be the effect of an unseen variable (or multiple unseen variables).

3. When you detect a causation, try to think about other factors that might also contribute — as we are biased to “reduce the dimensionality of matters” (i.e. making it more simple than it actually is).

Key Takeaways – Correlation vs. Causation

  • Correlation is a measure of association.
  • Causation goes beyond association, implying a cause-and-effect relationship.
  • Based on our human condition, we are prone to detect patterns in data and find meaning in those patterns, without realizing that randomness will always present some detectable pattern. So, many times we see correlations where there are none.
  • Intuitively, our thinking flows along causal lines. So we are prone to find causation even when no real cause-and-effect relationship exists.

If You Liked This Essay, Check Out These Sources