No doubt you've encountered the image below from Gizmodo in some PowerPoint somewhere this year.
But that same PowerPoint likely didn't bother to answer the next logical question:
How to get to causality?
It's not an easy question to answer. Having really, really good correlation is definitely not the answer. First a couple of counterexamples.
Common ancestral cause
Putting aside spurious correlations such as the one above, the much more common scenario is that of a common cause, such as shown below. Finding the correlation of "street wet" and "hair wet" in some data set does not lead to the conclusion that one follows from the other.
Well, the way to avoid coincidental correlation is apply the "gold standard" of statistics and scientific experimentation, the controlled randomized experiment, right? Consider the following experiment. We set up a bunch of plugged-in microwaves, each with its own cup of room temperature water with a tea bag inserted. For a randomized half of the set of microwaves, we push the "start" button on the microwave (the "treatment" to use the terminology from randomized experimentation), and on the other half we do not push the button.
The results are highly correlated. We've employed the gold standard of scientific experimentation. Can we say that finger pushing causes hot tea? In a sense, yes, but not in the common sense of the word. What happened?
Three things. First, the finger pushing is an indirect cause, as shown below.
Second, to use the terminology from the study of causation, finger pushing is a sufficient cause in the context of the experimental conditions, but it is not necessary. It is not necessary because we could have also arrived at hot tea by opening up the microwave and assaulting the cup with a blowtorch. Although there are a lot of common sense uses of the word "causation" that lack "necessity", the strongest types of causation are both necessary and sufficient.
Third, pushing the button on the microwave is really just a contributory cause, isn't it? Our experimental assumptions included that the microwave was plugged in, so it's getting it's energy from there, as shown below.
And so on... we could trace the right branch all the way back to the sun, the Big Bang, and the Aristotelian First Cause. But just that "Electricity generated from coal" makes a much better common sense "cause" than does finger pushing. It's because that is where the "motion" is coming from -- the turbine spinning at the power plant is causing the water to heat up in our microwave. The finger pushing is merely a contributory cause.
Now the confession, and the heart of the matter. I've (mis)led you down this path to illustrate the complexities of determining causation. Causation is the meat of the philosophical giants around the world and throughout time.
But what about the scientific method? Isn't that supposed to allow us to establish causation through repeated experimentation, at least the "sufficiency" form of causation?
Let's review the process of the scientific method:
It's all scientific and straightforward, right? Pay closer attention to step #1. Hypothesize. What is happening there? That's where the magic of causation is happening, and not in step #3, analysis and statistics, where it is often presumed. Hypothesis forming comes from the advanced intellect of people. People form models of how things work in their minds and form hypothesis, and then try to verify or disprove those hypotheses.
Just to be clear, I am not referring to statistical models, but rather models like the atomic model or an engineering model of a car engine. These models can only come from the minds of people (or AI that mimics in some way the minds of people -- e.g. automated Bayesian network construction or automated semantic analysis). Models don't get spit out from a statistical analysis.
And recall the complexities of the philosophy of causation discussed at the beginning of this post. They are still there. So causation requires modeling and philosophy, both of which are hard and messy.
That can't fit on a single PowerPoint, so it's no wonder there's not a slide on it following the infamous IE/Murder correlation slide.