Correlation vs. Causation – and What Are the Implications for Our Project?

by Michael Reames and Gabriel Kemeny - ProcessGPS on March 3, 2014

Linkedin Facebook Twitter Email

  ProcessGPS_-_Correlation_vs._Causation_-_and_What_Are_the_Implications_for_Our_Project.pdf (783.6 KiB)

 

In problem solving, accurately establishing and validating root causes are vital to improving processes. A common issue with teams of subject matter experts is they are biased, and are easily drawn to explanations about likely cause-and-effect relationships. What sets these teams apart from unenlightened approaches to root-cause analysis is that they seek statistically significant relationships to suspected causes and unsatisfactory effects. The unenlightened approach can be described as the intuitive or “seat of the pants” approach; one way of visualizing this is the statement “Don’t bother me with the data – I know what’s going on, and how this problem should be resolved.”

It is very useful for process improvement teams to know the difference, then, between correlation and causation. In so doing, the team realizes that correlation is not enough, and that an effective problem-solver seeks to understand causation.

For purposes of this article, an adequate dictionary definition of correlation is: “a relation existing between two or more things that tend to vary, be associated, or occur together in a way not expected on the basis of chance alone.” These “things” can be described mathematically: hence, correlation refers to how closely two sets of information or data are related.

Causation goes a step further by defining “a relationship in which one action or event is the direct consequence of another.”   

Meeting only one of these standards is not sufficient for validation. A typical approach is using one’s experience with a process to suggest a cause-and-effect relationship. Many leaders have achieved success by being decisive, often without good data to back them up. Short of information (data) demonstrating a significant statistical correlation, this knowledge is simply intuitive – “seat of the pants” – expertise.  Solutions based on intuition and not a demonstrated correlation may improve the process, but only to the extent that the intuitive guess is correct (i.e., has underlying data supporting it, but which was never collected). In other words, sometimes our intuition is right; but not always.

On the other hand, a strong mathematical relationship (correlation) between two variables does not by itself confirm that one causes the other. For an example, consider the fallacy that storks bring babies. Indeed, in nineteenth-century northern Europe villages and towns, there was a remarkably strong and continuing correlation between the local stork population and the birth of babies. This led to the enduring myth that storks bring babies. How else to explain the correlation?

 

Figure 1: Storks bring babies

Although the fallacy of this myth is obvious, it is impossible to refute the correlation. Perhaps there is an underlying cause that creates both effects, i.e., that increases the stork population and baby births. 

A bit of further knowledge (data gathering) is enlightening:

 

  • Northern Europe experiences cold, harsh winters
  • Storks prefer protected, relatively warm nesting sites
  • Dwellings almost always included fireplaces with chimneys atop the roofs
  • The roofs near the chimneys provided protected, warm nesting sites
  • As families grew (babies), more dwellings were built (hence, more fireplaces and chimneys)
  • Storks were attracted to the abundance of new nesting sites

Hence, an increase in homes provided for the increases in the human population, while at the same time drawing storks into the same areas to take advantage of good nesting sites.

This simple example demonstrates the risk of automatically assuming that a certain indicator has an impact on another indicator. Many times variables and indicators have mutual relationships that are easily proven mathematically (correlation). Even so, that does not necessarily mean that one thing has had an effect on the other. 

Many statistical tools assist in establishing a statistical correlation. Scatter diagrams and regression are techniques that visualize the relationship between pairs of variables (Figures 2 and 3 are examples of scatter diagrams). The strength of the relationship is the correlation coefficient “r” which ranges from +1 to -1:

Perfect positive relationship         +1

No linear relationship                      0

Perfect negative relationship        -1

Hypothesis testing helps us to handle uncertainty objectively. These include t-test, ANOVA (Analysis of Variance), Proportions test, Chi-Square analysis, and Logistic Regression. Each test is based on the characteristics of the gathered data. Applying the appropriate test allows us to confirm or disprove assumptions and to control our risk of making wrong decisions. They help teams to make fact-based decisions about process improvements, rather than intuitive guesses.

In the example of storks and babies, a verified statistical correlation between two variables X and Y is not by itself conclusive. Rather, it may lead to a number of possible alternative conclusions: 

  1. X affects Y (i.e., the cause creates the effect)
  2. Y affects X (i.e., the effect creates the cause)
  3. X interacts with Y (each affects the other)
  4. Other (or unknown) variables may affect both X and Y
  5. A combination of some or all of the above
  6. A pure coincidence (highly unlikely)

Thus, although it is exciting for a process improvement team to discover a significant statistical correlation, the team needs to investigate further for possible causation, even if the “statistically correlated” relationship has a large effect. 

Proven past process or scientific knowledge (by subject matter experts) may help the team eliminate some (or all) of these alternative conclusions. Too frequently, however, there are cases where there is no expert knowledge available regarding the factors that may affect a particular response. If such knowledge is unavailable, the team can use the scientific method to acquire new knowledge and to discover causation (or to refute it) through experimentation.

The technique known as Design of Experiments DOE) allows the experimenter to manipulate controllable factors (independent variables) at different levels to see their effect on some response (dependent variable). By manipulating inputs to see how the output changes, he/she begins to understand and model the dependent variable (Y) as a function of the independent variable (X).

In summary, just because two things occur together does not prove that one caused the other, even if it seems to make sense. Our intuition often leads us astray when it comes to distinguishing between causality and correlation. Validation of a root cause is achieved only when two standards are met:

  1. There is a statistically significant relationship between the suspected root cause and effect (i.e., a correlation); and
  2. Knowledge of the process assures that a causal relationship is feasible and likely to exist.

ProcessGPS has compiled many examples of interesting correlations. All have been drawn from a general Bing/Google search. Consider:

Table of Correlation Examples

 2b

Figure 2: Annual Per Capital Chocolate Consumption and the Number of Nobel Laureates per 10 Million Population

3

Figure 3: Lemon Importation from Mexico (Metric Tons) vs. U.S. Highway Fatality Rate

Lean Six Sigma Certification

February 4, 2014

Lean Six Sigma (LSS) certification is a watershed achievement for a quality practitioner, confirming an individual’s capabilities with respect to a specific body of knowledge. The certification itself recognizes the significance of the individual’s contribution to process improvement for an organization. An individual may seek to be certified in Lean Six Sigma for any of […]

Read the full article →

The Central Role of Creativity in Problem Solving

January 3, 2014

In problem solving, creativity and innovation are “levers” in eliminating validated root causes. A variety of tools and techniques can stimulate collective thought to produce a large number of improvement ideas. Improvement objectives are often stated in terms of elimination or reduction of the root causes of process variation. Innovation is necessary for business survival: […]

Read the full article →

The Spectrum of Problem-Solving

August 28, 2013

“When you’ve got a hammer, everything looks like a nail.”  Translation: If you only have a certain ability or mind set, you try to apply it to every situation even if that situation calls for a different ability or mind set. Process improvement can take many forms. As long-time consultants in the field of Six […]

Read the full article →

A Layperson’s Guide to Hypothesis Testing

August 31, 2011

In a recent Black Belt Class, the partners of ProcessGPS had a lively discussion about the topic of hypothesis testing. Sadly, many individuals (yes, even Black Belt candidates) start perspiring when confronted with such daunting topics as p-values, sample size determination, t-tests, analysis of variance (ANOVA), and Chi-Square analysis. Some even have difficulty pronouncing them, […]

Read the full article →

The Master Black Belt: Hire or Consult?

August 13, 2011

Recently, two potential clients posed the partners at ProcessGPS an interesting question: We recognize that we need someone with Master Black Belt skills in our organization. Should our organization seek and hire this person, or use a consulting resource on a contract basis?  The answer is not immediately obvious, and it certainly depends on knowing […]

Read the full article →

The Importance of the Black Belt in Process Improvement

June 27, 2011

When used to their optimum potential, Black Belts serve as trained, full-time team leaders in Lean Six Sigma (LSS) organizations. Considered to be the “heart” of process improvement and process design/redesign, they implement process improvement by leading projects using the methodology and tools of Lean and Six Sigma. While many successful Black Belts (BBs) have […]

Read the full article →

Project Alignment, Selection and Prioritization Using Quality Function Deployment (QFD)

December 29, 2010

By now, you have probably reviewed your organization’s strategic goals for 2011. Perhaps you are thinking about ways to engage your people to achieve these goals. How can you organize your efforts to best fulfill your strategic expectations and address the many challenges during this upcoming year? Your customers are interested in the value of […]

Read the full article →

Time Management: Covey, Juran, and Lean Six Sigma

September 10, 2010

Stephen R. Covey, in his book “The 7 Habits of Highly Effective People” describes four basic types of time management approaches, and states that these four approaches are the result of an evolutionary process. “The first wave or generation could be characterized by notes and checklists, an effort to give some semblance of recognition and […]

Read the full article →

The Idea Behind the “GPS” in “ProcessGPS”

August 15, 2010

  GPS is a wonderful technological advance, don’t you agree? No longer do independent types need to feel embarrassed, for no longer do they have any cause to stop to ask a stranger for directions. Still, we must admit that the various units we’ve used are not entirely error-free, and they do create some minor […]

Read the full article →