Data Science: Business Meet Science
Nearly all IT professionals can explain “data” and how it lives, however, what about the “science” part? Do you know enough about Hypothesis Testing? Falsifiability? P-values? If not, that’s okay, IT Professionals, meet science.
Falsifiability: The Sun Will Explode At The End Of The Month
Falsifiability is the assertion that for any proposed explanation made on the basis of limited evidence, must be inherently disprovable before it can become accepted as a scientific theory.
Where “no theory is completely correct, but if it can be shown both to be falsifiable and supported with evidence that shows its true, it can be accepted as truth” (Shuttleworth & Wilson, 2008). This means that an idea is falsifiable if it can be true and observed.
Example: “The sun will explode at the end of the month.” This is not yet been falsified because it’s a future prediction. However, we can begin making observations and as the date draws closers, we can observe if the sun will explode or not and falsify the claim. Hopefully! (In the case of Einstein, it was a solar eclipse. +1 if you knew this).
In science, we “make claims or statements about a property of a population” (Triola, 2015) which we call a hypothesis – again, it must be inherently falsifiable. Our hypothesis is tested by using a model and we attempt to disprove our models. Our models are not truth; rather, considered evidence to support our claim about the population.
Testing The Hypothesis
Making Claims: The Null And The Alternative
Again, we are attempting to make claims about a property of the population(s) we are studying, however, we do not have the ability, except in very rare circumstances, to know the true properties of the population(s) in question. This is why statistics and a rigorous set of procedures is used to help scientists. This is why we perform hypothesis testing.
The process is simple and repeatable. The first step is to create a set of claims to be used – the Null Hypothesis (Ho) and an Alternative Hypothesis (H1).
The Null Hypothesis is the assumption we are making about the population. Maybe we’re saying, “There is a 50% chance of landing heads in a coin flip.”
The Alternative Hypothesis states that the model we are testing is different from the Null (Ho). If we’re skeptical of the coins produced in 1987 and claim they have a tendency for heads then we might say, “There is a greater than 50% chance of landing heads in a coin flip.”
This would look like:
(Ho): p = 0.5 (Null Hypothesis)
(H1): p > 0.5 (Alternative Hypothesis)
Significance
Next we want to select the Significance Level (alpha) based on the seriousness of error. We call this a Type 1 error – where we mistakenly reject the Null Hypothesis when it is actually the correct hypothesis. Using the example above, we would be saying, “Coins produced in 1987 have a tendency for heads” when, in reality, there’s nothing wrong with the coins.
You might be asking yourself, “if we can’t accept the Alternative Hypothesis, do we accept the Null Hypothesis?” Unfortunately, no we can’t. Remember, our assumptions about the population are impossible to know and our Alternative Hypothesis is being used to help provide evidence for our original claim. Again, we are creating a model to assess if there is evidence to reject our assumption of the population.
Is that confusing? I hope not, but one metaphor I find useful is a courtroom and the rule of ‘innocent until proven guilty beyond a reasonable doubt’… let’s say with 95% certainty. If there is “evidence presented that doesn’t prove the defendant is guilty, you have not proven that the defendant is innocent; but, based on the evidence, you can’t reject that possibility” (Martz, 2013). Our job is to prove the model of being guilty beyond a reasonable doubt.
Most hypothesis tests use a Significance Level (alpha) of 95%. Scientists are saying, “we need to be confident by 95%, however, we accept the risk of rejecting the Null Hypothesis when we really shouldn’t by 5%.” This accounts for the Type 1 error discussed above.
Test statistics and p-values
There is no single test statistic that can be used for all applications. However, the test statistics are generally measuring how far things are falling away from the mean (aka “the average”). When things are falling further away from the mean then they are not conforming to the norms of the population that we assume in our model. If we are flipping coins from 1987 and the average flips landing on heads is somewhere around 80%; but coins from other years are closer to 50% then our test statistic would show that 1987 coins are several deviations away from the average population of all coins flipped.
In a normal distribution, 95% of all observations are within 2 standard deviations of the mean. This means if we got a test statistic, like a t-score, that is 3.25 then we are approximately 3 standard deviations away from the mean. That’s important because it helps us decide if we are going to accept our Alternative Hypothesis (and 3.25 standard deviations away from the mean is a pretty big deal).
The next thing we want to look to is the p-value. When our observation is several standard deviations away from the mean, the p-value is the probability of getting another observation that is as contradictory as the first in our model. Generally, when the p-value is less than 0.05, we know we’re in good shape, due to the level of risk we assumed for a Type 1 error.
Share your thoughts on data science and statistics in a comment below. If you’d like to have a discussion, please contact me or connect with me on social media!
References
Aschwanden, C. (2015, November 24). Not Even Scientists Can Easily Explain P-values. Retrieved February 2018, from FiveThirtyEight: http://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/
Aschwanden, C. (2015, August 19). Science Isn’t Broken. Retrieved February 2018, from FiveThirtyEight: https://fivethirtyeight.com/features/science-isnt-broken/#part1
Aschwanden, C. (2016, March 07). Statisticians Found One Thing They Can Agree On: It’s Time To Stop Misusing P-Values. Retrieved February 2018, from FiveThirtyEight: http://fivethirtyeight.com/features/statisticians-found-one-thing-they-can-agree-on-its-time-to-stop-misusing-p-values/
Becker, K. (2015, February 11). Does Science Need Falsifiability? Retrieved February 2018, from PBS: http://www.pbs.org/wgbh/nova/blogs/physics/2015/02/falsifiability/
Martz, E. (2013, January 30). Bewildering Things Statisticians Say: “Failure to Reject the Null Hypothesis”. Retrieved from The Minitab Blog: http://blog.minitab.com/blog/understanding-statistics/things-statisticians-say-failure-to-reject-the-null-hypothesis
Resnick, B. (2017, July 31). What a nerdy debate about p-values shows about science — and how to fix it. Retrieved February 2018, from Vox: https://www.vox.com/science-and-health/2017/7/31/16021654/p-values-statistical-significance-redefine-0005
Shuttleworth, M., & Wilson, L. T. (2008, September 21). Falsifiability. Retrieved February 2018, from Explorable: https://explorable.com/falsifiability
Triola, M. F. (2015). Essentials Of Statistics 5th Edition. Boston, MA, USA: Pearson Education, Inc.