HYPOTHESIS TESTS A beginner's guide to Statistical Tests + toop tips.

In statistics, these wonderful techniques allow us to use some data that we have collected to make predictions and conclusions about the real world.

AN EXAMPLE: Amy and Bill are playing snakes and ladders, but Amy thinks Bill is cheating because he keeps rolling a six. Bill insists he is just being lucky. They decide to perform a Hypothesis Test to decide between the two hypotheses (Amy: “Bill’s dice is weighted”; Bill: “the dice is fine, I’m just throwing a lot of sixes”).

THE TEST: they decide to roll the dice 20 times and count how many sixes come up. Let’s suppose all twenty rolls come up with a six. Assuming it’s a normal dice, the probability of this happening is $(\frac{1}{6})^{20} \approx 0.0000000000000003$ – almost impossibly small! If this is less than some predetermined cut-off point (often 0.05 for no particular reason) we conclude that Bill is indeed using a dodgy dice.

Note that Bill could be using a fair dice. It’s just that the evidence suggest very strongly that his dice is a trick one, because getting 20 sixes in a row with a fair dice is really really unlikely (and we know this because we calculated just how unlikely). A good statistician will not say “the dice is weighted”, they will instead say “the evidence strongly suggests that…”.

THE TRICKY BIT: what if, out of 20 rolls, we had thrown a six 5 times. This seems a little higher than we’d expect but also fairly plausible. So do we conclude the dice is a trick one or not? The answer is to work out the exact probability of getting 5 or more sixes out of twenty (using a Binomial Distribution) and compare that probability with a pre-determined cut-off point. If the outcome is sufficiently unlikely then we throw out the null hypothesis (the one that says everything is fine) and conclude that, in this case, Bill is cheating.

A BETTER APPLICATION: Hypothesis Tests could be used to test whether a new medical drug is more effective than an existing drug, or whether men are more (or less) likely than women to suffer from some condition. The entire insurance industry is based on analysing historical data to try to guess how likely the insured-against event is to happen – and so set a premium.