Chapter 3 Hypothesis Testing
In this chapter by Dr. H.G.J. van Mil, the concept of a test statistic is explained.
3.1 Introduction
After having discussed the basic concepts that motivate the use of statistics and their relation with research questions, hypothesis, experimental design and data, we are now ready to look at hypothesis test based on data. First we investigate, in some detail, the t-test. The reason for this being that the t-test is the work-horse of this course as it will reincarnate in different forms in a variety of tests. But the t-test illustrates elements of statistic testing that is common in other test-statistics. Chapter 4 deals with the morning lecture and exercises, whereas the afternoon lectures and exercises are displayed in chapter 5.1 and deals with the practical application of the t-test.
3.2 Statistics, test-statistics & hypothesis tests
This paragraph shortly revisit the standard statistics of the population and sample mean and variance to explain the concept of normalization; the normalization occurs though the sample size or the degree of freedom. This is important because a good normalization allows us to compare different system with each other. For instance, it makes no sense to compare the sums of squares of two sample with unequal sample size (why?). However, we can compare the normalized sum of squares like the variance also known also the mean sum of squares.
The method of normalization is also applied in test-statistics like the t-test. Once this normalization is combined with the properties of the null-hypothesis (no effect), the actual t-test is made possible and the the t-distribution can be derived. In the screencast below we will make this link
The line of reasoning used.
- Compare one mean to a value or two means with each other (effect size).
- Normalize the effect size with a measure for the reliability of the estimates sample means (SE).
- Being a random variables, the t-value has a distribution (t-distribution).
- Utilizing the properties of the null-hypothesis that the expected effect size is 0, we can derive that expected t-value is 0 with a variance of 1. This is true for all data involved in t-test
- We can use the t-distribution, in conjunction with the t-value and the degrees of freedom to calculate \(P(\text{Data}|H_0)\).
or in a more formal way
- \(\overline{y}_{A} - \overline{y}_{B} = \text{effect size}\)
- \(\frac{\overline{y}_{A} - \overline{y}_{B}}{SE} = t\)
- \(\frac{\overline{y}_{A} - \overline{y}_{B}}{SE} = t \implies \frac{N(\overline{y}_{A} - \overline{y}_{B}, SE)}{SE} = t\text{-distribution} = \frac{\Gamma ( \frac{\text{df}+1}{2}) }{\sqrt{\text{df}\cdot\pi}\Gamma( \frac{\text{df}}{2})}(1+\frac{t^2}{\text{df}})^{-\frac{\text{df}+1}{2}}\)
- \(\frac{\overline{y}_{A} - \overline{y}_{B}}{SE} = t_{H_0} \implies \frac{N(0, SE)}{SE} = t\text{-distribution}(t,\text{df}|\alpha)\)
3.3 One sample t-test: step by step
In this screencast we will look closer to the output of the R function for the t-test. First I will perform all the individual steps and than use the t.test
function in R. The actual use of this function will be discussed in more detail in paragraph 5.1
Question:
If we would perform another experiment producing a new confidence interval which includes 82 mmHG, what would that mean in relation to the population mean?
3.4 Assumptions
In all statistical test we make some kind of assumptions. For instance because the t-distribution is derived from a normal distribution, we make an explicit assume that that is the case. As we shall see, all the methods discussed in this course have assumption that need to be tested. In the case of the t-test we can test the assumption before the actual test is performed, for the other test we can only test the assumption after we run the test.
Question:
- Why do we need to test if the sample distribution is similar to a normal distribution?
- Why do we tests for outliers?