Chapter 9 ANCOVA & Model Selection

ANCOVA is multiple linear regression with both categorical and continuous explanatory variables.

Model selection is a set of guidelines for choosing the right model. The best model depends on what the model is intended to be used for.


9.1 Introduction

We now turn our attention to to the case where we have at least one numerical and at least one factor as explanatory variable. These type of models we call ANCOVA or ANalysis of COVAriance and a simple example is the model Y ~ F + X + F:X with F a factor and X a numeric variable.

Most of the ground work has been covered in the screencasts on one-way ANOVA, linear regression and factorial ANOVA. Therefore this set of screencasts on ANCOVA are more applied, following the same steps as the factorial ANOVA and the differences are emphasized where needed. Try to identify the similarities and differences between these multiple regression models.

The first screencast discusses the difference between the interpretation of an ANCOVA model as compared to an factorial ANOVA model. The second screencast runs trough the analysis of the of an ANCOVA step by step.

9.2 The ANCOVA explained by simulations

Simulations are used to explain the challenges faced when analyzing ANCOVA; when are interaction’s significant in the ANCOVA context and what does this mean. First we simulate the same model without interaction multiple times using different set.seed() values to the investigate the uncertainty in the estimates of the slopes. Then we simulate a model with interaction.

9.3 The analysis of ANCOVA data step by step

An ANCOVA analysis is performed step by step follow the tutorial for ANCOVA. In this screencast the differences with a factorial ANOVA are emphasized.

9.3.1 Exercises

Below part of the output of a ANOVA analysis.

Residuals:
    Min      1Q  Median      3Q     Max 
-0.8528 -0.3010  0.0563  0.2708  0.8555 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)    
(Intercept)                     5.09856    0.36516  13.963 4.25e-11 ***
TreatB                             1?      0.51641  -1.405  0.17712    
TreatC                         -1.30794       2?    -2.533  0.02084 *  
Concentration                   0.06073    0.08729   0.696  0.49547    
TreatmentB:Concentration        0.43544    0.12345     ?3   0.00241 ** 
TreatmentC:Concentration        0.24936    0.12345   2.020  0.05852 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5657 on 18 degrees of freedom
Multiple R-squared:  0.7831,    Adjusted R-squared:  0.7229 
F-statistic:    13 on 5 and 18 DF,  p-value: 1.892e-05
  1. Calculate the missing numbers ?1, ?2 and ?3.
  2. How much of the variance is explained by this model?
  3. What was the total sample size of this experiment?
  4. Based on this output, do you think there is an interaction? Explain.
  5. What is the estimate for the intercept of TreatA? What about TreatB and TreatC?
  6. Draw what this model would look like. You may use R, or pen and paper.

9.4 Model Selection

9.4.1 Exercises

A set of exercises can be downloaded here and its required data set here.
(If you can’t knit, click here for a PDF version of the exercises.)


9.4.2 Exercises (hard)

In order to study the effect of caviar on health, a survey is distributed in a city on whether the respondents have ever eaten caviar, and if so, how frequently they eat caviar. After receiving the responses, 100 random individuals from each of the following groups are invited for a health check-up:

  • Group A: Has never consumed caviar;
  • Group B: Has tried caviar once, or a few times in their life;
  • Group C: Eats caviar about once per year;
  • Group D: Eats caviar multiple times per year;
  • Group E: Eats caviar about once per month.
  1. What problems do you think there might be with this study design? (HINT: There are at least two major flaws.)

  2. Can you come up with a better design for the research question?

  3. Can you come up with a minimal sample size needed to conduct this research? You can estimate the minimal required sample size for both the original design and your own version.