Chapter 8 Factorial ANOVA

A model for comparing all combinations of categorical variables.


In these set of lectures we move from models with one explanatory variable, one-way ANOVA and simple linear regression (Y ~ F and Y ~ X), to multiple regression models (e.g. Y ~ F1 + F2 + F1:F2 or Y ~ F + X + F:X) with more than one explanatory variable; with F a factor and X a numerical variable. This will introduce two new aspects to our analysis:

  • Interaction between the explanatory variables: The effect of one explanatory variable on the response variable depends on the values of the second explanatory variable and vice versa; e.g. a treatment might have different effect in different species.
  • More models can be fitted on the data and we need to select the model that fits that data best; e.g. the follow option are possible:
    • Y ~ F1 + F2 + F1:F2 + residuals
    • Y ~ F1 + F2 + residuals
    • Y ~ F1 + residuals
    • Y ~ F2 + residuals
    • Y ~ residuals

In the first screencast I will explain the concepts of interaction and the difference between factorial ANOVA and ANCOVA. The next screencast explains why model selection is in order by analyzing factorial ANOVA on three different data set,

  • One data set representing the null hypothesis,
  • One data set representing data with interaction,
  • and finally and data set representing data with interaction.

In doing so we can investigate the different criteria for selecting the “best” model.

8.1 Interaction

In this screencast I argue why we might need additional degrees of freedom to analysis data with interactions between the explanatory variables. For example, F1:F2 compound variable in Y ~ F1 + F2 + F1:F2 can be read as that F1 modulates the effect of F2 on Y (or equivalently, the effect F2 has on how the F1 affects Y). In order to fit data with interaction, we need more degrees of freedom.

8.2 Three example factorial ANOVA data sets

In this screencast I introduce three data sets to explain the different qualities of factorial ANOVA model:

  • One data set results in a so-called null-model, a model where none of the treatment combinations affect the response variable (\(H_0\)).
  • One data set that belongs to a factorial ANOVA model without interaction between the two explanatory factors F1 and F2.
  • One data set that belongs to a factorial ANOVA model with interaction between the two explanatory factors. here we need the most complex model also known as the maximal model, to fit the data correctly.

It is shown that the maximal model with interaction needs more degrees of freedom (is more complex), than the model without interaction, and that the model without interaction needs more degrees of freedom than the null-model

8.3 Model specification and diagnostics

In this screencast the model specification for the three data set are discussed together with the model diagnostics. It is shown that it follows the same steps as the one-Way ANOVA discussed in chapter 6, because we only need to analyze the behavior of the residuals of the different models, a property shared by all statistical models.

8.4 Model selection

After it was shown in the previous screencast that the data seem to adhere to the assumptions of the model, the search for the best model for each individual data set can start. The model selection procedures results in three different models for the three different data sets.

In one of the screencast of Wednesday we will go into more detail of model selection.

8.5 Some final thoughts and visualisation

Here we will look what the effect can be on the model diagnostics plots when the factorial ANOVA model is miss-specified; e.g. when one term (or degree of freedom) is missing. Also some data visualizations of factorial ANOVA are briefly discussed.