How ANOVA analyze the variance

Author

therimalaya

Published

March 29, 2021

Often we Analysis of Variance (ANOVA) to analyze the variances to find if different cases results in similar outcome and if the difference is significant. Following are some simple examples,

These are some common examples where in some cases data are collected by setting up an experiment and in other cases they are collected through sampling. This article tries to explain how the ANOVA analyze the variance and in what situation are they significant throught both simulated and real data.

Consider the following model with \(i=3\) groups and \(j=n\) observations,

\[y_{ij} = \mu + \tau_i + \varepsilon_{ij}, \; i = 1, 2, 3 \texttt{ and } j = 1, 2, \ldots n\]

here, \(\tau_i\) is the effect corresponding to group \(i\) and \(\varepsilon_{ij} \sim \mathrm{N}(0, \sigma^2)\), the usual assumption of linear model. In order to understand how ANOVA finds the differences between groups and how the group mean and their standard deviation influence the results from ANOVA, let us consider following four cases:

Case 1
Similar group means with high variation within the groups
Case 2
Similar group means with low variation within the groups
Case 3
Distant group means with high variation within the groups
Case 4
Distant group means with low variation within the groups

Simulating data resembling these cases

Fitting ANOVA model for each cases

Prepare data for plotting

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Picking joint bandwidth of 2.21
Picking joint bandwidth of 1.48
Picking joint bandwidth of 6.18
Picking joint bandwidth of 2.66

Picking joint bandwidth of 2.21
Picking joint bandwidth of 1.48
Picking joint bandwidth of 6.18
Picking joint bandwidth of 2.66
Picking joint bandwidth of 3.12
Picking joint bandwidth of 3.47
Picking joint bandwidth of 6.01
Picking joint bandwidth of 6.22

Discussion