When the goal is to compare more than two groups, analysis of variance (ANOVA) is a commonly used procedure. The goal of this tutorial is to provide an introduction to the simplest form of ANOVA, the One-Way ANOVA, which has a continuous outcome and one categorical variable indicating the groups to be compared.
When we have more than two groups to compare, we have the problem of multiple comparisons. Let \(J\) be the number of independent groups we wish to compare. For example,
There are two goals when comparing more than two groups. First, we want to evaluate the omnibus test of whether there is systematic variability among the population means. If there is evidence of such variability, the second goal is to determine which means are different.
This tutorial focuses on the first goal, and the tutorial on multiple comparisons addresses the second.
We used the t test to compare 2 groups.
How can we compare more than 2 groups?
Why not just conduct multiple t tests?
The difference between the ANOVA and t-test is primarily related to the Independent Variable.
While a t-test requires an IV with only two levels, ANOVA allows us to use IVs with more than two levels.
We can also test multiple IVs and their combinations (more on this in later).
So, in short:
Normality - We assume a normal population distribution for the dependent variable for each group. ANOVA is generally robust to violations of normality.
Homogeneity of Variances - We assume that the population variance is the same for each group. ANOVA is generally robust to mild violations of this assumption.
Statistical Independence - We assume that the scores are independent of each other. ANOVA is NOT robust to violations of independence. So, if this assumptions is questionable, your results could be very misleading.
Random Sampling - We assume that individuals in each group were randomly sampled from their respective populations. ANOVA is also NOT very robust to violation of this assumption. If you sample does not represent your population of interest, you must really be careful with the inferences you make.
LeveneTest
function from the car
package in R.\[ H_0: \sigma^2_{\mu} = 0. \]
\[ H_a: \sigma^2_\mu > 0. \]
Symbol | Description |
---|---|
\(k\) | number of groups |
\(n\) | number in each group |
\(N\) | total number of participants (\(N = n \times k\)) |
\(df_{BG}\) | between group degrees of freedom |
\(df_{E}\) | error or within-group degrees of freedom |
\(SS_{BG}\) | sum of squares between group |
\(SS_E\) | sum of squares error |
\(SS_T\) | sum of squares total |
\(MS_{BG}\) | mean square between group |
\(MS_E\) | mean square error |
We are interested in comparing the how well different types of schools prepare students for college. We have sampled 3 students from a public school, 3 students from a parochial school, and 3 students from a charter school. We will compare these three groups on the high school gpa of students. Note this sample is way to small, but will allow us to make simple calculations before moving to larger data sets.
jits <- c(-.2, 0, .2)
achieve <- data.frame(
gpa = c(2.5, 3.0, 3.5,
3.0, 3.5, 3.3,
3.6, 3.3, 4.0),
schtype = factor(c(1,1,1,2,2,2,3,3,3),
labels = c("public", "parochial", "charter"))
)
achieve <- transform(achieve,
jst = as.numeric(schtype) + rep(jits, 3),
grandMean = mean(gpa),
groupMean = ave(gpa, schtype, FUN = mean)
)
# achieve <- achieve %>%
# mutate(jst = as.numeric(schtype) + rep(jits, 3),
# grandMean = mean(gpa)) %>%
# group_by(schtype) %>%
# mutate(groupMean = mean(gpa)) %>%
# ungroup
N <- nrow(achieve)
k <- length(levels(achieve$schtype))
n <- N/k
kable(achieve[, 1:2])
gpa | schtype |
---|---|
2.5 | public |
3.0 | public |
3.5 | public |
3.0 | parochial |
3.5 | parochial |
3.3 | parochial |
3.6 | charter |
3.3 | charter |
4.0 | charter |
describe(achieve[, -(2:3)], fast = TRUE)
vars n mean sd min max range se
gpa 1 9 3.3 0.43 2.5 4.00 1.50 0.14
grandMean 2 9 3.3 0.00 3.3 3.30 0.00 0.00
groupMean 3 9 3.3 0.28 3.0 3.63 0.63 0.09
var(achieve$gpa)
[1] 0.185
\[ n = 3 \]
\[ k = 3 \]
\[ df_{G} = k -1 = 2 \]
\[ df_E = df_W= N - k = 6 \]
\[ df = N - 1 = 8 \]
ggplot(achieve, aes(x = schtype, gpa, color = schtype)) + geom_point()
Here I jitter the data (add a little noise to the x axis) to help see better what is going on in the plots below.
d <- ggplot(achieve, aes(x = jst, gpa, color = schtype)) + geom_point() +
scale_x_discrete(name = "School Type",
labels = c("1.0" = "Public",
"2.0"= "Parochial",
"3.0" = "Charter"))
d
dgnd <- d + geom_hline(aes(yintercept = grandMean))
dgnd
dgnd <- dgnd + geom_segment(aes(x = jst, xend = jst,
y = grandMean, yend = gpa ))
dgnd
gpa
attach(achieve)
(SST <- sum((gpa - grandMean)^2))
[1] 1.48
(dfT <- N-1 )
[1] 8
(Sigma2 <- SST/dfT)
[1] 0.185
dgrp <- d + geom_hline(aes(yintercept = groupMean, color = schtype))
dgrp
dgrp <- dgrp + geom_segment(aes(x = jst, xend = jst,
y = groupMean, yend = gpa))
dgrp
(SSW <- sum((gpa - groupMean)^2))
[1] 0.8733333
(dfW <- N - k)
[1] 6
(MSW <- SSW/dfW)
[1] 0.1455556
gdat <- data.frame(
groupMean = unique(achieve$groupMean),
grandMean = 3.3,
schtype = factor(c("public", "parochial", "charter"))
)
ggplot(gdat, aes(x = schtype, y = groupMean, color = schtype)) +
geom_hline(aes(yintercept = groupMean, color = schtype)) +
geom_hline(aes(yintercept = grandMean), color = "black") +
geom_segment(aes(x = schtype, xend = schtype,
y = groupMean, yend = grandMean)) +
ylim(2.5, 4)
(SSG <- sum((groupMean - grandMean)^2))
[1] 0.6066667
(dfG <- k - 1)
[1] 2
(MSG <- SSG/dfG)
[1] 0.3033333
\[ F_{obt} = \frac{\text{variance between groups}}{{\text{variance within groups}}} = \frac{MS_{B}}{MS_W} \]
(F <- MSG/MSW)
[1] 2.083969
(1 - pf(F, dfG, dfW))
[1] 0.2054731
(1 - pf(2.084, 2, 6))
[1] 0.2054694
aov()
This test is often underpowered, so don’t take it too seriously. In the current context I certainly do not think this test gives us any information (n = 9), but you should know how to do it.
leveneTest(gpa ~ schtype, data = achieve)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 0.4222 0.6737
6
Recall that the null hypothesis is that the variances are not different, so we would fail to reject the null, which support the assumption being reasonable. But, once again, don’t take this test too seriously.
To test the research hypothesis, conduct the oneway ANOVA:
mod <- aov(gpa ~ schtype, data = achieve)
summary(mod)
Df Sum Sq Mean Sq F value Pr(>F)
schtype 2 0.6067 0.3033 2.084 0.205
Residuals 6 0.8733 0.1456
The p-value suggests that the observed variance of the group means would be expected to occur about 20% of the time if these groups were randomly sampled from the same population, which is not typically rare enough to reject the null hypothesis (e.g. if mindlessly set the \(\alpha\) to .05). Said differently, the p value is too large to reject the null.
\[ H_0: \sigma^2_{\mu} = 0. \]
\[ H_a: \sigma^2_\mu > 0. \]