First, I list some issues to think about with longitudinal modeling. With most longitudinal data, we have time points or occasions nested within individuals. So, unlike previous data we look at, individuals are now level 2 instead of level 1.
Time can be measured at fixed intervals, in which every individual is measured at the same intervals and or times. Alternatively, individuals can be measured at varying times. In this post we will look at longitudinal data with fixed occasions..
Recall from the text that there are advantages to collecting longitudinal data over cross-sectional data. These include:
There are a number of advantages to using multilevel models to model longitudinal data. These include:
But, the measures must be calibrated across time points.
For this example, we are using data from chapter 5 of Hox, Moerbeek, and van de Schoot (2018). The data contain the college GPA of 200 college students collected over six occasions. A time varying covariate, the job status of each student, was also collected at the six occasions. Two time invariant variables, high school GPA and the students gender were also collected. There is another outcome variable named admitted
, that is dichotomous and indicates whether the student was admitted to the university of their choice. We will not use this last variable in this post.
The purpose of this example study is to study the impact of having a job on college GPA. But, we will want to control for gender and high school GPA in our analyses.
First, we need to load the correct data. There are at least two ways we can structure Load all packages to be used. Note that the data for this project are in SPSS format, so you my need to install the haven
package which allow for importing SPSS, Stata, and SAS data files into R. So, you may need to run install.packages('haven')
before running the code chunk below.
# install the haven package which allows importing SPSS data, among other
# formats (Stata, SAS)
library(haven)
library(lme4)
library(lmerTest)
library(ggplot2)
library(texreg)
library(performance)
library(interactions)
Import the gpa2long.sav
file from the data directory of the longidtudinal_pgaCurran project.
gpa2long <- read_sav("data/gpa2long.sav")
Note that the variables in this data have preserved the SPSS labeling, which can be used to determine the factor labels. To see this use the str()
function on the data frame.
The data is in good shape, all we need to do is convert the categorical variables to factors. Here, I leave occasion as numeric, to reflect the assumption that this variable measure evenly spaced time points, that start with zero at the first time point.
gpa2long <- within(gpa2long, {
student <- factor(student)
gender <- factor(sex, labels = c("male", "female"))
admitted <- factor(admitted, labels = c("no", "yes"))
})
This first plot is a spaghetti plot, which just plots the outcome at each occasion with the points for each individual connected with a line. Its hard to see what is going on with each individual, but we can see a general trend in increases in GPA across occasion.
ggplot(gpa2long, aes(y = gpa, x = occas, group = student)) +
geom_line()
Another way to look at the general trend is to get a smoothed line summarizing the trend across individuals.
ggplot(gpa2long, aes(y = gpa, x = occas)) +
geom_point() + geom_smooth(method = "loess")
The loess smoothing algorithm allows for a local average curve to be fit to the data, which would allow for a curvy line. The fact that the line looks very straight suggest a linear relation between gpa across occasions. This plot also demonstrate the fixed nature of the occasions, as they line up in descrete columns.
We can also look at the linear relation across occasions for each individual.
ggplot(gpa2long, aes(y = gpa, x = occas, group = student)) +
geom_smooth(method = "lm", se = FALSE)
Here we see that descriptively our sample demonstrates differences in the estimated slopes across individuals. There even seem to be some that are negative.
We can look at the mean GPA across time in a few ways:
boxplot(gpa ~ occas, gpa2long)
This plot actually shows the medians and other quartiles, but we see the upward trend in GPA.
Next, I calculate the mean GPA for each occasion, and print and plot them.
gpameans_occas <- aggregate(gpa ~ occas, gpa2long, mean)
gpameans_occas
occas gpa
1 0 2.5935
2 1 2.7155
3 2 2.8100
4 3 2.9180
5 4 3.0190
6 5 3.1340
plot(gpa ~ occas, gpameans_occas)
Now we are ready to run a series of model of increasing complexity.
mod0 <- lmer(gpa ~ 1 + (1 | student), gpa2long, REML = FALSE)
modtime <- lmer(gpa ~ occas + (1 | student), gpa2long, REML = FALSE)
modtjob <- lmer(gpa ~ occas + job + (1 | student), gpa2long, REML = FALSE)
modfull <- lmer(gpa ~ occas + job + highgpa + gender + (1 | student), gpa2long,
REML = FALSE)
summary(modfull)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: gpa ~ occas + job + highgpa + gender + (1 | student)
Data: gpa2long
AIC BIC logLik deviance df.resid
296.8 332.4 -141.4 282.8 1193
Scaled residuals:
Min 1Q Median 3Q Max
-2.92840 -0.59427 -0.02739 0.63407 2.96127
Random effects:
Groups Name Variance Std.Dev.
student (Intercept) 0.04497 0.2121
Residual 0.05514 0.2348
Number of obs: 1200, groups: student, 200
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.64147 0.09752 289.77132 27.086 < 2e-16 ***
occas 0.10245 0.00399 994.90795 25.679 < 2e-16 ***
job -0.17221 0.01806 1100.57001 -9.534 < 2e-16 ***
highgpa 0.08470 0.02776 194.17579 3.051 0.0026 **
genderfemale 0.14725 0.03305 194.13594 4.455 1.42e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) occas job highgp
occas -0.145
job -0.430 0.102
highgpa -0.875 0.003 0.029
genderfemal -0.251 0.003 0.027 0.073
htmlreg(list(mod0, modtime, modtjob, modfull), digits = 3,
custom.model.names = c("$M_1$: null model", "$M_2$:+occas.",
"$M_3$:+jobsat", "$M_4$:+gpa, gender"))
\(M_1\): null model | \(M_2\):+occas. | \(M_3\):+jobsat | \(M_4\):+gpa, gender | |
---|---|---|---|---|
(Intercept) | 2.865*** | 2.599*** | 2.970*** | 2.641*** |
(0.019) | (0.022) | (0.044) | (0.098) | |
occas | 0.106*** | 0.102*** | 0.102*** | |
(0.004) | (0.004) | (0.004) | ||
job | -0.171*** | -0.172*** | ||
(0.018) | (0.018) | |||
highgpa | 0.085** | |||
(0.028) | ||||
genderfemale | 0.147*** | |||
(0.033) | ||||
AIC | 919.456 | 401.649 | 318.399 | 296.760 |
BIC | 934.726 | 422.009 | 343.849 | 332.390 |
Log Likelihood | -456.728 | -196.825 | -154.200 | -141.380 |
Num. obs. | 1200 | 1200 | 1200 | 1200 |
Num. groups: student | 200 | 200 | 200 | 200 |
Var: student (Intercept) | 0.057 | 0.063 | 0.052 | 0.045 |
Var: Residual | 0.098 | 0.058 | 0.055 | 0.055 |
p < 0.001; p < 0.01; p < 0.05 |
modtimepoly <- lmer(gpa ~ poly(occas, 2) + (1 | student), gpa2long, REML = FALSE)
summary(modtimepoly)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: gpa ~ poly(occas, 2) + (1 | student)
Data: gpa2long
AIC BIC logLik deviance df.resid
403.6 429.1 -196.8 393.6 1195
Scaled residuals:
Min 1Q Median 3Q Max
-3.6165 -0.6382 -0.0012 0.6389 2.8352
Random effects:
Groups Name Variance Std.Dev.
student (Intercept) 0.06336 0.2517
Residual 0.05803 0.2409
Number of obs: 1200, groups: student, 200
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.86500 0.01911 200.00000 149.927 <2e-16 ***
poly(occas, 2)1 6.28964 0.24089 1000.00000 26.109 <2e-16 ***
poly(occas, 2)2 -0.01389 0.24089 1000.00000 -0.058 0.954
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) p(,2)1
ply(ccs,2)1 0.000
ply(ccs,2)2 0.000 0.000
modfullrs <- lmer(gpa ~ occas + job + I(highgpa - mean(highgpa)) + gender + (occas |student),
gpa2long, REML = FALSE)
summary(modfullrs)
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: gpa ~ occas + job + I(highgpa - mean(highgpa)) + gender + (occas |
student)
Data: gpa2long
AIC BIC logLik deviance df.resid
188.1 233.9 -85.1 170.1 1191
Scaled residuals:
Min 1Q Median 3Q Max
-3.0175 -0.5379 -0.0002 0.5431 3.3656
Random effects:
Groups Name Variance Std.Dev. Corr
student (Intercept) 0.038233 0.19553
occas 0.003837 0.06194 -0.21
Residual 0.041542 0.20382
Number of obs: 1200, groups: student, 200
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 2.822e+00 4.474e-02 1.034e+03 63.077 < 2e-16
occas 1.034e-01 5.586e-03 1.996e+02 18.506 < 2e-16
job -1.311e-01 1.726e-02 1.038e+03 -7.595 6.84e-14
I(highgpa - mean(highgpa)) 8.854e-02 2.628e-02 1.976e+02 3.369 0.000907
genderfemale 1.157e-01 3.130e-02 1.978e+02 3.696 0.000284
(Intercept) ***
occas ***
job ***
I(highgpa - mean(highgpa)) ***
genderfemale ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) occas job I(-m()
occas -0.227
job -0.846 0.069
I(-mn(hgh)) -0.045 0.002 0.022
genderfemal -0.392 0.002 0.030 0.073
htmlreg(list(modtime, modfullrs),
custom.model.names = c("$M_5$:+occas rand",
"$M_6$:+cross-level int."))
\(M_5\):+occas rand | \(M_6\):+cross-level int. | |
---|---|---|
(Intercept) | 2.60*** | 2.82*** |
(0.02) | (0.04) | |
occas | 0.11*** | 0.10*** |
(0.00) | (0.01) | |
job | -0.13*** | |
(0.02) | ||
highgpa - mean(highgpa) | 0.09*** | |
(0.03) | ||
genderfemale | 0.12*** | |
(0.03) | ||
AIC | 401.65 | 188.12 |
BIC | 422.01 | 233.93 |
Log Likelihood | -196.82 | -85.06 |
Num. obs. | 1200 | 1200 |
Num. groups: student | 200 | 200 |
Var: student (Intercept) | 0.06 | 0.04 |
Var: Residual | 0.06 | 0.04 |
Var: student occas | 0.00 | |
Cov: student (Intercept) occas | -0.00 | |
p < 0.001; p < 0.01; p < 0.05 |
ranova(modfullrs)
ANOVA-like table for random-effects: Single term deletions
Model:
gpa ~ occas + job + I(highgpa - mean(highgpa)) + gender + (occas |
student)
npar logLik AIC LRT Df Pr(>Chisq)
<none> 9 -85.059 188.12
occas in (occas | student) 7 -141.380 296.76 112.64 2 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(modfull, modfullrs) # equivalent
Data: gpa2long
Models:
modfull: gpa ~ occas + job + highgpa + gender + (1 | student)
modfullrs: gpa ~ occas + job + I(highgpa - mean(highgpa)) + gender + (occas |
modfullrs: student)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
modfull 7 296.76 332.39 -141.380 282.76
modfullrs 9 188.12 233.93 -85.059 170.12 112.64 2 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
modfullrsx <- lmer(gpa ~ occas + job + I(highgpa - mean(highgpa)) + gender +
gender:occas +
(occas |student),
gpa2long, REML = FALSE)
htmlreg(list(modfull, modfullrs, modfullrsx), digits = 3)
Model 1 | Model 2 | Model 3 | |
---|---|---|---|
(Intercept) | 2.641*** | 2.822*** | 2.845*** |
(0.098) | (0.045) | (0.045) | |
occas | 0.102*** | 0.103*** | 0.088*** |
(0.004) | (0.006) | (0.008) | |
job | -0.172*** | -0.131*** | -0.132*** |
(0.018) | (0.017) | (0.017) | |
highgpa | 0.085** | ||
(0.028) | |||
genderfemale | 0.147*** | 0.116*** | 0.076* |
(0.033) | (0.031) | (0.035) | |
highgpa - mean(highgpa) | 0.089*** | 0.089*** | |
(0.026) | (0.026) | ||
occas:genderfemale | 0.030** | ||
(0.011) | |||
AIC | 296.760 | 188.117 | 182.971 |
BIC | 332.390 | 233.928 | 233.872 |
Log Likelihood | -141.380 | -85.059 | -81.486 |
Num. obs. | 1200 | 1200 | 1200 |
Num. groups: student | 200 | 200 | 200 |
Var: student (Intercept) | 0.045 | 0.038 | 0.038 |
Var: Residual | 0.055 | 0.042 | 0.042 |
Var: student occas | 0.004 | 0.004 | |
Cov: student (Intercept) occas | -0.002 | -0.002 | |
p < 0.001; p < 0.01; p < 0.05 |
VarCorr(modfullrsx)
Groups Name Std.Dev. Corr
student (Intercept) 0.194449
occas 0.060118 -0.188
Residual 0.203851
interact_plot(modfullrsx, occas, gender)
sim_slopes(modfullrsx, occas, gender)
Warning: Johnson-Neyman intervals are not available for factor moderators.
SIMPLE SLOPES ANALYSIS
Slope of occas when gender = female:
Est. S.E. t val. p
------ ------ -------- ------
0.12 0.01 15.53 0.00
Slope of occas when gender = male:
Est. S.E. t val. p
------ ------ -------- ------
0.09 0.01 11.05 0.00
# create a smaller data set by sampling individuals from the full data:
n <- 12 # level 2 sample size. You can change this to play around with n.
# Index of sampled students
set.seed(20200702) # set the random seed for reproducibility
sampidx <- as.numeric(sample(unique(gpa2long$student), replace = FALSE, size = n))
# Create smaller sample data
gpa2long12 <- gpa2long[gpa2long$student %in% sampidx, ]
Hox, J.J., Moerbeek, M. and Van de Schoot, R. (2018), Multilevel Analysis: Techniques and applications, Sage, New York, NY.