I will use the following packages in this tutorial:
library(psych)
library(car)
library(arm)
library(texreg)
After building an intuitive understanding of simple regression we will now expand our model to include more than one predictor. But keep this as simple as possible, I will focus mostly on models with two predictors.
First, let’s expand the simple regression equation to accommodate multiple predictors. Equation 1 below is the simple regression equation for the sample.
\[ Y_i = a + b X_i + e_i \tag{1} \] Here \(Y_i\) is the value on the outcome variable for the \(i\)th subject, while \(X_i\) is the value of the predictor variable for the same subject. The estimated parameters \(a\) and \(b\) are the intercept and slope for \(X\), which are common for the whole sample (e.g. we only estimate one intercept and one slope that is constant across all individual). The \(e_i\) is the \(i\)th subjects residual, or the difference between their model predicted score and their observed score.
To expand this we need to add more predictors and be able to distinguish them.
\[ Y_i = a + b_1 X_{1i} + b_2 X_{2i} + ...+ b_k X_{ki} + e_i \tag{2} \] This is done, as in equation 2, by adding numeric subscripts to the slope coefficients, and to the predictors. Note that equation 2 is generic and can accommodate any number of predictors (\(k\)).
Recall from simple regression that the squared correlation between the predictor and outcome, \(r^2\), is the proportion of the variance in the outcome accounted for by the predictor, and that \(1-r^2\), is the proportion of variance in the outcome not accounted for by the predictor. With more than one predictor is used we will calculate the squared multiple correlation = \(R^2\). For two predictors, \(X_1\) and \(X_2\) predicting \(Y\), we might write this as \[R^2_{y.12}\], where, following Pedhazur (1997), I drop the \(X\)’s to make the notation more compact. Similarly, the correlation between \(X_1\) and \(X_2\) is represented as \(r_{12}\), and the squared correlation between these two predictors is \(r^2_{12}\)
What is meant by squared multiple correlation above? It is similar to the squared correlation in simple regression, but instead of being the proportion of variance in the outcome accounted for by the predictor, it is the proportion of variance in the outcome accounted for by the model as a whole, or the proportion of variance accounted for by all the predictors together. If all the predictors are independent (i.e. uncorrelated) then the squared multiple correlation, also simply called the \(R^2\) (R squared), is the sum of the squared correlations between each of the predictors and the outcome:
\[ R^2_{y.12} = r^2_{y1} + r^2_{y2} \quad (\text{when } r_{12} = 0) \] See the left diagram in Figure 5.1 below. Note there is no overlap between \(X_1\) and \(X_2\) in that diagram, suggesting the two predictors are uncorrelated, but both are related to the outcome \(Y\). The overlap between each predictor and \(Y\) represents the variance in the outcome accounted for the that predictor.
When the two predictors are not independent (i.e. they are correlated) the equation above does not hold, because there is redundant information, provided by the two predictors. This is depicted in the right diagram in Figure 5.1. Note that a portion of the variance in \(Y\) accounted for by the two predictors overlaps both \(X_1\) and \(X_2\). We have to remove the part of the \(Y\) circle that is covered by both predictors, because it is counted twice, once in each of the sections overlapping each predictor and the outcome.
I will simulate both scenarios to demonstrate this. First, I will simulate the situation where the two predictors are not correlated:
n <- 10000
set.seed(1234)
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- .5*x1 + .5*x2 + rnorm(n)
Then I will run two simple regressions and one multiple regressions. Here we will focus on the \(R^2\) values. Recall that the \(R^2\) values for the simple regression are equivalent to the squared correlation between the single predictor and the outcome.
round(cor(cbind(x1, x2, y)),2)
x1 x2 y
x1 1.00 0.01 0.40
x2 0.01 1.00 0.42
y 0.40 0.42 1.00
mody.1 <- lm(y ~ x1)
mody.2 <- lm(y ~ x2)
mody.12 <- lm(y ~ x1 + x2)
htmlreg(list(mody.1, mody.2, mody.12),
custom.model.names = c("y ~ x1", "y ~ x2", "y ~ x1 + x2"),
caption = "Table 1: Uncorrelated Predictors")
y ~ x1 | y ~ x2 | y ~ x1 + x2 | |
---|---|---|---|
(Intercept) | 0.00 | 0.01 | 0.00 |
(0.01) | (0.01) | (0.01) | |
x1 | 0.49*** | 0.49*** | |
(0.01) | (0.01) | ||
x2 | 0.50*** | 0.50*** | |
(0.01) | (0.01) | ||
R2 | 0.16 | 0.17 | 0.33 |
Adj. R2 | 0.16 | 0.17 | 0.33 |
Num. obs. | 10000 | 10000 | 10000 |
p < 0.001; p < 0.01; p < 0.05 |
The Multiple \(R^2\) in the third column, \(R^2 = .33\) is equal to the sum of the \(R^2\) values of the simple regressions \(.33 - .16 + .17\).
Next, I simulate data where the predictors are correlated.
n <- 10000
set.seed(1234)
x1 <- rnorm(n)
x2 <- .5*x1 + rnorm(n)
y <- .5*x1 + .5*x2 + rnorm(n)
Then, I run the same models as above.
cor(cbind(x1, x2, y))
x1 x2 y
x1 1.0000000 0.4483634 0.5474017
x2 0.4483634 1.0000000 0.5812799
y 0.5474017 0.5812799 1.0000000
mody.1 <- lm(y ~ x1)
mody.2 <- lm(y ~ x2)
mody.12 <- lm(y ~ x1 + x2)
htmlreg(list(mody.1, mody.2, mody.12),
custom.model.names = c("y ~ x1", "y ~ x2", "y ~ x1 + x2"),
caption = "Table 2: Correlated Predictors")
y ~ x1 | y ~ x2 | y ~ x1 + x2 | |
---|---|---|---|
(Intercept) | 0.00 | 0.01 | 0.00 |
(0.01) | (0.01) | (0.01) | |
x1 | 0.74*** | 0.49*** | |
(0.01) | (0.01) | ||
x2 | 0.69*** | 0.50*** | |
(0.01) | (0.01) | ||
R2 | 0.30 | 0.34 | 0.44 |
Adj. R2 | 0.30 | 0.34 | 0.44 |
Num. obs. | 10000 | 10000 | 10000 |
p < 0.001; p < 0.01; p < 0.05 |
Here it is clear that the multiple R squard in the third column is NOT the sum of the squared correlations of the other two models \(.44 \ne .30 + .34\).
We will keep these insights in mind as we work through an example of multiple regression.
Next, I will build on what we covered related to simple regression by introducing multiple regression with an example. I will use data from Pedhazur (1997), chapter 5, table 1 to model the use of verbal aptitude and achievement motivation to predict reading achievement.
Below is R code to create a data frame from the data in the book.
dat51 <- data.frame(
readAch = c(2, 4, 4, 1, 5, 4, 7, 9, 7, 8, 5, 2, 8, 6, 10, 9, 3, 6, 7, 10),
verbalApt = c(1, 2, 1, 1, 3, 4, 5, 5, 7, 6, 4, 3, 6, 6, 8, 9, 2, 6, 4, 4),
achMot = c(3, 5, 3, 4, 6, 5, 6, 7, 8, 4, 3, 4, 6, 7, 7, 6, 6, 5, 6, 9)
)
Let’s look at the data. We can look at the distribution of each variable.
hist(dat51$readAch, main = "Reading Achievement")
hist(dat51$verbalApt, main = "Verbal Aptitude")
hist(dat51$achMot, main = "Achievement Motivation")
This is a very small data set, so not much going on here, but we do not see any major values falling far away from the others.
Next, let’s look at the relation between the variables.
pairs(dat51)
We see a stronger relation between reading achievement and verbal aptitude than between reading achievement and achievement motivation, as evidenced by the tighter pattern of points around an imagined straight line describing the linear correlation between variables. We also see a positive relation between our two predictors, students with higher verbal aptitude tend to also have higher achievement motivation.
We can estimate the linear relation between the three variables,
round(cor(dat51),2)
readAch verbalApt achMot
readAch 1.00 0.79 0.68
verbalApt 0.79 1.00 0.52
achMot 0.68 0.52 1.00
Now that we have looked at all the data, we can feel better about looking at descriptive statistics.
describe(dat51, fast = TRUE)
vars n mean sd min max range se
readAch 1 20 5.85 2.72 1 10 9 0.61
verbalApt 2 20 4.35 2.32 1 9 8 0.52
achMot 3 20 5.50 1.67 3 9 6 0.37
We see that all three variables are on similar scales, with range from 1 (or very close) to about 10, and the standard deviations are also very similar. This is useful to know when interpreting models.
I will, again conduct two simple regressions before the multiple regression. First, we look at how well verbal aptitude predicts reading achievement.
mody.1 <- lm(readAch ~ verbalApt, data = dat51)
display(mody.1)
lm(formula = readAch ~ verbalApt, data = dat51)
coef.est coef.se
(Intercept) 1.82 0.83
verbalApt 0.93 0.17
---
n = 20, k = 2
residual sd = 1.71, R-Squared = 0.63
We see that, according to this simple model, people who differ by one point in verbal aptitude would be predicted to differ a little less than one point (\(b = .93\)) on reading achievement on average, in the same direction.
Next we look at the relation between achievement motivation and reading achievement.
mody.2 <- lm(readAch ~ achMot, data = dat51)
display(mody.2)
lm(formula = readAch ~ achMot, data = dat51)
coef.est coef.se
(Intercept) -0.22 1.62
achMot 1.10 0.28
---
n = 20, k = 2
residual sd = 2.05, R-Squared = 0.46
This model suggests that two people that differ in achievement motivation by one point would be predicted to differ by a little over 1 point (\(b = 1.10\)) in reading achievement.
So, the two predictors have similar relations to the outcome.
But we also know that verbal aptitude and achievement motivation are correlated. We might want to know what the relation between verbal aptitude and reading achievement would be if we compare two people with the same level of achievement motivation. Or, we might want to know what is the relation between achievement motivation and reading achievement for two people with the same verbal aptitude. We can estimate those effects with multiple regression.
All we need to do to run a multiple regression in R is add the additional predictors to the right side of the equation.
mody.12 <- lm(readAch ~ verbalApt + achMot, data = dat51)
display(mody.12)
lm(formula = readAch ~ verbalApt + achMot, data = dat51)
coef.est coef.se
(Intercept) -0.47 1.19
verbalApt 0.70 0.18
achMot 0.59 0.24
---
n = 20, k = 3
residual sd = 1.51, R-Squared = 0.72
I put all three models into Table 3, in a format that is common in the literature. Notice the slope coefficients are different for the multiple regression model compared to the two simple regression models.
That’s because they are answering different questions. The coefficient for verbal aptitude in Model 1 of Table 3 answers the questions “what is the relation between verbal aptitude and reading achievement ignoring any other differences between individuals?”, while the verbal aptitude coefficient in the third model answers the questions “what is the relation between verbal aptitude and reading achievement for individuals with the same value on achievement motivation?” The respective achievement motivation coefficients have a similar interpretation.
htmlreg(list(mody.1, mody.2, mody.12), caption = "Table 3: Predicting Reading Achievement")
Model 1 | Model 2 | Model 3 | |
---|---|---|---|
(Intercept) | 1.82* | -0.22 | -0.47 |
(0.83) | (1.62) | (1.19) | |
verbalApt | 0.93*** | 0.70*** | |
(0.17) | (0.18) | ||
achMot | 1.10** | 0.59* | |
(0.28) | (0.24) | ||
R2 | 0.63 | 0.46 | 0.72 |
Adj. R2 | 0.61 | 0.43 | 0.69 |
Num. obs. | 20 | 20 | 20 |
p < 0.001; p < 0.01; p < 0.05 |