lexdec I: LMERknitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(languageR)
library(lme4)
library(lmerTest)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tibble)
library(performance)
library(see)
library(emmeans)
library(knitr)
set.seed(1234)
theme_set(theme_minimal(base_size = 12))
In my recent paper, Du., B. Pfiffner., A. (In Revision) The co-variance of acoustic cues and lip gestures in the production of the Mandarin sibilant contrast. Journal of Laboratory Phonology, we went through LMER, GLMER (which is not included in the final paper), and GAMM. I think it’s time to summarize this process as a note to my future self, also for my fellow PhD students who might be struggling with any modeling. Therefore, this is the first of this series of stats modeling tutorial, which I hope is helpful for anyone who comes across this.
This tutorial fits a linear mixed-effects model with
lmer() using the built-in lexdec data from the
languageR package. The outcome variable is RT,
the log-transformed reaction time. The factor of our interest is
Correct, which compares correct and incorrect lexical
decision responses.
For any LMER modeling, there is this generic templatic workflow that you can adapt. In my previous years of running LMER since 2021, I have not seen too much deviation from these steps.
performance::check_model().For step 5 in this tutorial, we pretend that the reduced model is nicely fitted. But in real life, this is often the most iterative part.
data("lexdec", package = "languageR")
# I use df_print: paged, so there should no worries about printing gigantic df object.
lexdec
lexdec_lmer <- lexdec %>%
filter(
!is.na(RT),
!is.na(Correct),
!is.na(PrevType),
!is.na(Trial),
!is.na(Frequency),
!is.na(Length),
!is.na(Subject),
!is.na(Word)
) %>%
mutate(
Subject = factor(Subject),
Word = factor(Word),
Correct = factor(Correct, levels = c("correct", "incorrect")),
PrevType = factor(PrevType, levels = c("nonword", "word")),
Trial_z = as.numeric(scale(Trial)),
Frequency_z = as.numeric(scale(Frequency)),
Length_z = as.numeric(scale(Length))
) %>%
droplevels()
lexdec_lmer
For fixed-effect tests with factors, be explicit about contrasts. I
use sum coding for Correct and PrevType. With
two-level sum coding, each model coefficient is a deviation from the
grand mean. For direct level comparisons, emmeans is easier
to interpret than raw coefficients.
If you care more about the direct comparison between two levels, maybe treatment coding is better. But that can make the interpretation of simple effects difficult when interaction is involved. So I almost always choose sum coding.
# Contrast set to sum coding. conrt.sum(levels_of_factor)
contrasts(lexdec_lmer$Correct) <- contr.sum(2)
contrasts(lexdec_lmer$PrevType) <- contr.sum(2)
contrasts(lexdec_lmer$Correct)
## [,1]
## correct 1
## incorrect -1
contrasts(lexdec_lmer$PrevType)
## [,1]
## nonword 1
## word -1
lexdec_lmer %>%
summarise(
n_observations = n(),
n_subjects = n_distinct(Subject),
n_words = n_distinct(Word),
mean_rt = mean(RT),
sd_rt = sd(RT)
)
lexdec_lmer %>%
count(Correct)
lexdec_lmer %>%
count(PrevType)
And basic data pattern for eyeballing the difference.
ggplot(lexdec_lmer, aes(x = Correct, y = RT, fill = Correct)) +
geom_boxplot(outlier.alpha = 0.2) +
labs(
x = "Response accuracy",
y = "Log reaction time",
title = "Lexical decision RT by response accuracy"
) +
guides(fill = "none")
The maximal structure includes random intercepts for subjects and words. It also includes random slopes for predictors that vary within each grouping factor.
For subjects, Correct, PrevType,
Trial_z, Frequency_z, and
Length_z vary across trials. For words,
Frequency_z and Length_z are item-level
properties, so they do not vary within a word and are not used as
word-level random slopes.
Note: This can be the most confusing part of LMER modeling. How do you determine which random slope to include for which grouping variable? The answer is, if a predictor varies between levels of your grouping variable, then you do not include that predictor in the random slope. An intuitive way to think about it is that: If I know my item is X, does that lead to that my predictor (e.g., word type) must be one specific level? If your answer is yes, then you should NOT include the predictor as your random slope. If your answer is no, then you probably want to include that predictor as your random slope (you should also want to make sure that this matches your assumption of the predictor’s behavior)
This is because, by definition, random slope is to assume that the effect of that predictor is dependent on the grouping factor. This won’t happen if holding that grouping variable constant already determine uniquely the level of the predictor.
If the maximal model does not converge or is singular, simplify the random structure. In a real analysis, this reduction should be theoretically motivated: remove the least important or least stable random slopes first, while keeping the fixed-effects structure aligned with the research question.
By the way, this is a gigantic model that probably takes a minute or so to fit.
lmer_control <- lme4::lmerControl(
optimizer = "bobyqa",
optCtrl = list(maxfun = 200000)
)
lmer_maximal <- lmerTest::lmer(
RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +
(1 + Correct + PrevType + Trial_z + Frequency_z + Length_z | Subject) +
(1 + Correct + PrevType + Trial_z | Word),
data = lexdec_lmer,
REML = FALSE,
control = lmer_control
)
summary(lmer_maximal)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +
## (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z |
## Subject) + (1 + Correct + PrevType + Trial_z | Word)
## Data: lexdec_lmer
## Control: lmer_control
##
## AIC BIC logLik -2*log(L) df.resid
## -1057.8 -852.1 566.9 -1133.8 1621
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.4489 -0.6094 -0.1059 0.4465 6.4097
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Word (Intercept) 2.502e-03 0.050023
## Correct1 2.357e-03 0.048545 -0.37
## PrevType1 5.790e-04 0.024062 0.27 -0.22
## Trial_z 1.754e-05 0.004188 -0.65 0.28 -0.90
## Subject (Intercept) 3.521e-02 0.187654
## Correct1 2.213e-03 0.047043 -0.92
## PrevType1 2.751e-04 0.016587 -0.15 0.33
## Trial_z 8.630e-04 0.029377 -0.17 -0.18 -0.06
## Frequency_z 4.142e-04 0.020351 -0.29 0.15 -0.08 0.50
## Length_z 4.403e-04 0.020984 0.93 -0.89 -0.30 -0.19 -0.14
## Residual 2.488e-02 0.157744
## Number of obs: 1659, groups: Word, 79; Subject, 21
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 6.364607 0.043306 23.576572 146.968 < 2e-16 ***
## Correct1 0.022349 0.017437 30.214507 1.282 0.210
## PrevType1 0.030562 0.006040 27.520260 5.060 2.46e-05 ***
## Trial_z -0.010617 0.007574 20.957927 -1.402 0.176
## Frequency_z -0.052375 0.009085 52.891275 -5.765 4.30e-07 ***
## Length_z 0.014729 0.009144 54.041526 1.611 0.113
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Crrct1 PrvTy1 Tril_z Frqnc_
## Correct1 -0.738
## PrevType1 -0.068 0.075
## Trial_z -0.140 -0.086 -0.056
## Frequency_z -0.103 -0.042 -0.026 0.206
## Length_z 0.443 -0.264 -0.084 -0.083 0.291
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
cat(
"Model singular fit: ",
lme4::isSingular(lmer_maximal),
"\n",
sep = ""
)
## Model singular fit: TRUE
cat(
"Model failed to converge: ",
length(lmer_maximal@optinfo$conv$lme4$messages) > 0,
"\n",
sep = ""
)
## Model failed to converge: TRUE
For the rest of this walkthrough, pretend that the maximal model above is too complex because it has a convergence warning, a singular fit, or both. (I did not check whether this is actually the case or not). The next step is to fit a simpler random-effects structure.
Here I keep the subject random slopes and simplify the word structure to a random intercept. This is a common kind of reduction when the item-level slope structure is too ambitious for the data.
lmer_reduced <- lmerTest::lmer(
RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +
(1 + Correct + PrevType + Trial_z + Frequency_z + Length_z | Subject) +
(1 | Word),
data = lexdec_lmer,
REML = FALSE,
control = lmer_control
)
summary(lmer_reduced)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +
## (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z |
## Subject) + (1 | Word)
## Data: lexdec_lmer
## Control: lmer_control
##
## AIC BIC logLik -2*log(L) df.resid
## -1063.0 -906.0 560.5 -1121.0 1630
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.4775 -0.6255 -0.1142 0.4398 6.3372
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Word (Intercept) 0.0028324 0.05322
## Subject (Intercept) 0.0340412 0.18450
## Correct1 0.0020135 0.04487 -0.90
## PrevType1 0.0002738 0.01655 -0.10 0.15
## Trial_z 0.0008767 0.02961 -0.22 -0.10 -0.07
## Frequency_z 0.0004210 0.02052 -0.32 0.19 -0.01 0.47
## Length_z 0.0004785 0.02188 0.92 -0.89 -0.24 -0.26 -0.12
## Residual 0.0257842 0.16057
## Number of obs: 1659, groups: Word, 79; Subject, 21
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 6.351440 0.042333 22.179728 150.035 < 2e-16 ***
## Correct1 0.035080 0.015291 10.189801 2.294 0.0442 *
## PrevType1 0.030431 0.005437 20.891510 5.597 1.51e-05 ***
## Trial_z -0.010377 0.007621 20.717015 -1.362 0.1879
## Frequency_z -0.048824 0.009150 52.400327 -5.336 2.07e-06 ***
## Length_z 0.017715 0.009280 53.073739 1.909 0.0617 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Crrct1 PrvTy1 Tril_z Frqnc_
## Correct1 -0.746
## PrevType1 -0.065 0.058
## Trial_z -0.179 -0.057 -0.041
## Frequency_z -0.132 0.012 -0.005 0.197
## Length_z 0.449 -0.297 -0.077 -0.116 0.289
cat(
"Model singular fit: ",
lme4::isSingular(lmer_reduced),
"\n",
sep = ""
)
## Model singular fit: FALSE
cat(
"Model failed to converge: ",
length(lmer_reduced@optinfo$conv$lme4$messages) > 0,
"\n",
sep = ""
)
## Model failed to converge: FALSE
For this walkthrough, I treat the reduced model as the final working model. In a real analysis, if this reduced model still had convergence or singularity problems, I would continue simplifying the random-effects structure and document each step.
lmer_fit <- lmer_reduced
selected_lmer_name <- "reduced model: subject random slopes + word random intercept"
selected_lmer_name
## [1] "reduced model: subject random slopes + word random intercept"
formula(lmer_fit)
## RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +
## (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z |
## Subject) + (1 | Word)
Don’t jump to interpret the coefficients yet. We first need to check whether individual factors are significant or not.
summary(lmer_fit)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
## method [lmerModLmerTest]
## Formula: RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +
## (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z |
## Subject) + (1 | Word)
## Data: lexdec_lmer
## Control: lmer_control
##
## AIC BIC logLik -2*log(L) df.resid
## -1063.0 -906.0 560.5 -1121.0 1630
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.4775 -0.6255 -0.1142 0.4398 6.3372
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Word (Intercept) 0.0028324 0.05322
## Subject (Intercept) 0.0340412 0.18450
## Correct1 0.0020135 0.04487 -0.90
## PrevType1 0.0002738 0.01655 -0.10 0.15
## Trial_z 0.0008767 0.02961 -0.22 -0.10 -0.07
## Frequency_z 0.0004210 0.02052 -0.32 0.19 -0.01 0.47
## Length_z 0.0004785 0.02188 0.92 -0.89 -0.24 -0.26 -0.12
## Residual 0.0257842 0.16057
## Number of obs: 1659, groups: Word, 79; Subject, 21
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 6.351440 0.042333 22.179728 150.035 < 2e-16 ***
## Correct1 0.035080 0.015291 10.189801 2.294 0.0442 *
## PrevType1 0.030431 0.005437 20.891510 5.597 1.51e-05 ***
## Trial_z -0.010377 0.007621 20.717015 -1.362 0.1879
## Frequency_z -0.048824 0.009150 52.400327 -5.336 2.07e-06 ***
## Length_z 0.017715 0.009280 53.073739 1.909 0.0617 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Crrct1 PrvTy1 Tril_z Frqnc_
## Correct1 -0.746
## PrevType1 -0.065 0.058
## Trial_z -0.179 -0.057 -0.041
## Frequency_z -0.132 0.012 -0.005 0.197
## Length_z 0.449 -0.297 -0.077 -0.116 0.289
For the LMER, this walkthrough uses the anova() method
provided by lmerTest with Type III tests and Satterthwaite
degrees of freedom. Because the factors were sum coded, the main effect
tests are aligned with the usual Type III interpretation.
If you use treatment coding, Type-III ANOVA does not automatically become “wrong”, especially if your preductors are mostly binary.
# These are the default anyway, but making them explicit is better.
lmer_anova <- anova(
lmer_fit,
type = 3,
ddf = "Satterthwaite"
)
lmer_anova
lmer_anova["Correct", , drop = FALSE]
For LMERs, check residual normality, linearity, influential
observations, and heterogeneity. performance::check_model()
gives a compact visual diagnostic panel.
performance::check_model(lmer_fit)
performance::check_collinearity(lmer_fit)
performance::check_singularity(lmer_fit)
## [1] TRUE
If these checks look poor, common next steps include a different outcome transformation, a different likelihood family, nonlinear terms, stronger outlier handling, or a revised random-effects structure.
The fixed-effect table gives coefficients on the log-RT scale. With sum coding, factor coefficients are deviations from the grand mean. Estimated marginal means and pairwise contrasts are usually clearer for interpreting factor effects.
lmer_fixed_effects <- coef(summary(lmer_fit)) %>%
as.data.frame() %>%
tibble::rownames_to_column("term") %>%
as_tibble()
lmer_fixed_effects %>%
knitr::kable(digits = 3)
| term | Estimate | Std. Error | df | t value | Pr(>|t|) |
|---|---|---|---|---|---|
| (Intercept) | 6.351 | 0.042 | 22.180 | 150.035 | 0.000 |
| Correct1 | 0.035 | 0.015 | 10.190 | 2.294 | 0.044 |
| PrevType1 | 0.030 | 0.005 | 20.892 | 5.597 | 0.000 |
| Trial_z | -0.010 | 0.008 | 20.717 | -1.362 | 0.188 |
| Frequency_z | -0.049 | 0.009 | 52.400 | -5.336 | 0.000 |
| Length_z | 0.018 | 0.009 | 53.074 | 1.909 | 0.062 |
lmer_fixed_ci <- confint(
lmer_fit,
parm = "beta_",
method = "Wald"
) %>%
as.data.frame() %>%
tibble::rownames_to_column("term") %>%
as_tibble() %>%
rename(
lower = `2.5 %`,
upper = `97.5 %`
)
lmer_fixed_ci %>%
knitr::kable(digits = 3)
| term | lower | upper |
|---|---|---|
| (Intercept) | 6.268 | 6.434 |
| Correct1 | 0.005 | 0.065 |
| PrevType1 | 0.020 | 0.041 |
| Trial_z | -0.025 | 0.005 |
| Frequency_z | -0.067 | -0.031 |
| Length_z | 0.000 | 0.036 |
lmer_correct_emmeans <- emmeans::emmeans(lmer_fit, ~ Correct)
lmer_correct_contrast <- pairs(lmer_correct_emmeans)
lmer_correct_emmeans
## Correct emmean SE df lower.CL upper.CL
## correct 6.39 0.0333 23.5 6.32 6.46
## incorrect 6.32 0.0587 23.4 6.20 6.44
##
## Results are averaged over the levels of: PrevType
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
lmer_correct_contrast
## contrast estimate SE df t.ratio p.value
## correct - incorrect 0.0702 0.0357 21.1 1.964 0.0628
##
## Results are averaged over the levels of: PrevType
## Degrees-of-freedom method: kenward-roger
lmer_correct_emmeans_df <- as.data.frame(lmer_correct_emmeans)
ggplot(lmer_correct_emmeans_df, aes(x = Correct, y = emmean)) +
geom_point(size = 2) +
geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.12) +
labs(
x = "Response accuracy",
y = "Estimated marginal mean log RT",
title = "Estimated marginal means for Correct"
)
We fit a linear mixed-effects model predicting log reaction time from response accuracy, previous trial type, trial order, word frequency, and word length. Factors were sum coded. We started with the maximal theoretically relevant random-effects structure and checked whether it had convergence warnings or a singular fit. To demonstrate the usual reduction workflow, I then fit a reduced random-effects structure and used it as the working model. The working random-effects structure was reduced model: subject random slopes + word random intercept. The fixed effect of response accuracy was evaluated with a Type III ANOVA using Satterthwaite degrees of freedom. Model assumptions were checked with residual diagnostics from the
performancepackage. Coefficients are reported on the log-RT scale, and factor effects are interpreted with estimated marginal means and pairwise contrasts.