knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

library(languageR)
library(lme4)
library(lmerTest)
library(dplyr)
library(tidyr)
library(ggplot2)
library(tibble)
library(performance)
library(see)
library(emmeans)
library(knitr)

set.seed(1234)
theme_set(theme_minimal(base_size = 12))

1 Overview

In my recent paper, Du., B. Pfiffner., A. (In Revision) The co-variance of acoustic cues and lip gestures in the production of the Mandarin sibilant contrast. Journal of Laboratory Phonology, we went through LMER, GLMER (which is not included in the final paper), and GAMM. I think it’s time to summarize this process as a note to my future self, also for my fellow PhD students who might be struggling with any modeling. Therefore, this is the first of this series of stats modeling tutorial, which I hope is helpful for anyone who comes across this.

This tutorial fits a linear mixed-effects model with lmer() using the built-in lexdec data from the languageR package. The outcome variable is RT, the log-transformed reaction time. The factor of our interest is Correct, which compares correct and incorrect lexical decision responses.

2 Workflow

For any LMER modeling, there is this generic templatic workflow that you can adapt. In my previous years of running LMER since 2021, I have not seen too much deviation from these steps.

  1. Prepare tidy data, that means “one row per observation”.
  2. Set the factor coding scheme.
  3. Start with a maximally specified random-effects structure.
  4. Fit the maximal model and check convergence and singular fit.
  5. Fit a reduced random-effects model when the maximal model is problematic, e.g., non-convergence or singular fit.
  6. Use the reduced model as the working model for inference.
  7. Use an ANOVA table for the fixed effect of interest.
  8. Check model assumptions with performance::check_model().
  9. Interpret fixed effects and estimated marginal means.

For step 5 in this tutorial, we pretend that the reduced model is nicely fitted. But in real life, this is often the most iterative part.

3 Prepare the Data

3.1 Load the sample data from languageR

data("lexdec", package = "languageR")
# I use df_print: paged, so there should no worries about printing gigantic df object. 
lexdec

3.2 Cleaning, data type coercing, etc.

lexdec_lmer <- lexdec %>%
  filter(
    !is.na(RT),
    !is.na(Correct),
    !is.na(PrevType),
    !is.na(Trial),
    !is.na(Frequency),
    !is.na(Length),
    !is.na(Subject),
    !is.na(Word)
  ) %>%
  mutate(
    Subject = factor(Subject),
    Word = factor(Word),
    Correct = factor(Correct, levels = c("correct", "incorrect")),
    PrevType = factor(PrevType, levels = c("nonword", "word")),
    Trial_z = as.numeric(scale(Trial)),
    Frequency_z = as.numeric(scale(Frequency)),
    Length_z = as.numeric(scale(Length))
  ) %>%
  droplevels()

lexdec_lmer

4 Set the Coding Scheme

For fixed-effect tests with factors, be explicit about contrasts. I use sum coding for Correct and PrevType. With two-level sum coding, each model coefficient is a deviation from the grand mean. For direct level comparisons, emmeans is easier to interpret than raw coefficients.

If you care more about the direct comparison between two levels, maybe treatment coding is better. But that can make the interpretation of simple effects difficult when interaction is involved. So I almost always choose sum coding.

# Contrast set to sum coding. conrt.sum(levels_of_factor)
contrasts(lexdec_lmer$Correct) <- contr.sum(2)
contrasts(lexdec_lmer$PrevType) <- contr.sum(2)

contrasts(lexdec_lmer$Correct)
##           [,1]
## correct      1
## incorrect   -1
contrasts(lexdec_lmer$PrevType)
##         [,1]
## nonword    1
## word      -1

5 Summary statistics

lexdec_lmer %>%
  summarise(
    n_observations = n(),
    n_subjects = n_distinct(Subject),
    n_words = n_distinct(Word),
    mean_rt = mean(RT),
    sd_rt = sd(RT)
  )
lexdec_lmer %>%
  count(Correct)
lexdec_lmer %>%
  count(PrevType)

And basic data pattern for eyeballing the difference.

ggplot(lexdec_lmer, aes(x = Correct, y = RT, fill = Correct)) +
  geom_boxplot(outlier.alpha = 0.2) +
  labs(
    x = "Response accuracy",
    y = "Log reaction time",
    title = "Lexical decision RT by response accuracy"
  ) +
  guides(fill = "none")

6 Start With the Maximal Random Structure

The maximal structure includes random intercepts for subjects and words. It also includes random slopes for predictors that vary within each grouping factor.

For subjects, Correct, PrevType, Trial_z, Frequency_z, and Length_z vary across trials. For words, Frequency_z and Length_z are item-level properties, so they do not vary within a word and are not used as word-level random slopes.

Note: This can be the most confusing part of LMER modeling. How do you determine which random slope to include for which grouping variable? The answer is, if a predictor varies between levels of your grouping variable, then you do not include that predictor in the random slope. An intuitive way to think about it is that: If I know my item is X, does that lead to that my predictor (e.g., word type) must be one specific level? If your answer is yes, then you should NOT include the predictor as your random slope. If your answer is no, then you probably want to include that predictor as your random slope (you should also want to make sure that this matches your assumption of the predictor’s behavior)

This is because, by definition, random slope is to assume that the effect of that predictor is dependent on the grouping factor. This won’t happen if holding that grouping variable constant already determine uniquely the level of the predictor.

If the maximal model does not converge or is singular, simplify the random structure. In a real analysis, this reduction should be theoretically motivated: remove the least important or least stable random slopes first, while keeping the fixed-effects structure aligned with the research question.

7 Fit the Maximal Model

By the way, this is a gigantic model that probably takes a minute or so to fit.

lmer_control <- lme4::lmerControl(
  optimizer = "bobyqa",
  optCtrl = list(maxfun = 200000)
)

lmer_maximal <- lmerTest::lmer(
  RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +
    (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z | Subject) +
    (1 + Correct + PrevType + Trial_z | Word),
  data = lexdec_lmer,
  REML = FALSE,
  control = lmer_control
)

8 Check model fit and convergence.

summary(lmer_maximal)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +  
##     (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z |  
##         Subject) + (1 + Correct + PrevType + Trial_z | Word)
##    Data: lexdec_lmer
## Control: lmer_control
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   -1057.8    -852.1     566.9   -1133.8      1621 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4489 -0.6094 -0.1059  0.4465  6.4097 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr                         
##  Word     (Intercept) 2.502e-03 0.050023                              
##           Correct1    2.357e-03 0.048545 -0.37                        
##           PrevType1   5.790e-04 0.024062  0.27 -0.22                  
##           Trial_z     1.754e-05 0.004188 -0.65  0.28 -0.90            
##  Subject  (Intercept) 3.521e-02 0.187654                              
##           Correct1    2.213e-03 0.047043 -0.92                        
##           PrevType1   2.751e-04 0.016587 -0.15  0.33                  
##           Trial_z     8.630e-04 0.029377 -0.17 -0.18 -0.06            
##           Frequency_z 4.142e-04 0.020351 -0.29  0.15 -0.08  0.50      
##           Length_z    4.403e-04 0.020984  0.93 -0.89 -0.30 -0.19 -0.14
##  Residual             2.488e-02 0.157744                              
## Number of obs: 1659, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)  6.364607   0.043306 23.576572 146.968  < 2e-16 ***
## Correct1     0.022349   0.017437 30.214507   1.282    0.210    
## PrevType1    0.030562   0.006040 27.520260   5.060 2.46e-05 ***
## Trial_z     -0.010617   0.007574 20.957927  -1.402    0.176    
## Frequency_z -0.052375   0.009085 52.891275  -5.765 4.30e-07 ***
## Length_z     0.014729   0.009144 54.041526   1.611    0.113    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Crrct1 PrvTy1 Tril_z Frqnc_
## Correct1    -0.738                            
## PrevType1   -0.068  0.075                     
## Trial_z     -0.140 -0.086 -0.056              
## Frequency_z -0.103 -0.042 -0.026  0.206       
## Length_z     0.443 -0.264 -0.084 -0.083  0.291
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
cat(
  "Model singular fit: ",
  lme4::isSingular(lmer_maximal),
  "\n",
  sep = ""
)
## Model singular fit: TRUE
cat(
  "Model failed to converge: ",
  length(lmer_maximal@optinfo$conv$lme4$messages) > 0,
  "\n",
  sep = ""
)
## Model failed to converge: TRUE

9 Fit a Reduced Random-Effects Model

For the rest of this walkthrough, pretend that the maximal model above is too complex because it has a convergence warning, a singular fit, or both. (I did not check whether this is actually the case or not). The next step is to fit a simpler random-effects structure.

Here I keep the subject random slopes and simplify the word structure to a random intercept. This is a common kind of reduction when the item-level slope structure is too ambitious for the data.

lmer_reduced <- lmerTest::lmer(
  RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +
    (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z | Subject) +
    (1 | Word),
  data = lexdec_lmer,
  REML = FALSE,
  control = lmer_control
)

10 Check reduced model fit and convergence

summary(lmer_reduced)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +  
##     (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z |  
##         Subject) + (1 | Word)
##    Data: lexdec_lmer
## Control: lmer_control
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   -1063.0    -906.0     560.5   -1121.0      1630 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4775 -0.6255 -0.1142  0.4398  6.3372 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr                         
##  Word     (Intercept) 0.0028324 0.05322                               
##  Subject  (Intercept) 0.0340412 0.18450                               
##           Correct1    0.0020135 0.04487  -0.90                        
##           PrevType1   0.0002738 0.01655  -0.10  0.15                  
##           Trial_z     0.0008767 0.02961  -0.22 -0.10 -0.07            
##           Frequency_z 0.0004210 0.02052  -0.32  0.19 -0.01  0.47      
##           Length_z    0.0004785 0.02188   0.92 -0.89 -0.24 -0.26 -0.12
##  Residual             0.0257842 0.16057                               
## Number of obs: 1659, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)  6.351440   0.042333 22.179728 150.035  < 2e-16 ***
## Correct1     0.035080   0.015291 10.189801   2.294   0.0442 *  
## PrevType1    0.030431   0.005437 20.891510   5.597 1.51e-05 ***
## Trial_z     -0.010377   0.007621 20.717015  -1.362   0.1879    
## Frequency_z -0.048824   0.009150 52.400327  -5.336 2.07e-06 ***
## Length_z     0.017715   0.009280 53.073739   1.909   0.0617 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Crrct1 PrvTy1 Tril_z Frqnc_
## Correct1    -0.746                            
## PrevType1   -0.065  0.058                     
## Trial_z     -0.179 -0.057 -0.041              
## Frequency_z -0.132  0.012 -0.005  0.197       
## Length_z     0.449 -0.297 -0.077 -0.116  0.289
cat(
  "Model singular fit: ",
  lme4::isSingular(lmer_reduced),
  "\n",
  sep = ""
)
## Model singular fit: FALSE
cat(
  "Model failed to converge: ",
  length(lmer_reduced@optinfo$conv$lme4$messages) > 0,
  "\n",
  sep = ""
)
## Model failed to converge: FALSE

For this walkthrough, I treat the reduced model as the final working model. In a real analysis, if this reduced model still had convergence or singularity problems, I would continue simplifying the random-effects structure and document each step.

11 Record model structure (for later reporting)

lmer_fit <- lmer_reduced
selected_lmer_name <- "reduced model: subject random slopes + word random intercept"

selected_lmer_name
## [1] "reduced model: subject random slopes + word random intercept"
formula(lmer_fit)
## RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z + 
##     (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z | 
##         Subject) + (1 | Word)

12 Check the Working Model

Don’t jump to interpret the coefficients yet. We first need to check whether individual factors are significant or not.

summary(lmer_fit)
## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: RT ~ Correct + PrevType + Trial_z + Frequency_z + Length_z +  
##     (1 + Correct + PrevType + Trial_z + Frequency_z + Length_z |  
##         Subject) + (1 | Word)
##    Data: lexdec_lmer
## Control: lmer_control
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   -1063.0    -906.0     560.5   -1121.0      1630 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4775 -0.6255 -0.1142  0.4398  6.3372 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr                         
##  Word     (Intercept) 0.0028324 0.05322                               
##  Subject  (Intercept) 0.0340412 0.18450                               
##           Correct1    0.0020135 0.04487  -0.90                        
##           PrevType1   0.0002738 0.01655  -0.10  0.15                  
##           Trial_z     0.0008767 0.02961  -0.22 -0.10 -0.07            
##           Frequency_z 0.0004210 0.02052  -0.32  0.19 -0.01  0.47      
##           Length_z    0.0004785 0.02188   0.92 -0.89 -0.24 -0.26 -0.12
##  Residual             0.0257842 0.16057                               
## Number of obs: 1659, groups:  Word, 79; Subject, 21
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)  6.351440   0.042333 22.179728 150.035  < 2e-16 ***
## Correct1     0.035080   0.015291 10.189801   2.294   0.0442 *  
## PrevType1    0.030431   0.005437 20.891510   5.597 1.51e-05 ***
## Trial_z     -0.010377   0.007621 20.717015  -1.362   0.1879    
## Frequency_z -0.048824   0.009150 52.400327  -5.336 2.07e-06 ***
## Length_z     0.017715   0.009280 53.073739   1.909   0.0617 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Crrct1 PrvTy1 Tril_z Frqnc_
## Correct1    -0.746                            
## PrevType1   -0.065  0.058                     
## Trial_z     -0.179 -0.057 -0.041              
## Frequency_z -0.132  0.012 -0.005  0.197       
## Length_z     0.449 -0.297 -0.077 -0.116  0.289

13 ANOVA for the Fixed Effect

For the LMER, this walkthrough uses the anova() method provided by lmerTest with Type III tests and Satterthwaite degrees of freedom. Because the factors were sum coded, the main effect tests are aligned with the usual Type III interpretation.

If you use treatment coding, Type-III ANOVA does not automatically become “wrong”, especially if your preductors are mostly binary.

# These are the default anyway, but making them explicit is better.
lmer_anova <- anova(
  lmer_fit,
  type = 3,
  ddf = "Satterthwaite"
)

lmer_anova

14 Factor-of-interest test

lmer_anova["Correct", , drop = FALSE]

15 Check Model Assumptions

For LMERs, check residual normality, linearity, influential observations, and heterogeneity. performance::check_model() gives a compact visual diagnostic panel.

performance::check_model(lmer_fit)

performance::check_collinearity(lmer_fit)
performance::check_singularity(lmer_fit)
## [1] TRUE

If these checks look poor, common next steps include a different outcome transformation, a different likelihood family, nonlinear terms, stronger outlier handling, or a revised random-effects structure.

16 Interpret the Model

The fixed-effect table gives coefficients on the log-RT scale. With sum coding, factor coefficients are deviations from the grand mean. Estimated marginal means and pairwise contrasts are usually clearer for interpreting factor effects.

lmer_fixed_effects <- coef(summary(lmer_fit)) %>%
  as.data.frame() %>%
  tibble::rownames_to_column("term") %>%
  as_tibble()

lmer_fixed_effects %>%
  knitr::kable(digits = 3)
term Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.351 0.042 22.180 150.035 0.000
Correct1 0.035 0.015 10.190 2.294 0.044
PrevType1 0.030 0.005 20.892 5.597 0.000
Trial_z -0.010 0.008 20.717 -1.362 0.188
Frequency_z -0.049 0.009 52.400 -5.336 0.000
Length_z 0.018 0.009 53.074 1.909 0.062
lmer_fixed_ci <- confint(
  lmer_fit,
  parm = "beta_",
  method = "Wald"
) %>%
  as.data.frame() %>%
  tibble::rownames_to_column("term") %>%
  as_tibble() %>%
  rename(
    lower = `2.5 %`,
    upper = `97.5 %`
  )

lmer_fixed_ci %>%
  knitr::kable(digits = 3)
term lower upper
(Intercept) 6.268 6.434
Correct1 0.005 0.065
PrevType1 0.020 0.041
Trial_z -0.025 0.005
Frequency_z -0.067 -0.031
Length_z 0.000 0.036
lmer_correct_emmeans <- emmeans::emmeans(lmer_fit, ~ Correct)
lmer_correct_contrast <- pairs(lmer_correct_emmeans)

lmer_correct_emmeans
##  Correct   emmean     SE   df lower.CL upper.CL
##  correct     6.39 0.0333 23.5     6.32     6.46
##  incorrect   6.32 0.0587 23.4     6.20     6.44
## 
## Results are averaged over the levels of: PrevType 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95
lmer_correct_contrast
##  contrast            estimate     SE   df t.ratio p.value
##  correct - incorrect   0.0702 0.0357 21.1   1.964  0.0628
## 
## Results are averaged over the levels of: PrevType 
## Degrees-of-freedom method: kenward-roger
lmer_correct_emmeans_df <- as.data.frame(lmer_correct_emmeans)

ggplot(lmer_correct_emmeans_df, aes(x = Correct, y = emmean)) +
  geom_point(size = 2) +
  geom_errorbar(aes(ymin = lower.CL, ymax = upper.CL), width = 0.12) +
  labs(
    x = "Response accuracy",
    y = "Estimated marginal mean log RT",
    title = "Estimated marginal means for Correct"
  )

17 Example Reporting Language

We fit a linear mixed-effects model predicting log reaction time from response accuracy, previous trial type, trial order, word frequency, and word length. Factors were sum coded. We started with the maximal theoretically relevant random-effects structure and checked whether it had convergence warnings or a singular fit. To demonstrate the usual reduction workflow, I then fit a reduced random-effects structure and used it as the working model. The working random-effects structure was reduced model: subject random slopes + word random intercept. The fixed effect of response accuracy was evaluated with a Type III ANOVA using Satterthwaite degrees of freedom. Model assumptions were checked with residual diagnostics from the performance package. Coefficients are reported on the log-RT scale, and factor effects are interpreted with estimated marginal means and pairwise contrasts.