--- title: "Week 09 lab: Partial pooling and random slopes" subtitle: "Complete pooling, no pooling, BLUPs, fitted values, and model comparison" format: html: toc: true embed-resources: true execute: echo: true warning: false message: false --- ## Goal By the end of this lab you will be able to: 1. explain why repeated observations from the same participant are not independent, 2. compare complete pooling, no pooling, and partial pooling, 3. fit a mixed-effects model with participant-specific intercepts and slopes, 4. inspect fixed effects, random effects, BLUPs, coefficients, and fitted values, and 5. test fixed and random parts of a mixed-effects model with `anova()` and `ranova()`. ## Reporting reminder As in previous labs, each exercise should leave behind two things: code that reproduces the analysis, and a short write-up in your own words. For mixed-effects models, do not report only that a model "was significant." Say what the fixed effect means, what varies across participants, and how the model structure follows from the repeated-measures design. ## 0) Setup Put this file next to the worksheet: - `week09_lab_visual_search.csv` Load packages: ```{r} library(tidyverse) library(lme4) library(lmerTest) ``` This week we will work with a synthetic visual-search dataset. A perception lab asked participants to find a target shape on displays with different amounts of visual clutter. The variable `distractor_load` goes from `0` to `5`; each one-step increase means a more cluttered display. The outcome `rt_ms` is response time in milliseconds. The important design feature is that each participant appears many times. Some participants are generally faster or slower, and participants may also differ in how strongly clutter slows them down. ```{r} search <- read.csv("week09_lab_visual_search.csv") |> mutate(participant = factor(participant)) glimpse(search) summary(search) ``` ## 1) Exercise 1 Start by checking the repeated-measures structure before fitting any model. Count how many observations there are for each participant and each value of `distractor_load`. Then make a plot with one panel for each participant. Put `distractor_load` on the x-axis and `rt_ms` on the y-axis. Inside each participant panel, add a separate ordinary regression line for that participant. These separate lines correspond to the no-pooling idea: each participant gets their own intercept and slope. The point is to see whether participants differ only in baseline speed, or whether they also differ in the slope of the clutter effect. ```{r} #| lab: student #| include: true participant_counts <- NULL ``` Questions: - What is repeated in this dataset? - Why would ordinary regression treat the data too optimistically? - Do the participant panels suggest different intercepts, different slopes, or both? *Put your answer here* ## 2) Exercise 2 Fit a complete-pooling model that ignores participants: `rt_ms ~ distractor_load` Then fit the no-pooling model by treating `participant` as a fixed-effect predictor and interacting it with `distractor_load`. Because `participant` is a factor, R will create the necessary dummy-coded terms. The interaction allows the intercept and slope to differ across participants. ```{r} #| lab: student #| include: true fit_pool <- NULL fit_nopool <- NULL ``` Now make a faceted plot that compares the complete-pooling line with participant-specific no-pooling lines. Use one panel per participant; otherwise the lines are too hard to read. ```{r} #| lab: student #| include: true ``` Questions: - What does the complete-pooling slope estimate? - What does no pooling estimate that complete pooling cannot estimate? - Why is no pooling also unsatisfactory if we want to generalize beyond these exact participants? *Put your answer here* ## 3) Exercise 3 Fit the partial-pooling model. This model estimates a population intercept and slope, while also allowing each participant to have their own intercept and their own slope for `distractor_load`. `rt_ms ~ distractor_load + (distractor_load | participant)` After fitting the model, print the model object, extract the fixed effects with `fixef()`, and inspect the variance components with `VarCorr()`. ```{r} #| lab: student #| include: true fit_mix <- NULL ``` Questions: - Interpret the fixed effect of `distractor_load`. - What does the random-intercept standard deviation represent? - What does the random-slope standard deviation represent? - What does the intercept-slope correlation tell us about the participant-level deviations? *Put your answer here* ## 4) Exercise 4 The random effects returned by `ranef()` are BLUPs: participant-specific deviations from the fixed effects. They are not the full participant-specific coefficients. Use `ranef()` to inspect the deviations, and use `coef()` to inspect the resulting participant-specific intercepts and slopes. ```{r} #| lab: student #| include: true participant_blups <- NULL participant_coefs <- NULL ``` For the first row of the dataset, compute the fitted value by hand from the fixed effects and the participant-specific random-effect adjustments. Then compare your calculation with `fitted(fit_mix)[1]`. ```{r} #| lab: student #| include: true row1 <- NULL manual_fit <- NULL ``` Finally, draw the partial-pooling fitted lines for all participants. ```{r} #| lab: student #| include: true ``` Questions: - What is the difference between `ranef(fit_mix)` and `coef(fit_mix)`? - Why does the manual fitted-value calculation include both fixed and random parts? - In plain language, what does partial pooling do to the participant-specific lines? *Put your answer here* ## 5) Exercise 5 Use `anova()` to test the fixed effect of `distractor_load`. Because `lmerTest` is loaded, this gives an ANOVA-style test using Satterthwaite's degrees of freedom. Then use `ranova()` to test the random-effects structure. Here the key question is whether the model is improved by allowing the effect of `distractor_load` to vary across participants. ```{r} #| lab: student #| include: true ``` As an alternative check, compare a random-intercept model and a random-intercept-plus-random-slope model directly with `anova()`. If you compare two REML-fitted `lmer` models this way, R will refit them with maximum likelihood for the likelihood-ratio comparison. For this lab, the main point is simpler: `ranova()` is the clearest direct test of the random-slope term. ```{r} #| lab: student #| include: true fit_ri <- NULL ``` Questions: - What does `anova(fit_mix)` say about the population-level effect of distractor load? - What does `ranova(fit_mix)` say about the random slope? - What does the direct model comparison add? - In one paragraph, describe the substantive conclusion from the mixed model. *Put your answer here* ## Checklist - You can explain why this is a repeated-measures dataset. - You can distinguish complete pooling, no pooling, and partial pooling. - You can fit `lmer(y ~ x + (x | participant), data = ...)`. - You can interpret `fixef()`, `VarCorr()`, `ranef()`, `coef()`, and `fitted()`. - You can use `anova()` and `ranova()` for fixed-effect and random-effect tests.