---
title: "Week 10 lab: Random slopes and hierarchical structure"
subtitle: "Class-level random effects, cross-level interactions, and nested designs"
format:
  html:
    toc: true
    embed-resources: true
execute:
  echo: true
  warning: false
  message: false
---

## Goal

By the end of this lab you will be able to:

1. fit random-intercept and random-slope models with `lmer()`,
2. inspect random effects with `VarCorr()`, `ranef()`, and `coef()`,
3. compare random-effects structures with `ranova()` and `anova()`,
4. interpret a class-level predictor and a cross-level interaction, and
5. decide whether grouping factors are nested or crossed from the design.

## Reporting reminder

As in previous labs, each exercise should leave behind two things: code that reproduces the analysis, and a short write-up in your own words. For mixed-effects models, do not report only that a model "was significant." Say what varies, at which level it varies, and how the model structure follows from the design.

## 0) Setup

Put these files next to the worksheet:

- `popular.csv`
- `schools.csv`

Load packages:

```{r}
library(tidyverse)
library(lme4)
library(lmerTest)
```

This week we will use a synthetic classroom dataset. A cognitive-science research group studies social-cognitive development in school classrooms. Each pupil completed a short self-report extraversion scale and a peer-status questionnaire. The outcome `popular` is a popularity score. The predictor `extrav` is pupil extraversion. The variable `sex` is coded as pupil sex, and `texp` is the teacher's years of experience.

The important design feature is that pupils are nested in classes. Pupils from the same class share the same classroom context and the same teacher, so their popularity scores should not be treated as fully independent.

First load the raw data and inspect the variables. We will prepare the centered predictors in Exercise 1.

```{r}
pop_raw <- read.csv("popular.csv", row.names = 1) |>
  select(pupil, class, extrav, sex, texp, popular)

glimpse(pop_raw)
summary(pop_raw)
```

## 1) Exercise 1

Prepare the data for modeling. Convert `class` to a factor, recode `sex` from `0` to `"boy"` and from `1` to `"girl"`, and create two centered predictors: `extrav_c` and `texp_c`. Centering means subtracting the variable mean from each observed value. Check that the centered variables have means very close to zero.

Then check the hierarchical structure by counting how many pupils there are in each class.

For the exploratory plot, use only classes `1` through `12`. Create `shown_classes <- as.character(1:12)` and use it only for this plot; the models should still use all classes. Create a plot of `popular` against `extrav`, using one panel per class. Show the raw points only. Use this plot to inspect the scale, spread, and class structure before fitting the model.

Then fit a random-intercept model:

`popular ~ extrav_c + sex + (1 | class)`

This model estimates one average extraversion effect and one average sex difference, while allowing classes to differ in their baseline popularity.

```{r}
#| lab: student
#| include: true
pop <- NULL
class_counts <- NULL
shown_classes <- NULL
fit_ri <- NULL
```

Questions:

- What is repeated or clustered in this dataset?
- Why do we create `extrav_c` and `texp_c` before fitting these models?
- What does the fixed effect of `extrav_c` mean?
- What does the random-intercept standard deviation for `class` mean?
- Why would an ordinary regression be too optimistic here?

*Put your answer here*

## 2) Exercise 2

The random-intercept model assumes that the extraversion-popularity relationship is the same in every class. That may be too simple. In some classrooms, extraverted pupils may become much more popular; in other classrooms, extraversion may matter less.

Fit a random-slope model:

`popular ~ extrav_c + sex + (extrav_c | class)`

Then inspect the variance components, extract the class-specific coefficients with `coef()`, and compare the random-intercept and random-slope models. `ranova()` gives an ANOVA-like table for random-effect terms. `anova(model1, model2)` compares two fitted models directly.

```{r}
#| lab: student
#| include: true
fit_rs <- NULL
class_coefs <- NULL
```

Now return to the same 12 classes from Exercise 1. Make the same faceted scatterplot, but this time add the class-specific fitted lines from the random-slope mixed model. These are not separate ordinary regression lines. They are the partial-pooling lines implied by `coef(fit_rs)$class`. Because the model also includes `sex`, draw the lines for the reference sex level, `"boy"`.

```{r}
#| lab: student
#| include: true
```

Questions:

- What does the random slope for `extrav_c` allow the model to do?
- What is the difference between `ranef(fit_rs)$class` and `coef(fit_rs)$class`?
- Do the model comparisons support keeping the random slope?
- In plain language, what do the largest and smallest class-specific slopes mean?

*Put your answer here*

## 3) Exercise 3

Now add a class-level predictor. The variable `texp_c` is teacher experience centered around the average teacher experience in the dataset. Because all pupils in the same class have the same teacher, this predictor varies at the class level, not at the pupil level.

First fit a model with teacher experience as an additional fixed effect:

`popular ~ extrav_c + texp_c + sex + (extrav_c | class)`

Then fit a model with the cross-level interaction between pupil extraversion and teacher experience:

`popular ~ extrav_c * texp_c + sex + (extrav_c | class)`

The interaction asks whether the extraversion-popularity relationship changes depending on teacher experience.

```{r}
#| lab: student
#| include: true
fit_texp <- NULL
fit_cross <- NULL
```

Visualize the interaction by drawing predicted lines from `fit_cross`. Teacher experience is continuous, so use three representative values: low experience (10th percentile of `texp_c`), average experience (median of `texp_c`), and high experience (90th percentile of `texp_c`).

Create a small table called `teacher_levels` with these three values and their labels. Then create `pred_grid`, containing values of `extrav_c` across its observed range crossed with the three teacher-experience values. The function `expand_grid()` creates all combinations of the supplied values.

Add `sex = "girl"` to `pred_grid` so that predictions are drawn for one fixed sex level. Use `predict(fit_cross, newdata = pred_grid, re.form = NA)` to get fixed-effect predictions. Then plot predicted popularity against `extrav_c`, with one line for each teacher-experience value.

```{r}
#| lab: student
#| include: true
teacher_levels <- NULL
pred_grid <- NULL
```

Questions:

- What does the main effect of `texp_c` mean in this centered model?
- What does the `extrav_c:texp_c` interaction mean?
- Does the comparison of `fit_texp` and `fit_cross` support the interaction?
- Why is this called a cross-level interaction?

*Put your answer here*

## 4) Exercise 4

The last exercise returns to the design question from the lecture: are random effects nested or crossed?

Use `schools.csv`. The dataset contains pupils from six schools and four classes within each school. The labels `a`, `b`, `c`, and `d` repeat in every school, so the label `class == "a"` does not identify one unique classroom by itself.

Create a `school` by `class` table and inspect it. Then fit the nested model:

`extro ~ open + agree + social + (1 | school/class)`

The expression `(1 | school/class)` is shorthand for a school random intercept plus a class-within-school random intercept. Next, add an explicit class-within-school ID column called `class_id` with `interaction(school, class)`. The function `interaction()` combines factor labels into a new factor, such as `I.a`, `I.b`, `II.a`, and so on. Fit the equivalent explicit-ID model:

`extro ~ open + agree + social + (1 | school) + (1 | class_id)`

```{r}
#| lab: student
#| include: true
schools <- NULL
school_class_table <- NULL
fit_nested <- NULL
fit_explicit <- NULL
```

Questions:

- Why is `class` nested within `school` in the design, even though the same class labels appear in every school?
- What does `(1 | school/class)` expand to?
- Why is it safer to create a unique `class_id` when class labels repeat?
- Are the nested shorthand model and the explicit-ID model estimating the same thing here?

*Put your answer here*

## Checklist

- You can fit `lmer(y ~ x + (x | group), data = ...)`.
- You can interpret fixed effects, random intercepts, and random slopes.
- You can distinguish `ranef()` from `coef()`.
- You can compare random-effects structures with `ranova()` and `anova()`.
- You can explain what makes an interaction "cross-level."
- You can recognize nested grouping factors and create unique grouping IDs when labels repeat.