Goal
Produce a fully reproducible statistical report, in R, that demonstrates your mastery of the techniques we have covered during the classes.
You are free to choose any empirical dataset or study as long as the data allow you to apply (and justify) at least two advanced methods from the course.
This final project is worth 40% of your final course grade (see the Syllabus page for course-wide assessment details).
How to Choose Your Dataset
Public data repositories
- Sources: Open Science Framework, Zenodo, Figshare, ICPSR, Kaggle, NIH/NASA archives
Your own or lab data
- Sources: Datasets maintained by your research lab; data from your experiments - current or old (perhaps re-analysing data from your BA project?)
Published articles
- Sources: Any journal related to cognitive science (look for papers with supplementary materials)
Optional: project topic ideas (simulation/replication)
- See Project Topic Bank (Optional) in this week.
What to look for:
Datasets with more than 2 levels (e.g., students nested in schools), repeated measures, or multiple variables suitable for a complex linear model.
Submission format (single-file Quarto)
- Your graded submission is one Quarto source file (
.qmd), uploaded via this page. - The
.qmdmust contain your full report: methods, results, and final written interpretation. - Your
.qmdmust render without errors on my machine (fresh R session). - Do not use
setwd()and do not use absolute paths. - Week 15 presentation materials are separate and do not replace the
.qmdsubmission.
Use the provided template: final_project_template.qmd.
Submission rules: see How to Submit Assignments - Quarto (.qmd) Basics (Introduction section).
Data access (important)
Your project must be reproducible from the .qmd:
- If you use a built-in dataset, state package + dataset clearly in your
.qmd. - If you use a public dataset, your
.qmdmust download/read it from a stable public source (OSF/Zenodo/DOI/direct URL). - If you use your own/lab data that are not publicly shareable, contact the instructor before the deadline to arrange an approved reproducible access method.
- If data access is not possible, I cannot reproduce your analysis and your grade may be capped.
Minimum statistical content (syllabus-aligned)
Your project must include at least 2 advanced components from this list (you may include more):
- Diagnostics & robustness
- model diagnostics (residuals, influence) and at least one robustness step (e.g., log/Box-Cox, bootstrap CI, sensitivity check)
- Generalized linear model (GLM)
- logistic regression (binary) and/or Poisson regression (counts), interpretation on the response scale
- Mixed model (LMM/GLMM)
- random intercepts and/or random slopes, justification of the random-effects structure, convergence/singularity checks
- SEM / CFA (lavaan)
- CFA with fit indices + interpretation of loadings, and/or a simple SEM/path model
- Meta-analysis (metafor)
- compute effect sizes and run a small meta-analysis (forest plot + at least one bias/sensitivity check)
At least one of your components must be from (2)–(5) (i.e., beyond “standard regression/ANOVA only”).
Report Structure
- Introduction
- Background & theory
- Clear specification of the research question(s)
- Clear specification of dependent/independent variables and the structure of the dataset (levels, repeated measures, items)
- Clear hypotheses that will be tested (it does not matter whether exploratory or confirmatory)
- Methods
- Pre-processing steps (handling of missing data, transformations)
- Model specification choices (coding, random effects, links, etc.) and why
- Results
- At least 2 tables (descriptives + model results)
- At least 2 figures (model-based plot(s), diagnostics, forest plot, etc.)
- Precise reporting of statistics (estimates, CI, p, effect sizes where applicable)
- Discussion
- Interpretation in light of hypotheses
- Discussion of the statistical methods that were used
- Limitations and what you would do next
Week 15 presentation requirements
You will present your final project in Week 15.
Bring:
- a short slide deck (PDF recommended), or a Quarto HTML report you can navigate live
- your main result figure/table ready to show
Format (default; may be adjusted depending on class size):
- 8–10 minutes presentation
- 3–5 minutes Q&A
Your presentation should answer (in order):
1) What is the research question and what is the dataset?
2) What model(s) did you fit and why are they appropriate?
3) What is the main result (with one key figure/table)?
4) What did you check (diagnostics/robustness/convergence/bias) and what did you conclude?
Grading rubric (40% of course grade)
This rubric is only for the final project (separate from homework rubrics).
Graded on a 0–10 scale (then weighted as 40% of the course grade):
| Component | Points (max) | What “full points” means |
|---|---|---|
| Research question + design clarity | 1.5 | Clear question, variables, and data structure; appropriate scope |
| Data prep + transparency | 1.5 | Clear preprocessing; missing data handled; transformations justified |
| Methods + justification | 3.0 | At least 2 advanced components; correct implementation; choices justified |
| Diagnostics / robustness / bias checks | 1.5 | Appropriate checks for the methods used; conclusions reflect checks |
| Reporting + interpretation | 2.0 | Clear tables/figures; correct interpretation in plain language; limitations noted |
| Reproducibility | 0.5 | Renders cleanly; no absolute paths; data access is reproducible |
Retake rule (if failed)
If the project is failed, you must submit a revised version (or a new analysis) during the retake period.
Completion of all homework assignments is required to submit the final project.