--- title: "Assignment: correlation" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` # General remarks This week you are asked to try to report the results of the statistical analyses in a more "professional" way. A good place to start is the APA standard (or "APA style"). There are many materials on the web, but I like this short guide from the University of Washington (read it!): https://psych.uw.edu/storage/writing_center/stats.pdf # Sub-Saharan Africa and infant mortality In Sub-Saharan Africa, more than half of mothers lose at least one child before the child's first birthday. In `infmort.xlsx` are data on 36 countries in the region, giving country, infant mortality, per capita income (in U.S. dollars), percentage of births to mothers under 20, percentage of births to mothers over 40, percentage of births less than 2 years apart, percentage of married women using contraception, and percentage of women with unmet family planning need. # Exercise 0 (prerequisite, not graded) Load the data from the `xlsx` file. To do that you will need to look for an appropriate library. Take this opportunity to do some research on loading different types of files into R. What files can you load? What are the most comprehensive packages to do that? Can you, for example, load SPSS, SAS or Stata files? How? What about more exotic filetypes? Lets say you need to work with HDF5 files. Can you find (and install!) a library that makes this possible? ```{r} # Your code here ``` # Exercise 1 Make a scatter diagram of `InfMort` and `Income` and draw the line that best fits the data (regression line). We did not cover it during the class so treat it as an opportunity to do independent research into the capabilities of R! ```{r} # Code here duh ``` # Exercise 2 Calculate the correlations among all numeric variables in the previous exercise. Prepare a visualization of the resulting correlation matrix. What are the strongest predictors of infant mortality? ```{r} # Code here duh ``` *Your answer here* # Exercise 3 Run a statistical test for the two strongest predictors of infant mortality and describe the results of your analysis. What can you conclude from these findings? What are the limitations of the data? ```{r} # Code here ``` *Your answer here* # Exercise 4 (SHARKS) How large a correlation would you need for the relationships shown in the previous exercises to be significant? You need to use the formula for the $t$ statistic. I like to do these kinds of tasks using `uniroot` function because my high-school math was terrible, but you can probably just transform the formula and compute the answer directly. ```{r} # Code here ``` # Exercise 5 (SHARKS) If you have multivariate data at hand, `cor` function is useful but has some limitations. For example, in contrast to `cor.test` it does not compute p-values and confidence intervals. Of course, you can run multiple `cor.test`s but it is rather cumbersome. Several packages aim to streamline this process. My favorite is `rstatix` library that contains `cor_test` function. Using this function, calculate the correlations among all numeric variables in the previous exercises. ```{r} # Code here ``` # Exercise 6 (SHARKS) We focused on testing a null hypothesis stating that the "true" correlation (population correlation) is equal to 0 ($H_0: \rho = 0$). There are, however, other interesting hypotheses about the correlation that we may want to test. For example, we can ask whether two population correlations differ ($H_0: \rho_1 = \rho_2$) or whether the population correlation is different from some specified value other than 0 (e.g. $H_0: \rho = 0.1$). To run those tests you need to implement them yourself (hard) or find an appropriate library (easy). a) run a statistical test to determine whether the two highest correlations between demographic variables and infant mortality are different ```{r} # Code here ``` *Your answer here* b) run a statistical test to determine whether the highest correlation differs significantly from $0.1$ ```{r} # Code here ``` *Your answer here*