--- title: "Additional exercise 25" author: "Jonas Moss and Per August Jarval Moen" date: "2024" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` We load the \textit{nlme} package and print out a description of the dataset: ```{r} library(nlme) # Transform Subject to the values 1-27 Orthodont[,"ID"] = as.factor(as.numeric(Orthodont[,"Subject"])) help(Orthodont) head(Orthodont) ``` \subsection*{a)} We do some explorative analyses of the data set. First we plot age versus distance (the response) for some subjects: ```{r} # Exploratory analysis plot(Orthodont) #Plot Age againts Distance for each Subject ``` \leavevmode\newline\newline Then we look at boxplots of the distance for each age group: ```{r} distance.by.age = split(Orthodont[,"distance"], Orthodont[,"age"]) boxplot(distance.by.age) ``` \subsection*{b)} We plot \textbf{distance} against \textbf{age} for each subject in the same plot: ```{r} # Plot distance against age. For every group, add a regression line plot(Orthodont[,"age"], Orthodont[,"distance"], col = Orthodont[,"ID"]) for(i in 1:length(unique(Orthodont[,"ID"]))){ Orthodont.per.ID = Orthodont[Orthodont[,"ID"] == i, ] abline(lm(distance ~ age, data = Orthodont.per.ID), col = i) } ``` \leavevmode\newline\newline We observe that \textbf{distance} (the response variable) increases with \textbf{age} for most subjects. \subsection*{c)} We do as we are told: ```{r} # Fit linear mixed effects model with a random intercept and no constraints on the structure of covariance matrix. fit1 = lme(distance ~ age, data = Orthodont, random = ~1|ID) ``` In the above, \textit{random = $\sim$ 1} is used to indicate that $Z_i$ in equation (9.6) is set to $\mathbf{1}$. Moreover, \textit{|ID} tells the function how the data is grouped (i.e., all responses for which \textit{ID} is the same are grouped together and indexed by a common $i$ in terms of equation (9.6)). Continuing on: ```{r} # Plot the data and the results from the LME plot(Orthodont[,"age"], Orthodont[,"distance"], col = Orthodont[,"ID"]) beta.0.hat = fit1$coef$fixed["(Intercept)"] beta.1.hat = fit1$coef$fixed["age"] abline(a = beta.0.hat, b = beta.1.hat, lwd = 4) # y.hat no conditioning on random effect for (i in 1:length(unique(Orthodont[,"ID"]))){ b.hat.i = fit1$coef$random$ID[i,] abline(a = beta.0.hat + b.hat.i, b = beta.1.hat, col = i) # y.hat|b_i conditioned on random effect } ``` \leavevmode\newline\newline The black line is the estimated mean \textbf{distance} plotted against \textbf{age}. The colored lines are means of \textbf{distance} conditional on the estimated random intercept $\hat{u}_i$ for each individual. We see that including a random intercept for each individual fits the data well (at least visually). \subsection*{d)} We find the estimates of $\sigma_u^2$ and $\sigma_{\epsilon}^2$ from the summary output: ```{r} summary(fit1) ``` \leavevmode\newline The estimate of $\sigma_u^2$ is $2.114724^2 = 4.47$ and the estimate of $\sigma_{\epsilon}^2$ is $1.431593^2 = 2.05$. \subsection*{e)} We add the covariate \textbf{sex} to the model as a covariate: ```{r} fit2 = lme(distance ~ age + Sex,data = Orthodont, random = ~1|ID) summary(fit2) ``` The coefficient for \textbf{sex} has a p-value (based on the Wald test) of $0.0054$, which is quite small. \newline \newline The interpretation of the coefficients for the fixed effects is the same as in the linear model. A (somewhat unprecise) interpretation of the random intercept is that it captures the variation between individuals after the fixed effects are taken into account.