---
title: "Additional exercise 25"
author: "Jonas Moss and Per August Jarval Moen"
date: "2024"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
We load the \textit{nlme} package and print out a description of the dataset:
```{r}
library(nlme)
# Transform Subject to the values 1-27
Orthodont[,"ID"] = as.factor(as.numeric(Orthodont[,"Subject"]))
help(Orthodont)
head(Orthodont)
```
\subsection*{a)}
We do some explorative analyses of the data set. First we plot age versus distance (the response) for some subjects:
```{r}
# Exploratory analysis
plot(Orthodont) #Plot Age againts Distance for each Subject
```
\leavevmode\newline\newline Then we look at boxplots of the distance for each age group:
```{r}
distance.by.age = split(Orthodont[,"distance"], Orthodont[,"age"])
boxplot(distance.by.age)
``` 
\subsection*{b)}
We plot \textbf{distance} against \textbf{age} for each subject in the same plot:
```{r}
# Plot distance against age. For every group, add a regression line
plot(Orthodont[,"age"], Orthodont[,"distance"], col = Orthodont[,"ID"])
for(i in 1:length(unique(Orthodont[,"ID"]))){
  Orthodont.per.ID = Orthodont[Orthodont[,"ID"] == i, ]
  abline(lm(distance ~ age, data = Orthodont.per.ID), col = i)
}
```
\leavevmode\newline\newline
We observe that \textbf{distance} (the response variable) increases with \textbf{age} for most subjects.
\subsection*{c)}
We do as we are told:
```{r}
# Fit linear mixed effects model with a random intercept and no constraints on the structure of covariance matrix.
fit1 = lme(distance ~ age, data = Orthodont, random = ~1|ID)
```
<!-- \leavevmode -->
<!-- \newline\newline -->
In the above, \textit{random = $\sim$ 1} is used to indicate that $Z_i$ in equation (9.6) is set to $\mathbf{1}$. Moreover, \textit{|ID} tells the function how the data is grouped (i.e., all responses for which \textit{ID} is the same are grouped together and indexed by a common $i$ in terms of equation (9.6)). Continuing on:
```{r}
# Plot the data and the results from the LME
plot(Orthodont[,"age"], Orthodont[,"distance"], col = Orthodont[,"ID"])
beta.0.hat = fit1$coef$fixed["(Intercept)"]
beta.1.hat = fit1$coef$fixed["age"]
abline(a = beta.0.hat, b = beta.1.hat, lwd = 4) # y.hat no conditioning on random effect
for (i in 1:length(unique(Orthodont[,"ID"]))){
  b.hat.i = fit1$coef$random$ID[i,]
  abline(a = beta.0.hat + b.hat.i, b = beta.1.hat, col = i) # y.hat|b_i conditioned on random effect
}
```
\leavevmode\newline\newline The black line is the estimated mean \textbf{distance} plotted against \textbf{age}. The colored lines are means of \textbf{distance} conditional on the estimated random intercept $\hat{u}_i$ for each individual. We see that including a random intercept for each individual fits the data well (at least visually).
\subsection*{d)}
We find the estimates of $\sigma_u^2$ and $\sigma_{\epsilon}^2$ from the summary output:
```{r}
summary(fit1)
```
\leavevmode\newline
The estimate of $\sigma_u^2$ is $2.114724^2 = 4.47$ and the estimate of $\sigma_{\epsilon}^2$ is $1.431593^2 = 2.05$.

\subsection*{e)}
We add the covariate \textbf{sex} to the model as a covariate:
```{r}
fit2 = lme(distance ~ age + Sex,data = Orthodont, random = ~1|ID)
summary(fit2)
```
The coefficient for \textbf{sex} has a p-value (based on the Wald test) of $0.0054$, which is quite small. 
\newline
\newline
The interpretation of the coefficients for the fixed effects is the same as in the linear model. A (somewhat unprecise) interpretation of the random intercept is that it captures the variation between individuals after the fixed effects are taken into account.