--- title: "Exercise 5.17" author: "Per August Moen" date: "Fall 2024" output: pdf_document: default html_document: default --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` \section*{Exercise 5.17} We will illustrate that our findings in exercise 5.16 are true by considering a toy data set. \newline \newline We consider two nested logistic regression models with canonical link (logit), letting $\pi_i$ denote the probability of success within group $i$: \begin{itemize} \item Model $\text{M}_0: \text{logit}(\pi_i) = \beta_0$ for some $\beta_0$ (intercept-only model), \item Model $\text{M}_1: \text{logit}(\pi_i) = \beta_0 + \beta_1 x_i$ for some $\beta_0,\beta_1$. \end{itemize} We may consider the data as un-grouped Bernoulli trials, $$ Y_{i,j} \overset{\text{ind}}{\sim} \text{Bin}(1, \pi_i) \ \text{for } j=1,\dots,n_i, \ i = 1, \dots, N, $$ or grouped, with $\tilde{Y}_i := \frac{1}{n_i} \sum_{j=1}^{n_i} Y_{i,j}$, $$ \tilde{Y}_{i} \overset{\text{ind}}{\sim} \frac{1}{n_i}\text{Bin}(n_i, \pi_i) \ \text{for } i = 1, \dots, N, $$ \newline \newline We load both the un-grouped data and the grouped data into R. \newline \newline ```{r cache = TRUE} # Grouped data x = c(0,1,2) n_i = c(4,4,4) y = c(1,2,4)/4 grouped = data.frame(x,n_i, y) # Un-grouped data x = c(rep(0,4), rep(1,4), rep(2,4)) n_i = rep(1,12) y = c(1,rep(0,3), 1,1,0,0,rep(1,4)) ungrouped = data.frame(x,n_i, y) ``` \leavevmode \newline \subsection*{a)} Let us fit the models $\text{M}_0$ and $\text{M}_1$ to both the grouped and un-grouped data. We begin with the grouped data: \newline ```{r cache = TRUE} # Grouped data, fit models: M0.grouped = glm(y~ 1, data=grouped, weights = n_i,family=binomial(link="logit")) M1.grouped = glm(y~x, data=grouped, weights = n_i,family=binomial(link="logit")) summary(M0.grouped) summary(M1.grouped) ``` \leavevmode \newline \newline Note that (what we call) the deviance is called "Residual deviance" in the R output. \newline \newline Let us fit the models $\text{M}_0$ and $\text{M}_1$ to the un-grouped data: \newline ```{r cache = TRUE} # Un-grouped data, fit models: M0.ungrouped = glm(y~ 1, data=ungrouped, weights = n_i,family=binomial(link="logit")) M1.ungrouped = glm(y~x, data=ungrouped, weights = n_i,family=binomial(link="logit")) summary(M0.ungrouped) summary(M1.ungrouped) ``` \leavevmode \newline \newline By inspecting the output, we see that the deviance associated with model $\text{M}_0$ differs between the grouped and un-grouped data. The same is true for model $\text{M}_1$. We explained this phenomenon in exercise 5.16. \subsection*{b)} The difference between the deviances of model $\text{M}_0$ and model $\text{M}_1$ using the grouped data is given by $D(Y;\hat{\mu}_0) - D(Y;\hat{\mu}_1)=$```r M0.grouped$deviance``` $-$```r M1.grouped$deviance``` $=$ ```r M0.grouped$deviance - M1.grouped$deviance```. \newline \newline The difference between the deviances of model $\text{M}_0$ and model $\text{M}_1$ using the un-grouped data is given by $D(\tilde{Y};\hat{\mu}_0) - D(\tilde{Y};\hat{\mu}_1)=$```r M0.ungrouped$deviance``` $-$```r M1.ungrouped$deviance``` $=$ ```r M0.ungrouped$deviance - M1.ungrouped$deviance```. \newline \newline So, the difference in deviance between the two models is the same, regardless of whether the data is grouped or un-grouped. This is in agreement with what we found in exercise 5.16.