\documentclass[10pt,UKenglish]{article} \RequirePackage{amsthm,amsmath,amsfonts} \RequirePackage{bm} \usepackage{enumerate} \usepackage{paralist} \usepackage{graphicx} \usepackage{hyperref} \usepackage{color} \newcommand{\E}{\mbox{E}} \newcommand{\sd}{\mbox{sd}} \newcommand{\V}{\mbox{V}} \usepackage{url} \usepackage{parskip} \newcommand{\B}{\boldsymbol} \newcommand{\Bb}{\mathbf} \begin{document} \begin{center} \section*{STK3100/4100––Introduction to Generalized Linear Models} \subsection*{Mandatory assignment 1 of 2} \end{center} \subsubsection*{Submission deadline} Thursday September 26 2024, 14:30 in Canvas (\url{canvas.uio.no}). \subsubsection*{Instructions} Note that you have \textbf{one attempt} to pass the assignment. This means that there are no second attempts. You can choose between scanning handwritten notes or typing the solution directly on a computer (for instance with Latex). The assignment must be submitted as \textbf{a single PDF file}. Scanned pages must be clearly legible. The submission must contain your name, course and assignment number. It is expected that you give a clear presentation with all necessary explanations. Remember to include all relevant plots and figures. All aids, including collaboration, are allowed, but the submission must be written by you and reflect your understanding of the subject. If we doubt that you have understood the content you have handed in, we may request that you give an oral account. In exercises where you are asked to write a computer program, you need to hand in the code along with the rest of the assignment. It is important that the submitted program contains a trial run, so that it is easy to see the result of the code. \subsubsection*{Application for postponed delivery} If you need to apply for a postponement of the submission deadline due to illness or other reasons, you have to contact the Student Administration at the Department of Mathematics (e-mail: \href{mailto:studieinfo@math.uio.no}{studieinfo@math.uio.no}) no later than the same day as the deadline. All mandatory assignments in this course must be approved in the same semester, before you are allowed to take the final examination. \subsubsection*{Specifically about this assignment} In order to get the assignment accepted you need to fulfil the following requirements: \begin{itemize} \item Made a real attempt on all (sub-)questions \item Give satisfactory answers in at least 60$\%$ of the (sub-)questions \item Include relevant R outputs in your report. \end{itemize} \subsubsection*{Complete guidelines about delivery of mandatory assignments:} \url{www.uio.no/english/studies/admin/compulsary-activities/mn-math-mandatory.html} \begin{center} GOOD LUCK! \end{center} \subsubsection*{Problem 1} In this problem, we will consider the relationship between a person's wingspan and height. The wingspan is the horizontal measurement from fingertip to fingertip with outstretched arms. The data below show the wingspan and height in cm for $16$ Australian women, and are also recorded in the file\\ {\scriptsize\url{http://www.uio.no/studier/emner/matnat/math/STK3100/data/wingspan.txt}}:\\ \begin{verbatim} Height (x) Wingspan (y) 1 63.0 62.0 2 63.0 62.0 3 65.0 64.0 4 64.0 64.5 5 68.0 67.0 6 69.0 69.0 7 71.0 70.0 8 68.0 72.0 9 68.0 70.0 10 72.0 72.0 11 73.0 73.0 12 73.5 75.0 13 70.0 71.0 14 70.0 70.0 15 72.0 76.0 16 74.0 76.5 \end{verbatim} We will return to the relationship between height and wingspan in question f), but first we will consider the problem more generally. To this end we consider a simple linear regression model with the single covariate $\Bb{x}=(x_{1},\ldots,x_{n})^{T}$ and the response $\Bb{Y}=(Y_{1},\ldots,Y_{n})^{T}$, where $Y_{1},\ldots,Y_{n}$ are independent and $Y_{i}\sim N(\mu_{i},\sigma^{2})$. We will consider two models, $M_{0}$ and $M_{1}$. Model $M_{1}$ is the standard linear regression model \[ \mu_{i} = \beta_{0}+\beta_{1}x_{i}, \] whereas $M_{0}$ is the same model, but without the intercept, i.e. \[ \mu_{i} = \beta_{1}^{*}x_{i}. \] Now, let $\B{\mu}=(\mu_{1},\ldots,\mu_{n})^{T}$ be the $n \times 1$ vector of mean values. Further, the model matrices for models $M_{0}$ and $M_{1}$ are denoted $\Bb{X}_{0}$ and $\Bb{X}_{1}$ and the model spaces are denoted $C(\Bb{X}_{0})$ and $C(\Bb{X}_{1})$. We use the notation $\B{1}_{k}$ to denote a $k \times 1$ vector of $1$s. \begin{enumerate}[a)] \item Give the model matrices for models $M_{0}$ and $M_{1}$. What are the ranks of the two model matrices? Explain what it means that the models are nested. \item Give the projection matrices $\Bb{P}_{0}$ and $\Bb{P}_{1}$ onto the two model spaces. \item Use the projection matrices to show that the vectors of fitted values for models $M_{0}$ and $M_{1}$ may be given, respectively, as \[ \hat{\B{\mu}}_{0} = \hat{\beta}_{1}^{*}\Bb{x} \] and \[ \hat{\B{\mu}}_{1} = \bar{Y}\B{1}_{n}+\hat{\beta}_{1}(\Bb{x}-\bar{x}\B{1}_{n}), \] with \[ \hat{\beta}_{1}^{*}=\frac{\sum_{i=1}^{n}x_{i}Y_{i}}{\sum_{i=1}^{n}x_{i}^{2}} \quad \mbox{and} \quad \hat{\beta}_{1} = \frac{\sum_{i=1}^{n}(x_{i}-\bar{x})Y_{i}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}. \] \item Show that \[ \Bb{Y}^{T}(\Bb{P}_{1}-\Bb{P}_{0})\Bb{Y}/\sigma^{2} \quad \mbox{and} \quad \Bb{Y}^{T}(\Bb{I}-\Bb{P}_{1})\Bb{Y}/\sigma^{2} \] are independent and determine their distributions (Hint: Use Cochran’s theorem). It is sufficient to determine the distributions under $M_{0}$. \item Show that the F -statistic for testing the null hypothesis that model $M_{0}$ holds versus the alternative hypothesis that model $M_{1}$ holds may be given as \[ F = \frac{\|\hat{\B{\mu}}_{1}-\hat{\B{\mu}}_{0}\|^{2}}{\|\Bb{Y}-\hat{\B{\mu}}_{1}\|^{2}/(n-2)} = \frac{\sum_{i=1}^{n}\left(\bar{Y}-\hat{\beta}_{1}\bar{x}+\left(\hat{\beta}_{1}-\hat{\beta}_{1}^{*}\right)x_{i}\right)^{2}}{\sum_{i=1}^{n}\left(Y_{i}-\bar{Y}-\hat{\beta}_{1}(x_{i}-\bar{x})\right)^{2}/(n-2)}, \] and determine the distribution of the $F$-statistic under model $M_{0}$.\\[2ex] \noindent We now return to the example on wingspan and height, considered in the beginning of the problem. \item Read the data in the table given in the beginning of the problem into R. Use the \texttt{lm} command to fit models $M_{0}$ and $M_{1}$, and use the \texttt{anova} command to perform the $F$ test. Discuss your results. \end{enumerate} \subsubsection*{Problem 2} In this problem, we will consider how the survival of a passenger on the Titanic depended on ticket class and the passenger's age. The file\\ {\scriptsize\url{/studier/emner/matnat/math/STK3100/data/titanic.txt}}\\ consists of three columns with the following information about a subset of $70$ passengers:\\ \begin{itemize} \item \texttt{survived}: survived the shipwreck (0 = no, 1 = yes) \item \texttt{age}: age of the passenger in years \item \texttt{pclass}: ticket class, grouped as either 3rd class or 1st/2nd class (0 = 1st/2nd class, 1 = 3rd class) \end{itemize} We will use logistic regression to analyse the data using R. You may read the data into R by the commands: \begin{verbatim} data="http://www.uio.no/studier/emner/matnat/math/STK3100/data/titanic.txt" titanic=read.table(data,header=T) \end{verbatim} \begin{enumerate}[a)] \item Fit a logistic regression model with \texttt{survived} as response and \texttt{pclass} as the only covariate. Is there a significant effect of \texttt{pclass}? \item Denote by $\beta_{1}$ the regression coefficient for \texttt{pclass} in the logistic regression model. Give an interpretation of $e^{\beta_{1}}$, and estimate it. \item Fit a logistic regression model where you also include \texttt{age} as a covariate. Denote by $\beta_{2}$ the regression coefficient for \texttt{age} in this model. Give an interpretation of $e^{\beta_{2}}$, and estimate it. \item Use the Wald test, the likelihood ratio test and the score test to test the hypothesis that $\beta_{2}=0$ in the model in question c). How well do the tests agree? What are the conclusions of the tests? \item Find $95\%$ confidence intervals for $e^{\beta_{1}}$ from b) and for $e^{\beta_{2}}$ from c). Give interpretations of these intervals. \end{enumerate} \end{document}