STK4100 - Introduction to generalized linear models

Book

Book: Foundations of linear and generalized linear models; Agresti, Alan, Hoboken, N.J., Wiley, cop. 2015, Totalt antall sider XIII, 444 s.

1 Introduction to linear and generalized linear models

Pensum: All (except 1.4.2).

1.1 Components of a generalized linear model

1.2 Quantitative/qualitative explanatory variables and interpreting effects

1.3 Model matrices and model vector spaces

1.4 Identifiability and estimability

1.5 Example: Using software to fit a GLM

Exercises

2 Linear models: Least squares theory

Pensum: Section 2.1 (except 2.1.6), section 2.2 (except proofs in 2.2.1 and 2.2.2, and except 2.2.4), section 2.3 (except 2.3.4), and section 2.4 (only 2.4.1).

2.1 Least squares model fitting

2.2 Projections of data onto model spaces

2.3 Linear model examples: projections and SS decompositions

2.3.2

2.3.3

2.4 Summarizing variability in a linear model

2.4.2

3 Normal linear models: Statistical inference

Pensum: Section 3.1 (except page 84 in 3.1.4 and except 3.1.5), and section 3.2 (except 3.2.7, 3.2.8, and 3.2.9).

3.1 Distribution theory for normal variates

3.2 Significance tests for normal linear models

3.2.1

3.2.4

4 Generalized linear models: model fitting and inference

Pensum: Section 4.1, section 4.2 (except 4.2.6), section 4.3, section 4.4 (except 4.4.5), section 4.5 (except 4.5.3), section 4.6 (except 4.6.4 and 4.6.5), section 4.7.

4.1 Exponential dispersion family distributions for a GLM

4.1.1

4.1.2

4.1.3

4.2 Likelihood and asymptotic distributions for GLMs

4.2.1

4.2.2

4.2.3

4.2.4

4.3 Likelihood-ratio/Wald/score methods of inference for GLM parameters

4.3.1

4.3.2

4.3.3

4.3.5

4.4 Deviance of a GLM, model comparison, and model checking

4.4.1

4.4.2

4.4.6

4.5 Fitting generalized linear models

4.5.1

4.5.2

4.5.4

4.6 Selecting explanatory variables for a GLM

4.6.1

4.6.2

4.6.3

4.7 Example: Building a GLM

4.7.1

5 Models for binary data

Pensum: Section 5.1, section 5.2 (except 5.2.3), section 5.3 (except 5.3.3 and 5.3.4), section 5.5 (except 5.5.4), section 5.6 (except 5.6.2), section 5.7 (except 5.7.1).

6 Multinomial response models

Pensum: Section 6.1 (except 6.1.5 and 6.1.6), section 6.2 (except 6.2.3, 6.2.4, and 6.2.5), and section 6.3 (except 6.3.1).

7 Models for count data

Pensum: Section 7.1 (except 7.1.2 and 7.1.5), section 7.3 (except 7.3.5), section 7.4.1, and section 7.5.

7.1 Poisson GLMs for counts and rates

7.1.1

7.1.3

7.1.4

7.1.6

7.1.7

7.3 Negative binomial GLMs

7.3.1

7.3.2

7.3.3

7.4 Models for zero-inflated data

7.5 Example: modelling count data

8 Quasi-likelihood methods

Pensum: Section 8.1, section 8.2.4 (only to middle of page 276), and section 8.3.

9 Modelling correlated responses

Pensum: Section 9.1 (except 9.1.4 and 9.1.6), section 9.2, section 9.3, section 9.4 (except 9.4.3 and 9.4.4), section 9.5 (except 9.5.3, 9.5.4, and 9.5.5), section 9.6 (except 9.6.2 and 9.6.5), and section 9.7.

Notes

A data set of observations that come from \(c\) groups.

A GLM with identity link function and \(Y|X\) normal.

Given observations \(y\) and \(X\), and given a choice of model, we will fit a model. The model itself is known,

The one-way layout is

A model for comparing the means of \(c\) groups.

A subject in group \(i\) has linear predictor \(\eta_i = \beta_0 + \beta_i\).

Randomly assign test subjects to one of \(c\) groups.

The error variance is the

The total sum of squares

A normal variable but the variance is unknown.

The \(F\) quantile with \(q\), \(r\) degrees of freedom.

 > qf(0.95, q, r)

The random component of a GLM is a random variable with an exponential dispersion family.

The exponential dispersion family is the collection of distributions that have density or mass of the form: \[f(y_i, \theta_i, \phi ) = \Exp{\Cr{\frac{y \theta{} - b(\theta )}{a(\phi )} + c(y, \phi )}}\]

\(\theta\) in the exponential dispersion family is the natural parameter.

\(\phi\) in the exponential dispersion family is the dispersion parameter.

A natural exponential family is the exponential dispersion family where \(a(\phi ) = 1\) and \(c(y_i, \phi ) = c(y)\). \[f(y, \theta ) = \Exp{\Cr{y \theta{} - b(\theta ) + c(y)}}\]

The cumulant function is the function \(b\) in the natural exponential family.

It is called the cumulant function because its derivates determine the cumulants of the distribution.

A Poisson distribution is exponential dispersion distribution with natural parameter \(\theta{} = \Log\mu\), \(\E{y} = b'(\theta ) = \Exp\theta{} = \mu\) and \(\Var{y} = b''(\theta ) = \Exp\theta{} = \mu\).

The response function is the inverse of the link function.

The canonical link is the link function that maps the mean \(\mu\) to the natural parameter \(\theta\).

Normal: Identity link \(g(\mu ) = \mu{} = \theta\)
Logistic: Logit link \(g(\mu ) = \Log{\frac\mu{1 - \mu}} = \theta\)
Loglinear: Log link \(g(\mu ) = \Log\mu{} = \theta\)

As \(n \to{} \infty\), \(\hat\beta\) has approximately distribution \[\mathcal{N} \Cr{\beta , (X^\intercal WX)^{-1}}\] where \(W\) is the diagonal matrix with elements \(w_i = \frac{(\partial_{\eta_i}\mu_i)^2}{\Var{y_i}}\)

The likelihood-ratio test tests if the parameter \(\beta\) is significant. \[-2\Log\Lambda{} = -2\Log{\frac{\ell_0}{\ell_1}} = -2(L_0 - L_1)\]

The derivative of the log-likelihood function.

\[\cfrac{\Sq{\partial_\beta\ell (\beta_0)}^2}{-\Ex{\Sq{\partial^2_\beta\ell (\beta_0)}}}\]

\[-2\Sq{\ell (\beta_0) - \ell (\hat\beta )} < \chi^2_1 (\alpha )\]

\[\Op{MSE} = \Op{Var} + \Op{Bias}^2\]

\[\Op{AIC} = -2\Pa{\hat\ell{} + p} = -2\hat\ell{} -2p\] where \(p\) is the number of parameters of the model and \(\hat\ell\) is the maximized value of the log-likelihood \(\ell (\beta )\).

\[\Op{KL} \Sq{p, p_M(\hat\beta_M)} = \Ex{\Sq{\Log{\cfrac{p(y^*)}{p_M(y^*;\hat\beta_M)}}}}\]

\[\Op{BIC} = -\Log{n}\Pa{\hat\ell{} + p}\] where \(p\) is the number of parameters of the model, \(n\) is the number of observations and \(\hat\ell\) is the maximized value of the log-likelihood \(\ell (\beta )\).

attach(d)

makes the names in d available as variables.

anova(m1, m2)

performs the \(F\) test, comparing the nested models m1 and m2.

\[\cfrac{e^{-\mu} \mu^y}{y!}\]

A confidence interval is the range of null hypothesis parameters that are not rejected by the data.

A prediction error estimate is a measurement of how well a model fits the data. It can be used to compare different models.

Examples are AIC, BIC, \(R^2\).