\[\newcommand\R[0]{\mathbb{R}} \newcommand\Z[0]{\mathbb{Z}} \newcommand\Q[0]{\mathbb{Q}} \newcommand\Sq[1]{\left[#1\right]} \newcommand\Cr[1]{\left\{#1\right\}} \newcommand\Pa[1]{\left(#1\right)} \newcommand\Br[1]{\{#1\}} \newcommand\Vb[1]{\lvert #1\rvert} \newcommand\Op[1]{\operatorname{#1}} \newcommand\Pr[1]{\operatorname{\mathbb{P}} #1} \newcommand\Ex[1]{\operatorname{\mathbb{E}} #1} \newcommand\E[1]{\operatorname{\mathbb{E}} #1} \newcommand\Var[1]{\operatorname{Var} #1} \newcommand\Lim[1]{\underset{#1}{\operatorname{Lim}}} \newcommand\LimSup[0]{\operatorname{LimSup}} \newcommand\LimInf[0]{\operatorname{LimSup}} \newcommand\Log[1]{\operatorname{Log} #1} \newcommand\Exp[1]{\operatorname{Exp} #1} \newcommand\hash[0]{\#} \newcommand\PowsP[0]{\operatorname{\mathcal{P}}} \newcommand\MeasurableSpace[2]{\mathcal{M}_{#1, #2}} \newcommand\MeasureSpace[3]{\mathcal{M}_{#1, #2, #3}}\]

STK4100 - Introduction to generalized linear models

Book

Book: Foundations of linear and generalized linear models; Agresti, Alan, Hoboken, N.J., Wiley, cop. 2015, Totalt antall sider XIII, 444 s.

1 Introduction to linear and generalized linear models

Pensum: All (except 1.4.2).

1.1 Components of a generalized linear model
Generalized linear model
Random component
Linear predictor
Link function
Exponential family
Model matrix · Design matrix
Natural parameter
Canonical link
Identity link function
Linear model
Homoscedasticity
Ordinary linear model
Bernoulli trial
Logit function
Logistic model · Logit model
Poisson distribution
Poisson loglinear model
1.2 Quantitative/qualitative explanatory variables and interpreting effects
Quantitative variable
1.3 Model matrices and model vector spaces
Vector space
Column space
Rank
Basis
Dimension
Full rank
Null space
Vector space
Aliasing
Extrinsic aliasing
Perfect collinearity
Intrinsic aliasing
One-way layout
One-way ANOVA test
1.4 Identifiability and estimability
Identifiable
1.5 Example: Using software to fit a GLM
Exercises

2 Linear models: Least squares theory

Pensum: Section 2.1 (except 2.1.6), section 2.2 (except proofs in 2.2.1 and 2.2.2, and except 2.2.4), section 2.3 (except 2.3.4), and section 2.4 (only 2.4.1).

Normal linear model
Link function
2.1 Least squares model fitting
Model fitting
Least squares
Normal equations
2.2 Projections of data onto model spaces
Projection matrix
2.3 Linear model examples: projections and SS decompositions
Null model
Linear model for the one-way layout
2.3.2
Completely randomized experimental design
2.3.2
Orthogonal decomposition
2.3.3
Sum of squares decomposition
2.3.3
Between-groups sum of squares
2.3.3
Within-groups sum of squares
2.3.3
ANOVA table
2.3.3
Corrected total sum of squares
2.3.3
2.4 Summarizing variability in a linear model
Error variance
Error mean square · Residual mean square
2.4.2
TSS · Total sum of squares
2.4.2

3 Normal linear models: Statistical inference

Pensum: Section 3.1 (except page 84 in 3.1.4 and except 3.1.5), and section 3.2 (except 3.2.7, 3.2.8, and 3.2.9).

Normal linear model
3.1 Distribution theory for normal variates
Multivariate normal distribution
\(\chi^2\) distribution
\(t\) distribution
\(F\) distribution
Noncentral \(\chi^2\) distribution
Noncentral \(t\) distribution
Noncentral \(F\) distribution
Cochran's theorem
3.2 Significance tests for normal linear models
Likelihood ratio test
ANOVA for the one-way layout
3.2.1
One-way ANOVA test
ANOVA table
Likelihood ratio test
Test for any effect
3.2.4
qf · \(F\) quantile

4 Generalized linear models: model fitting and inference

Pensum: Section 4.1, section 4.2 (except 4.2.6), section 4.3, section 4.4 (except 4.4.5), section 4.5 (except 4.5.3), section 4.6 (except 4.6.4 and 4.6.5), section 4.7.

4.1 Exponential dispersion family distributions for a GLM
Random component
4.1.1
Exponential dispersion family · Exponential dispersion model
4.1.1
Natural parameter
4.1.1
Dispersion parameter
4.1.1
Natural exponential family
4.1.1
Cumulant function
4.1.1
Poisson dispersion family derivation
4.1.2
Poisson dispersion family
4.1.2
Binomial dispersion family derivation
4.1.2
Binomial dispersion family
4.1.2
Normal dispersion family derivation
4.1.2
Normal dispersion family
4.1.2
Link function
4.1.3
Response function
4.1.3
Canonical link
4.1.3
Canonical link for common GLMs
4.1.3
4.2 Likelihood and asymptotic distributions for GLMs
GLM likelihood equations derivation
4.2.1
GLM likelihood equations
4.2.1
GLM likelihood matrix equation
4.2.1
Poisson loglinear likelihood equations
4.2.2
Mean-variance GLM characterization
4.2.3
Asymptotic ML distribution derivation
4.2.4
Asymptotic ML distribution
4.2.4
4.3 Likelihood-ratio/Wald/score methods of inference for GLM parameters
Likelihood-ratio test statistic
4.3.1
Likelihood-ratio test
4.3.1
Wald statistic
4.3.2
Wald test
4.3.2
Score function
4.3.3
Score statistic
4.3.3
Score test · Lagrange multiplier test
4.3.3
Wald confidence interval
4.3.5
Score confidence interval
4.3.5
Likelihood-ratio confidence interval
4.3.5
4.4 Deviance of a GLM, model comparison, and model checking
Saturated model
Scaled deviance
4.4.1
Deviance
4.4.1
Deviance
4.4.2
Pearson residual
4.4.6
Sum of squared Pearson residuals
4.4.6
Deviance residual
4.4.6
Sum of squared deviance residuals
4.4.6
Standardized residual
4.4.6
4.5 Fitting generalized linear models
Newton-Raphson method · Newton's method
4.5.1
Hessian matrix
4.5.1
Fisher scoring method
4.5.2
Expected information
4.5.2
Observed information
4.5.2
Weighted least squares
4.5.4
4.6 Selecting explanatory variables for a GLM
Best subset selection
4.6.1
Forward selection
4.6.1
Backward elimination
4.6.1
Bias-variance decomposition
4.6.2
AIC · Akaike information criterion
4.6.3
Kullback–Leibler divergence
4.6.3
BIC · Bayesian information criterion
4.6.3
4.7 Example: Building a GLM
attach
anova · \(F\) test
4.7.1
update
4.7.1
step
4.7.1
AIC
4.7.1
BIC
4.7.1

5 Models for binary data

Pensum: Section 5.1, section 5.2 (except 5.2.3), section 5.3 (except 5.3.3 and 5.3.4), section 5.5 (except 5.5.4), section 5.6 (except 5.6.2), section 5.7 (except 5.7.1).

6 Multinomial response models

Pensum: Section 6.1 (except 6.1.5 and 6.1.6), section 6.2 (except 6.2.3, 6.2.4, and 6.2.5), and section 6.3 (except 6.3.1).

7 Models for count data

Pensum: Section 7.1 (except 7.1.2 and 7.1.5), section 7.3 (except 7.3.5), section 7.4.1, and section 7.5.

7.1 Poisson GLMs for counts and rates
Poisson distribution
7.1.1
Poisson GLM
7.1.3
Poisson loglinear model
7.1.3
Poisson loglinear log-likelihood
7.1.3
Poisson loglinear mean-data relation
7.1.3
Poisson parameter interpretation
7.1.3
Poisson GLM deviance
7.1.4
Poisson loglinear mean-data relation
7.1.4
Poisson
7.1.4
Loglinear rate model
7.1.6
Poisson loglinear rate model
7.1.6
Poisson linear rate model
7.1.6
Example
7.1.7
7.3 Negative binomial GLMs
Overdispersion
7.3.1
Mixture model
7.3.2
Negative binomial distributions
7.3.2
Negative binomial GLM
7.3.3
7.4 Models for zero-inflated data
7.5 Example: modelling count data

8 Quasi-likelihood methods

Pensum: Section 8.1, section 8.2.4 (only to middle of page 276), and section 8.3.

9 Modelling correlated responses

Pensum: Section 9.1 (except 9.1.4 and 9.1.6), section 9.2, section 9.3, section 9.4 (except 9.4.3 and 9.4.4), section 9.5 (except 9.5.3, 9.5.4, and 9.5.5), section 9.6 (except 9.6.2 and 9.6.5), and section 9.7.

Notes

Confidence interval
Prediction error estimate
Test statistic
A data set of observations that come from \(c\) groups.
A GLM with identity link function and \(Y|X\) normal.
Given observations \(y\) and \(X\), and given a choice of model, we will fit a model. The model itself is known,
The one-way layout is
A model for comparing the means of \(c\) groups.
A subject in group \(i\) has linear predictor \(\eta_i = \beta_0 + \beta_i\).
Randomly assign test subjects to one of \(c\) groups.
The error variance is the
The total sum of squares
T
A normal variable but the variance is unknown.
The \(F\) quantile with \(q\), \(r\) degrees of freedom.
 > qf(0.95, q, r)
The random component of a GLM is a random variable with an exponential dispersion family.
The exponential dispersion family is the collection of distributions that have density or mass of the form: \[f(y_i, \theta_i, \phi ) = \Exp{\Cr{\frac{y \theta{} - b(\theta )}{a(\phi )} + c(y, \phi )}}\]
\(\theta\) in the exponential dispersion family is the natural parameter.
\(\phi\) in the exponential dispersion family is the dispersion parameter.
A natural exponential family is the exponential dispersion family where \(a(\phi ) = 1\) and \(c(y_i, \phi ) = c(y)\). \[f(y, \theta ) = \Exp{\Cr{y \theta{} - b(\theta ) + c(y)}}\]
The cumulant function is the function \(b\) in the natural exponential family.
It is called the cumulant function because its derivates determine the cumulants of the distribution.
A Poisson distribution is exponential dispersion distribution with natural parameter \(\theta{} = \Log\mu\), \(\E{y} = b'(\theta ) = \Exp\theta{} = \mu\) and \(\Var{y} = b''(\theta ) = \Exp\theta{} = \mu\).
The response function is the inverse of the link function.
The canonical link is the link function that maps the mean \(\mu\) to the natural parameter \(\theta\).
  • Normal: Identity link \(g(\mu ) = \mu{} = \theta\)
  • Logistic: Logit link \(g(\mu ) = \Log{\frac\mu{1 - \mu}} = \theta\)
  • Loglinear: Log link \(g(\mu ) = \Log\mu{} = \theta\)
As \(n \to{} \infty\), \(\hat\beta\) has approximately distribution \[\mathcal{N} \Cr{\beta , (X^\intercal WX)^{-1}}\] where \(W\) is the diagonal matrix with elements \(w_i = \frac{(\partial_{\eta_i}\mu_i)^2}{\Var{y_i}}\)
f
The likelihood-ratio test tests if the parameter \(\beta\) is significant. \[-2\Log\Lambda{} = -2\Log{\frac{\ell_0}{\ell_1}} = -2(L_0 - L_1)\]
f
f
The derivative of the log-likelihood function.
\[\cfrac{\Sq{\partial_\beta\ell (\beta_0)}^2}{-\Ex{\Sq{\partial^2_\beta\ell (\beta_0)}}}\]
h
\[-2\Sq{\ell (\beta_0) - \ell (\hat\beta )} < \chi^2_1 (\alpha )\]
\[\Op{MSE} = \Op{Var} + \Op{Bias}^2\]
\[\Op{AIC} = -2\Pa{\hat\ell{} + p} = -2\hat\ell{} -2p\] where \(p\) is the number of parameters of the model and \(\hat\ell\) is the maximized value of the log-likelihood \(\ell (\beta )\).
\[\Op{KL} \Sq{p, p_M(\hat\beta_M)} = \Ex{\Sq{\Log{\cfrac{p(y^*)}{p_M(y^*;\hat\beta_M)}}}}\]
\[\Op{BIC} = -\Log{n}\Pa{\hat\ell{} + p}\] where \(p\) is the number of parameters of the model, \(n\) is the number of observations and \(\hat\ell\) is the maximized value of the log-likelihood \(\ell (\beta )\).
attach(d)
makes the names in d available as variables.
anova(m1, m2)
performs the \(F\) test, comparing the nested models m1 and m2.
\[\cfrac{e^{-\mu} \mu^y}{y!}\]
A confidence interval is the range of null hypothesis parameters that are not rejected by the data.
A prediction error estimate is a measurement of how well a model fits the data. It can be used to compare different models.
Examples are AIC, BIC, \(R^2\).
d
Incomplete
Complete
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)