Pensum: Section 6.1 (except 6.1.5 and 6.1.6), section 6.2 (except 6.2.3, 6.2.4, and 6.2.5), and section 6.3 (except 6.3.1).
7Models for count data
Pensum: Section 7.1 (except 7.1.2 and 7.1.5), section 7.3 (except 7.3.5), section 7.4.1, and section 7.5.
7.1Poisson GLMs for counts and rates
Poisson distribution
7.1.1
Poisson GLM
7.1.3
Poisson loglinear model
7.1.3
Poisson loglinear log-likelihood
7.1.3
Poisson loglinear mean-data relation
7.1.3
Poisson parameter interpretation
7.1.3
Poisson GLM deviance
7.1.4
Poisson loglinear mean-data relation
7.1.4
Poisson
7.1.4
Loglinear rate model
7.1.6
Poisson loglinear rate model
7.1.6
Poisson linear rate model
7.1.6
Example
7.1.7
7.3Negative binomial GLMs
Overdispersion
7.3.1
Mixture model
7.3.2
Negative binomial distributions
7.3.2
Negative binomial GLM
7.3.3
7.4Models for zero-inflated data
7.5Example: modelling count data
8Quasi-likelihood methods
Pensum: Section 8.1, section 8.2.4 (only to middle of page 276), and section 8.3.
9Modelling correlated responses
Pensum: Section 9.1 (except 9.1.4 and 9.1.6), section 9.2, section 9.3, section 9.4 (except 9.4.3 and 9.4.4), section 9.5 (except 9.5.3, 9.5.4, and 9.5.5), section 9.6 (except 9.6.2 and 9.6.5), and section 9.7.
Notes
Confidence interval
Prediction error estimate
Test statistic
Generalized linear model
Random component
Linear predictor
Link function
Exponential family
Model matrix · Design matrix
Natural parameter
Canonical link
Identity link function
Linear model
Homoscedasticity
Ordinary linear model
Bernoulli trial
Logit function
Logistic model · Logit model
Poisson distribution
Poisson loglinear model
Quantitative variable
Vector space
Column space
Rank
Basis
Dimension
Full rank
Null space
Vector space
Aliasing
Extrinsic aliasing
Perfect collinearity
Intrinsic aliasing
One-way layout
A data set of observations that come from \(c\) groups.
One-way ANOVA test
Identifiable
Normal linear model
A GLM with identity link function and \(Y|X\) normal.
Link function
Model fitting
Given observations \(y\) and \(X\), and given a choice of model, we will fit a model. The model itself is known,
Least squares
Normal equations
Projection matrix
Null model
Linear model for the one-way layout
The one-way layout is
A model for comparing the means of \(c\) groups.
A subject in group \(i\) has linear predictor \(\eta_i = \beta_0 + \beta_i\).
Completely randomized experimental design
Randomly assign test subjects to one of \(c\) groups.
Orthogonal decomposition
Sum of squares decomposition
Between-groups sum of squares
Within-groups sum of squares
ANOVA table
Corrected total sum of squares
Error variance
The error variance is the
Error mean square · Residual mean square
TSS · Total sum of squares
The total sum of squares
Normal linear model
Multivariate normal distribution
\(\chi^2\) distribution
T
\(t\) distribution
A normal variable but the variance is unknown.
\(F\) distribution
Noncentral \(\chi^2\) distribution
Noncentral \(t\) distribution
Noncentral \(F\) distribution
Cochran's theorem
Likelihood ratio test
ANOVA for the one-way layout
One-way ANOVA test
ANOVA table
Likelihood ratio test
Test for any effect
qf · \(F\) quantile
The \(F\) quantile with \(q\), \(r\) degrees of freedom.
> qf(0.95, q, r)
Random component
The random component of a GLM is a random variable with an exponential dispersion family.
Exponential dispersion family · Exponential dispersion model
The exponential dispersion family is the collection of distributions that have density or mass of the form: \[f(y_i, \theta_i, \phi ) = \Exp{\Cr{\frac{y \theta{} - b(\theta )}{a(\phi )} + c(y, \phi )}}\]
Natural parameter
\(\theta\) in the exponential dispersion family is the natural parameter.
Dispersion parameter
\(\phi\) in the exponential dispersion family is the dispersion parameter.
Natural exponential family
A natural exponential family is the exponential dispersion family where \(a(\phi ) = 1\) and \(c(y_i, \phi ) = c(y)\). \[f(y, \theta ) = \Exp{\Cr{y \theta{} - b(\theta ) + c(y)}}\]
Cumulant function
The cumulant function is the function \(b\) in the natural exponential family.
It is called the cumulant function because its derivates determine the cumulants of the distribution.
Poisson dispersion family derivation
Poisson dispersion family
A Poisson distribution is exponential dispersion distribution with natural parameter \(\theta{} = \Log\mu\), \(\E{y} = b'(\theta ) = \Exp\theta{} = \mu\) and \(\Var{y} = b''(\theta ) = \Exp\theta{} = \mu\).
Binomial dispersion family derivation
Binomial dispersion family
Normal dispersion family derivation
Normal dispersion family
Link function
Response function
The response function is the inverse of the link function.
Canonical link
The canonical link is the link function that maps the mean \(\mu\) to the natural parameter \(\theta\).
Canonical link for common GLMs
Normal: Identity link \(g(\mu ) = \mu{} = \theta\)
Loglinear: Log link \(g(\mu ) = \Log\mu{} = \theta\)
GLM likelihood equations derivation
GLM likelihood equations
GLM likelihood matrix equation
Poisson loglinear likelihood equations
Mean-variance GLM characterization
Asymptotic ML distribution derivation
Asymptotic ML distribution
As \(n \to{} \infty\), \(\hat\beta\) has approximately distribution \[\mathcal{N} \Cr{\beta , (X^\intercal WX)^{-1}}\] where \(W\) is the diagonal matrix with elements \(w_i = \frac{(\partial_{\eta_i}\mu_i)^2}{\Var{y_i}}\)
Likelihood-ratio test statistic
f
Likelihood-ratio test
The likelihood-ratio test tests if the parameter \(\beta\) is significant. \[-2\Log\Lambda{} = -2\Log{\frac{\ell_0}{\ell_1}} = -2(L_0 - L_1)\]
\[\Op{AIC} = -2\Pa{\hat\ell{} + p} = -2\hat\ell{} -2p\] where \(p\) is the number of parameters of the model and \(\hat\ell\) is the maximized value of the log-likelihood \(\ell (\beta )\).
\[\Op{BIC} = -\Log{n}\Pa{\hat\ell{} + p}\] where \(p\) is the number of parameters of the model, \(n\) is the number of observations and \(\hat\ell\) is the maximized value of the log-likelihood \(\ell (\beta )\).
attach
attach(d)
makes the names in d available as variables.
anova · \(F\) test
anova(m1, m2)
performs the \(F\) test, comparing the nested models m1 and m2.
update
step
AIC
BIC
Poisson distribution
\[\cfrac{e^{-\mu} \mu^y}{y!}\]
Poisson GLM
Poisson loglinear model
Poisson loglinear log-likelihood
Poisson loglinear mean-data relation
Poisson parameter interpretation
Poisson GLM deviance
Poisson loglinear mean-data relation
Poisson
Loglinear rate model
Poisson loglinear rate model
Poisson linear rate model
Example
Overdispersion
Mixture model
Negative binomial distributions
Negative binomial GLM
Confidence interval
A confidence interval is the range of null hypothesis parameters that are not rejected by the data.
Prediction error estimate
A prediction error estimate is a measurement of how well a model fits the data. It can be used to compare different models.