\[\]

STK4021 – Applied Bayesian analysis

Course page · Lecture notes

Lecture notes

1 The Bayesian pipeline

1.1 Introduction
Bayesian statistics
Bayesian model
Prior
Likelihood
Posterior
Marginal likelihood
Exercise
Exercise 1
Posterior mean
Posterior variance
Exercise
Exercise 2
Exercise
Exercise 3
Normalization trick · Functional form trick
Functional form
Proportional Bayes theorem
The predictive distribution
Predictive distribution
Functional form
Exercise
Exercise 4
Exercise
Exercise 5
1.2 Bayesian decision theory
Decision theory
Action
Decision function
Loss function · Cost function
Frequentist risk · Risk
Bayes risk
Bayes estimator
Posterior expected loss
Bayes risk minimization theorem
Exercise
Exercise 6
Exercise
Exercise 7
Exercise
Exercise 8
Exercise
Exercise 9
Exercise
Exercise 10

2 Choosing the prior distribution

Prior selection
2.1 Conjugate priors
Conjugate prior
Exercise
Exercise 11
Exercise
Exercise 12
Exercise
Exercise 13
2.1.1 The multivariate Gaussian distribution
Multivariate Gaussian distribution
Symmetric matrix
Positive definite matrix
Exercise
Exercise 14
Exercise
Exercise 15
Exercise
Exercise 16
Conditional Gaussian distributions
Covariance matrix
Precision matrix
Exercise
Exercise 17
Woodbury matrix identity
Exercise 17
Marginal Gaussian distributions
Exercise
Exercise 18
Bayes' theorem for Gaussian variables
Linear Gaussian model
Exercise
Exercise 19
Exercise
Exercise 20
Exercise
Exercise 21
Sherman-Morrison formula
Exercise 21
2.2 Empirical Bayes
Hyperparameter
Hyperparameter selection
Empirical Bayes
Empirical Bayes
Exercise
Exercise 22
Exercise
Exercise 23
2.3 The Jeffreys prior
Fisher information
Jeffreys prior
Improper prior
Exercise
Exercise 24
Exercise
Exercise 25
Exercise
Exercise 26
Score
Exercise 26

3 The Laplace approximation (Lazy Bayes)

Laplace approximation derivation
Laplace approximation
MAP · Maximum a posteriori
Observed Fisher information
Note
Exercise
Exercise 27
Exercise
Exercise 28
Exercise
Exercise 29

4 Model selection and model averaging

Model space
Model prior
Model-level likelihood
Bayesian model space
Conditioning on model gives model
Model posterior
Model selection
Bayes factor derivation
Uniform model selection · Bayes factor selection
Bayes factor
Exercise
Exercise 30
Exercise
Exercise 31
4.1 The Bayesian information criterion (BIC)
BIC · Bayesian information criterion
4.2 Derivation of the BIC
Derivation of the BIC

5 Regression and classification

5.1 Linear models for regression
5.1.1 The frequentist solution: least squares and penalization
5.1.2 Bayesian linear regression
5.1.3 Model comparison
5.1.4 Empirical Bayes
5.2 Linear models for classification
5.2.1 Bayesian classification

6 Exchangeability and de Finetti's theorem

6.1 Exchangeability
Permutation
Exchangeable variables
6.1
Exchangeable sequence
6.2
Exercise
Exercise 44
iid sequence is exchangeable
Exercise 44
Exercise - Pólya's urn
Exercise 45
De Finetti's theorem
Theorem 6.1
De Finetti's theorem - Integral form
Exercise
Exercise 46
Exercise
Exercise 47

7 The Bernstein–von Mises theorem

Asymptotic normality of MLE
Total variation distance
Exercise
Exercise 48
Total variation integral expression
Exercise 48
Bernstein–von Mises theorem
Theorem 7.1
Exercise
Exercise 49

8 Markov chain Monte Carlo (MCMC)

MCMC methods
8.1 General state space Markov chains
Markov chain
Definition 8.1
Homogeneous chain
Definition 8.2
Kernel
Definition 8.2
Kernel density
Exercise
Exercise 50
Example 2
Exercise
Exercise 51
\(m\)-step kernel
Exercise 51
\(m\)-step kernel density
Exercise 51
8.2 Key properties
\(μ\)-irreducible
Definition 8.3
Exercise
Exercise 52
Periodic chain
Definition 8.4
Aperiodic chain
Definition 8.4
Exercise
Exercise 53
Stationary distribution · Invariant distribution
Definition 8.5
Detailed balance
Definition 8.6
Exercise
Exercise 54
Ergodicity
Irreducible & stationary chain is ergodic
Theorem 8.1
Aperiodic ergodic chain converges to stationary
Theorem 8.2
8.3 The Metropolis–Hastings algorithm
Metropolis–Hastings idea
Target density
Proposal distribution
Acceptance probability
Metropolis–Hastings algorithm
Exercise
Exercise 55
Metropolis–Hastings kernel derivation
Metropolis–Hastings kernel
Exercise
Exercise 56
Metropolis–Hastings target is stationary proof
Metropolis–Hastings target is stationary
Metropolis–Hastings Bayesian posterior target derivation
Metropolis–Hastings Bayesian posterior target
Exercise
Exercise 57
Exercise
Exercise 58
8.4 Gibbs sampling
Gibbs sampling idea
Gibbs target density
Exercise
Exercise 59
Gibbs sampling
Gibbs kernel
Exercise
Exercise 60
Gibbs joint distribution is stationary
Proposition 8.1
Proof
Partial Gibbs kernel
Exercise
Exercise 61
Joint satisfies detailed balance for partial kernel
Composed partial kernels
Proposition 8.2
Chain composition preserve stationary
Proposition 8.2
Proof
Partial Gibbs kernel as Metropolis–Hastings kernel
Proof
Exercise
Exercise 62
Exercise
Exercise 63
8.5 Convergence diagnostics
Empirical estimate
Burn-in
Trace plot
Exercise
Exercise 64
Autocorrelation-lag plot
\(r\)-lag autocorrelation
\(r\)-lag autocovariance
\(r\)-lag autocovariance estimator
\(r\)-lag autocorrelation estimator
Exercise
Exercise 65
ESS · Effective sample size
Effective sample size estimator
Exercise
Exercise 66
MCMC quality assurance checklist

Auxiliary

Fubini's theorem
Spectral theorem
The prior \(π(θ)\) in a Bayesian model is the prior belief about the parameters \(θ\).
The likelihood in a Bayesian model is the
...
The marginal likelihood \(π(y)\) in a Bayesian model
...
Make the optimal decision based on the (sampled) data.
An action is a function of the data
A measure of the cost of taking an action given the data.
The average cost of using a decision function.
\[R(a, θ) = \int_\mathcal{Y} L(a(y)|θ)π(y|θ)dy\]
Select the hyperparameters that maximizes the marginal likelihood \(π(y)\).
Select the model \(m\) with highest probability given the data \(y\) \(π(m|y)\).
Ergodicity ensures that large sample properties hold for the chain.
In particular, an expectation of the chain over time is equal to the expectation over the stationary distribution.
In general, computing the normalization factor of a distribution is very difficult or expensive. We might only have an unnormalized distribution.
Metropolis–Hastings allows us to sample from a distribution even if all we have is the unnormalized distribution.
Construct a Markov chain thats stationary distribution is the target distribution.
It is hard to sample from the joint, but it is easy to sample from the conditionals.
1
If a function is Lebesgue integrable on a rectangle, then the integral is equivalent to an iterated integral, and the order of integration is optional.
A real symmetrix matrix can be diagonalized. It has a set of eigenvalues and eigenvectors \((λ_i, u_i)\), where the diagonalization is the diagonal matrix with eigenvalues on the diagonal.
Incomplete
Complete
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)