STK4021 – Applied Bayesian analysis

Course page · Lecture notes

Lecture notes

1 The Bayesian pipeline

1.1 Introduction

Exercise 1

Exercise 2

Exercise 3

The predictive distribution

Exercise 4

Exercise 5

1.2 Bayesian decision theory

Exercise 6

Exercise 7

Exercise 8

Exercise 9

Exercise 10

2 Choosing the prior distribution

2.1 Conjugate priors

Exercise 11

Exercise 12

Exercise 13

2.1.1 The multivariate Gaussian distribution

Exercise 14

Exercise 15

Exercise 16

Conditional Gaussian distributions

Exercise 17

Marginal Gaussian distributions

Exercise 18

Bayes' theorem for Gaussian variables

Exercise 19

Exercise 20

Exercise 21

2.2 Empirical Bayes

Exercise 22

Exercise 23

2.3 The Jeffreys prior

Exercise 24

Exercise 25

Exercise 26

3 The Laplace approximation (Lazy Bayes)

Note

Exercise 27

Exercise 28

Exercise 29

4 Model selection and model averaging

Exercise 30

Exercise 31

4.1 The Bayesian information criterion (BIC)

4.2 Derivation of the BIC

5 Regression and classification

5.1 Linear models for regression

5.1.1 The frequentist solution: least squares and penalization

5.1.2 Bayesian linear regression

5.1.3 Model comparison

5.1.4 Empirical Bayes

5.2 Linear models for classification

5.2.1 Bayesian classification

6 Exchangeability and de Finetti's theorem

6.1 Exchangeability

6.1

6.2

Exercise 44

Exercise 45

Theorem 6.1

Exercise 46

Exercise 47

7 The Bernstein–von Mises theorem

Exercise 48

Theorem 7.1

Exercise 49

8 Markov chain Monte Carlo (MCMC)

8.1 General state space Markov chains

Definition 8.1

Definition 8.2

Exercise 50

Exercise 51

8.2 Key properties

Definition 8.3

Exercise 52

Definition 8.4

Exercise 53

Definition 8.5

Definition 8.6

Exercise 54

Theorem 8.1

Theorem 8.2

8.3 The Metropolis–Hastings algorithm

Exercise 55

Exercise 56

Exercise 57

Exercise 58

8.4 Gibbs sampling

Exercise 59

Exercise 60

Proposition 8.1

Exercise 61

Proposition 8.2

Exercise 62

Exercise 63

8.5 Convergence diagnostics

Exercise 64

Exercise 65

Exercise 66

Auxiliary

The prior \(π(θ)\) in a Bayesian model is the prior belief about the parameters \(θ\).

The likelihood in a Bayesian model is the

...

The marginal likelihood \(π(y)\) in a Bayesian model

...

Make the optimal decision based on the (sampled) data.

An action is a function of the data

A measure of the cost of taking an action given the data.

The average cost of using a decision function.

\[R(a, θ) = \int_\mathcal{Y} L(a(y)|θ)π(y|θ)dy\]

Select the hyperparameters that maximizes the marginal likelihood \(π(y)\).

Select the model \(m\) with highest probability given the data \(y\) \(π(m|y)\).

Ergodicity ensures that large sample properties hold for the chain.

In particular, an expectation of the chain over time is equal to the expectation over the stationary distribution.

In general, computing the normalization factor of a distribution is very difficult or expensive. We might only have an unnormalized distribution.

Metropolis–Hastings allows us to sample from a distribution even if all we have is the unnormalized distribution.

Construct a Markov chain thats stationary distribution is the target distribution.

It is hard to sample from the joint, but it is easy to sample from the conditionals.

If a function is Lebesgue integrable on a rectangle, then the integral is equivalent to an iterated integral, and the order of integration is optional.

A real symmetrix matrix can be diagonalized. It has a set of eigenvalues and eigenvectors \((λ_i, u_i)\), where the diagonalization is the diagonal matrix with eigenvalues on the diagonal.