Theoretical statistics

The lecture notes are the primary material. The book is supplementary material and not required.

Lecture notes

The notes are located link.

1 Introduction

Interpretation of \(\P (A)\)

2 Preliminary material

A gathering of concepts that should be familiar.

2.1 Set theory and operations

Exa
2.1

Def
2.1.1

Exa
2.2

Exa
2.3

Prp
2.1.1

2.2 Sequences of sets

Def
2.2.1

Exa
2.4

Exr
2.1

Exr
2.2

Exr
2.3

Exr
2.4

2.3 Convergence of functions

Def
2.3.1

Exa
2.5

Exa
2.6

Def
2.3.2

Exa
2.7

Exa
2.8

Thm
2.3.1

Cor
2.3.2

Def
2.3.3

Exa
2.9

Cor
2.3.3

3 A primer on measure and integration

Measure theory: measures, spaces and integration.

3.1 Measures and measure spaces

Def
3.1.1

Def
3.1.2

Exa
3.1

Exa
3.2

Exa
3.3

Exa
3.4

Def
3.1.3

Exa
3.5

Exa
3.6

Def
3.1.4

Def
3.1.5

Prp
3.1.1

Prp
3.1.2

Thm
3.1.3

Def
3.1.6

Def
3.1.7

Exa
3.7

Exa
3.8

Exa
3.9

Exa
3.10

Exa
3.11

Exa
3.12

Exa
3.13

Exa
3.14

Def
3.1.8

Exa
3.15

Def
3.1.9

Def
3.1.10

Exa
3.16

Def
3.1.11

Exa
3.17

Exa
3.18

Def
3.1.12

Exa
3.19

3.2 Measurable functions

Def
3.2.1

Exa
3.20

Exa
3.21

Thm
3.2.1

Exa
3.22

Def
3.2.2

Cor
3.2.2

Def
3.2.3

Prp
3.2.3

Prp
3.2.4

Def
3.2.4

Def
3.2.5

Def
3.2.6

3.3 Integration with respect to a measure

Def
3.3.1

Def
3.3.2

Thm
3.3.1

Def
3.3.3

Def
3.3.4

Thm
3.3.2

Prp
3.3.3

Thm
3.3.4

Thm
3.3.5

Cor
3.3.6

3.4 Lebesgue's convergence theorems

When \(f_n \to{} f\), when does \(\int{} f_n d\mu{} \to{} \int{} f d\mu\)?

Prp
3.4.1

Thm
3.4.2

Thm
3.4.3

Thm
3.4.4

Cor
3.4.5

Exa
3.23

Exa
3.24

Thm
3.4.6

Exa
3.25

Exa
3.26

Exa
3.27

3.5 Riemann integral

3.6 Lebesgue integral

Thm
3.6.1

3.7 Riemann-Stieltjes integral

Exa
3.28

Exa
3.29

3.8 Lebesgue-Stieltjes integral

3.9 Absolute continuity and the Radon-Nikodym theorem

Def
3.9.1

Exa
3.30

Thm
3.9.1

3.10 Exercises

Exr
3.1

Exr
3.2

Exr
3.3

Exr
3.4

Exr
3.5

Exr
3.6

Exr
3.7

Exr
3.8

Exr
3.9

Exr
3.10

Exr
3.11

Exr
3.12

Exr
3.13

Exr
3.14

Exr
3.15

Exr
3.16

Exr
3.17

Exr
3.18

Exr
3.19

Exr
3.20

Exr
3.21

Exr
3.22

Exr
3.23

Exr
3.24

Exr
3.25

Exr
3.26

Exr
3.27

Exr
3.28

4 An axiomatic approach to probability

4.1 Sample space and events

Def
4.1.1

Def
4.1.2

Def
4.1.3

Exa
4.1

Exa
4.2

Exa
4.3

Exr
4.1

Exr
4.2

4.2 σ-algebras of events

Exa
4.4

Exa
4.5

Exa
4.6

Exa
4.7

Exa
4.8

Exa
4.9

Def
4.2.2

Thm
4.2.1

Exr
4.3

Exr
4.4

Exr
4.5

Exr
4.6

Exr
4.7

Exr
4.8

Exr
4.9

Exr
4.10

4.3 Probability measure

Def
4.3.1

Thm
4.3.1

Cor
4.3.2

Thm
4.3.3

Exr
4.11

4.4 Conditional probability and independence

Def
4.4.1

Thm
4.4.1

Exa
4.10

Def
4.4.2

Thm
4.4.2

Thm
4.4.3

Def
4.4.3

Thm
4.4.4

Thm
4.4.5

4.5 The Borel-Cantelli lemmas

Thm
4.5.1

Thm
4.5.2

Exr
4.12

Exr
4.13

Exr
4.14

Exr
4.15

Exr
4.16

Exr
4.17

Exr
4.18

Exr
4.19

Exr
4.20

Exr
4.21

Exr
4.22

Exr
4.23

Exr
4.24

Exr
4.25

Exr
4.26

Exr
4.27

5 Random variables

5.1 Random variables

Def
5.1.1

Exa
5.1

Exa
5.2

Def
5.1.2

Def
5.1.3

Exr
5.1

5.2 Law and distribution of a random variable

Exa
5.3

Def
5.2.1

Def
5.2.2

Def
5.2.1

Exr
5.2

5.3 Types of random variables

Def
5.3.1

Def
5.3.2

Exa
5.4

5.4 Transformation of random variables

Thm
5.4.1

Thm
5.4.2

Cor
5.4.3

Exr
5.3

Exr
5.4

Exr
5.5

Exr
5.6

Exr
5.7

Exr
5.8

5.5 Expectation

Def
5.5.1

Exa
5.5

Exa
5.6

Exa
5.7

Thm
5.5.1

Exr
5.9

Exr
5.10

Exr
5.11

Exr
5.12

Exr
5.13

Exr
5.14

Exr
5.15

Exr
5.16

Exr
5.17

5.6 Independence of random variables

Def
5.6.1

Def
5.6.2

Def
5.6.3

Thm
5.6.1

Cor
5.6.2

Exr
5.18

Exr
5.19

Exr
5.20

Exr
5.21

Assignment

Laplace's definition.

Probability is the ratio of favourable outcomes to possible outcomes.

\[\P{} A = \frac{N_A}{N}\]

Probability is the frequency with which the outcome occurs when the experiment is repeated.

\[\P{} A = \Lim{n \to{} \infty} f_n A\]

Also known as Bayesian.

\[\P{} \Sqm{A}{F}\]

No experiment is the same; the generating conditions change.

An experiment is predetermined based on the environment, and has probability 0 or 1.

\[\Vb{A}\]

\[\hash A\]

Finite countable, infinite countable, infinite uncountable

Can be constructed by the Peano axioms.

The ratios of the integers.

Can be constructed as the limits of sequences of rationals.

\[B \setminus{} A = A^\complement{} \setminus{} B\]

Union but intersection removed, \(A \triangle{} B\)

\(\mathcal{P} A\), \(2^A\). Cardinality \(2^{\hash A}\) when \(A\) is finite.

\(\R{} \setminus{} \Q\)

Defined as the union or intersection "limit" of a set sequence. Can similarly be defined for Cartesian products.

\(A_n \subset{} A_{n+1}\)

\(A_n \supset{} A_{n+1}\)

\(A_n \uparrow{} A\) means \(\cup A_n = A\) for increasing \(A_n\).

\(A_n \downarrow{} A\) means \(\cap A_n = A\) for decreasing \(A_n\).

These limits can be started from any finite index.

\(\underset{n}{\operatorname{LimSup}} A_n = \bigcap\bigcup\)

Uniform convergence on compact domain is \(L^1\).

Intuitively a measure assigns a number to things like line segments, areas and volumes.

Weak to functions with unlimited oscillation.

The idea behind the Lebesgue integral is that it is invariant under a change on a set with measure 0.

Probability theory can be seen as a special case of measure theory: Probability function is a measure. Expectation (given a probability function) is an integral wrt. a measure.

σ-algebra of \(\mathcal{E}\) over \(E\)

Let \(\mathcal{E} \subseteq{} \PowsP{} E\). Then \(\mathcal{E}\) is a σ-algebra if it satisfies:

1) \(\mathcal{E}\) is nonempty.
2) \(\mathcal{E}\) contains the total set \(E\).
3) \(\mathcal{E}\) is closed under countable union.
4) \(\mathcal{E}\) is closed under complement.

2) replaceable by \(\emptyset\) thanks to 4) through complement.

3) replaceable by countable intersection thanks to complement and de Morgan's law.

Formalizes the idea that pieces of measurable sets each contribe a set "volume".

An element of a sigma algebra.

\(\Br{\emptyset , E}\); The smallest σ-algebra over \(E\).

\(\PowsP{E}\); The largest σ-algebra over \(E\).

\(\sigma (F) = \Br{\emptyset , F, F^c, E}\)

The bigger (or finer) a σ-algebra is, the more sets it is possible to measure.

The smallest sigma algebra containing the subsets. Always exists.

The intersection of σ-algebras over \(E\) is a σ-algebra.

\(\mathcal{B} \mathbb{R}\), Generated by all \([a..b]\).

Collection of all intersections of the interval with all sigma algebra elements.

\(E\) with σ-algebra \(\mathcal{E}\) is a measurable space.

Measure of empty is zero and σ-additivity

Measure of union of partition pieces equals sum of measures of partition peaces.

If not disjoint, the equation is less than or equal.

\(\mathcal{M}_{\Omega , A, \P}\), the measure space of events over outcomes with a probability measure.

A negligible set is a subset of a measurable set with measure zero.

\(f\) is \(\mathcal{E} \to{} \mathcal{F}\)-measurable if \(\forall B \in{} \mathcal{F} f^{-1}B \in{} \mathcal{E}\).

"Every measurable set in \(\mathcal{F}\) was measurable in \(\mathcal{E}\)."

Let \(\mathcal{G}\) generate \(\mathcal{F}\).

Then \(f\) is measurable iff \(\underset{G \in{} \mathcal{G}}\forall{} f^{-1}G \in{} \mathcal{E}\)

"We just need to check if the generating set was measurable."

\(f : \MeasurableSpace{E}{\mathcal{E}} \to{} \R\) is Borel measurable if \(f : \MeasurableSpace{E}{\mathcal{E}} \to{} \MeasurableSpace\R{\mathcal{B} \R}\) is measurable.

\(f : E \to{} \R\) is Borel measurable iff the preimage of every \((\infty ..a]\) is measurable.

Let \(f : E \to{} F\) and \(\mathcal{F}\) a σ-algebra.

\(\sigma (f)\) is the σ-algebra generated by \(f\) on \(E\) which makes \(f\) measurable.

The σ-algebra generated by the preimages of the measurable sets.

Linear combination: \(af + bg\)

Multiplication: \(fg\)

Division: \(\frac{f}{g}\) (\(g \ne{} 0\))

TODO

Let \(E, F, G\) be measurable spaces, and \(f, g\) measurable where \(\operatorname{Img} f \subseteq{} \operatorname{Dmn} g\).

Then \(g \circ{} f\) is measurable.

A property \(P(x)\) holds almost everywhere if it is only false on a negligible set.

Almost everywhere in the context of a probability space.

A property \(P(x)\) holds almost surely if it is only false within an impossible event.

Let \(E = \MeasureSpace{E}{\mathcal{E}}\mu\), \(F = \MeasurableSpace{F}{\mathcal{F}}\) and \(f : E \to{} F\) a measurable map.

\(\mu^* = \mu{} \circ{} f^{-1}\) is the pushforward measure via \(f\) (Sometimes denoted \(f\hash\mu\)).

Intuitively: each number is multiplied by the "size" of its partition. Then they are summed together.

A non-negative measurable function can be written as the pointwise limit of an increasing sequence of simple functions.

Let \(E = \MeasureSpace{E}{\mathcal{E}}\mu\) and \(f : E \to{} \R\) a non-negative measurable function.

\(\exists\) a sequence of simple functions \(f_\underline{n}\)

A simple function defined on \(n\) contours of \(f\).

\[f_n(x) = \begin{cases} \frac{1-1}{2^n}&: f(x) \in{} [ \frac{1-1}{2^n} .. \frac{1}{2^n} )\\\frac{2-1}{2^n}&: f(x) \in{} [ \frac{2-1}{2^n} .. \frac{2}{2^n} )\\\vdots\\\frac{n2^n-1}{2^n}&: f(x) \in{} [ \frac{n2^n-1}{2^n} .. \frac{n2^n}{2^n} )\\n&: f(x) \geq{} n\\ \end{cases}\]

Let

The contours of a function \(f\) determines a \(n\)-partition of \(E\).
The contour values and the partitions determine a simple function that approximates \(f\).
As \(n \to{} \infty\), the simple function converges pointwise to \(f\).
Simultaneously, the integral of the simple function converges to a value, which we define to be the integral of \(f\).

\(µ\)-integrable function on \(A\)

A measurable function \(f\) is \(µ\)-integrable if \(\int_E f^+ dµ < \infty\) and \(\int_E f^- dµ < \infty\).

Integral of measurable function with respect to \(\mu\)

Let \(f : E \to{} \R\) be a measurable function. \(f = f^+ - f^-\).

\[\int_E f d\mu{} = \int_E f^+ d\mu{} - \int_E f^- d\mu\]

Let \(f, g\) be \(\mu\)-integrable functions.

\[\Vb{\int_E d\mu} \leq{} \int_E \Vb{f} d\mu\]
d

Let \(f_n\) be an increasing non-negative sequence of measurable functions and \(f = \LimSup{} f_n\).

\[\int_E f_n d\mu{} \to{} \int_E f d\mu\]

As \(n \to{} \infty\), a sequence converges, diverges to \(\pm\infty\) or oscillates.

We can look at \(\LimInf\) and \(\LimSup\) to determine this.

Fatou's lemmas are relevant to a function sequence \(f_n\) that oscillates as \(n \to{} \infty\).

It is possible that

Note that if \(f_n\) converges, this cannot happen because \(\LimSup{} = \Lim{} = \LimInf\) and we get the monotone convergence theorem instead.

Lebesgue-Stieltjes integral of \(f\) on \(A\) with respect to Lebesgue-Stieltjes measure \(\mu_g\)

\(\nu\) is absolutely continuous with respect to \(\mu\), \(\nu{} \ll{} \mu\)

\[\mu{} \ll{} \nu{} \iff{} \Pa{\nu (A) = 0 \impliedby{} \mu (A) = 0}\]

"\(\nu\) measures at least the zeroes of \(\mu\)."

\(\mu\) and \(\nu\) are equivalent if \(\mu{} \ll{} \nu\) and \(\nu{} \ll{} \mu\).

Ran

The Radon-Nikodym derivative is the function \(f\) in the Radon-Nikodym theorem.

Notation: \(f = \frac{d\nu}{d\mu}\).

σ-additivity is equivalent to the probability of increasing and decreasing set sequences converging to the probability.

...

In a probability space, a sequence of measurable sets corresponds to a sequence of events.

\(\LimSup{} A_n\) can be seen as the collection of outcomes that occur in infinitely many of the \(A_n\). It is also the event that infinitely many of the \(A_n\) occur.

\(\LimSup{} A_n\) can be seen as the collection of outcomes that occur in every \(A_n\) except a finite subset. It can also be seen as the event that infinitely many of the events in \(A_n\) occur.

Tells us about the probability of the event \(\LimSup{} A_n\), given the probabilities of \(A_n\).

The first Borel-Cantelli lemma says that

The Borel-Cantelli lemmas relates the probability of \(F_i\) to the probability of this \(\LimSup{} F_i\).

...

Let \(A_\underline{n}\) be a sequence of events and \(\sum_n^{1:}\infty\\ \P{} A_n\) is finite.

\[\P{} \LimSup{} A_n = 0\]

A random variable is a map \(Ω \to{} \R\) from a probability space to \(\R\) where \({ω \in{} Ω : X(ω) \le{} x}\) is an event. "The collection of outcomes assigned to \(x\) or below is an event."

Alternatively, \(X^{-1}(-\infty ..x]\). Recall that this is a generator of all the Borel sets.

A random variable is in reality a function. However, it is very common to omit any mentions of \(Ω\) and \(ω\) and treat them as implicit. Doing so, \(X = X(ω)\) looks like a variable.

The fact that \(X^{-1}(-\infty ..x]\) is an event means that \(X\) is a measurable map. This gives us a pushforward measure of \(\P\) on \(\R\).

A random variable but with a range \(\R{} \cup{} \{-\infty , \infty \}\).

A random vector of dimension \(n\) is a function \(X : Ω \to{} \R^n\) with \(X(ω) = [X_1(ω), \dots , X_n(ω)]\).

A stochastic process is a collection \(X_\underline{n}\) of random variables.

The law of a random variable \(X\) is the pushforward measure of \(\P\) that acts on Borel sets.

\(μ_X = \P{} \circ{} X^{-1}\).

The distribution of a random variable is the function \(\R{} \to{} [0..1]\) of a real number \(x\) that evaluates the law measure for the Borel set \((-\infty ..x]\).

The distribution of a random variable is the function \(F_X\) defined by \(F_X (x) = μ_X (-\infty ..x] = \P [ X \le{} x ]\)

The law and the CDF of a random variable are equivalent. Given one, the other can be constructed.

The distribution function \(F_X\) of \(X\) satisfies:

\(\Lim{x \to{} -\infty} F_X (x) = 0\) and \(\Lim{x \to{} \infty} F_X (x) = 1\)
\(F_X\) is an increasing function (not necessarily strictly).
\(F_X\) is right-continuous. (because of the \(\le\))

A function satisfying the CDF properties can serve as a CDF. This will then induce a law and a probability measure on \(Ω\).

\(F_X\) is a step function, or equivalently, the image of \(F_X\) is countable.

\(X\) is absolutely continuous if \(F_X\) can be written as \(F_X (x) = \int_{-\infty}^x f_X (y) dy\) for a non-negative integrable function \(f_X\).

The density function of \(X\) is the function \(f_X\) if it exists in the definition of absolutely continuous random variable above.

The integral of the density on \(\R\) must be \(1\). This follows from the fact that \(\Lim{x \to{} \infty} F_X (x) = 1\).

By Lebesgue's differentiation theorem, \(F'_X (x) = f_X (x)\) and \(f_X\) is unique almost everywhere.

Let \(Ω = \MeasureSpace{Ω}{\mathcal{A}}\P\) be a probability space and \(X\) a random variable.

The expectation of \(X\) is \(\mathbb{E} [X] = \int_Ω X d\P\).

\[\E{} [g(X)] = \int_\R{} g dF_X\]

The integral is a Riemann-Stieltjes or Lebesgue-Stieltjes integral with respect to the increasing function \(F_X\), which induces a Lebesgue-Stieltjes measure.

When \(F_X\) has a non-zero derivative almost everywhere, then \(\E{} [g(X)] = \int_\R{} g(x) f_X (x) dx\)