\[\]

STK-IN4300 – Statistical Learning Methods in Data Science

Course page

Book

The Elements of Statistical Learning by Hastie, Tibshirani and Friedman

Chapter 1: All Chapter 2: Sections 2.1-2-6, Carmichael & Marron (2018) Chapter 3: Sections 3.2-3.6 (not 3.5.2), 3.8 Chapter 4: Section 4.4.4, Zou (2006) (Only the results described in slides for lecture 4) Chapter 5: sections 5.1, 5.2, 5.4, 5.5, 5.7 Chapter 6: sections 6.1-6.4, 6.6.1, 6.8 Hjort & Glad (1995) (Only what is shown on the slides) Chapter 7: 7.1-7.5, 7-7 Chapter 8: Section 8.7 Chapter 9: Sections 9.1,9.2 Chapter 10: 10.1-10.5, 10.9, 10.10, 10.11, 10.12.1 Bühlmann & Yu (2003) (Only supplementary material) Chen & Guestrin (2016) (Only supplementary material) Chapter 11: Schmidhuber (2015) (Only supplementary material) Chapter 15: Sections 15.1, 15.2 Chapter 16: Sections 16.1, 16.2 Chapter 18: Sections 18.1, 18.7

1 Introduction

Learning · Statistical learning
Supervised learning
Unsupervised learning
Prediction model · Learner
Classification
Regression

2 Overview of Supervised Learning

2.1 Introduction
Input · Predictor · Independent variable · Feature
Output · Response · Dependent variable
Qualitative variable · Categorical variable · Discrete variable · Factor
Quantitative variable
2.2 Variable Types and Terminology
Regression
Classification
Ordered categorial variable
Target
Dummy variable
Training data
2.3 Two Simple Approaches to Prediction
Linear model
Intercept · Bias
Least squares
Residual sum of squares · RSS
Normal equations
Linear classification model
Decision boundary
Mixture · Mixture model
\(k\)-nearest neighbour model · \(k\)-nearest neighbour fit
Voronoi tesselation
Effective number of parameters
Enhancement of linear and local models
2.4 Statistical Decision Theory
Decision theory
Loss function
Squared error loss
EPE · Expected prediction error
EPE minimized pointwise
Regression function
Squared error loss optimizer is conditional mean
kNN models conditional mean
kNN large sample properties
LS vs kNN
Additive model
\(L_1\) loss
\(L_1\) loss optimizer is conditional median
Categorical loss
Categorical loss matrix
Zero-one loss
2.5 Local Methods in High Dimensions
Supervised learning as learning by example
2.6 Statistical Models, Supervised Learning and Function Approximation
Supervised learning as function approximation
Parameter
Linear basis expansion
Nonlinear basis expansion
Sigmoid transformation
Least squares
Maximum likelihood estimation
Maximum likelihood principle
Log-likelihood · Cross-entropy
LS and ML equivalent for normal additive model

3 Linear Methods for Regression

3.2 Linear Regression Models and Least Squares
3.3 Subset Selection
3.4 Shrinkage Methods
3.5 Methods Using Derived Input Directions
3.6 Discussion: A Comparison of the Selection and Shrinkage Methods
3.8 More on the Lasso and Related Path Algorithms

4 Linear Methods for Classification

4.4.4 \(L_1\) Regularized Logistic Regression

5 Basis Expansions and Regularization

5.1 Introduction
5.2 Piecewise Polynomials and Splines
5.4 Smoothing Splines
5.5 Automatic Selection of the Smoothing Parameters
5.7 Multidimensional Splines

6 Kernel Smoothing Methods

6.1 One-Dimensional Kernel Smoothers
6.2 Selecting the Width of the Kernel
6.3 Local Regression in \(\mathbb{R}^p\)
6.4 Structured Local Regression Models in \(\mathbb{R}^p\)
6.6.1 Kernel Density Estimation
6.8 Mixture Models for Density Estimation and Classification

7 Model Assessment and Selection

7.1 Introduction
7.2 Bias, Variance and Model Complexity
7.3 The Bias–Variance Decomposition
7.4 Optimism of the Training Error Rate
7.5 Estimates of In-Sample Prediction Error
7.7 The Bayesian Approach and BIC

8 Model Inference and Averaging

8.7 Bagging

9 Additive Models, Trees, and Related Methods

9.1 Generalized Additive Models
9.2 Tree-Based Models

10 Boosting and Additive Trees

10.1 Introduction
10.2 Introduction
10.3 Introduction
10.4 Introduction
10.5 Introduction
10.9 Introduction
10.10 Introduction
10.11 Introduction
10.12.1 Introduction

11 Neural Networks

15 Random Forests

15.1 Introduction
15.2 Definition of Random Forests

16 Ensemble Learning

16.1 Introduction
16.2 Boosting and Regularization Paths

18 High-Dimensional Problems: \(p \gg{} N\)

18.1 When \(p\) is Much Bigger than \(N\)
18.7 Feature Assessment and the Multiple-Testing Problem
Incomplete
Complete
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)
2024-Jul-31 (46 hours ago)