STK-IN4300 – Statistical Learning Methods in Data Science

Book

The Elements of Statistical Learning by Hastie, Tibshirani and Friedman

Chapter 1: All Chapter 2: Sections 2.1-2-6, Carmichael & Marron (2018) Chapter 3: Sections 3.2-3.6 (not 3.5.2), 3.8 Chapter 4: Section 4.4.4, Zou (2006) (Only the results described in slides for lecture 4) Chapter 5: sections 5.1, 5.2, 5.4, 5.5, 5.7 Chapter 6: sections 6.1-6.4, 6.6.1, 6.8 Hjort & Glad (1995) (Only what is shown on the slides) Chapter 7: 7.1-7.5, 7-7 Chapter 8: Section 8.7 Chapter 9: Sections 9.1,9.2 Chapter 10: 10.1-10.5, 10.9, 10.10, 10.11, 10.12.1 Bühlmann & Yu (2003) (Only supplementary material) Chen & Guestrin (2016) (Only supplementary material) Chapter 11: Schmidhuber (2015) (Only supplementary material) Chapter 15: Sections 15.1, 15.2 Chapter 16: Sections 16.1, 16.2 Chapter 18: Sections 18.1, 18.7

1 Introduction

2 Overview of Supervised Learning

2.1 Introduction

2.2 Variable Types and Terminology

2.3 Two Simple Approaches to Prediction

2.4 Statistical Decision Theory

2.5 Local Methods in High Dimensions

2.6 Statistical Models, Supervised Learning and Function Approximation

3 Linear Methods for Regression

3.2 Linear Regression Models and Least Squares

3.3 Subset Selection

3.4 Shrinkage Methods

3.5 Methods Using Derived Input Directions

3.6 Discussion: A Comparison of the Selection and Shrinkage Methods

3.8 More on the Lasso and Related Path Algorithms

4 Linear Methods for Classification

4.4.4 \(L_1\) Regularized Logistic Regression

5 Basis Expansions and Regularization

5.1 Introduction

5.2 Piecewise Polynomials and Splines

5.4 Smoothing Splines

5.5 Automatic Selection of the Smoothing Parameters

5.7 Multidimensional Splines

6 Kernel Smoothing Methods

6.1 One-Dimensional Kernel Smoothers

6.2 Selecting the Width of the Kernel

6.3 Local Regression in \(\mathbb{R}^p\)

6.4 Structured Local Regression Models in \(\mathbb{R}^p\)

6.6.1 Kernel Density Estimation

6.8 Mixture Models for Density Estimation and Classification

7 Model Assessment and Selection

7.1 Introduction

7.2 Bias, Variance and Model Complexity

7.3 The Bias–Variance Decomposition

7.4 Optimism of the Training Error Rate

7.5 Estimates of In-Sample Prediction Error

7.7 The Bayesian Approach and BIC

8 Model Inference and Averaging

8.7 Bagging

9 Additive Models, Trees, and Related Methods

9.1 Generalized Additive Models

9.2 Tree-Based Models

10 Boosting and Additive Trees

10.1 Introduction

10.2 Introduction

10.3 Introduction

10.4 Introduction

10.5 Introduction

10.9 Introduction

10.10 Introduction

10.11 Introduction

10.12.1 Introduction

11 Neural Networks

15 Random Forests

15.1 Introduction

15.2 Definition of Random Forests

16 Ensemble Learning

16.1 Introduction

16.2 Boosting and Regularization Paths

18 High-Dimensional Problems: \(p \gg{} N\)

18.1 When \(p\) is Much Bigger than \(N\)

18.7 Feature Assessment and the Multiple-Testing Problem