What is SL?
What is SL?
Statistical Learning is a approach for Estimating
predictors
independent variables
features
response
dependent variable
fixed but unknown function of
Systematic information
random error term
independent of
Why is ?
Prediction
description of prediction's accuracy
reducible error
which could be reduced by the more appropriate statistical learning model
irreducible error
Because of
unmeasured variables which could influence Y
unmeasurable variables
Which can't be predicted by
This book focus on the Reducible Error
The Irreducible Error could be reduced (but not to zero) by Theories.
Understanding By Formulations
Reducible Error
The most appropriate model
The model we estimated
Interference
Identify the important predictors among a large set of variables.
To understand the relationship between dependent variable and predictors. (positive or negative relationship; main effect and interact effect)
To known the relationship between the predictors and dependent variable is linear or more complex non-linear.
The choice between Prediction and Interference
Prediction
- complex model could predict more accurately
- But complex model is hard to explain.
Interference
- Simple model is easy to explain.
- But simple model couldn't predict accurate as complex model.
How to Estimate
parametric methods
Steps
select model
- make an assumption about function form of
- e.g.
fitting model
Ordinary least squares
- Chapter 03
Others estimating methods
- Chapter 06
Advantages and Disadvantages
- estimating a set of parameters easily
- model used to fit is very important !!!
- Overfitting
non-parametric methods
Defination
- seek an estimate of f that get as close to the data points as possible
Advantages and Disadvantages
- more accurately
- need large number of observations to get accurate estimate
- Overfitting
The trade-off between Prediction Accuracy and Model Interpretability
- methds comparation according to accuracy and interpretability

Supervised v.s. Unsupervised Learning
Supervised Learning
Understand the relationship between the predictors and response.
Unsupervised Learning
The situation we lack a response variable that can supervise our analysis.
To understand the relationship between variables
Cluster Analysis
Semi-supervises Learning Problem
When have a set of observations but only have of both have predictors and response the remaining observations just have predictors but not response.
Regression v.s. Classification Problems
Regression
with quantitative response
Classsification
with qualitative response