UNC-Chapel Hill BIOS 740 - Statistical Learning - D160009

Home> Schools> University of North Carolina at Chapel Hill> Biology (BIOS) > BIOS 740> Statistical Learning

DOC PREVIEW

UNC-Chapel Hill BIOS 740 - Statistical Learning

School name University of North Carolina at Chapel Hill

Course Bios 740- SPEC METH HLTH STAT Lecture 1 - 21

Pages 118

This preview shows page 1-2-3-4-5-6-7-8-55-56-57-58-59-60-61-62-111-112-113-114-115-116-117-118 out of 118 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 118 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

'&$%CHAPTER 1 INTRODUCTION1CHAPTER 1 INTRODUCTION 2'&$%Statistical Learning• What is statistical learning?– machine learning, data mining– supervised vs unsupervisedCHAPTER 1 INTRODUCTION 3'&$%• How different from traditional inference?– different objectives– different statistical procedures– supervised learning < − − − > regression– unsupervised learning < −− > density estimationCHAPTER 1 INTRODUCTION 4'&$%High dimensional data• What does “high-dimension” mean?– relative to sample sizes– curse of dimensionality– possibly ultra-high: p = exp{O(n)}CHAPTER 1 INTRODUCTION 5'&$%• What can we do with “high-dimension” data?– two-stage procedure with dimension reduction– regularized procedureCHAPTER 1 INTRODUCTION 6'&$%Overview of this course– Learning methods– Learning theory– Methods for high-dimensional dataCHAPTER 1 INTRODUCTION 7'&$%Acknowledgments– data examples, figures from HTF book– learning theories extracted from DGL book– part for high-dimensional data relies on publishedreferences– errors are mineCHAPTER 1 INTRODUCTION 8'&$%CHAPTER 2 STATISTICAL DECISIONTHEORYCHAPTER 2 STATISTICAL DECISION THEORY 9'&$%Set-up in decision theory– X: feature variables– Y : outcome variable (continuous, categorical, ordinal)– (X, Y ) follows some distribution– goal: determine f : X → Y to minimize some lossE[L(Y, f(X))].CHAPTER 2 STATISTICAL DECISION THEORY 10'&$%Loss function L(y, x)– squared loss: L(y, x) = (y − x)2– absolute deviation loss: L(y, x) = |y − x|– Huber loss: L(y, x) = (y − x)2I(|y − x| <δ) + (2δ|y − x| − δ2)I(|y − x| ≥ δ)– zero-one loss: L(y, x) = I(y 6= x)– preference loss:L(y1, y2, x1, x2) = 1 − I(y1< y2, x1< x2)CHAPTER 2 STATISTICAL DECISION THEORY 11'&$%−2 −1 0 1 20 1 2 3 4xloss functionsCHAPTER 2 STATISTICAL DECISION THEORY 12'&$%Optimal f(x)– squared loss: f(X) = E[Y |X]– absolute deviation loss: f(X) = med(Y |X)– Huber loss: ???– zero-one loss: f(X) = argmaxkP (Y = k|X)– preference loss: ???– not all loss functions have explicit solutionsCHAPTER 2 STATISTICAL DECISION THEORY 13'&$%Estimate f(x)– Empirical data(Xi, Yi), i = 1, ..., n– Direct learning: estimate f directly via parametric,semi-parametric, or nonparametric methods– Indirect learning: estimate f by minimizing(empirical risk)nXi=1L(Yi, f(Xi))CHAPTER 2 STATISTICAL DECISION THEORY 14'&$%Candidate set for f(x)– too small: underfit data– too large: overfit data– even more important with high-dimensional XCHAPTER 2 STATISTICAL DECISION THEORY 15'&$%Why high-dimensionality is an issue?– data are sparse– local approximation is infeasible– increasing bias and variability with dimensionality– curse of dimensionalityCHAPTER 2 STATISTICAL DECISION THEORY 16'&$%Common considerations for f(x)– linear functions or local linear functions– linear combination of basis function: polynomials,splines, wavelets– let data choose f by penalizing f from roughnessCHAPTER 2 STATISTICAL DECISION THEORY 17'&$%CHAPTER 3 DIRECT LEARNING:PARAMETRIC APPROACHESCHAPTER 3 PARAMETRIC LEARNING 18'&$%Parametric learning– It is one of direct learning methods.– Estimate f(x) using parametric models.– Linear models are often used.CHAPTER 3 PARAMETRIC LEARNING 19'&$%Linear regression model– Target squared loss or zero-one loss.– Assume f(X) = E[Y |X] = XTβ.– The least squared estimationˆf(x) = xT(XTX)−1XTY.CHAPTER 3 PARAMETRIC LEARNING 20'&$%Shrinkage methods– Gain variability reduction by sacrificing predictionaccuracy.– Help to determine important features (variableselection) if any.– Include subset selection, ridge regression, LASSO andet.CHAPTER 3 PARAMETRIC LEARNING 21'&$%Subset selection– Search for the best subset of size k in terms of RSS.– Use leaps and bounds procedure.– Computationally intensive with large dimension.– The best choice of size k is based on Mallow’s CP.CHAPTER 3 PARAMETRIC LEARNING 22'&$%Ridge regression– MinimizenXi=1(Yi− XTiβ)2+ λpXj=1β2j.– Equivalently, minimizenXi=1(Yi− XTiβ)2, subject topXj=1β2j≤ s.– The solutionˆβ = (XTX + λI)−1XTY.– Has Bayesian interpretation.CHAPTER 3 PARAMETRIC LEARNING 23'&$%LASSO– MinimizenXi=1(Yi− XTiβ)2+ λpXj=1|βj|.– Equivalently, minimizenXi=1(Yi− XTiβ)2, subject topXj=1|βj| ≤ s.– This is a convex optimization.– Suppose X to have independent columns:ˆβj= sign(ˆβlse)(|ˆβlse| − λ/2)+.– Nonlinear shrinkage property.CHAPTER 3 PARAMETRIC LEARNING 24'&$%Summary– Subset selection is L0-penalty shrinkage butcomputationally intensive.– Ridge regression is L2-penalty shrinkage and shrinksall coefficients the same way.– LASSO is L1-penalty shrinkage and it is a nonlinearshrinkage.CHAPTER 3 PARAMETRIC LEARNING 25'&$%Other shrinkage methods– Lq-penalty with q ∈ [1, 2]:nXi=1(Yi− XTiβ)2+ λpXj=1|βj|q.– Weighted LASSO (aLASSO):nXi=1(Yi− XTiβ)2+ λpXj=1wj|βj|where wj= |ˆβlse|−q.– SCAD penaltyPpj=1Jλ(|βj|):J0λ(x) = λ(I(x ≤ λ) +(aλ − x)+(a − 1)λI(x > λ)).CHAPTER 3 PARAMETRIC LEARNING 26'&$%−10 −5 0 5 100 2 4 6 8 10(a) Hard thresholdBeta coeffectPenalized coeffHard−threshold−10 −5 0 5 100 2 4 6 8 10(b) Adaptive LASSOBeta coeffectPenalized coeffWeighted L_1 with alpha=3−10 −5 0 5 100 2 4 6 8 10(c) SCADBeta coeffectPenalized coeffSCADCHAPTER 3 PARAMETRIC LEARNING 27'&$%Compare different penalties– All penalties have shrinkage properties.– Some penalties give an oracle property as if the truezeros are known (aLASSO, SCAD).– But aLASSO needs a consistent initial estimate (notsuitable for high-dimensional).– SCAD generally needs large sample size and maysuffer computational difficulty (due to itsnon-convexity).CHAPTER 3 PARAMETRIC LEARNING 28'&$%Logistic discriminant analysis– It is often used when Y is dichotomous or categorical.– AssumeP (Y = k|X =exp{βk0+ XTβk}1 +PKl=1exp{βl0+ XTβl}.– Thenˆf(x) = argmaxknˆβk0+ XTˆβko.CHAPTER 3 PARAMETRIC LEARNING 29'&$%Discriminant analysis– Assume that X given Y = k follows a normaldistribution with mean µkand covariance Σk.– For K = 2, the decision rule (quadratic discriminantanalysis) is based on the sign oflogπ2π1−12(x−ˆµ2)TˆΣ−12(x−ˆµ2)+12(x−ˆµ1)TˆΣ−11(x−ˆµ1).– If assume Σ1= Σ2, this results in linear discriminantanaysis.CHAPTER 3 PARAMETRIC LEARNING 30'&$%Generalization– In parametric methods, features X can be replaced bysome basis

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6-7-8-55-56-57-58-59-60-61-62-111-112-113-114-115-116-117-118 out of 118 pages.

UNC-Chapel Hill BIOS 740 - Statistical Learning

Sign up for free to view:

Please select your school