UNC-Chapel Hill BIOS 740 - CHAPTER 12 CONSISTENCY OF INDIRECT LEARNING METHODS

Unformatted text preview:

'&$%CHAPTER 12 CONSISTENCY OFINDIRECT LEARNING METHODS1'&$%Intuition of proof ideas• The decision rule is not explicit.• However, we know that best classifiers minimizes some lossfunction or regularized loss functions.• Thus,P (L(gn)−L(g∗) > ²) ≤ P (L(gn)−Ln(gn)−L(g∗)+Ln(g∗) > ²)≤ 2P (supg∈F|Ln(g) − L(g)| > ²/2).• We need control stochastic errors of such loss functions overthe model space,supg∈F|Ln(g) − L(g)|.• This uses concentration inequalities from empirical processesand relies on the model size of F.2'&$%Example 1: Maximum likelihood principle• Let F be the class of decision functions with values in [0, 1].• The estimated decision function, ηn(x) ∈ F, maximizesln(η) = n−1nXi=1[Yilog η(Xi) + (1 − Yi) log(1 − η(Xi))]so the decision rule is gn(x) = I(ηn(X) > 1/2).• Consistency result: suppose the true η ∈ F andN[](², F, L2(P )) ≤ (K/²)V. Then L(gn) → L(g∗)withprobability one.3'&$%Steps of the proof• DefineF²= {˜η : ˜η ∈ F, L(˜g) > L(g∗) + ²}where ˜g = I(˜η > 1/2).• NoteP (L(gn)−L(g∗) > ²) = P (ηn∈ F²) ≤ P ( sup˜η∈F²(ln(˜η)−ln(η)) > 0).• On the other hand,E[ln(η)] − E[ln(˜η)] = KL(η, ˜η)≥ E[η(X)(pη(X) −p˜η(X))2+(1 − η(X))(p1 − η(X) −p1 − ˜η(X))2]≥ E[(η(X) − ˜η(X))2]/4and ² ≤ L(˜g) − L(g∗) ≤ 2E[(η(X) − ˜η(X))2].4'&$%Completing the proof• Hence,P (L(gn) − L(g∗) > ²) ≤ P (ηn∈ F²)≤ P ( sup˜η∈F²{(ln(˜η) − ln(η)) − E[ln(˜η) − ln(η)]} > ²/8).• Need to apply the concentration inequality from empiricalprocesses to obtain an upper bound nae−n²2.5'&$%Example 2: Empirical risk minimization• gnminimizes the empirical Bayes riskLn(g) = n−1nXi=1I(Yi6= g(Xi))for g ∈ C.• Strong consistency: if C has a finite VC dimension v, thenL(gn) − infg∈CL(g) →a.s.0.• Remark: if g∗∈ C, then L(gn) → L(g∗) with probability one.6'&$%Proof• NoteL(gn) − infg∈CL(g) ≤ L(gn) − Ln(gn) + supg∈C(Ln(g) − L(g))≤ 2 supg∈C|Ln(g) − L(g)|.• Thus,P (L(gn) − infg∈CL(g) > ²)≤ P (supg∈C|Ln(g) − L(g)| > ²/2)≤ 8(en/v)ve−n²2/128.• The last step uses the concentration inequality for empiricalprocesses.7'&$%Concentration inequalities for VC-class• For a class F with finite VC-dimension v,P (supf∈F|Pnf − Pf| > ²) ≤ 8(en/v)ve−n²2/32.8'&$%Example 3: Empirical risk minimization with complexity regularization• Consider a sequence of modelsC(1)⊂ C(2)⊂ ...• Each C(k)has a finite VC-dimension vk.• Let g(k)nbe the best decision rule in model C(k)minimizingLn(g) for g ∈ C(k).• The best selected rule ˜g is one of g(1)n, g(2)n, ... which minimizesLn(g(k)n) + R(k, n), R(k, n) =p32vklog(en)/n.• Consistency result: suppose limkinfg∈C(k)L(g) = L(g∗) and∞Xk=1e−vk< ∞,then L(˜g) → L(g∗) with probability one.9'&$%Proof• NoteL(˜g) − L(g∗) = L(˜g) − infk≥1nLn(g(k)n) + R(k, n)o+(infk≥1nLn(g(k)n) + R(k, n)o− L(g∗)).10'&$%First term in RHS• For the first term,P (L(˜g) − infk≥1nLn(g(k)n) + R(k, n)o> ²)= P (L(˜g) − Ln(˜g) − R(˜k, n) > ²)≤Xk≥1P (L(g(k)n) − Ln(g(k)n) > ² + R(k, n))≤Xk≥1P ( supg∈C(k)|L(g) − Ln(g)| > ² + R(k, n))≤Xk≥18nvke−n(²+R(n,k))2/32≤ 8e−n²2/32Xke−vk.• The last step uses the concentration inequality for empiricalprocesses with VC-classes.11'&$%Second term in RHS• First, there exits some m such thatinfg∈C(m)L(g) < L(g∗) + ²/4and R(n, m) < ²/4.• Therefore,P (infk≥1(Ln(g(k)n) + R(k, n)) − L(g∗) > ²)≤P(Ln(g(m)n)−L(g(m)n)> ²/2)≤ 8nvme−n²2/128.• It follows because vm< ∞.12'&$%Remarks• If L(g∗) < limkinfg∈C(k)L(g), then the above proof impliesL(˜g) →a.s.limkinfg∈C(k)L(g)instead of L(g∗).13'&$%CHAPTER 13 CONVERGENCE RATES14'&$%Some results• Goal is to find some equality likeE[L(gn)] − L(g∗) < en−α.• Unfortunately, this is impossible: let andecrease to zero anda1< 1/16. For every sequence classifier {gn}, there exits somea distribution of (X, Y ) such that L(g∗) = 0 but E[L(gn)] ≥ an.• However, such rate exists if we impose restrictive assumption of(X, Y ) distributions (smoothness).• Many results on obtaining accurate convergence rates rely onusing concentration inequalities as seen in consistency proofs.• Open question: inference on E[L(gn)]?15'&$%CHAPTER 14 CLASSIFICATIONERROR ESTIMATION16'&$%Resubstitution estimate• Apparent error: n−1Pni=1I(ˆg(Xi) 6= Yi)• It is biased towards zero.• Using validation data to obtainm−1mXj=1I(ˆg(Xn+j) 6= Ynj).• Require large validation data.17'&$%Leave-one-out cross-validation estimate• Assume gnto be a symmetric classifier.• An upper bound result:En(ˆLCV− L(gn))2o≤ n−1+ 6P (gn(X) 6= gn−1(X)).• For k-NN rule,P (gn(X) 6= gn−1(X)) ≤ k/n.• For kernel rule with K(x) having support in the unit sphereand K(x) ≥ β for kxk ≤ ρ,P (gn(X) 6= gn−1(X)) ≤Cpβ√n(1 + ρ−1)p/2.18'&$%Other error estimates• Smooth error count:n−1nXi=1[Yi(1 − r(ηn(Xi))) + (1 − Yi)r(ηn(Xi))]where r is an increasing function satisfyingr(1/2 − u) + r(1/2 + u) = 1.• It tends to have a smaller variability than the resubstitutionestimate (r(u) = I(u ≥ 1/2)).• Posterior probability estimate:n−1nXi=1[I(ηn(Xi) ≤ 1/2)ηn(Xi) + I(ηn(Xi) > 1/2)(1 − ηn(Xi))]or its cross-validation version by replacing ηn(X − i) byηn,−i(Xi).19'&$%• K-fold cross-validation estimate• Bootstrap estimate: use a random sample to construct theclassifier than predict the errors in those not selected; repeatmany times then take average.20'&$%CHAPTER 15 CONCENTRATIONINEQUALITIES21'&$%Concentration Inequalities• What are concentration inequalities?• They are essentially inequalities for the tail probability of thedeviation of a random variable from its mean (or median)P (ξ(X1, ..., Xn) − E[ ξ(X1, ..., Xn)] ≥ x) ≤ exp{−x2/2ν}.• Large deviation or moderate deviation inequalities22'&$%Concentration Inequalities• The oldest oneP (f(Z) − E[f(Z)] ≥ x) ≤ exp{−x22L},where Z ∼ N(0, Id) and f is Lipschitz continuous withLipschitz constant L. Also, E[f(Z)] can be replaced by themedian.23'&$%Technique 1 in obtaining concentration inequalities• Cramer-Chernoff method (Chernoff inequality)P (Z ≥ x) ≤ exp{−ψ∗Z(x)},where ψ∗Z(x) = supλ{λx −log E[eλZ]} (Cramer transformation)24'&$%Resulting inequalities• Hoeffding’s inequality: X1, ..., Xnare


View Full Document

UNC-Chapel Hill BIOS 740 - CHAPTER 12 CONSISTENCY OF INDIRECT LEARNING METHODS

Download CHAPTER 12 CONSISTENCY OF INDIRECT LEARNING METHODS
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view CHAPTER 12 CONSISTENCY OF INDIRECT LEARNING METHODS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view CHAPTER 12 CONSISTENCY OF INDIRECT LEARNING METHODS 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?