Improving Generalization with Active Learning

Home> Academic Documents> Improving Generalization with Active Learning

DOC PREVIEW

This preview shows page 1-2-20-21 out of 21 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning, 15, 201-221 (1994)© 1994 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.Improving Generalization with Active Learning*DAVID COHN ([email protected])Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139LES ATLASDeptartment of Electrical Engineering, University of Washington, Seattle, WA 98195RICHARD LADNERDeptartment of Computer Science and Engineering, University of Washington, Seattle, WA 98195Editor: Alex WaibelAbstract. Active learning differs from "learning from examples" in that the learning algorithm assumes at leastsome control over what part of the input domain it receives information about. In some situations, active learningis provably more powerful than learning from examples alone, giving better generalization for a fixed numberof training examples.In this article, we consider the problem of learning a binary concept in the absence of noise. We describe aformalism for active concept learning called selective sampling and show how it may be approximately implementedby a neural network. In selective sampling, a learner receives distribution information from the environmentand queries an oracle on parts of the domain it considers "useful." We test our implementation, called an SG-network, on three domains and observe significant improvement in generalization.Keywords, queries, active learning, generalization, version space, neural networks.1. Introduction: Random sampling vs. active learningMost neural network generalization problems are studied only with respect to random sam-pling: the training examples are chosen at random, and the network is simply a passivelearner. This approach is generally referred to as "learning from examples." Baum andHaussler (1989) examine the problem analytically for neural networks; Conn and Tesauro(1992) provide an empirical study of neural network generalization when learning fromexamples. There have also been a number of empirical efforts, such as those of Le Cunet al. (1990), aimed at improving neural network generalization when learning fromexamples.Learning from examples is not, however, a universally applicable paradigm. Many naturallearning systems are not simply passive, but instead make use of at least some form ofactive learning to examine the problem domain. By active learning, we mean any formof learning in which the learning program has some control over the inputs on which it*A preliminary version of this article appears as Cohn et al. (1990).202D. COHN, L. ATLAS AND R. LADNERtrains. In natural systems (such as humans), this phenomenon is exhibited at both highlevels (e.g., active examination of objects) and low, subconscious levels (e.g., Fernald andKuhl's (1987) work on infant reactions to "Motherese" speech).Within the broad definition of active learning, we will restrict our attention to the simpleand intuitive form of concept learning via membership queries. In a membership query,the learner queries a point in the input domain and an oracle returns the classification ofthat point. Much work in formal learning theory has been directed to the study of queries(see, e.g., Angluin 1986; Valiant, 1984), but only very recently have queries been exam-ined with respect to their role in improving generalization behavior.In many formal problems, active learning is provably more powerful than passive learn-ing from randomly given examples. A simple example is that of locating a boundary onthe unit line interval. In order to achieve an expected position error of less than e, onewould need to draw O f iln f-j-T) random training examples. If one is allowed to makemembership queries sequentially, then binary search is possible and, assuming a uniformdistribution, a position error of e may be reached with 0 flnf-i-1 j queries.One can imagine any number of algorithms for employing membership queries to doactive learning. We have been studying the problem of learning binary concepts in an error-free environment. For such problems, a learner may proceed by examining the informa-tion already given and determining a region of uncertainty, an area in the domain whereit believes misclassification is still possible. The learner then asks for examples exclusivelyfrom that region. This article discusses a formalization of this simple approach, whichwe call selective sampling.In section 2, we describe the concept-learning problem in detail and give a formal defini-tion of selective sampling, describing the conditions necessary for the approach to be useful.In section 3 we describe the SG-network, a neural network implementation of this tech-nique inspired by version-space search (Mitchell, 1982). Section 4 contains the results oftesting this implementation on several different problem domains, and section 5 discussessome of the limitations of the selective sampling approach. Sections 6 and 7 contain referenceto related work in the field and a concluding discussion.2. Concept learning and selective samplingGiven an arbitrary domain X, we define a concept c to be some subset of points in thedomain. For example, X might be a two-dimensional space, and c might be the set of allpoints lying inside a fixed rectangle in the plane. We classify a point x € Xby its member-ship in concept c: we write c(x) = 1 if je € c, and c(x) = 0 otherwise. A popular useof artificial neural networks is as concept classifiers: x is presented as the input to an ap-propriately trained network, which then activates a designated output node above somethreshold if and only if x € c, that is, if x is an instance of concept c. Formally, a conceptclass C is a set of concepts, usually described by some description language. In the aboveexample, our class C may be the set of all two-dimensional, axis-parallel rectangles (seefigure 1). In the case of neural networks, the concept class is usually the set of all conceptsthat the network may be trained to classify.IMPROVING GENERALIZATION WITH ACTIVE LEARNING203Figure 1. A concept class defined as the set of all axis-parallel rectangles in two dimensions. Several positiveand negative examples are depicted, as are several consistent concepts in the class.2.1. GeneralizationFor target concept t, a training example is a pair (x, t(x)) consisting of a point x (usuallydrawn from some distribution (P), and the point's classification t(x). If x € t, then t(x)= 1, and we say that (x, t(x)) is a positive example. Otherwise, t(x) = 0, and (x,


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-20-21 out of 21 pages.

Please select your school