DOC PREVIEW
Berkeley COMPSCI 294 - ben-david schuller-2003

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Exploiting Task Relatedness for Multiple Task Learning Shai Ben David Department of Computer Science Technion Haifa 32000 Israel and Cornell University Ithaca NY 14853 shai cs cornell edu Reba Schuller Department of Mathematics Cornell University Ithaca NY 14853 ras51 cornell edu Abstract The approach of learning of multiple related tasks simultaneously has proven quite successful in practice however theoretical justification for this success has remained elusive The starting point of previous work on multiple task learning has been that the tasks to be learnt jointly are somehow algorithmically related in the sense that the results of applying a specific learning algorithm to these tasks are assumed to be similar We take a logical step backwards and offer a data generating mechanism through which our notion of task relatedness is defined We provide a formal framework for task relatedness that captures a certain sub domain of the wide scope of issues in which one may apply a multiple task learning approach Our notion of similarity between tasks is relevant to a variety of real life multi task learning scenarios and allows the formal derivation of strong generalization bounds bounds that are strictly stronger than the previously known bounds for both the learning to learn and the multi tasklearning scenarios We provide general conditions under which our bounds guarantee smaller sample size per task than the known bounds for the single task learning approach 1 Introduction Most approaches to machine learning focus on the learning of a single isolated task While great success has been achieved in this type of framework it is clear that it neglects certain fundamental aspects of human learning Human beings face each new learning task equipped with knowledge gained from previous learning tasks There is no question that mankind would be seriously hindered if we simply threw away the knowledge gained from one learning task before commencing another rather than using each learning task to become a better learner Furthermore human learning frequently involves approaching several learning tasks simultaneously in particular humans take advantage of the opportunity to compare and contrast similar categories in learning to classify entities into those categories For example most of us probably learned the alphabet by learning several similar letters at the same time It is natural to attempt to apply these observations to machine learning what kind of advantage is there in setting a learner to work on several tasks sequentially or simultaneously Intuitively there should certainly be some advantage especially if the tasks are closely related in some way And indeed much experimental work 1 5 6 has validated this intuition However thus far there has been relatively little progress on any sort of theoretical justification for these results Relatedness of tasks is key to the multi task learning MTL approach Obviously one cannot expect that information gathered through the learning of a set of tasks will be relevant to the learning of another task that has nothing in common with the already learned set of tasks Previous work on MTL or Learning to Learn treated the notion of relatedness using a functional approach Consider for example Baxter s Learning To Learn work e g 2 which is to our knowledge the most systematic theoretical analysis of the simultaneous learning approach In Baxter s work the similarity between jointly learned tasks is manifested solely through a model selection criterion namely the advantages of learning tasks together relies on the assumption that the tasks share a common optimal hypothesis class or inductive bias We take a step backwards We introduce a data generating framework through which a notion of task relatedness is defined Not surprisingly by limiting the discussion to problems that can be modelled by our data generating mechanism we leave many potential MTL scenarios outside the scope of our discussion However there are several interesting problems that can be treated within our framework For these problems we can reap the benefits of having a mathematical notion of relatedness and prove sample size upper bounds for MTL learning that are far better than any previous proven bounds The rest of the paper is organized as follows Section 2 formally introduces multiple task learning and describes our notion of task similarity and We state our generalization error bound for this framework in section 3 and in section 4 we compare these results for multiple task learning to the known bounds for the single task approach We close with concluding remarks and directions for future work in section 5 2 A Data Generation Model for Related Tasks Formally the typical classification learning problem is framed as follows Given a domain X and a random sample S drawn from some unknown distribution P on X 0 1 find a hypothesis h X 0 1 which approximates P i e h such that for randomly drawn x b with high probability h x b This problem is some times referred to as statistical regressions The multiple task learning problem is the analogous problem for multiple distributions That is given domain X and sequence of random samples S1 Sn drawn from some unknown distributions P1 Pn respectively on X 0 1 find hypotheses h1 hn X 0 1 which approximate P1 Pn respectively As we have mentioned previously it is intuitive that the advantage of the multiple task approach depends on the relatedness between the different tasks While there has been empirical success with sets of tasks related in various ways thus far no formal definition of relatedness has provided any theoretical results to this effect 2 1 Our Notion of Relatedness Between Learning Tasks We define a data generation mechanism which serves to determine our notion of related tasks Our data generation model is an extension of the agnostic learning framework The basic ingredient in our definition is a set F of transformations f X X We say that tasks are F related if for some fixed probability distribution over X 0 1 the data in each of these tasks is generated by applying some f F to this fixed distribution Definition 2 1 Let F be a set of transformations f X X and let P1 P2 be probability distributions over X 0 1 We say that P1 P2 are F related distributions if there exists some f F such that for any T X 0 1 T is P1 measurable iff f T f x b x b T is P2 measurable and P1 T P2 f T Note that the strength of this definition depends of the richness of


View Full Document

Berkeley COMPSCI 294 - ben-david schuller-2003

Documents in this Course
"Woo" MAC

"Woo" MAC

11 pages

Pangaea

Pangaea

14 pages

Load more
Download ben-david schuller-2003
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ben-david schuller-2003 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ben-david schuller-2003 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?