MIT 6 080 - Lecture Notes - D2639148

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 080> Lecture Notes

MIT 6 080 - Lecture Notes

School name Massachusetts Institute of Technology

Course 6 080- Great Ideas in Theoretical Computer Science

Pages 4

Download Save

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 6.080 / 6.089 Great Ideas in Theoretical Computer Science Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.� � � � 1 6.080/6.089 GITCS Feb 5, 2008 Lecture 21 Lecturer: Scott Aaronson Scribe: Scott Aaronson / Chris Granade Recap and Discussion of Previous Lecture Theorem 1 (Valiant) m = O 1 � log (|C| /δ) samples suﬃce for (�, δ)-learning. Theorem 2 (Blumer et al.) m = O 1 VCdim (C) log 1 samples suﬃce. � δ� In both cases, the learning algorithm that achives the bound is just “ﬁnd any hypothesis h compatible with all the sample data, and output it.” You asked great, probing questions last time, about what these theorems really mean. For example, “why can’t I just draw little circles around the ‘yes’ points, and expect that I can therefore predict the future?” It’s unfortunately a bit hidden in the formalism, but what these theorems are “really” saying is that to predict the future, it suﬃces to ﬁnd a succinct description of the past–a description that takes many fewer bits to write down than the past data itself. Hence the dependence on |C| or VCdim (C): the size or dimension of the concept class from which our hypothesis is drawn. We also talked about the computational problem of ﬁnding a small hypothesis that agrees with the data. Certainly we can always solve this problem in polynomial time if P = NP. But what if P =� NP? Can we show that “learning is NP-hard”? Here we saw that we need to distinguish two cases: Proper learning problems (where the hypothesis has to have a certain form): Sometimes we can show these are NP-hard. Example: Finding a DNF expression that agrees with the data. Improper learning problems (where the hypothesis can be any Boolean circuit): It’s an open problem whether any of these are NP-hard. (Incidentally, why do we restrict the hypothesis to be a Boolean circuit? It’s equivalent to saying, we should be able to compute in polynomial time what a given hypothesis predicts.) So, if we can’t show that improper (or “represe ntation-independent”) learning is NP-complete, what other evidence might there be for its hardness? The teaser from last time: we could try to show that ﬁnding a hypothesis that explains past data is at least as hard as breaking some cryptographic code! But how would we actually do this? How would we reduce a cryptanalysis problem to a learning problem? To be concrete, let’s just consider the RSA cryptosystem. Can any of you give me a PAC-learning problem, such that if you could solve it, then you could also break RSA? How about this: our concept class C will have one concept c for each product of prime numbers N = pq, with p − 1 and q − 1 not divisible by 3. (Technically, for each N expressible with at most n bits.) Our sample space S will consist of pairs of the form (y, i), where 1 ≤ y ≤ N−1 and 1 ≤ i ≤ log N. Here’s the trick: (y, i) will be in c if and only if the ith bit of y1/3 mod N is a 1. The sample distribution D will be uniform over S. 21-1So basically, you (the learner) will be given a bunch of encrypted messages of the form x3 mod N , and for each one, you’ll also told some bit of the plaintext x. Based on this “training” data, you need to infer the general rule for going from x3 mod N to some given bit of x. First question: is there such a rule, which is expressible by a polynomial-size circuit? Sure there is! Namely, the rule that someone who knew the trapdo or information, who knew p and q, would use to decrypt the messages ! On the other hand, if you don’t already know this rule, is there an e ﬃc ient algorithm to infer it from sample data? Well, not if RSA is secure! The sample data–the set of (y, i) pairs–is stuﬀ that an eavesdropper could not only plausibly have access to, but could actually generate itself! So if, by examining that data, the adversary could gain the ability to go from x3 mod N to a desired bit of x—well, then RSA is toast. (Today, it’s actually known how to base the hardness of improper learning on the existence of any one-way function, not just the RSA function.) A beautiful connection between learning theory and cryptography—typical of the connections that abound in theoretical computer science. 1.1 RSA and Language Learning In Infants: The Argument Chomsky Should’ve Made What is Noam Chomsky famous for, besides hating America? Linguistics, of course—among other things, what’s called the “Poverty of the Stimulus Argument.” This is an argument that tries to show, more or less from ﬁrst principles, that many of the basic ingredients of grammar (nouns, verbs, verb conjugation, etc.) must be hardwired into the human brain. They’re not just taught to children by their parents: the children are “pre-conﬁgured” to learn grammar. The argument says , suppose that weren’t the case; suppose instead that babies started out as blank slates. Before it has to start speaking, a baby will hear, what, a hundred thousand sentences? Chomsky c laims, with a bit of handwaving, that isn’t nearly enough sentences to infer the general rules of grammar, the rules separating grammatical sentences from ungrammatical ones. The sample data available to the baby are just too impoverished for it to learn a language from scratch. But here’s the problem: the sample complexity bounds we saw earlier today should make us skeptical of any such argument! These bounds suggested that, in principle, it really is possible to predict the future given a surprisingly small amount of sample data. As long as the VC-dimension of your concept class is small—and I know I haven’t convinced you of this, but in “most” practical cases it is—the amount of data needed for learning will be quite reasonable. So the real stumbling block would seem to be not sample complexity, but computational com-plexity! In other words, if the basic ingredients of language weren’t hardwired into every human baby, then even if in principle a baby has heard enough sentences spoken by its parents to infer the rules of the English language, how would the baby actually do the computation? It’s just a baby! More concretely, let’s say I give you

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 080 - Lecture Notes

Sign up for free to view:

Please select your school