Berkeley CS 182 - Inductive and statistical learning of formal grammars

Unformatted text preview:

Inductive and statistical learningof formal grammarsPierre [email protected]– Typeset by FoilTEX –2002 Grammar InductionOutline• Grammar induction definition• Learning paradigms• DFA learning from positive and negative examples• RPNI algorithm• Probabilistic DFA learning• Application to a natural language task• Links with Markov models• Smoothing issues• Related problems and future workPierre Dupont 12002 Grammar InductionMachine LearningGoal: to give the learning ability to a machineDesign programs the performance of which improves over timeInductive learning is a particular instance of machine learning• Goal: to find a general law from examples• Subproblem of theoretical computer science, artificial intelligenceor pattern recognitionPierre Dupont 22002 Grammar InductionGrammar Induction or Grammatical InferenceGrammar induction is a particular case of inductive learningThe general law is represented by a formal grammar or an equivalent machineThe set of examples, known as positive sample, is usually made ofstrings or sequences over a specific alphabetA negative sample, i.e. a set of strings not belonging to the target language,can sometimes help the induction processaaabbbabGrammarS−>aSbS−>λInductionDataPierre Dupont 32002 Grammar InductionExamples• Natural language sentence• Speech• Chronological series• Successive actions of a WEB user• Successive moves during a chess game• A musical piece• A program• A form characterized by a chain code• A biological sequence (DNA, proteins, . . .)Pierre Dupont 42002 Grammar InductionPattern Recognition012345678910111213141516-2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14’3.4cont’’3.8cont’8dC012345 6 70000777666766665555454444432110007101123445433110012344543118dC: Pierre Dupont 52002 Grammar InductionChromosome classificationString of Primitives"=====CDFDCBBBBBBBA==bcdc==DGFB=bccb== ...... ==cffc=CCC==cdb==BCB==dfdcb====="-6-4-202460 100 200 300 400 500 600grey dens. derivativeposition along median axis304050607080900 100 200 300 400 500 600grey density Chromosome 2aCentromerePierre Dupont 62002 Grammar InductionA modeling hypothesisInductionG0Grammar GGenerationData• Find G as close as possible to G0• The induction process does not prove the existence of G0It is a modeling hypothesisPierre Dupont 72002 Grammar Induction• Grammar induction definition• Learning paradigms• DFA learning from positive and negative examples• RPNI algorithm• Probabilistic DFA learning• Application to a natural language task• Links with Markov models• Smoothing issues• Related problems and future workPierre Dupont 82002 Grammar InductionLearning paradigmsHow to characterize learning?• which concept classes can or cannot be learned?• what is a good example?• is it possible to learn in polynomial time?Pierre Dupont 92002 Grammar InductionIdentification in the limitG*GG21InductionG0Grammar12d d dnGenerationData• convergence in finite time to G∗• G∗is a representation of L(G0) (exact learning)Pierre Dupont 102002 Grammar InductionPAC LearningG*GG21InductionG0Grammar12d d dnGenerationData• convergence to G∗• G∗is close enough to G0with high probability⇒ Probably Approximately Correct learning• polynomial time complexityPierre Dupont 112002 Grammar InductionIdentification in the limit: good and bad newsThe bad one. . .Theorem 1. No superfinite class of languages is identifiable in the limit frompositive data onlyThe good one. . .Theorem 2. Any admissible class of languages is identifiable in the limit frompositive and negative dataPierre Dupont 132002 Grammar InductionOther learnability results• Identification in the limit in polynomial time– DFAs cannot be efficiently identified in the limit– unless we can ask equivalence and membership queries to an oracle• PAC learning– DFAs are not PAC learnable (under some cryptographic limitation assumption)– unless we can ask membership queries to an oraclePierre Dupont 142002 Grammar Induction• PAC learning with simple examples, i.e. examples drawn according to theconditional Solomonoff-Levin distributionPc(x) = λc2−K(x|c)K(x|c) denotes the Kolmogorov complexity of x given a representation c of theconcept to be learned– regular languages are PACS learnable with positive examples only– but Kolmogorov complexity is not computable!Pierre Dupont 152002 Grammar InductionCognitive relevance of learning paradigmsA largely unsolved questionLearning paradigms seem irrelevant to model human learning:• Gold’s identification in the limit framework has been criticized as children seemto learn natural language without negative examples• All learning models assume a known representation class• Some learnability results are based on enumerationPierre Dupont 162002 Grammar InductionHowever learning models show that:• an oracle can help• some examples are useless, others are good:characteristic samples ⇔ typical examples• learning well is learning efficiently• example frequency matters• good examples are simple examples ⇔ cognitive economyPierre Dupont 172002 Grammar Induction• Grammar induction definition• Learning paradigms• DFA learning from positive and negative examples• RPNI algorithm• Probabilistic DFA learning• Application to a natural language task• Links with Markov models• Smoothing issues• Related problems and future workPierre Dupont 182002 Grammar InductionRegular Inference from Positive and Negative DataAdditional hypothesis: the underlying theory is a regular grammar or, equiva-lently, a finite state automatonProperty 1. Any regular language has a canonical automaton A(L) which isdeterministic and minimal (minimal DFA)Example : L = (ba∗a)∗0 1b2abaPierre Dupont 192002 Grammar InductionA few definitionsDefinition 1. A positive sample S+is structurally completewith respect to an automaton A if, when generating S+from A:• every transition of A is used at least one• every final state is used as accepting state of at least one stringExample : {ba, baa, baba, λ}0 1b2abaPierre Dupont 202002 Grammar InductionMerging is funA2A10 1aa2b0,1a2b0 1,2aab0,2 1aba0,1,2ba20,1• Merging ⇔ definition of a partition π on the set of


View Full Document

Berkeley CS 182 - Inductive and statistical learning of formal grammars

Download Inductive and statistical learning of formal grammars
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Inductive and statistical learning of formal grammars and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Inductive and statistical learning of formal grammars 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?