Finite-State MorphologyCMSC 723: Computational Linguistics I ― Session #3Jimmy LinJimmy LinThe iSchoolUniversity of MarylandWednesday, September 16, 2009Today’s Agenda| Computational toolsz Regular expressionsz Finite-state automata (deterministic vs. non-deterministic)z Finite-state transducers|Overview of morphological processes|Overview of morphological processes| Computational morphology with finite-state methodsRegular Expressions| A metalanguage for specifying simple classes of stringsz Very useful in searching and matching text strings| Everyone does it!z Implementations in the shell, Perl, Java, Python, …Regular Expressions| Basic regular expressions/happy/ → happy/[abcd]/ → a, b, c, d/[a-d]/ → a, b, c, d/[^a-d]/→e, f, g, … z/[ ad]/ e, f, g, … z/[Tt]he/ → the, The/(dog|cat)/ → dog, cat| Special metacharacters/colou?r/ → color, colour/oo*h!/ → oh! ooh!oooh!/ooh!/ → oh!, ooh!, oooh!, …/oo+h!/ → ooh!, oooh!, ooooh!, …/beg.n/ → began, begin, begun, begbn, …NLP*with Regular ExpressionsTranscript with Eliza, simulation of a Rogerian psychotherapist (Weizenbaum, 1966)User: Men are all alikeELIZA: IN WHAT WAYUser: They’re always bugging us about something or otherELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE?User: Well, my boyfriend made me come hereELIZA: YOUR BOYFRIEND MADE YOU COME HEREUser: He says I’m depressed much of the timeUser: He says Im depressed much of the timeELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSEDHow did it work?| .* all .* → IN WHAT WAY| .* always .* → CAN YOU THINK OF A SPECIFIC EXAMPLE| .* I’m (depressed|sad) .* → I AM SORRY TO HEAR YOU ARE \1 | .* I’m (depressed|sad) .* → WHY DO YOU THINK YOU ARE \1?Aside…| What is intelligence?| What does Eliza tell us about intelligence?at does a te us about te ge ceEquivalence Relations| We can say the followingz Regular expressions describe a regular languagez Regular expressions can be implemented by finite-state automataz Regular languages can be generated by regular grammars|So what?|So what?RegularLanguagesRegular ExpressionsFinite-State AutomataLanguagesRegular GrammarsSheeptalk!baa!b!Language:RlE ibaaa!baaaa!baaaaa!.../baa+!/Regular Expression:FiniteState Automaton:baa!Finite-State Automaton:q0q1q2q3q4aFinite-State Automata| What are they?| What do they do?at do t ey do| How do they work?FSA: What are they?| Q: a finite set of N states z Q = {q0, q1, q2, q3, q4}z The start state: q0z The set of final states: F = {q4}|Σ: a finite input alphabet of symbols|Σ: a finite input alphabet of symbolsz Σ = {a, b, !}|δ(qi): transition function|δ(q,i): transition function z Given state q and input symbol i, return new state q'z δ(q3,!) → q4q0q1q2q3q4baa!q0q1q2q3q4aFSA: State Transition TableInputStateba!Stateba!0 1∅∅1∅2∅1∅2∅2∅3∅3∅343∅344∅∅ ∅q0q1q2q3q4baa!q0q1q2q3q4aFSA: What do they do?| Given a string, a FSA either rejects or accepts itz ba! → rejectz baa! → acceptz baaaz! → rejectzbaaaa!→acceptbaaaa! acceptz baaaaaa! → acceptz baa → rejectmoooorejectzmoooo→ reject| What does this have to do with NLP?zThink grammaticality!zThink grammaticality!FSA: How do they work?q0q1q2q3q3q4baaa!ACCEPTbaa!q0q1q2q3q4aFSA: How do they work?q0q1q2ba!!!REJECTbaa!q0q1q2q3q4aD-RECOGNIZEAccept or Generate?| Formal languages are sets of stringsz Strings composed of symbols drawn from a finite alphabet| Finite-state automata define formal languages z Without having to enumerate all the strings in the language| Two views of FSAs:z Acceptors that can tell you if a string is in the languageGenerators to produce all and only the strings in the languagezGenerators to produce all and only the strings in the languageSimple NLP with FSAsIntroducing Non-Determinism| Deterministic vs. Non-deterministic FSAs| Epsilon (ε) transitionsUsing NFSAs to Accept Strings| What does it mean?z Accept: there exist at least one path (need not be all paths)z Reject: no paths exist| General approaches:z Backup: add markers at choice points, then possibly revisit unexplored arcs at marked choice pointz Look-ahead: look ahead in input to provide cluesz Parallelism: look at alternatives in parallel| Recognition with NFSAs as search through state space()z Agenda holds (state, tape position) pairsND-RECOGNIZEND-RECOGNIZEState Orderings| Stack (LIFO): depth-first|Queue (FIFO): breadth-firstQueue ( O) b eadtstND-RECOGNIZE: ExampleACCEPTWhat’s the point?| NFSAs and DFSAs are equivalentz For every NFSA, there is a equivalent DFSA (and vice versa)| Equivalence between regular expressions and FSAz Easy to show with NFSAs| Why use NFSAs?Regular Language: Definition| ∅ is a regular language| a א Σ ε, {a} is a regular language aε,{a} s a egu a a guage| If L1and L2are regular languages, then so are:zL1·L2={xy|xאL1,yאL2}, theconcatenationof L1and L2L1 L2 {xy| xאL1, yאL2}, the concatenationof L1and L2z L1 L2, the union or disjunction of L1and L2z L1כ, the Kleene closure of L1Regular Languages: Starting PointsRegular Languages: ConcatenationRegular Languages: DisjunctionRegular Languages: Kleene ClosureFinite-State Transducers (FSTs)| A two-tape automaton that recognizes or generates pairs of strings| Think of an FST as an FSA with two symbol strings on each arcz One symbol string from each tapeFour-fold view of FSTs| As a recognizer|As a generatorsage eato| As a translator|As a set relater|As a set relaterSummary: Computational Tools| Regular expressions| Finite-state automata (deterministic vs. non-deterministic)testate auto ata (dete st c s odete st c)| Finite-state transducersComputational Morphology| Definitions and problemsz What is morphology?z Topology of morphologies| Computational morphologyz Finite-state methodsMorphology| Study of how words are constructed from smaller units of meaning| Smallest unit of meaning = morphemez fox has morpheme foxz cats has two morphemes cat and –sz Note: it is useful to distinguish morphemes from orthographic rules|Two classes of morphemes:|Two classes of morphemes:z Stems: supply the “main” meaningz Affixes: add “additional” meaningTopology of Morphologies| Concatenative vs. non-concatenative| Derivational vs. inflectionale ato a s ecto a| Regular vs. irregularConcatenative Morphology| Morpheme+Morpheme+Morpheme+…|Stems (also called lemma, base form, root, lexeme):Ste s (a so ca ed e a, base o , oot, e e e)z hope+ing → hopingz hop+ing → hopping| Affixes:z Prefixes:
View Full Document