Unformatted text preview:

Finite-State ProgrammingFinite-state “programming”Slide 3Slide 4Some Xerox ExtensionsContainmentRestrictionReplacementReplacement is NondeterministicSlide 10Slide 11More Replace Operators@-> Left-to-right, Longest-match ReplacementUsing “…” for markingSlide 15Example: Finnish SyllabificationConditional ReplacementHand-Coded Example: Parsing DatesSource code: Language of DatesObject code: All Dates from 1/1/1 to 12/31/9999Parser for DatesProblem of ReferenceRefinement by IntersectionDefining Valid DatesParser for Valid and Invalid DatesMore Engineering ApplicationsFASTUS – Information Extraction Appelt et al, 1992-?FASTUS: Successive Markups (details on subsequent slides)FASTUS: TokenizationFASTUS: MultiwordsFASTUS : Basic phrasesFASTUS: Noun GroupsSlide 33FASTUS: Semantic patterns600.465 - Intro to NLP - J. Eisner 1Finite-State ProgrammingSome Examples600.465 - Intro to NLP - J. Eisner 2Finite-state “programming”600.465 - Intro to NLP - J. Eisner 3Finite-state “programming”Function composition (Weighted) compositionHigher-order function OperatorFunction inversion(available in Prolog)Function inversionStructuredprogrammingOps + small regexps600.465 - Intro to NLP - J. Eisner 4Finite-state “programming”Parallelism Apply to set of strings Nondeterminism Nondeterminism Stochasticity Prob.-weighted arcs600.465 - Intro to NLP - J. Eisner 5Some Xerox Extensions$ containment=> restriction-> @-> replacementMake it easier to describe complex languages and relations without extending the formal power of finite-state systems.slide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 6Containment$[ab*c]$[ab*c]““Must contain a substringMust contain a substringthat matches that matches ab*cab*c.”.”Accepts Accepts xxxacyyxxxacyyRejects Rejects bcbabcba?* [ab*c] ?* ?* [ab*c] ?* Equivalent expression Equivalent expression bbccaaa,b,c,?a,b,c,?a,b,c,?a,b,c,?Warning: ?? in regexps means “any character at all.”But ?? in machines means “any character not explicitly mentioned anywhere in the machine.”600.465 - Intro to NLP - J. Eisner 7Restriction??ccbbbbcc??aacca => b _ ca => b _ c““AnyAny aa must be preceded bymust be preceded by bband followed byand followed by cc.”.”Accepts Accepts bacbbacdebacbbacdeRejects Rejects bacbacaa~[~[~[?* b] a ?*~[?* b] a ?*]] & & ~[~[?* a ~[c ?*]?* a ~[c ?*]]] Equivalent expression Equivalent expression slide courtesy of L. Karttunen (modified)contains aa not preceded by bbcontains aa not followed by cc600.465 - Intro to NLP - J. Eisner 8Replacementa:ba:bbbaa????b:ab:aaaa:ba:ba b -> b a““Replace ‘ab’ by ‘ba’.”Replace ‘ab’ by ‘ba’.”Transduces Transduces abcdbaba to to bacdbbaa [[~$[a b] ~$[a b] [[a b] .x. [b a]][[a b] .x. [b a]]]*]* ~$[a b]~$[a b] Equivalent expression Equivalent expression slide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 9Replacement is Nondeterministica b -> b a | x““Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.”Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.”Transduces Transduces abcdbaba to { to {bacdbbaa,, bacdbxa,, xcdbbaa,, xcdbxa}}600.465 - Intro to NLP - J. Eisner 10Replacement is Nondeterministic[ a b -> b a | x ] .o. [ x => _ c ] ““Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.”Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.”Transduces Transduces abcdbaba to { to {bacdbbaa,, bacdbxa,, xcdbbaa,, xcdbxa}}600.465 - Intro to NLP - J. Eisner 11Replacement is Nondeterministica b | b | b a | a b a -> xapplied to “aba”Four overlapping substrings match; we haven’t told it which one to replace so it chooses nondeterministicallya b a a b a a b a a b aa x a a x x a xslide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 12More Replace OperatorsOptional replacement: a b (->) b a Directed replacementguarantees a unique result by constraining the factorization of the input string byDirection of the match (rightward or leftward)Length (longest or shortest)slide courtesy of L. Karttunen600.465 - Intro to NLP - J. Eisner 13@-> Left-to-right, Longest-match Replacementa b | b | b a | a b a @-> xapplied to “aba”a b a a b a a b a a b aa x a a x x a xslide courtesy of L. Karttunen@-> left-to-right, longest match @> left-to-right, shortest match ->@ right-to-left, longest match >@ right-to-left, shortest match600.465 - Intro to NLP - J. Eisner 14Using “…” for marking0:[0:[[[0:]0:]??aaeeiioouu]]a|e|i|o|u -> [ ... ]p o t a t op o t a t op[o]t[a]t[o]p[o]t[a]t[o]Note: actually have to write as Note: actually have to write as -> %[ ... %] or or -> “[” ... “]”since [] are parens in the regexp languagesince [] are parens in the regexp languageslide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 15Using “…” for marking0:[0:[[[0:]0:]??aaeeiioouu]]a|e|i|o|u -> [ ... ]p o t a t op o t a t op[o]t[a]t[o]p[o]t[a]t[o]Which way does the FST transduce potatoe? slide courtesy of L. Karttunen (modified)How would you change it to get the other answer? p o t a t o ep o t a t o ep[o]t[a]t[o][e]p[o]t[a]t[o][e]p o t a t o ep o t a t o ep[o]t[a]t[o e]p[o]t[a]t[o e]vs.600.465 - Intro to NLP - J. Eisner 16Example: Finnish Syllabificationdefine C [ b | c | d | f ...define C [ b | c | d | f ...define V [ a | e | i | o | u ];define V [ a | e | i | o | u ];s t r u k t u r a l i s m s t r u k t u r a l i s m iis t r u k - t u - r a - l i s - m s t r u k - t u - r a - l i s - m ii[C* V+ C*] @-> ... "-" || _ [C V][C* V+ C*] @-> ... "-" || _ [C V]““Insert a hyphen after the longest instance of theInsert a hyphen after the longest instance of the C* V+ C*C* V+ C* pattern in front of a pattern in front of a C VC V pattern.” pattern.”slide courtesy of L. Karttunenwhy?why?600.465 - Intro to NLP - J. Eisner 17Conditional ReplacementThe relation that replaces A by B between L and R leaving everything else unchanged.A -> BA -> BReplacementL _ RL _ RContextSources of complexity: Replacements and contexts may overlap Alternative ways of interpreting “between left and right.”slide courtesy of L. Karttunen600.465 - Intro to NLP - J. Eisner 18Hand-Coded Example: Parsing DatesToday is [Tuesday, July 25, 2000]. Today is Tuesday, [July 25, 2000].Today is [Tuesday, July 25], 2000.Today is


View Full Document

Johns Hopkins EN 600 465 - Finite-State Programming

Download Finite-State Programming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Finite-State Programming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Finite-State Programming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?