DOC PREVIEW
Johns Hopkins EN 600 465 - Finite-State Programming

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

600.465 - Intro to NLP - J. Eisner 1 Finite-State Programming Some Examples600.465 - Intro to NLP - J. Eisner 2 Finite-state “programming” Function Function on (set of) strings Source code Regular expression Object code Finite state machine Compiler Regexp compiler Optimization of object code Determinization, minimization, pruning600.465 - Intro to NLP - J. Eisner 3 Finite-state “programming” Function composition(Weighted) compositionHigher-order functionOperatorFunction inversion(available in Prolog)Function inversionStructuredprogrammingOps + small regexps600.465 - Intro to NLP - J. Eisner 4 Finite-state “programming” Parallelism Apply to set of strings Nondeterminism Nondeterminism Stochasticity Prob.-weighted arcs600.465 - Intro to NLP - J. Eisner 5 Some Xerox Extensions $ containment => restriction -> @-> replacement Make it easier to describe complex languages and relations without extending the formal power of finite-state systems. slide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 6 Containment $[ab*c] “Must contain a substring that matches ab*c.” Accepts xxxacyy Rejects bcba ?* [ab*c] ?* Equivalent expression b c a a,b,c,? a,b,c,? Warning: ? in regexps means “any character at all.” But ? in machines means “any character not explicitly mentioned anywhere in the machine.”600.465 - Intro to NLP - J. Eisner 7 Restriction ? c b b c ? a c a => b _ c “Any a must be preceded by b and followed by c.” Accepts bacbbacde Rejects baca ~[~[?* b] a ?*] & ~[?* a ~[c ?*]] Equivalent expression slide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 8 Replacement a:b b a ? ? b:a a a:b a b -> b a “Replace ‘ab’ by ‘ba’.” Transduces abcdbaba to bacdbbaa [~$[a b] [[a b] .x. [b a]]]* ~$[a b] Equivalent expression slide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 9 Replacement is Nondeterministic a b -> b a | x “Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.” Transduces abcdbaba to {bacdbbaa, bacdbxa, xcdbbaa, xcdbxa}600.465 - Intro to NLP - J. Eisner 10 Replacement is Nondeterministic [ a b -> b a | x ] .o. [ x => _ c ] “Replace ‘ab’ by ‘ba’ or ‘x’, nondeterministically.” Transduces abcdbaba to {bacdbbaa, bacdbxa, xcdbbaa, xcdbxa}600.465 - Intro to NLP - J. Eisner 11 Replacement is Nondeterministic a b | b | b a | a b a -> x applied to “aba” Four overlapping substrings match; we haven’t told it which one to replace so it chooses nondeterministically a b a a b a a b a a b a a x a a x x a x slide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 12 More Replace Operators  Optional replacement: a b (->) b a  Directed replacement  guarantees a unique result by constraining the factorization of the input string by  Direction of the match (rightward or leftward)  Length (longest or shortest) slide courtesy of L. Karttunen600.465 - Intro to NLP - J. Eisner 13 @-> Left-to-right, Longest-match Replacement a b | b | b a | a b a @-> x applied to “aba” a b a a b a a b a a b a a x a a x x a x slide courtesy of L. Karttunen @-> left-to-right, longest match @> left-to-right, shortest match ->@ right-to-left, longest match >@ right-to-left, shortest match600.465 - Intro to NLP - J. Eisner 14 Using “…” for marking 0:[ [ 0:] ? a e i o u ] a|e|i|o|u -> [ ... ] p o t a t o p[o]t[a]t[o] Note: actually have to write as -> %[ ... %] or -> “[” ... “]” since [] are parens in the regexp language slide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 15 Using “…” for marking 0:[ [ 0:] ? a e i o u ] a|e|i|o|u -> [ ... ] p o t a t o p[o]t[a]t[o] Which way does the FST transduce potatoe? slide courtesy of L. Karttunen (modified) How would you change it to get the other answer? p o t a t o e p[o]t[a]t[o][e] p o t a t o e p[o]t[a]t[o e] vs.600.465 - Intro to NLP - J. Eisner 16 Example: Finnish Syllabification define C [ b | c | d | f ... define V [ a | e | i | o | u ]; s t r u k t u r a l i s m i s t r u k - t u - r a - l i s - m i [C* V+ C*] @-> ... "-" || _ [C V] “Insert a hyphen after the longest instance of the C* V+ C* pattern in front of a C V pattern.” slide courtesy of L. Karttunen why?600.465 - Intro to NLP - J. Eisner 17 Conditional Replacement The relation that replaces A by B between L and R leaving everything else unchanged. A -> B Replacement L _ R Context Sources of complexity:  Replacements and contexts may overlap  Alternative ways of interpreting “between left and right.” slide courtesy of L. Karttunen600.465 - Intro to NLP - J. Eisner 18 Hand-Coded Example: Parsing Dates Today is [Tuesday, July 25, 2000]. Today is Tuesday, [July 25, 2000]. Today is [Tuesday, July 25], 2000. Today is Tuesday, [July 25], 2000. Today is [Tuesday], July 25, 2000. Best result Bad results Need left-to-right, longest-match constraints. slide courtesy of L. Karttunen600.465 - Intro to NLP - J. Eisner 19 Source code: Language of Dates Day = Monday | Tuesday | ... | Sunday Month = January | February | ... | December Date = 1 | 2 | 3 | ... | 3 1 Year = %0To9 (%0To9 (%0To9 (%0To9))) - %0?* from 1 to 9999 AllDates = Day | (Day “, ”) Month “ ” Date (“, ” Year)) slide courtesy of L. Karttunen600.465 - Intro to NLP - J. Eisner 20 Object code: All Dates from 1/1/1 to 12/31/9999 , , Feb Jan Mar May Jun Jul Apr Aug Oct Nov Dec Sep 3 , , 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 0 1 0 2 1 Tue Mon Wed Fri Sat Sun Thu May Jan Feb Mar Apr Jun Jul Aug Oct Nov Dec Sep 13 states, 96 arcs 29 760 007 date expressions slide courtesy of L. Karttunen600.465 - Intro to NLP - J. Eisner 21 Parser for Dates AllDates @-> “[DT ” ... “]” Compiles into an unambiguous transducer (23 states, 332 arcs). Today is [DT Tuesday, July 25, 2000] because yesterday was [DT Monday] and it was [DT July 24] so tomorrow must be [DT Wednesday, July 26] and not [DT July 27] as it says on the program. Xerox left-to-right replacement operator slide courtesy of L. Karttunen (modified)600.465 - Intro to NLP - J. Eisner 22 Problem of Reference Valid dates Tuesday, July 25, 2000 Tuesday, February 29, 2000 Monday, September 16, 1996 Invalid dates


View Full Document

Johns Hopkins EN 600 465 - Finite-State Programming

Download Finite-State Programming
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Finite-State Programming and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Finite-State Programming 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?