600.465 - Intro to NLP - J. Eisner 1 Building Finite-State Machines600.465 - Intro to NLP - J. Eisner 3 Xerox Finite-State Tool You’ll use it for homework … Commercial (we have license; open-source clone is Foma) One of several finite-state toolkits available This one is easiest to use but doesn’t have probabilities Usage: Enter a regular expression; it builds FSA or FST Now type in input string FSA: It tells you whether it’s accepted FST: It tells you all the output strings (if any) Can also invert FST to let you map outputs to inputs Could hook it up to other NLP tools that need finite-state processing of their input or output600.465 - Intro to NLP - J. Eisner 4 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ | union E | F & intersection E & F ~ \ - complementation, minus ~E, \x, F-E .x. crossproduct E .x. F .o. composition E .o. F .u upper (input) language E.u “domain” .l lower (output) language E.l “range”600.465 - Intro to NLP - J. Eisner 5 Common Regular Expression Operators (in XFST notation) concatenation EF EF = {ef: e E, f F} ef denotes the concatenation of 2 strings. EF denotes the concatenation of 2 languages. To pick a string in EF, pick e E and f F and concatenate them. To find out whether w EF, look for at least one way to split w into two “halves,” w = ef, such that e E and f F. A language is a set of strings. It is a regular language if there exists an FSA that accepts all the strings in the language, and no other strings. If E and F denote regular languages, than so does EF. (We will have to prove this by finding the FSA for EF!)600.465 - Intro to NLP - J. Eisner 6 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ E* = {e1e2 … en: n0, e1 E, … en E} To pick a string in E*, pick any number of strings in E and concatenate them. To find out whether w E*, look for at least one way to split w into 0 or more sections, e1e2 … en, all of which are in E. E+ = {e1e2 … en: n>0, e1 E, … en E} =EE*600.465 - Intro to NLP - J. Eisner 7 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ | union E | F E | F = {w: w E or w F} = E F To pick a string in E | F, pick a string from either E or F. To find out whether w E | F, check whether w E or w F.600.465 - Intro to NLP - J. Eisner 8 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ | union E | F & intersection E & F E & F = {w: w E and w F} = E F To pick a string in E & F, pick a string from E that is also in F. To find out whether w E & F, check whether w E and w F.600.465 - Intro to NLP - J. Eisner 9 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ | union E | F & intersection E & F ~ \ - complementation, minus ~E, \x, F-E ~E = {e: e E} = * - E E – F = {e: e E and e F} = E & ~F \E = - E (any single character not in E) is set of all letters; so * is set of all strings600.465 - Intro to NLP - J. Eisner 10 Regular Expressions A language is a set of strings. It is a regular language if there exists an FSA that accepts all the strings in the language, and no other strings. If E and F denote regular languages, than so do EF, etc. Regular expression: EF*|(F & G)+ Syntax: E F * F G concat & + | Semantics: Denotes a regular language. As usual, can build semantics compositionally bottom-up. E, F, G must be regular languages. As a base case, e denotes {e} (a language containing a single string), so ef*|(f&g)+ is regular.600.465 - Intro to NLP - J. Eisner 11 Regular Expressions for Regular Relations A language is a set of strings. It is a regular language if there exists an FSA that accepts all the strings in the language, and no other strings. If E and F denote regular languages, than so do EF, etc. A relation is a set of pairs – here, pairs of strings. It is a regular relation if there exists an FST that accepts all the pairs in the language, and no other pairs. If E and F denote regular relations, then so do EF, etc. EF = {(ef,e’f’): (e,e’) E, (f,f’) F} Can you guess the definitions for E*, E+, E | F, E & F when E and F are regular relations? Surprise: E & F isn’t necessarily regular in the case of relations; so not supported.600.465 - Intro to NLP - J. Eisner 12 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ | union E | F & intersection E & F ~ \ - complementation, minus ~E, \x, F-E .x. crossproduct E .x. F E .x. F = {(e,f): e E, f F} Combines two regular languages into a regular relation.600.465 - Intro to NLP - J. Eisner 13 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ | union E | F & intersection E & F ~ \ - complementation, minus ~E, \x, F-E .x. crossproduct E .x. F .o. composition E .o. F E .o. F = {(e,f): m. (e,m) E, (m,f) F} Composes two regular relations into a regular relation. As we’ve seen, this generalizes ordinary function composition.600.465 - Intro to NLP - J. Eisner 14 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ | union E | F & intersection E & F ~ \ - complementation, minus ~E, \x, F-E .x. crossproduct E .x. F .o. composition E .o. F .u upper (input) language E.u “domain” E.u = {e: m. (e,m) E}600.465 - Intro to NLP - J. Eisner 15 Common Regular Expression Operators (in XFST notation) concatenation EF * + iteration E*, E+ | union E | F & intersection E & F ~ \ - complementation, minus ~E, \x, F-E .x. crossproduct E .x. F .o. composition E .o. F .u upper (input) language E.u “domain” .l lower (output) language E.l “range”Function from strings to ... a:x/.5 c:z/.7 e:y/.5 .3 Acceptors (FSAs)
View Full Document