DOC PREVIEW
UMD CMSC 723 - Intro to Computational Linguistics

This preview shows page 1-2-3-4-24-25-26-50-51-52-53 out of 53 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 53 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC 723 / LING 645: Intro to Computational LinguisticsComputational Morphology (continued)Lexicon-Free Morphology: Porter StemmerPorter StemmerSlide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Why (not) Statistics for NLP?Weighted Automata/TransducersPronunciation network for “about”Noisy ChannelProbability DefinitionsMore DefinitionsConditional ProbabilityIndependenceBayes TheoremWhat does this have to do with the Noisy Channel Model?Noisy Channel Applied to Word RecognitionWhat is the most likely word given [ni]?Why N-grams?Next Word Prediction [borrowed from J. Hirschberg]Next Word Prediction (cont)Human Word PredictionClaimWhy would we want to do this?Why is this useful?Handwriting RecognitionReal Word Spelling ErrorsFor Spell CheckersLanguage ModelWord Prediction: Simple vs. SmartChain RuleRelative Frequencies and Conditional ProbabilitiesFor a Word StringMarkov AssumptionCounting Words in CorporaCorporaTraining and TestingTerminologySimple N-GramsUsing N-GramsA Bigram Grammar Fragment from BERPAdditional BERP GrammarComputing Sentence ProbabilityBERP Bigram CountsBERP Bigram Probabilities: Use Unigram CountLearning a Bigram GrammarWhat do we learn about the language?Readings for next timeCMSC 723 / LING 645: Intro to Computational LinguisticsSeptember 22, 2004: DorrPorter Stemmer,Intro to Probabilistic NLP and N-grams (chap 6.1-6.3)Prof. Bonnie J. DorrDr. Christof MonzTA: Adam LeeComputational Morphology (continued)The Rules and the Lexicon–General versus Specific–Regular versus Irregular–Accuracy, speed, space–The Morphology of a language Approaches–Lexicon only–Lexicon and Rules•Finite-state Automata•Finite-state Transducers–Rules onlyLexicon-Free Morphology:Porter StemmerLexicon-Free FST ApproachBy Martin Porter (1980)http://www.tartarus.org/%7Emartin/PorterStemmer/ Cascade of substitutions given specific conditionsGENERALIZATIONS GENERALIZATIONGENERALIZEGENERALGENERPorter StemmerDefinitions–C = string of one or more consonants, where a consonant is anything other than A E I O U or (Y preceded by C)–V = string of one or more vowels–M = Measure, roughly with number of syllables–Words = (C)*(V*C*)M(V)*• M=0 TR, EE, TREE, Y, BY• M=1 TROUBLE, OATS, TREES, IVY• M=2 TROUBLES, PRIVATE, OATEN, ORRERYConditions–*S - stem ends with S–*v* - stem contains a V–*d - stem ends with double C, e.g., -TT, -SS –*o - stem ends CVC, where second C is not W, X or Y, e.g., -WIL, HOPPorter StemmerStep 1: Plural Nouns and Third Person Singular VerbsSSES  SS caresses  caressIES  I ponies  poni ties  tiSS  SS caress  caressS  cats  cat*<S> = ends with <S> *v* = contains a V*d = ends with double C*o = ends with CVC second C is not W, X or YStep 2a: Verbal Past Tense and Progressive Forms(M>0) EED  EE feed  feed, agreed  agreei (*v*) ED  plastered  plaster, bled  bledii (*v*) ING  motoring  motor, sing  singStep 2b: If 2a.i or 2a.ii is successful, Cleanup AT  ATE conflat(ed)  conflate BL  BLE troubl(ed)  trouble IZ  IZE siz(ed)  size(*d and not (*L or *S or *Z)) hopp(ing)  hop, tann(ed)  tan  single letter hiss(ing)  hiss, fizz(ed)  fizz (M=1 and *o)  E fail(ing)  fail, fil(ing)  filePorter StemmerStep 3: Y  I(*v*) Y  I happy  happi sky  sky*<S> = ends with <S> *v* = contains a V*d = ends with double C*o = ends with CVC second C is not W, X or YPorter StemmerStep 4: Derivational Morphology I: Multiple Suffixes (m>0) ATIONAL -> ATE relational -> relate (m>0) TIONAL -> TION conditional -> condition rational -> rational (m>0) ENCI -> ENCE valenci -> valence (m>0) ANCI -> ANCE hesitanci -> hesitance (m>0) IZER -> IZE digitizer -> digitize (m>0) ABLI -> ABLE conformabli -> conformable (m>0) ALLI -> AL radicalli -> radical (m>0) ENTLI -> ENT differentli -> different (m>0) ELI -> E vileli - > vile (m>0) OUSLI -> OUS analogousli -> analogous (m>0) IZATION -> IZE vietnamization -> vietnamize (m>0) ATION -> ATE predication -> predicate (m>0) ATOR -> ATE operator -> operate (m>0) ALISM -> AL feudalism -> feudal (m>0) IVENESS -> IVE decisiveness -> decisive (m>0) FULNESS -> FUL hopefulness -> hopeful (m>0) OUSNESS -> OUS callousness -> callous (m>0) ALITI -> AL formaliti -> formal (m>0) IVITI -> IVE sensitiviti -> sensitive (m>0) BILITI -> BLE sensibiliti -> sensiblePorter StemmerStep 5: Derivational Morphology II: More Multiple Suffixes (m>0) ICATE -> IC triplicate -> triplic (m>0) ATIVE -> formative -> form (m>0) ALIZE -> AL formalize -> formal (m>0) ICITI -> IC electriciti -> electric (m>0) ICAL -> IC electrical -> electric (m>0) FUL -> hopeful -> hope (m>0) NESS -> goodness -> goodPorter StemmerStep 6: Derivational Morphology III: Single Suffixes (m>1) AL -> revival -> reviv (m>1) ANCE -> allowance -> allow (m>1) ENCE -> inference -> infer (m>1) ER -> airliner -> airlin (m>1) IC -> gyroscopic -> gyroscop (m>1) ABLE -> adjustable -> adjust (m>1) IBLE -> defensible -> defens (m>1) ANT -> irritant -> irrit (m>1) EMENT -> replacement -> replac (m>1) MENT -> adjustment -> adjust (m>1) ENT -> dependent -> depend (m>1 and (*S or *T)) ION -> adoption -> adopt (m>1) OU -> homologou -> homolog (m>1) ISM -> communism -> commun (m>1) ATE -> activate -> activ (m>1) ITI -> angulariti -> angular (m>1) OUS -> homologous -> homolog (m>1) IVE -> effective -> effect (m>1)


View Full Document

UMD CMSC 723 - Intro to Computational Linguistics

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Download Intro to Computational Linguistics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Intro to Computational Linguistics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Intro to Computational Linguistics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?