UMass Amherst CS 585 - Context Free Grammars (43 pages)

Previewing pages 1, 2, 3, 20, 21, 22, 41, 42, 43 of 43 page document View the full content.
View Full Document

Context Free Grammars



Previewing pages 1, 2, 3, 20, 21, 22, 41, 42, 43 of actual document.

View the full content.
View Full Document
View Full Document

Context Free Grammars

24 views

Lecture Notes


Pages:
43
School:
University of Massachusetts Amherst
Course:
Cs 585 - Introduction to Natural Language Processing
Introduction to Natural Language Processing Documents

Unformatted text preview:

Lecture 2 Context Free Grammars Introduction to Natural Language Processing CS 585 Fall 2004 Andrew McCallum Also includes material from Chris Manning September 14 2004 Today s Main Points Demo of hands on with text using Unix tools Ziph s Law A brief introduction to syntax in NLP Define context free grammars Give some examples Chomsky normal form Converting to it Parsing as search Top down bottom up and the problems with each Hand out homework 1 Read M S Ch 3 if you haven t already Administration I will be giving research presentations in D C next Tuesday Thursday Tuesday Wei Li with review of probability Thursday Aron Culotta information theory and naive Bayes Be sure to subscribe to cs585 mailing list if you haven t already I sent a test message Sunday night Hands on with Text Get a large body of electronic text called a corpus Browse it Use perl and other Unix tools to look at counts of words What does this distribution look like Messy but the following works cat perl pe s a zA Z n g egrep awk print tolower 0 sort uniq c sort nr less cat sawyr10 txt perl pe s a zA Z n g egrep awk print tolower 0 sort uniq c sort nr perl pe s 0 9 n g uniq c less Word frequencies in Tom Sawyer Word the and a to of was it in that he I his you Tom with Freq 3332 2972 1775 1725 1440 1161 1027 906 877 877 783 772 686 679 642 Use determiner article conjunction determiner preposition verbal infinitive marker preposition auxiliary verb personal expletive pronoun preposition complementizer demonstrative personal pronoun personal pronoun possessive pronoun personal pronoun proper noun preposition Frequencies of frequencies in Tom Sawyer Word Frequency 1 2 3 4 5 6 7 8 9 10 11 50 51 100 100 71 730 word tokens 8 018 word types Frequency of Frequency 3993 1292 664 410 243 199 172 131 82 91 540 99 102 Ziph s Law in Tom Sawyer Word the and a he but be there one about more never Oh two Freq f 3332 2972 1775 877 710 294 222 172 158 138 124 116 104 Rank r 1 2 3 10 20 30 40 50 60 70 80 90 100 f r



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Context Free Grammars and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Context Free Grammars and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?