UMD CMSC 723 - COURSE INFORMATION - D353078

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 723> COURSE INFORMATION

DOC PREVIEW

UMD CMSC 723 - COURSE INFORMATION

School name University of Maryland, College Park

Course Cmsc 723- Computational Linguistics I

Pages 72

This preview shows page 1-2-3-4-5-34-35-36-37-68-69-70-71-72 out of 72 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 72 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Language of the subconscious, by WildCherry - Saif Mohammad Instructor: Saif Mohammad  Co-instructor: Nitin Madnani  Coordinator: Professor Bonnie Dorr  Teaching Assistant: Sajib Dasgupta Instructor: Saif Mohammad  Co-instructor: Nitin Madnani  Coordinator: Professor Bonnie Dorr  Teaching Assistant: Sajib Dasgupta  Guest Lectures: ◦ Bonnie Dorr ◦ Philip Resnik ◦ Doug Oard Competent programmers Competent programmers  Do not have to be linguists ◦ Have high-school English behind you ◦ Know parts of speech, syntactic parse trees, subject, object,… ◦ Read material on word classes and context-free grammars from J&M chapters 5 and 12 for background Text: ◦ Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, second edition (published in 2008), by Daniel Jurafsky and James H. Martin.  Course webpage: ◦ http://www.umiacs.umd.edu/~saif/WebPages/CS723.htm  Class: ◦ Wednesdays, 4 to 6:30pm (5--10 min break in between) Exams: 50% ◦ midterm exam: 25% ◦ final exam: 25%  Class assignments/projects: 45% ◦ Assignment 1 through 4: 10%, 12.5%, 10%, 12.5% ◦ Assignment 0: no credit  designed to calibrate programming skills  Class participation: 5% ◦ Showing up for class, demonstrating preparedness, and contributing to class discussions. Office hours: ◦ Saif: by appointment ◦ Sajib: TA room 1112  Mondays: 4 to 5:30 pm  Tuesdays: 2 to 3:30 pm  Forum: ◦ https://forum.cs.umd.edu/forumdisplay.php?f=113 Focus on Statistical Models ◦ HMMs, EM, N-gram LMs, TAGs (approx. 4 lectures)  Assignments ◦ All written in Python/NLTK ◦ Python/NLTK tutorial next week (show up!)  Assignment 0 (not for credit) ◦ Purpose: Introspection and Practice ◦ Try to solve problem 1 before tutorial next week, problem 2 after Forums ◦ Register unless already registered for another class ◦ Preferred way to ask questions ◦ Feel free to start discussion threads, if necessary ◦ Subscribe to notifications! Study of computer processing, understanding, and generation of human languages  Interdisciplinary field ◦ Linguistics, machine learning and artificial intelligence, statistics, cognitive science, psychology, and others  Common applications: ◦ Machine translation, information retrieval, text summarization, question answeringProfessor Bonnie Dorr Disambiguation decisions of word sense, word category, syntactic structure,…  Maximize coverage, minimize errors (false positives)  Robust  Generalize well AI approaches with deep understanding had hand-coded rules ◦ Creating the rules is time-consuming ◦ One may miss rules; sometimes the rules are too many to encode ◦ May not scale to different domains ◦ Brittle (metaphors) I swallowed his story Counting things  Determining patterns that occur in language use  Features: ◦ Learn rules, patterns automatically ◦ Statistical models are robust, generalize well, and behave gracefully when faced with less-than-perfect conditions Corpus: a collection of natural language documents ◦ British National Corpus, Wall Street journal, google’s web-indexed corpus, switch-board corpus  Can we learn how language works from this text? ◦ Look for patterns in the corpus Size  Balanced or domain-specific  Written or spoken  Raw or annotated (senses, pos, structure)  Electronically available or hard copy  Free to use or one needs to pay for a license Brown  Susanne  Penn Treebank  Canadian Hansards Dictionaries ◦ Gloss, example sentence  Thesauri ◦ categories, paragraphs, semicolon units  WordNet ◦ synsets, gloss ◦ hypernyms, holonyms, troponymsTom SawyerTom Sawyer the 333 determiner (article) and 2972 conjunction a 1775 determiner to 1725 preposition, verbal infinitive marker of 1440 preposition was 1161 auxiliary verb it 1027 (personal/expletive) pronoun in 906 prepositionTom Sawyer ◦ Tokens: 71,370 ◦ Types: 8,018 ◦ Memory: half a megabyte ◦ Average frequency of a word  # tokens / # types = 8.9freq freq of freq 1 3993 2 1292 3 664 4 410 5 243 6 199 7 172 freq freq of freq 8 131 9 82 10 91 11–50 540 51–100 99 > 100 102 Tom Sawyer Hapax legomena ◦ word types that occur only once in the corpus Hapax legomena ◦ word types that occur only once in the corpus  Direct applications of simple word counts ◦ cryptography, style of authorship  Indirectly, counts are used pervasively in NLP Hapax legomena ◦ word types that occur only once in the corpus  Direct applications of simple word counts ◦ cryptography, style of authorship  Indirectly, counts are used pervasively in NLP  Why is statistical NLP difficult? ◦ hard to predict much about the behavior of words that occur rarely (if at all) The Principle of Least Effort: “people will act so as to minimize their probable average rate of work”  Evidence: ◦ Underlying statistical distributions in language ◦ Count up words in a corpus ◦ List (rank) words in order of frequency frequency ∝ 1/rank  Example: ◦ the 50th most common word should occur three times more often than the 150th  First observed by Estoup (1916)  there are a few very common words, a middling number of medium frequency words, and many low frequency words  speaker and the hearer are trying to minimize their effortregular scales (non-logarithmic) # meanings ∝ √frequency ∝ 1/√rank  Length of a word ∝ 1/frequency Often, we deal with the occurrence and frequencies of sets of strings  given a sentence with the word bank, did the words teller or tellers occur in the sentence?  how many times did the various forms of the word dissect (dissect, dissection, dissected, dissectible) occur in a book  What are the different dates mentioned in a history book

View Full Document