UMD CMSC 723 - MIDTERM SAMPLE QUESTIONS - D368938

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 723> MIDTERM SAMPLE QUESTIONS

DOC PREVIEW

UMD CMSC 723 - MIDTERM SAMPLE QUESTIONS

School name University of Maryland, College Park

Course Cmsc 723- Computational Linguistics I

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CMSC723/LING645 MIDTERM SAMPLE QUESTIONS: 10/12/04 READINGS: Chapters: 1, 2, 3, 5, 6.1-6.3, 7.1-7.3, 21, Rabiner's HMM tutorial (pp. 257-266 not including section IV) 0. When and where is the midterm exam??? Answer: Wed 10/19/04, CSIC 1122 1. Give examples of 3 types of ambiguity (syntactic, lexical, semantic, pragmatic, …). 2. Why did early MT systems fail? What makes systems more successful nowadays? 3. Describe the tradeoffs among Direct, Transfer, and Interlingual systems. Consider the following characteristics: speed, specificity of rules, size of rule set, depth of coverage, breadth of coverage. 4. MT Evaluation: Is there a relation between human metrics and automatic metrics? Consider the notions of accuracy and fluency in your answer. 5. Bleu: Compute Unigram and Bigram scores for the following: MT: Now the time attempts to boggle the mind for the man. Ref #1 Now is the time that tries mens’ souls. Ref #2 The the the the the. Ref #3 War makes life difficult for the mind. Note: Ignore capitalization. Also, mens’ counts as one word. How effective is Bleu in measuring the correctness of the translation, given these references? In your answer, you may allude to the notions of: ambiguity, synonymy, differences between long references and short references, duplication of words in the reference, etc. 6. Finite State Machinery: What are the settings of the 5 parameters (Q, q0, S, F, d) for a FSA that accepts cow language, i.e., strings defined by the regular expression mo*. You must draw out your automaton. State what changes would be needed to accept the regular expression mo+. 7. What is an example of how ELIZA uses regular expressions? 8. What is the difference between Inflectional and Derivational morphology? What is the difference between templatic and concatenative morphology? (If reasonable) illustrate the difference with examples not been presented in class. What are we modeling in the morphology lab—inflectional or derivational morphology? 9. What are the pro’s and con’s of compiling out the effects of rules into the lexicon—i.e., a lexicon-only system? What would be lost by doing this? What would be gained? In your answer, consider different languages, e.g., morphologically rich languages vs. morphologically poor languages. Also, think about the downstream processes that take morphologically analyzed tokens as input—how does the compilation of morphological rules into the lexical entries impact these later processes? 10. How are FSTs different from FSAs? What is a feasible pair? What is a state transition table? What is an example of a FST rule interaction? (Think back to your lab—what sort of rules interacted with each other?) What happens when multiple rules are applicable to a particular string? Can you think2of a case where rule ordering would have made it easier to build the automata for two interacting rules? 11. Can you express a two-level (Kimmo) style rule in C&H notation on pages 77-78 (e.g., slides 41-42, lecture 3— Intermediate-to-Surface transducer)? 12. How does the Porter Stemmer work? What applications is it useful for? Why? 13. Bayes’ Rule: How do we derive this? That is, Given: (1) P(A|B) = P(A,B)/P(B); (2) P(A,B) = P(B,A) Prove: P(B|A) = [P(A|B) · P(B)]/P(A) How does this relate to noisy channel? 14. Describe how the Noisy Channel model can be used for a particular application. (Some of the ones we’ve talked about are MT, POS tagging, speech recognition, OCR.) Be sure to specify what the different components of the noisy channel refer to in this application, i.e., w, O, P(w),P(O), P(w|O), P(O|w). For this application, how would each of these be derived? 15. Consider the problem of speech recognition. Assume you have a corpus of size 3,715,820, where the word “about” occurs 3,725 times and “a_bow” occurs 38 times. Describe the speech-recognition problem for a given pronunciation, e.g., [ax b aw]. To simplify this problem, assume the only two “words” pronounced this way are “about” and “a_bow”. Compute the prior and likelihood for each of these two words and then determine word which has the highest likelihood for [ax b aw]. You may assume you have access to only one pronunciation rule: {t,d}  0 / V __ # which has a probability of 0.48. (Recall that, if no pronunciation rules are applicable to a given word, one pronunciation is available, so the probability of that pronunciation, given the word, is 1.) 16. Consider N-grams combined with the noisy channel model. Above we looked at one word at a time (unigram). But even though the word “about” has the highest probability for the pronunciation [ax b aw] (independent of context), the word “a_bow” has a higher probability next to words like “and”. N-grams allow us to think about context. Let’s imagine there are only 4 words in the vocabulary: about, a_bow, and, and the. Describe how to compute the probability that [ax b aw] is “about” when followed by and vs. the. Compare this to the probability that [ax b aw] is a_bow when followed by and vs. the. Assume that the word about is followed by the word the with 0.99 probability and the word and with 0.01 probability. Conversely, assume the word a_bow is followed by the word the with 0.01 probability and the word and with 0.99 probability. 17. What is the Markov Assumption? 18. How do we use a bigram grammar to compute P(S)—the probability of a given sentence S? (You may also hear: “compute P(S) using a Maximum Likelihood Estimate”.) For example, what is P(like,Chinese,food) using a MLE, where “like” occurs 15000 times in corpus of 1000000 words, “like Chinese” occurs 10000 times in same corpus, “Chinese” occurs 80000 times in same corpus, and “Chinese food” occurs 56000 times in same corpus.319. Sparse data problem—what is it? What can we do about it? What is the problem of add-one smoothing? Given a vocabulary of 1000 words (types), and a corpus of 100,000 words (tokens) we want to estimate the bi-gram probabilities. We know that half of the bi-grams do not occur in the corpus. What is the unsmoothed probability of a bi-gram that occurs 100 times in the corpus. What is its probability if we apply add-one smoothing? You are allowed to round off probabilities. 20. What are the characteristics of a ‘hidden’ variable, i.e., what is the reason

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

UMD CMSC 723 - MIDTERM SAMPLE QUESTIONS

Sign up for free to view:

Please select your school