CMU CS 15780 - Homework - D3041056

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 15780> Homework

DOC PREVIEW

CMU CS 15780 - Homework

School name Carnegie Mellon University

Course Cs 15780- Graduate Artificial Intelligence

Pages 3

This preview shows page 1 out of 3 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 3 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Grad AI. 15-780 Fall, 2006Homework 3• Homework deadline: 10:30am on Nov. 1• Please print your code and hand it in with the hard copy of your homework. Also senda copy of your code by e-mail to both TAs.1. Bayesian Networks [33 pts]. This problem involves a theoretical analysis of ad-missible Bayesian networks. Recall from lecture that an admissible Bayesian networkmust be a Directed Acyclic Graph (DAG).(a) [3 pts] Draw all admissible Bayesian networks with 3 variables A, B, and C.(b) [5 pts] Group all possible three node Bayesian networks with variables A, B andC into equivalence classes. Networks in the same equivalence class should conveythe same independence assumptions. In other words, if two networks are in thesame class then the joint probability distribution implied by the first network isthe same as the one implied by the second. For example, for networks with twonodes A and B there are 2 classes: The first contains one network (two nodes andno edges) and the second contains two networks: A → B and B → A. These twonetworks are equivalent sin ce P (A)P (B|A) = P (B)P (A|B) = P (A, B).(c) [10 pts] Derive an upper bound on the number of admissible Bayesian networkswith n nodes (please explain your answer). Note that smaller upper bounds willgain more credit (i.e. an upper bound of ∞ will get no points).(d) [15 pts] Derive a lower bound on the number of admissible Bayesian networkswith n nodes (please explain your answer). Note that larger lower bounds willgain more credit (i.e. a lower bound of 0 will get no points). Hint: if you canshow that it is always possible to create f(n) admissible networks with n nodesthen f (n) is a valid lower bound. Consider how many admissible networks can beconstructed after an arbitrary ordering is imposed on the variables. Alternativelyyou may consider an inductive argument by assuming your bound h olds for n − 1variables and showing that it continues to hold when you add the n’th.2. Maximum Likelihood Estimation [23 pts]. In class we derived the MaximumLikelihood Estimator (MLE) for the single parameter of a Binomial distribution (e.g.the probability that a coin lands heads after observing the outcome of n independentflips of the coin). In this problem we will derive the MLE for the parameters of amultinomial distribution where the variable of interest, X, can take on k values ratherthan 2.(a) [5 pts] Given data describing n independent identically distributed observationsof X, d = {d1, . . . , dn}, each of which can be one of k values, express the likelihoodof the data given k − 1 parameters for the distribution over X. Let nirepresentthe number of times X takes on value i in the data.(b) [6 pts] Find the MLE for one of the k − 1 parameters, θjin terms of the otherparameters.(c) [12 pts] At this point you should have k −1 equations describing MLE’s of differ-ent parameters. Show how those equations imply that the MLE for a parameterθjrepresenting the probability that X takes on value j is equal tonjn. You mayfind the following hint useful for this: In order to remove the k’th parameter fromthe likelihood in part (a) you had to rep resent it with an equation, θk= f(). Atthis point you may find it helpful to replace all occurrences of f () with θk. Afterreplacing f() with θkyou can substitute all occurrences of each other parameterin f () with its MLE from part (b). This should allow you to solve for the MLEof θk, which can then be u sed to simplify all of the other equations.3. Hidden Markov Models [44 pts]. Many of you may be familiar with the T9 inputparadigm commonly found on cell phones for interpreting a series of key presses on the9 button keypad as text. Typically, each of the keys 2-9 represents about 3 differentletters (Figure 1 provides the exact mapping). When the user inputs a series of keypresses, such as 3-6-4, the T9 system p rovides a list of words from its dictionary thatpotentially match the sequence (e.g. “dog” and “fog” both match the sequence above).In this problem you will build a next generation text messaging cell phone input system,SmarT9, by training an HMM on an English text corpus. For simplicity, we will askyou to build your system only for 5 letter words and we will provide a small corpus onthe course website.Figure 1: A typical phone keypad.(a) [5 pts] Describe the hidden states and observable outputs of an HMM designed tointerpret 5 letter words based on 9-digit keypad interactions as described above.(b) [3 pts] In your model from part (a) describe the probability that a state emitseach particular output (e.g. P (Ot= oi| St= st)).(c) [10 pts] Write a program that reads in a list of english words and constructsa single table indicating the probability that state t + 1 will have each of itspossible values given the value of state t (assume that the transition probabilityis not dependent on the value of t). The value at the i’th row and j’th columnof your table should be the probability th at P (St+1= sj| St= si). Read in thelist of words given in the support code and provide a print out of the tableconstructed by your program. You may find the following Matlab facts andfunctions helpful:• textscan(): this command scans a file and returns cells with its contents,view the Matlab help description for more information.• Strings in Matlab are treated as vectors of characters. A string S = “dog”can be indexed to access its individual characters as follows, S(1) = ’d’, S(2)= ’o’, S(3) = ’g’.• char2number(): this funct ion is provided in the support code, it takes asinput a character “a-z” (not case sensitive) and returns its position in thealphabet 1-26. If the input is not a character “a-z” the function returns -1.• number2char(): this function is provided in the support code, it is theinverse of char2number(). It returns the character corresponding the letter inthe position given as input (1-26). Any other inpu t results in a return valueof -1.(d) [6 pts] Write a program that reads in a list of english words and learns theprobability that the initial state takes each of its possible values, P (S1= si).Provide the output of your program when run on the list of words provided withthe support code.(e) [10 pts] Write a program using the Hidden Markov Model and parameters learnedfrom the list of words provided with the support code (parts a-d) to predict theprobability that state k takes each of its possible values given observations 1through t

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 3 pages.

CMU CS 15780 - Homework

Sign up for free to view:

Please select your school