DOC PREVIEW
Berkeley COMPSCI 182 - Model Merging

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Assignment 6, Part 4: Model Merging (Non-Computational Credit)Assigned Thursday, March 11thDue Monday, March 29th, 11:59pmModel merging is a technique for building complex structured models of some process from data. Theidea behind model merging is to begin with a completely unstructured model and then to incrementallymodify it to reflect regularities in the data. In doing this, there is a tradeoff between the simplicity of themodel and how well it accounts for the data.In this problem you will apply a simplified version of the model merging algorithm to the task of learn-ing right-regular grammars.1 Right-regular grammars• A grammar is a tuple G = (Nonterminals, T erminals, Rules, StartSymbol), where N onterminals orT erminals are sets of symbols used to write the rules in Rules, and StartSymbol is a unique startsymbol from N onterminals. Each nonterminal symbol can be expanded according to any rule in thegrammar that has that symbol on the left side. Terminal symbols cannot be expanded and appear onlyon the right side of rules in the grammar.• A right-regular grammar is one in which every rule looks either likeA → x1x2. . . xnBor likeA → x1x2. . . xnwhere the xiare terminal symbols, and A and B are nonterminal symbols. That is, each rule has onenonterminal symbol on the left side; its expansion on the right side contains any number of terminalsymbols followed by at most one nonterminal symbol.An example grammar G0has N onterminals = {S,Y}, T erminals = {the, dog, ran, away, down, the, street},StartSymbol = S, and Rules consisting of the following:S → the dog ran YY → awayY → down the streetG0produces exactly two distinct sentences.2 The model merging algorithmA simplified greedy version of the model merging algorithm presented in class can be applied to learna right-regular grammar G that produces a given set of sentences, Data. The algorithm has two mainprocedures:• Data incorporation: Given a sentence from Data, add a set of rules to the grammar that generates thissentence from the start symbol. For example, the sentence ‘the boy eats carrots’ would be incorporatedby adding the ruleS → the boy eats carrots• Merging: Find and perform a merge of grammar rules that improves the grammar by decreasing itscost; stop when no such merge can be found. The next two sections describe how merging works(Section 3) and how the cost of a grammar is calculated (Section 4).1The algorithm proceeds by alternating between incorporation and merging as follows:1. Initialize the grammar to be empty (i.e., contain no rules).2. Incorporate the next sentence from Data by adding the necessary rules to the grammar.3. If the next possible merge decreases the cost of the grammar on the (seen) data, perform it.4. If there are still possible merges left, go to step 3.5. If there are sentences in Data that haven’t yet been incorporated, go to step 2.3 Merging operationsWe will incrementally improve the grammar by using two types of merge operations:• A prefix merge operates on a set of rules with the same left side, and whose right sides share a commonprefix. For example, the rulesN → a b c dN → a b c e XN → a b c f gwould be replaced, in a prefix merge, by:N → a b c N0N0→ dN0→ e XN0→ f g• A suffix merge operates on a set of rules whose right sides share a common suffix. For example, therulesN1→ a b c d eN2→ f c d ewould be replaced, in a suffix merge, by:N1→ a b N0N2→ f N0N0→ c d e4 Calculating the effect of mergingThe full-fledged Bayesian version of the model merging algorithm is based on maximizing the posteriorprobability of the model given the data. In this assignment, however, we use a simpler cost function, whichwe attempt to minimize. Since we take a greedy approach to model merging, any merge that decreases thecost is accepted.The cost of a grammar G given data D is defined as:c(G) = αs(G) + d(G, D)2Here is an explanation of the terms in this function:• s(G) denotes the size of the grammar, and is equal to the number of symbols (i.e., terminals plusnonterminals) occurring on the right side of rules in the grammar, counting repeats. So, for example,the size of the simple grammar G0from Section 1 would be 8.• d(G, D) denotes the total derivation length of the data given this grammar, which is a reflection of howwell the model accounts for the data. The derivation length of a sentence is defined to be the numberof rules that have to be applied to the start symbol S to produce the sentence. The total derivationlength of the data is the sum of the derivation lengths of all the sentences.• α is a constant indicating the importance of the grammar size (relative to the total derivation length)and can be seen as controlling the algorithm’s tendency to generalize. (A smaller grammar tends togeneralize better to new examples, and a larger α leads to a greater cost being associated with largegrammars.)One of the good things about this cost function is that it allows us to efficiently calculate the effect ofdifferent merges without actually applying them and reparsing the data each time.Problem 1. Cost change for prefix mergesConsider the set of rulesN → x1x2. . . xky11y12. . . y1l1N → x1x2. . . xky21y22. . . y2l2. . .N → x1x2. . . xkym1ym2. . . ymlm(Don’t be put off by the notation — it’s just a precise way of expressing that these m rules share acommon prefix of length k, followed by strings of varying length li.)(a) Using this notation, write down the new rules that would result from applying a prefix merge.(b) Calculate the change in size resulting from this merge (your answer should depend on k and m but noton the li).(c) Assume now that the ithrule in the original grammar was used ritimes when parsing the data. Calcu-late the number of times each rule in the modified grammar would be used.(d) Use the result of part (c) to calculate the change in the total derivation length that would result fromthis merge.(e) Show that the total change in cost is α(k + 1 − mk) +Pmi=1ri.Problem 2. Cost change for suffix mergesWe can do the same thing for suffix merges. Consider the rulesN1→ y11y12. . . y1l1x1x2. . . xkN2→ y21y22. . . y2l2x1x2. . . xk. . .Nm→ ym1ym2. . . ymlmx1x2. . . xkPerform a similar analysis as in the previous problem and show that the change in cost after applying asuffix merge to these rules is α(m + k − mk) +Pmi=1ri.3Problem 3. Example grammarConsider the following grammar:S → The cat went Y (1)S → The cat


View Full Document

Berkeley COMPSCI 182 - Model Merging

Documents in this Course
Load more
Download Model Merging
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Model Merging and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Model Merging 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?