Berkeley COMPSCI 182 - CS182 Assignment 7 - D2928927

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 182> CS182 Assignment 7

DOC PREVIEW

Berkeley COMPSCI 182 - CS182 Assignment 7

School name University of California, Berkeley

Course Compsci 182- Neural Basis of Thought and Language

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS182 Assignment 7 (Computational): Model MergingAssigned Tuesday, March 13thDue Tuesday, April 3rd, by 11:59pmModel merging is a technique for building complex structured models of some process from data. Theidea behind model merging is to begin with a completely unstructured model and then to incrementallymodify it to reflect regularities in the data. In doing this, there is a tradeoff between the simplicity of themodel and how well it accounts for the data.In this assignment you will apply a simplified version of the model merging algorithm to the task oflearning right-regular grammars.1 Right-regular grammars• A grammar is a tuple G = (N onterminals, T erminal s, Rules, StartSymbol), where Nonterminals orT erminal s are sets of symbols used to write the rules in Rules, and StartSymbol is a unique startsymbol from N onterminals. Each nonterminal symbol can be expanded according to any rule in thegrammar that has that symbol on the left side. Terminal symbols cannot be expanded and appear onlyon the right side of rules in the grammar.• A right-regular grammar is one in which every rule looks either likeA → x1x2. . . xnBor likeA → x1x2. . . xnwhere the xiare terminal symbols, and A and B are nonterminal symbols. That is, each rule has onenonterminal symbol on the left side; its expansion on the right side contains any number of terminalsymbols followed by at most one nonterminal symbol.(Note: there may be 0 symbols on the right-hand side! The following is a valid rule:A →An example grammar G0has N onterminals = {S,Y}, T erminals = {the, dog, ran, away, down, the, street},StartSymbol = S, and Rules consisting of the following:S → the dog ran YY → awayY → down the streetG0produces exactly two distinct sentences.2 The model-merging algorithmA simplified greedy version of the model-merging algorithm presented in class can be applied to learna right-regular grammar G that produces a given set of sentences Data. The algorithm has two mainprocedures:1• Data incorporation: Given a sentence from Data, add a set of rules to the grammar that generates thissentence from the start symbol. For example, the sentence ‘the boy eats carrots’ would be incorporatedby adding the ruleS → the boy eats carrots• Merging: Find and perform a merge of grammar rules that improves the grammar by decreasing itscost; stop when no such merge can be found. The next two sections describe how merging works(Section 3) and how the cost of a grammar is calculated (Section 4).The algorithm proceeds as follows:1. Initialize the grammar by incorporating each sentence of Data, as shown above.2. Find the cost reduction of each potential merge using your prefix and suffix tree data structures.3. If at least one of the merges decreases the cost of the grammar on the data, perform the one thatreduces the cost the most. If so, return to step 2. Otherwise, terminate.3 Merging operationsWe will incrementally improve the grammar by using two types of merge operations:• A prefix merge operates on a set of rules with the same left side, and whose right sides share a commonprefix. For example, the rulesN → a b c dN → a b c e XN → a b c f gwould be replaced, in a prefix merge, by:N → a b c N0N0→ dN0→ e XN0→ f g• A suffix merge operates on a set of rules whose right sides share a common suffix. For example, therulesN1→ a b c d eN2→ f c d ewould be replaced, in a suffix merge, by:N1→ a b N0N2→ f N0N0→ c d e24 Calculating the effect of mergingThe full-fledged Bayesian version of the model-merging algorithm is based on maximizing the posteriorprobability of the model given the data. In this assignment, however, we use a simpler cost function, whichwe attempt to minimize. Since we take a greedy approach to model merging, any merge that decreases thecost is accepted.The cost of a grammar G given data D is defined as:c(G) = αs(G) + d(G, D)Here is an explanation of the terms in this function:• s(G) denotes the size of the grammar, and is equal to the number of symbols (i.e., terminals plusnonterminals) occurring on the right side of rules in the grammar, counting repeats. So, for example,the size of the simple grammar G0from Section 1 would be 8.• d(G, D) denotes the total derivation length of the data given this grammar, which is a reflection of howwell the model accounts for the data. The derivation length of a sentence is defined to be the numberof rules that have to be applied to the start symbol S to produce the sentence. The total derivationlength of the data is the sum of the derivation lengths of all the sentences.• α is a constant indicating the importance of the grammar size (relative to the total derivation length)and can be seen as controlling the algorithm’s tendency to generalize. (A smaller grammar tends togeneralize better to new examples, and a larger α leads to a greater cost being associated with largegrammars.)One of the good things about this cost function is that it allows us to efficiently calculate the effect ofdifferent merges without actually applying them and reparsing the data each time.Affix treesAffix trees (prefix and suffix trees) are data structures that make the cost change calculation much easierand more efficient. The idea is that a prefix tree has nodes that represent the beginning of a rule, and itsleaves are the actual rules. Each node represents the first few symbols of a rule; some nodes represent fullrules, while others only represent the beginnings of rules. For instance, if we have the rules S → abN,S → ab, and S → ac, then we would construct the following prefix tree:The advantage of this representation is that it allows us to easily compute both the potential mergepoints and the cost of the merge. The first part is easy: any node of the tree (excluding the root and leaves)is a potential merge point, where this prefix becomes its own rule, with a variable at the end of the rule thatcan expand into the children of this node.3The second part, that costs are easy to compute, is a bit harder to see. Suppose that at each node, wenote down the number of rules that start with this prefix and the total number of times those rules areused. Then, let’s work with a particular node, with a prefix length of k, containing m rules. We will wantto store in the node both the prefix and the set of rules that start with that prefix. By accessing the objectsrepresenting those rules, we can find out that they are each lilong and used ritimes

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

Berkeley COMPSCI 182 - CS182 Assignment 7

Sign up for free to view:

Please select your school