MSU CSE 842 - Probabilistic top-down parsing and language modeling - D2544800

Home> Schools> Michigan State University> Computer Science & Engineering (CSE) > CSE 842> Probabilistic top-down parsing and language modeling

DOC PREVIEW

MSU CSE 842 - Probabilistic top-down parsing and language modeling

School name Michigan State University

Course Cse 842-

Pages 28

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 28 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

arXiv:cs.CL/0105016 v1 8 May 2001Probabilistic top-down parsing andlanguage modelingNote that the page numbers and placement and formatting of figuresand tables differ from the version printed in the journal.Brian Roark∗Brown UniversityThis paper describes the functioning of a broad-coverage probabilistic top-down parser,and its application to the problem of language modeling for speech recognition. The paperfirst introdu ces key notions in language modeling and probabilistic parsing, and brieflyreviews some previous approaches to using syntactic structure for language modeling.A lexicalized probabilistic top-down parser is then presented, which performs very well,in terms of both the accuracy of returned parses and the efficiency with which they arefound, relative to the best broad-coverage statistical parsers. A new language model whichutilizes probabilistic top-down parsing is then outlined, and empirical results show thatit improves upon previous work in test corpus perplexity. Interpolation with a trigrammodel yields an exceptional improvement relative to the improvement observed by othermodels, demonstrating the degree to which the information captured by our parsing modelis orthogonal to that captured by a trigram model. A small recognition experiment alsodemonstrates the utility of the model.1. IntroductionWith certain exceptions, computational linguists have in the pas t generally formed aseparate rese arch community from speech recognition researchers, despite some obviousoverla p of interest. Perhaps one reason for this is that, until relatively recently, few meth-ods have come out of the natural language processing community that were shown toimprove upon the very simple language models still standardly in use in spe e ch recog-nition systems. In the past few years, however, some improvements have been madeover these language models through the use of statistical methods of natural languageprocessing; and the development of innovative, linguistically well-motivated techniquesfor improving language models fo r speech recognition is generating more intere st amongcomputational linguists. While language models built around shallow local dependenciesare still the standard in state-of-the-art speech recognition systems, there is reason tohope that better language models can and will be developed by computational linguistsfor this task.This paper will examine la nguage modeling fo r speech recognition from a naturallanguage processing po int o f view. Some of the recent literature investigating approachesthat use syntactic structure in an attempt to capture long-distance dependencies forlanguage modeling will be revie wed. A new language model, based on probabilistic top-down parsing, will be outlined and compared with the previous literature, and extensiveempirical results will be presented which demonstrate its utility.∗ Department of Cognitive and Linguistic Sciences, Box 1978, Brown University, Providence, RI 02912c 2001 Association for Computational LinguisticsComputational Linguistics Volume 27, Number 2Two features of our top-down parsing approach will emerge as key to its s ucc e ss.First, the top-down parsing algorithm builds a set of rooted candidate parse trees fromleft-to-right over the string, which allows it to calculate a generative probability for eachprefix string from the probabilistic grammar, and hence a conditional probability foreach word given the previous words and the probabilistic g rammar. A left-to-right parserwhose derivations a re not rooted, i.e. with derivations that can consist of disconnectedtree fragments, such as an LR or shift-reduce parser, cannot incrementally calculate theprobability of each prefix string being generated by the probabilistic grammar, becausetheir derivations include probability mass from unrooted structures. Only at the pointwhen their derivations become rooted (a t the end of the string) can generative stringprobabilities be calculated from the grammar. These parsers can calc ulate word pro b-abilities based upon the parser state – as in Chelba and Jelinek (1 998a) – but such adistribution is not generative from the probabilistic gr ammar.A parser that is not left-to-right, but which has rooted der ivations, e.g . a head- firstparser, will be able to calculate generative joint probabilities for entire str ings; howeverit will not be able to calculate probabilities for each word conditioned on previouslygenerated words, unless each derivation generates the words in the string in exactly thesame order. For example, suppose that there are two possible verbs that could be the headof a sentence. For a head-first parser, some derivations will have the first verb as the headof the sentence, and the second verb will be generated after the first; hence the secondverb’s probability will be conditioned on the first verb. Other derivations will have thesecond verb as the head of the sentence, and the first verb’s probability will b e c onditionedon the second verb. In such a scenario, there is no way to decompose the joint probabilitycalculated from the set of derivations into the product of conditional probabilities usingthe chain rule. O f course, the joint probability can be used as a language model, but itcannot be interpolated on a word-by-word basis with, say, a trigram model, which wewill demonstrate is a useful thing to do.Thus, our top-down pa rser allows for the incremental calculation of generative condi-tional word probabilities, a property it shares with other left-to-right parser s with rootedderivations such as Earley parsers (Earley, 197 0) or left-corner parsers (Rosenkra ntz andLewis I I , 1970 ).A second key feature of our approach is that top-down guidance impr oves the effi-ciency of the search as more and more conditioning events are extracted from the deriva-tion for use in the probabilistic model. Because the rooted partial derivation is fullyconnected, all of the conditioning information that might be e xtracted from the to p-down left context has already been specified, and a conditional probability model builton this information will not impose any additional burden on the search. In contrast, anEarley or left-corner parser will underspecify certain connections between constituents inthe left-context, and if some of the underspecified information is used in the c onditionalprobability model, it will have to become specified. Of course, this can be done, but atthe expense of search

View Full Document