Penn CIS 700 - TEXTUAL COHERENCE USING LATENT SEMANTIC ANALYSIS

Unformatted text preview:

Measuring Coherence 1Running head: TEXTUAL COHERENCE USING LATENT SEMANTIC ANALYSISThe Measurement of Textual Coherence with Latent Semantic AnalysisPeter W. FoltzNew Mexico State University Walter Kintsch and Thomas K. LandauerUniversity of ColoradoFoltz, P. W., Kintsch, W. & Landauer, T. K. (1998). The measurement of textualcoherence with Latent Semantic Analysis. Discourse Processes, , 25, 2&3, 285-307.Measuring Coherence 2AbstractLatent Semantic Analysis is used as a technique for measuring the coherence oftexts. By comparing the vectors for two adjoining segments of text in a high-dimensional semantic space, the method provides a characterization of the degree ofsemantic relatedness between the segments. We illustrate the approach forpredicting coherence through re-analyzing sets of texts from two studies thatmanipulated the coherence of texts and assessed readers' comprehension. Theresults indicate that the method is able to predict the effect of text coherence oncomprehension and is more effective than simple term-term overlap measures. Inthis manner, LSA can be applied as an automated method that produces coherencepredictions similar to propositional modeling. We describe additional studiesinvestigating the application of LSA to analyzing discourse structure and examinethe potential of LSA as a psychological model of coherence effects in textcomprehension.Measuring Coherence 3The Measurement of Textual Coherence with Latent Semantic Analysis.In order to comprehend a text, a reader must create a well connectedrepresentation of the information in it. This connected representation is based onlinking related pieces of textual information that occur throughout the text. Thelinking of information is a process of determining and maintaining coherence.Because coherence is a central issue to text comprehension, a large number ofstudies have investigated the process readers use to maintain coherence and tomodel the readers' representation of the textual information as well as of theirprevious knowledge (e.g., Lorch & O'Brien, 1995)There are many aspects of a discourse that contribute to coherence, including,coreference, causal relationships, connectives, and signals. For example, Kintschand van Dijk (Kintsch, 1988; Kintsch & van Dijk, 1978) have emphasized the effectof coreference in coherence through propositional modeling of texts. Whilecoreference captures one aspect of coherence, it is highly correlated with othercoherence factors such as causal relationships found in the text (Fletcher, Chrysler,van den Broek, Deaton, & Bloom, 1995; Trabasso, Secco & van den Broek, 1984).Although a propositional model of a text can predict readers' comprehension,a problem with the approach is that in-depth propositional analysis is timeconsuming and requires a considerable amount of training. Semi-automaticmethods of propositional coding (e.g., Turner, 1987) still require a large amount ofeffort. This degree of effort limits the size of the text that can be analyzed. Thus,most texts analyzed and used in reading comprehension experiments have beensmall, typically from 50 to 500 words, and almost all are under 1000 words.Automated methods such as readability measures (e.g., Flesch, 1948; Klare, 1963)provide another characterization of the text, however, they do not correlate wellwith comprehension measures (Britton & Gulgoz, 1991; Kintsch & Vipond, 1979).Thus, while the coherence of a text can be measured, it can often involveconsiderable effort.In this study, we use Latent Semantic Analysis (LSA) to determine thecoherence of texts. A more complete description of the method and approach tousing LSA may be found in Deerwester, Dumais, Furnas, Landauer and Harshman,(1990), Landauer and Dumais, (1997), as well as in the preceding article by Landauer,Foltz and Laham (this issue). LSA provides a fully automatic method forcomparing units of textual information to each other in order to determine theirsemantic relatedness. These units of text are compared to each other using aderived measure of their similarity of meaning. This measure is based on aMeasuring Coherence 4powerful mathematical analysis of direct and indirect relations among words andpassages in a large training corpus. Semantic relatedness so measured, shouldcorrespond to a measure of coherence since it captures the extent to which two textunits are discussing semantically related information.Unlike methods which rely on counting literal word overlap between unitsof text, LSA's comparisons are based on a derived semantic relatedness measurewhich reflects semantic similarity among synonyms, antonyms, hyponyms,compounds, and other words that tend to be used in similar contexts. In this way, itcan reflect coherence due to automatic inferences made by readers as well as toliteral surface coreference. In addition, since LSA is automatic, there are noconstraints on the size of the text analyzed. This permits analyses of much largertexts to examine aspects of their discourse structure.In order for LSA to be considered an appropriate approach for modeling textcoherence, we first establish how well LSA captures elements of coherence that aresimilar to modeling methods such as propositional models. A re-analysis of twostudies that examined the role of coherence in readers' comprehension is described.This re-analysis of the texts produces automatic predictions of the coherence of textswhich are then compared to measures of the readers' comprehension. We nextdescribe the application of the method to investigating other features of thediscourse structure of texts. Finally, we illustrate how the approach applies both as atool for text researchers and as a theoretical model of text coherence.General approach for using LSA to measure coherenceThe primary method for using LSA to make coherence predictions is tocompare some unit of text to an adjoining unit of text in order to determine thedegree to which the two are semantically related. These units could be sentences,paragraphs or even individual words or whole books. This analysis can then beperformed for all pairs of adjoining text units in order to characterize the overallcoherence of the text. Coherence predictions have typically been performed at apropositional level, in which a set of propositions all contained within workingmemory are compared or connected to each other (e.g., Kintsch, 1988, In press). ForLSA coherence


View Full Document

Penn CIS 700 - TEXTUAL COHERENCE USING LATENT SEMANTIC ANALYSIS

Documents in this Course
Lists

Lists

19 pages

Actors

Actors

30 pages

Load more
Download TEXTUAL COHERENCE USING LATENT SEMANTIC ANALYSIS
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view TEXTUAL COHERENCE USING LATENT SEMANTIC ANALYSIS and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view TEXTUAL COHERENCE USING LATENT SEMANTIC ANALYSIS 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?