CS 10701: Final project report.Textual entailment in the domain of physicsMaxim MakatchevRobotics InstituteCarnegie Mellon UniversityPittsburgh, PA [email protected] methods to the problems of semantic text classificationand textual entailment have seen some successful applications [3], theirstraightforward applications are known to break when the training datais sparse, the number of classes is large, or classes do not have clearsyntactic boundaries (for example when negational, conditional sentencemarkers significantly affect classification). These are, however, proper-ties of a typical semantic classification problem in the domain of naturallanguage tutoring systems. Recently formal methods have been evalu-ated for reasoning about entailment using the logical representations ofnatural language propositions [5]. This work extends those methods toaccount for uncertainty in generating logical representations of naturallanguage sentences by using Bayesian networks with observable nodesrepresenting the logical propositions in the domain of the tutorial dia-logue corpus, latent nodes corresponding to domain rule applications,and semantic class label nodes. The problem of sparseness of trainingdata is dealt with by using logical inference engine to generate the net-work structure, and using informative priors for parameter estimation.The results demonstrate improved performance over the formal reason-ing approaches and other baselines.1 Introduction1.1 ProblemModern intelligent tutoring systems attempt to explore relatively unconstrained interactionswith students, for example via a natural language (NL) dialogue. The rationale behind thisis that allowing students to provide unrestricted input to a system would trigger meta-cognitive processes that support learning (i.e. self-explaining) and help expose miscon-ceptions WHY2-ATLAS, is designed to elicit NL explanations in the domain of qualitativephysics [6].The system presents the student a qualitative physics problem and asks the student to typean essay with an answer and an explanation. A typical problem and the correspondingessay are shown in Figure 1.After the student submits the first draft of an essay, the system analyzes it for errors andmissing statements and starts a dialogue that attempts to remediate misconceptions andelicit missing propositions.Although there are limited amount of classes of possible student beliefs that are of inter-est to the system (of 20 statements representing semantic classes for the Pumpkin problemthe approach described here will target 16 selected as described in Section 2), there aremany possible NL sentences that are semantically close to be classified as representative ofone of these classes by an expert. Typically the expert will classify a statement belongingto a certain class of student beliefs if either (1) the statement is a rephrasal of the textualdescription of the belief class, or (2) the statement is a consequence (or, more rarely, a con-dition) of an inference rule involving the belief. An example of the first case is the sentence“pumpkin has no horizontal acceleration” as a representative of the belief class “the hori-zontal acceleration of the pumpkin is zero.” An example of the second case is the sentence“the horizontal velocity of the pumpkin doesn’t change” as a representative of the beliefclass “the horizontal acceleration of the pumpkin is zero”: the former can be derived in onestep from the letter via a physics domain rule. These examples suggest that a model an ex-pert’s classification of student beliefs would have to account not only for syntactic, but alsofor inferential proximity of the statements. Note that in general, syntactic proximity aloneappears to be insufficient to predict of inferential proximity. In this paper we attempt toaugment syntactic proximity analysis with a graph of semantic relationships over the set ofdomain statements. We will compare deterministic and probabilistic inference algorithmsthat use this graph for a sentence classification.1.2 Existing system overviewThe sequence of natural language processing is as follows:• A combination of a semantic-syntactic parser, template-filling classifier and a bagof words statistical classifier generates a first-order predicate logic (FOPL) repre-sentation of the input sentence [4].• Based on the semantic representation of the student’s input, the completeness andcorrectness analyzer attempts to classify whether the input sentence correspondsto any of the pre-specified classes of student’s beliefs. For example, if the stu-dent types “pumpkin has no horizontal acceleration,” the analyzer may infer thatstudent believes that the horizontal force of the pumpkin is zero.In the early versions of WHY2-ATLAS, the reasoning about the student’s beliefs was doneby generating abductive proofs of the observed student’s input on-the-fly. More recentlywe have used pre-generated deductive closure as a graph of semantic relationships in thespace of problem-specific domain statements and deterministic inference mechanism basedQuestion: Suppose you are running in a straight line at constant speed. You throw a pumpkinstraight up. Where will it land? Explain.Explanation: Once the pumpkin leaves my hand, the horizontal force that I am exerting onit no longer exists, only a vertical force (caused by my throwing it). As it reaches it’s maximumheight, gravity (exerted vertically downward) will cause the pumpkin to fall. Since no horizontalforce acted on the pumpkin from the time it left my hand, it will fall at the same place where itleft my hands.Figure 1: The statement of the problem and a verbatim explanation from a student whoreceived no follow-up discussions on any problems.on graph matching. We will compare the new approach with these existing deterministicapproaches in the experiments described in this report.1.3 Desired extensionThe deterministic mapping from the formal representation of the input to the graph of de-ductive closure does not account for the uncertainty in generating formal semantic represen-tation. It is desirable to extend the graph of logical relations over the domain statements (asubset of the deductive closure of givens and false assumptions) into a probabilistic graph-ical model, such as a Bayesian network, and estimate its parameters based on the actualexpert labeling of student sentences. In this project, we implement such such
View Full Document