CS 10701 Final project report Textual entailment in the domain of physics Maxim Makatchev Robotics Institute Carnegie Mellon University Pittsburgh PA 15213 maxim makatchev cs cmu edu Abstract Bag of words methods to the problems of semantic text classification and textual entailment have seen some successful applications 3 their straightforward applications are known to break when the training data is sparse the number of classes is large or classes do not have clear syntactic boundaries for example when negational conditional sentence markers significantly affect classification These are however properties of a typical semantic classification problem in the domain of natural language tutoring systems Recently formal methods have been evaluated for reasoning about entailment using the logical representations of natural language propositions 5 This work extends those methods to account for uncertainty in generating logical representations of natural language sentences by using Bayesian networks with observable nodes representing the logical propositions in the domain of the tutorial dialogue corpus latent nodes corresponding to domain rule applications and semantic class label nodes The problem of sparseness of training data is dealt with by using logical inference engine to generate the network structure and using informative priors for parameter estimation The results demonstrate improved performance over the formal reasoning approaches and other baselines 1 1 1 Introduction Problem Modern intelligent tutoring systems attempt to explore relatively unconstrained interactions with students for example via a natural language NL dialogue The rationale behind this is that allowing students to provide unrestricted input to a system would trigger metacognitive processes that support learning i e self explaining and help expose misconceptions WHY 2 ATLAS is designed to elicit NL explanations in the domain of qualitative physics 6 The system presents the student a qualitative physics problem and asks the student to type an essay with an answer and an explanation A typical problem and the corresponding essay are shown in Figure 1 After the student submits the first draft of an essay the system analyzes it for errors and missing statements and starts a dialogue that attempts to remediate misconceptions and elicit missing propositions Although there are limited amount of classes of possible student beliefs that are of interest to the system of 20 statements representing semantic classes for the Pumpkin problem the approach described here will target 16 selected as described in Section 2 there are many possible NL sentences that are semantically close to be classified as representative of one of these classes by an expert Typically the expert will classify a statement belonging to a certain class of student beliefs if either 1 the statement is a rephrasal of the textual description of the belief class or 2 the statement is a consequence or more rarely a condition of an inference rule involving the belief An example of the first case is the sentence pumpkin has no horizontal acceleration as a representative of the belief class the horizontal acceleration of the pumpkin is zero An example of the second case is the sentence the horizontal velocity of the pumpkin doesn t change as a representative of the belief class the horizontal acceleration of the pumpkin is zero the former can be derived in one step from the letter via a physics domain rule These examples suggest that a model an expert s classification of student beliefs would have to account not only for syntactic but also for inferential proximity of the statements Note that in general syntactic proximity alone appears to be insufficient to predict of inferential proximity In this paper we attempt to augment syntactic proximity analysis with a graph of semantic relationships over the set of domain statements We will compare deterministic and probabilistic inference algorithms that use this graph for a sentence classification 1 2 Existing system overview The sequence of natural language processing is as follows A combination of a semantic syntactic parser template filling classifier and a bag of words statistical classifier generates a first order predicate logic FOPL representation of the input sentence 4 Based on the semantic representation of the student s input the completeness and correctness analyzer attempts to classify whether the input sentence corresponds to any of the pre specified classes of student s beliefs For example if the student types pumpkin has no horizontal acceleration the analyzer may infer that student believes that the horizontal force of the pumpkin is zero In the early versions of WHY 2 ATLAS the reasoning about the student s beliefs was done by generating abductive proofs of the observed student s input on the fly More recently we have used pre generated deductive closure as a graph of semantic relationships in the space of problem specific domain statements and deterministic inference mechanism based Question Suppose you are running in a straight line at constant speed You throw a pumpkin straight up Where will it land Explain Explanation Once the pumpkin leaves my hand the horizontal force that I am exerting on it no longer exists only a vertical force caused by my throwing it As it reaches it s maximum height gravity exerted vertically downward will cause the pumpkin to fall Since no horizontal force acted on the pumpkin from the time it left my hand it will fall at the same place where it left my hands Figure 1 The statement of the problem and a verbatim explanation from a student who received no follow up discussions on any problems on graph matching We will compare the new approach with these existing deterministic approaches in the experiments described in this report 1 3 Desired extension The deterministic mapping from the formal representation of the input to the graph of deductive closure does not account for the uncertainty in generating formal semantic representation It is desirable to extend the graph of logical relations over the domain statements a subset of the deductive closure of givens and false assumptions into a probabilistic graphical model such as a Bayesian network and estimate its parameters based on the actual expert labeling of student sentences In this project we implement such such extension 1 4 Related work Bayesian networks have been gaining popularity as tool of choice for user
View Full Document