Computational Linguistics and Intelligent Text Processing Proceedings of the 8th International Conference CICLing 2007 Mexico City invited paper A Gelbukh Ed pp 311 324 Springer Berlin Germany February 2007 Learning for Semantic Parsing Raymond J Mooney Department of Computer Sciences University of Texas at Austin 1 University Station C0500 Austin TX 78712 0233 USA mooney cs utexas edu Abstract Semantic parsing is the task of mapping a natural language sentence into a complete formal meaning representation Over the past decade we have developed a number of machine learning methods for inducing semantic parsers by training on a corpus of sentences paired with their meaning representations in a specified formal language We have demonstrated these methods on the automated construction of naturallanguage interfaces to databases and robot command languages This paper reviews our prior work on this topic and discusses directions for future research 1 Introduction Semantic parsing is the task of mapping a natural language NL sentence into a complete formal meaning representation MR or logical form A meaning representation language MRL is a formal unambiguous language that allows for automated inference and processing such as first order predicate logic In particular our research has focused on applications in which the MRL is executable and can be directly used by another program to perform some task such as answering questions from a database or controlling the actions of a real or simulated robot This distinguishes the task from related tasks such as semantic role labeling 8 and other forms of shallow semantic parsing which do not generate complete formal representations Over the past decade we have developed a number of systems for learning parsers that map NL sentences to a pre specified MRL 44 35 37 24 17 39 23 Given a training corpus of sentences annotated with their correct semantic interpretation in a given MRL the goal of these systems is to induce an efficient and accurate semantic parser that can map novel sentences into this MRL Some of the systems require extra training input in addition to NL MR pairs such as syntactic parse trees or semantically annotated parse trees In this paper we first describe the applications we have explored and their corresponding MRLs and then review the parsing and learning systems that we have already developed for these applications along with experimental results on their performance We then discuss important areas for future research in learning for semantic parsing 2 Sample Applications and their MRLs We have previously considered two MRLs for performing useful complex tasks The first is a database query language primarily using a sample database on U S geography The second MRL is a coaching language for robotic soccer developed for the RoboCup Coach Competition in which AI researchers compete to provide effective instructions to a coachable team of agents in a simulated soccer domain 9 When exploring NL interfaces for databases the MRL we have primarily used is a logical query language based on Prolog We have primarily focused on queries to a small database on U S geography This domain Geoquery was originally chosen to test corpus based semantic parsing due to the availability of a hand built natural language interface Geobase supplied with Turbo Prolog 2 0 3 The language consists of Prolog queries augmented with several metapredicates 44 Below is a sample query with its English gloss answer A count B state B const C riverid mississippi traverse C B A How many states does the Mississippi run through The same query language has also been used to build NLI s for databases of restaurants and CS job openings including a component that translates our logical queries to standard SQL database queries 36 35 The resulting formal queries can be executed to generate answers to the corresponding questions RoboCup www robocup org is an international AI research initiative using robotic soccer as its primary domain In the Coach Competition teams of agents compete on a simulated soccer field and receive advice from a team coach in a formal language called CLang In CLang tactics and behaviors are expressed in terms of if then rules As described in 9 its grammar consists of 37 nonterminal symbols and 133 productions Below is a sample rule with its English gloss bpos penalty area our do player except our 4 pos half our If the ball is in our penalty area all our players except player 4 should stay in our half The robots in the simulator can interpret the CLang instructions which then strongly affect their behavior while playing the game The semantic parsers we have developed for this MRL were part of a larger research project on advicetaking reinforcement learners that can accept advice stated in natural language 25 3 Systems for Learning Semantic Parsers Our earliest system for learning semantic parsers called Chill 44 35 uses Inductive Logic Programming ILP 26 to learn a deterministic parser written in Prolog In our more recent work we have developed three different approaches S bowner NP player VP bowner PRP team NN player CD unum our player 2 VB bowner NP null DT null NN null the ball has N8 bowner player our 2 N7 player our 2 N3 bowner N5 team N4 player N6 unum N1 bowner our player 2 has N2 null null null the ball Fig 1 The SAPT and its Compositional MR Construction for a CLang Sentence to learning statistical semantic parsers that are more robust and scale more effectively to larger training sets Each exploits a different advanced technology in statistical natural language processing Scissor 17 18 adds detailed semantics to a state of the art statistical syntactic parser i e the Collins parser 12 Wasp 39 adapts statistical machine translation methods to map from NL to MRL and Krisp 23 uses Support Vector Machines SVM s 13 with a subsequence kernel specialized for text learning 27 We briefly review each of these systems below A version of our Geoquery data has also been used to evaluate a system for learning semantic parsers using probabilistic Combinatorial Categorial Grammars CCG 45 3 1 Scissor Scissor Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations 17 18 learns a statistical parser that generates a semantically augmented parse tree SAPT in which each internal node is given both a syntactic and a semantic label We augment Collins head driven model 2 12 to incorporate a semantic label on each internal node By
View Full Document
Unlocking...