Kwon-Dissertation (125 pages)

Previewing pages 1, 2, 3, 4, 5, 6, 7, 8, 58, 59, 60, 61, 62, 63, 64, 65, 66, 118, 119, 120, 121, 122, 123, 124, 125 of 125 page document View the full content.
View Full Document

Kwon-Dissertation



Previewing pages 1, 2, 3, 4, 5, 6, 7, 8, 58, 59, 60, 61, 62, 63, 64, 65, 66, 118, 119, 120, 121, 122, 123, 124, 125 of actual document.

View the full content.
View Full Document
View Full Document

16 views

Unformatted text preview:

KORPAR A RULE BASED DEPENDENCY PARSER FOR KOREAN IMPLEMENTED IN PROLOG by SOYOUNG KWON Under the Direction of Michael A Covington ABSTRACT Natural language parsing is the process of analyzing an input sentence by determining its syntactic structure and representing that structure according to a given formal grammar However it is often difficult to parse sentences correctly since the nature of language is ambiguous and has many irregularities If the word order is totally or partially free the task of parsing becomes more challenging The process of parsing can be based on hand coded heuristic rules probability or a hybrid of both KorPar described in this dissertation is a parser for Korean based on hand coded heuristic rules represented in unification based dependency grammar and implemented in Prolog The dependency grammar provides an efficient way to parse the free word order of Korean while the unification based features express complex grammatical facts without complicating the parsing algorithm and as a result the parser can be easily modified for grammar correction implementation of probabilities for the grammar rules and application to other languages KorPar analyzes the structure of a Korean natural language sentence by representing it as a set of dependency pairs Since Korean is a partially free word order language KorPar accounts for restrictions on totally free order of the words in a sentence recognizes subcategorization features restricts the order of dependents for a single head matches long distance dependencies and parses nouns that lack case markers KorPar has been tested with 100 consecutive sentences more than 2000 words from articles in the Chosun Ilbo Newspaper The F score harmonic mean of precision and recall rates was 96 3 INDEX WORDS Dependency grammar Unification based grammar Parsing Natural language processing Korean Prolog KORPAR A RULE BASED DEPENDENCY PARSER FOR KOREAN IMPLEMENTED IN PROLOG by SOYOUNG KWON B A Ewha Womans University Korea 1996 M A Ewha Womans University Korea 1998 A Dissertation Submitted to the Graduate Faculty of The University of Georgia in Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY ATHENS GEORGIA 2006 2006 Soyoung Kwon All Rights Reserved KORPAR A RULE BASED DEPENDENCY PARSER FOR KOREAN IMPLEMENTED IN PROLOG by SOYOUNG KWON Electronic Version Approved Maureen Grasso Dean of the Graduate School The University of Georgia December 2006 Major Professor Michael A Covington Committee Marlyse Baptista Hyangsoon Yi iv DEDICATION To my parents parents in law my husband and my lovely two children who gave me love and support v ACKNOWLEDGEMENTS I would like to thank Dr Covington for believing in me and giving me endless support throughout my courses exams and dissertation I remember the first course that I have taken from you and how poorly I have written my paper in the class I have got the worst grade among other courses but I have learned the most valuable thing to write only what you know and to express your knowledge with concise and clear sentences I am so grateful that you have given me another chance and supported me during my graduate years I will always cherish those years Also I would like to thank Dr Baptista for always being there for me and supporting me like a sister I have always admired your sincere warm heart and how you encourage us to pursuit our career I hope I have made you proud and also wish the very best for you and your family I am also so grateful to Dr Yi for all her encouragement and objective opinions and support in choosing my career I have learned to plan my future with confidence thanks to you and hope that I could someday give in return Most of all I would like to thank my family my mom and dad who have sacrificed so much for my education my mother in law who has always supported and encouraged me to finish my degree my best friend and husband who I could not have made it without and my lovely son and daughter who are the best children that any parent can ever have I love you all so dearly vi TABLE OF CONTENTS Page ACKNOWLEDGEMENTS v LIST OF TABLES viii LIST OF FIGURES ix CHAPTER 1 Introduction 1 1 1 Statement of Thesis 2 1 2 Overview of Dissertation 3 2 Related Work on Parsers Focusing on Korean 5 2 1 Rule based Framework 5 2 2 Probability Framework 11 2 3 Hybrid Framework 11 3 Technical Background of KorPar 13 3 1 Probability and Accuracy 13 3 2 Dependency Grammar and Efficiency 14 3 3 Rule based Parser and Generative Power 21 3 4 Satisfying 3 1 3 2 and 3 3 with KorPar 22 4 Theoretical Basis of KorPar 27 4 1 Dependency Grammar 27 4 2 Unification based Grammar and GULP 29 vii 4 3 Characteristics of Korean 30 5 KorPar Korean Parser 33 5 1 Lexicon 33 5 2 Grammar Rules 37 5 3 Algorithm 58 5 4 Troubleshooting Cases 65 6 Results and Evaluations 81 7 Conclusion 86 7 1 Contributions of KorPar 87 7 2 Possible Future Improvements 87 REFERENCES 89 APPENDICES 95 A Prolog Code of KorPar 95 B Several Examples of Input and Output of Test Sentences 111 viii LIST OF TABLES Page Table 1 Categories of PartofSpeech 34 Table 2 Head and Dependent Pairs of KorPar 38 Table 3 Head and Dependent Relations as proposed by Kim Kim Seo and Kim 1994 39 Table 4 Evaluation of the Overall Dependencies 81 Table 5 Evaluation of Different Dependencies 82 ix LIST OF FIGURES Page Figure 1 Overall Algorithm of KorPar in pseudo code 59 Figure 2 Parsing Process of Sentence 12a 61 Figure 3 Parsing Process of Sentence 73a 71 1 CHAPTER 1 INTRODUCTION Natural language parsing is an integral part of natural language processing since it is related to semantic interpretation speech production machine translation information extraction question answering and so forth Parsers based on various formalisms have been developed on both theoretical and statistical principles throughout the past decades In recent years statistical parsers have been widely used to offset the highly ambiguous property of language However the probabilistic parsers generally provide the most likely analysis and in order to improve the accuracy of results a large corpus of annotated sentences is required As a result the latest trend in parsing methods is to emphasize on a rule based framework for accurate analysis and combine probability based framework for a wide coverage and efficient analysis The changes in the methods of parsing are also reflected in Korean parsers In the 1980s parsers based on various grammar formalisms such as phrase structure


Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Kwon-Dissertation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Kwon-Dissertation and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?