DOC PREVIEW
MIT 6 863J - Two-level morphology Introduction

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Massachusetts Institute of Technology6.863J/9.611J, Natural Language Processing, Spring, 2003Department of Electrical Engineering and Computer ScienceDepartment of Brain and Cognitive SciencesLaboratory 1a: Two-level morphology IntroductionHanded out: February 10, 2003 Due: February 19, 2003Goals: This laboratory will explore a computational approach to dictionary and morphologicalanalysis — how to “parse” words, and assign them feature and part of speech labels for usein further natural language processing. Part 1 (this document) is designed for you to gainfamiliarity with the Athena computers and the PC-KIMMO system, in preparation for thesecond part of the laboratory. It is really meant to take you only a half-hour or so of computertime, and just a bit more of thinking. In the second part (handed out on Wednesday), youwill design Kimmo automata and Kimmo lexicons to do a morphological analysis for a foreignlanguage. (Well, perhaps foreign to a majority of you. . . )• For this lab you will use the PC-KIMMO program. Importantly, we will be using aslightly older incarnation of this program (version 1.08) – if you go to the web site forPC-KIMMO,http://www.sil.org/pckimmo/you will see that the current version is 2.1.8. While this current version has manyadvantages, and we will be exploring some of these later on, the older version retains acertain clear division into rules and lexicon (dictionary), and does not have as steep alearning curve.• The reference manual for the version of PC-KIMMO we will be using is on the coursewebsite, at http://www.ai.mit.edu/courses/6.863/kimmoman.txt.• In order to familiarize yourself with the Athena environment you should have an Athenaaccount, know how to log on, and use a text editor. As far as I know, the few (non-MIT)students who do not have an Athena account have already been notified how to obtainone; if not, please email Karen Kohl so that she can take care of this as soon as possible.1 Part 1: Using PC-KimmoIn Part 1, we will just crank up the PC-KIMMO machinery to make sure it works, and getyou to think a bit about the logic of the two-level system. (Part 2 is the real laboratory.) Tobegin, login to a SUN Athena workstation. Then:2 6.863J/9.611J Laboratory 1a, Spring, 2003athena% attach 6.863athena% cd /mit/6.863athena% cd pckimmo-oldathena% pckimmoPC-KIMMO TWO-LEVEL PROCESSORVersion 1.0.8 (18 February 1992), Copyright 1992 SILType ? for helpPC-KIMMO>Now you have to load a set of rules (two-level automata) for a particular language (if youwant to generate surface words) or rules and a lexicon for a particular language (if you wantto recognize, i.e., parse) words. The rule files have the suffix .rul while the lexicon files havethe appendix .lex. You can combine the loading of both in a .tak file. The rule and lexiconfiles would typically be in your own directory of course, and you would modify the tak fileaccordingly; in this first part, the files are in the directory where the program itself resides. Youcan now proceed as follows; the last 3 items are tests of the recognition and word generationmachinery that are invoked in the tak file.PC-KIMMO>load rules englishRules being loaded from english.rulPC-KIMMO>load lexicon englishLexicon being loaded from english.lexPC-KIMMO>generate ‘fox+sfoxesPC-KIMMO>recognize foxes‘fox+s [ N(fox)+PL ]‘fox+s [ V(fox)+3sg.PRES ]PC-KIMMO>generate ‘spy+sspiesPC-KIMMO>recognize spies‘spy+s [ N(spy)+PL ]‘spy+s [ V(spy)+3sg.PRES ]PC-KIMMO>Two-level morphology Introduction 3PC-KIMMO>recognize flies‘fly+s ‘fly+PL‘fly+s ‘fly+3SG‘fly+s ‘fly+PL‘fly+s ‘fly+3SGPC-KIMMO>generate fly+sfliesWhen you are ready to get out, type:PC-KIMMO>quitOf course, there is a facility for running entire files in and out via log files; see your PC-KIMMO documentation for this, or type HELP at the prompt. Be kind and make sure you logthe file to your own directory.Now, on to the simple warmup questions.Question 1Recognize the surface string antibody. What is the result? Does it make sense? Explain youranswer in a few sentences.Question 2Generate from the surface string refer+ing. What is the result? What is going wrong? (Hint:take a look at the recognizer, and the rules it is using.)Question 3Recognize the surface string traveler. What is the result? Now, explain why this resultobtains, by observing the sequence of “Lexicons” traversed by the pc-kimmo engine. You mustturn tracing on. To turn on tracing, enter:set tracing onat the PC-KIMMO> prompt. You probably SHOULD have the session copied to a file by enteringlog <~your-dir/output-filename>OK, what fix-up does this suggest? (Hint: Compare this to doer. You must still be able torecognize traveled correctly; it isn’t so easy as one might think to do this correctly; I simplywant to get you to think about the organization of the Lexicons.)4 6.863J/9.611J Laboratory 1a, Spring, 2003Question 4Recognize the surface string flier with tracing turned on. This WILL generate a LOT ofoutput, so please direct your output to your own athena directory! Keep in mind that you canrefer to the PC-KIMMO documentation manual if you need to.Paying attention to the sequences of lexical and surface characters processed in this exam-ple, comment briefly on the statement that “the recognizer only makes a single left-to-rightpass through the string as it homes in on its target in the lexicon.” Is this so? Try to char-acterize as precisely you can the kind of situation in which the behavior that you observe willarise.Writing up this partWe’d like to get by with as little paper as possible for assignments. (Call it ecological necessity.)To this end, we’d like you to write up a web page on Athena for your lab report, and then justemail the URL to the TA Karen Kohl [email protected]. If you don’t know how to createa web page, now is as good a time as any to learn.Please remember that collaboration is encouraged, but please do write down the names ofyour collaborators at the beginning of the report. Also, please remember that cloned reportsare not


View Full Document

MIT 6 863J - Two-level morphology Introduction

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Two-level morphology Introduction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Two-level morphology Introduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Two-level morphology Introduction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?