DOC PREVIEW
MIT 6 863J - Warmup Exercises on Word Parsing

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Department of Brain & Cognitive Sciences 6.863/9.611J Natural Language Processing, Spring 2004 Laboratory 1 – Word Parsing Component II: Warmup Exercises on Word Parsing Handed out: Feb 9, 2004 Due: Feb 17, 2004 Goals of Lab 1, Component II This laboratory will explore a computational approach to dictionary and morphological analysis – how to “parse” words, and assign them feature and part of speech labels for use in further natural language processing. Component II (this document) is designed for you to gain familiarity with the Athena computers and how to run our system for word parsing, Kimmo. In addition, this lab component is designed to teach you how to think about the two main parts of word parsing: (1) finite-state machines to model morpheme sequences; and (2) finite-state machines to model spelling change rules. The idea behind the Kimmo system is that given this basic two-part machinery, in order to build a dictionary system for a new language ‘all’ you have to do is write down the descriptions of these two machines – different spelling change rules and different morpheme machines. In particular, we want you to understand how the finite-state machines operate, how rules interact, and how the rules working in parallel model multiple rules. In the last part of this laboratory (handed out on Wednesday), you will do more substantive work, building Kimmo spelling change automata and Kimmo morpheme lexicons to carry out the morphological analysis of a foreign language. (Well, perhaps foreign to a majority of you.) Reading Preparation: In preparation for understanding the Kimmo system and both parts of this laboratory, you should read the following. • Component I of this laboratory, which provides background on word parsing and the Kimmo system; Lecture Slides 1 & 2; and Notes 1, all on the course website. • The Pc-Kimmo website, at http://www.sil.org/pckimmo/ , contains several documents that you should read to understand the system: o The general description of the approach: http://www.sil.org/pckimmo/two-level_phon.html o http://www.sil.org/pckimmo/v2/doc/rules.html gives in-depth coverage of how the finite-state machines implement spelling-change rules; at a minimum, you should read the section on how the automata actually work, described here: http://www.sil.org/pckimmo/v2/doc/Rules_2.html#subsec:3.2.1 o The instructions for writing the spelling-change rules at: http://www.sil.org/pckimmo/v2/doc/Rules_4.html o A summary of the format for writing rules in section 7 of the online version of Pc-Kimmo that we use, at: www.ai.mit.edu/courses/6.863/pckimmoman.txt • Optionally, depending on your background: o If you are unfamiliar with finite-state machines, read this section: http://www.sil.org/pckimmo/v2/doc/Rules_2.html#subsec:3.2.2 o If you want to know more about rule conflicts (see also Question 5 below): http://www.sil.org/pckimmo/v2/doc/Rules_3.html#subsec:3.3.116.863/9.611J Natural Language Processing, Laboratory 1, Component II 2What you must do and what you must turn in There are five questions (and an optional bonus question) that you must answer below, in Section 4. We'd like to get by with as little paper as possible for assignments. (Call it ecological necessity.) To this end, we'd like you to write up your answers as a web page for your lab report, and then just email the URL to the TA Catherine Havasi, [email protected]. Please remember that collaboration is encouraged, but please do write down the names of your collaborators at the beginning of the report. Also, please remember that cloned reports are not acceptable. 2. Running Kimmo: the basics The Software We provide two different versions of the Kimmo program, one with an older, command-line interface, pckimmo, and one, newly ported to Python, with a graphical interface, pykimmo. They both provide nearly the same functionality for generating and recognizing words, tracing a parse, and printing pictures of finite-state machines. The graphical interface has distinct advantages in on-the-fly editing and display. Pckimmo has additional tracing functionality and the advantage of long-time stability. There is also a command-line terminal only version of pykimmo. The command-line versions of both programs are notably faster than the graphical version, especially with a large lexicon. Either version should give the same results for doing the laboratory, and we will alert you should there be any possibility that this might not be so. Here are the brief instructions for running the two. • pykimmo: login to an Athena workstation and then: athena% add 6.863 athena% pykimmo This will spit out the commands to use the program in command-line mode if you should want, and then bring up the graphical user interface window, with a toy version of English rules and lexicon already loaded, and a panel into which you can type strings to either generate or recognize (see the picture on the next page). You should load the lab1a rules and lexicon by clicking on the ‘load’ button and then loading in the rules and lexicon pykimmo.rul and pykimmo.lex that will show up in a dialog box. You can then proceed with the rest of the exercises below. • Pckimmo: login to an Athena workstation and then: athena% add 6.863 Athena% cd /mit/6.863/pckimmo-old athena% ./pckimmo This will bring up the command line interface: PC-KIMMO TWO-LEVEL PROCESSOR Version 1.0.8 (18 February 1992), Copyright 1992 SIL Type ? for help PC-KIMMO>6.863/9.611J Natural Language Processing, Laboratory 1, Component II 3You will now have to load English rules and a lexicon explicitly. You can then proceed with the rest of the exercises. PC-KIMMO>load rules englishpck.rul Rules being loaded from englishpck.rul PC-KIMMO>load lexicon englishpck.lex Lexicon being loaded from englishpck.lex 3. Trying out word recognition and word generation You should now try out generating and recognizing the following strings; check that you get the results we give below. We give the pckimmo form for entering strings; in the pykimmo graphical version, you just type the strings into the lower left-hand text box and click ‘generate’ or ‘recognize’. PC-KIMMO>generate fox+s foxes PC-KIMMO>recognize foxes `fox+s [ N(fox)+PL ] ` PC-KIMMO>generate fly+s Flies PC-KIMMO>generate dogg+s doggs (This


View Full Document

MIT 6 863J - Warmup Exercises on Word Parsing

Documents in this Course
N-grams

N-grams

42 pages

Semantics

Semantics

75 pages

Semantics

Semantics

82 pages

Semantics

Semantics

64 pages

Load more
Download Warmup Exercises on Word Parsing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Warmup Exercises on Word Parsing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Warmup Exercises on Word Parsing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?