SWARTHMORE CS 97 - Machine Translation Evaluation by Document Classification and Clustering - D978485

Home> Schools> Swarthmore College> (CS) > CS 97> Machine Translation Evaluation by Document Classification and Clustering

DOC PREVIEW

SWARTHMORE CS 97 - Machine Translation Evaluation by Document Classification and Clustering

School name Swarthmore College

Course Cs 97- Computer Perception

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 35–39Computer Science Department, Swarthmore CollegeMachine Translation Evaluation by Document Classification andClusteringFeng He and Pascal TroemelSwarthmore College,500 College Avenue,Swarthmore PA 19081, USA{feng,troemel}@cs.swarthmore.eduAbstractWe propose a Machine Translation evaluationsystem which does not require human-translatedreference texts. The system makes use of a comparisonbetween the performance of a computer’s executionof NLP tasks on source text and on translated text tojudge how effective the translation is. The differencein accuracy of the NLP task exectutions is used asa gauge for judging the competence of the Babelfishonline translation system.Keywords: Machine Translation Evaluation, Doc-ument Classification, Clustering.1 Introduction1.1 Machine Translation EvalutationMachine translation research has been going on forseveral decades, and there are a number of systemsavailable for use, mostly between English and a Euro-pean or Asian language. Notable examples are prod-ucts from Systran, which are used in Altavista’s Ba-belfish online translation service. Machine translationevaluation has long been an extremely arduous taskwhich requires much human input; more recently, theBLEU evaluation system [3] has made use of a muchmore automated, and thus more practical, approach.However, the BLEU system still requires the presenceof several correct, human-translated reference texts(see Section 2.1 for an overview on the BLEU sys-tem). We propose a system which does not have thisrequirement, a system that is capable of judging thecompetence of a translation simply by comparing thesource and target texts. We believe that this freedomfrom human input is important; human translation is atime-consuming and costly task in the MT evaluationprocess, and to cut it out alltogether will undoubtedlysave resources.We attempt to either prove or disprove the notionthat although a machine translation may seem ineffec-tive to a human reader, it still holds sufficient correctinformation to allow a computer to adequately performthe NLP tasks of text classification and clustering onit. If this is indeed the case, then even though machinetranslations may not yet be acceptable as accurate rep-resentations of works of literature in their original lan-guage, they may be far from useless to a computer ca-pable of interpreting (“understanding”) them.Ultimately, a translation should be judged on howmuch information it retains from the original text. Fol-lowing this notion, we judge the effectiveness of atranslation system by comparing the accuracy resultsof a computer’s NLP task execution on the source textand the target text. We expect a drop in performancethat can then be interpreted as “acceptable” or “unac-ceptable,” which serves as an evaluation of the system.Indeed the drop in performance gives us a quantitativemeasure of the translation’s effectiveness.In Section 2, we discuss a few relevant examples ofprevious work in the area of machine translation eval-uation. Section 3 serves to describe how we collectdata. The various experiments we performed are dis-cussed in Section 4. In Sections 5 and 7 we share ourresults and conclusions.1.2 Text Classification and ClusteringText classification and clustering are two commonNLP tasks that have been shown to obtain good re-sults with statistical approaches. Classifications refersto the assigning of new documents to existing classes.The models for existing classes are built from docu-ments known to belong to the classes. Usually a docu-ment is assigned to a single class, but it is also possiblethat a document has multiple class-tags.Clustering refers to the dividing of a collection ofdocuments into groups (clusters). These clusters arenot pre-defined, although the number of clusters canbe specified. It is also possible to create a hierarchy35Appeared in: Proceedings of the Class of 2003 Senior Conference, pages 35–39Computer Science Department, Swarthmore Collegeof clusters, in which a cluster is further divided intosub-clusters.Classification and clustering have long been stud-ied, and there are many effective toolkits for bothtasks. These two NLP tasks are natural choices forour experiments because they can be effectively per-formed on our data sets.2 Related Work2.1 MT EvaluationThe BLEU machine translation evaluation system[3] proposed in 2002 produced very respectable re-sults, effectively emulating a human translation judge.The system produced a score for a given translation bycomparing it to a group of “perfect” human-translatedreference texts using n-gram precision values. Aftera few necessary details such as a brevity penalty hadbeen added, the BLEU system’s scores were foundto correlate closely with those given by a variety ofhuman judges on a group of test data. The mainweakness of this system is its dependency on human-translated reference texts. Although it is far more auto-mated than older, completely human-dependent “sys-tems,” which relied completely on human evaluation,the BLEU method still requires several correct trans-lations. This means that, for every new text that theMT system is tested on, the BLEU evaluation sys-tem must first be presented with good reference texts,which must be produced by a group of humans. Thiscan get expensive when a machine translation systemis tested on numerous documents, a case that is clearlypossible during the production of a truly effective MTsystem.2.2 ToolsThe Bow toolkit [1] was designed for statistical lan-guage modeling, text retrieval, classification and clus-tering. It provides a simple means of performing NLPtasks on a body of newsgroups, and was thus a veryuseful tool for us. We produced our results for textclassification and clustering with the help of this sys-tem.2.3 Other Related WorksIn [4], Weiss et al. showed that newsgroup postingscan be reasonably classified using statistical models.Palmer et al. [2] investigated the effect of Chi-nese word segmentation on information retrieval. Thiswork suggests that well-segmented Chinese text willimprove performances of NLP tasks. Chinese segmen-tation is an active area of research, partly because cur-rent systems produce very poor segmentations. As wedo not have a working segmenter for Chinese text, weexpect our results to be accordingly affected.Finally, Yang [5] gives a good overview of statisti-cal text classification.3 DataThe

View Full Document