New version page

Adaptable, Community Controlled Language Technologies

This preview shows page 1-2-3-4-5-35-36-37-38-39-71-72-73-74-75 out of 75 pages.

View Full Document
View Full Document

End of preview. Want to read all 75 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Slide 1The double life of an endangered language researcherOutlineSuggested Research ProgramEndangered LanguagesImportance of Endangered LanguagesThree Language CommunitiesOther sources of informationNorth Slope IñupiatProperties of Iñupiaq (From notes by Lawrence Kaplan)Properties of IñupiaqProperties of IñupiaqProperties of IñupiaqProperties of IñupiaqProperties of IñupiaqType token curvesType token ratio curvesIñupiaq Orthography and FontsMapucheProperties of Mapudungun (Zúñiga 2000)Properties of MapudungunProperties of MapudungunProperties of MapudungunProperties of MapudungunType Token CurveMapudungun OrthographyAnishinaabeLow (Digital) ResourcesBeyond Low ResourcesLanguage technologies in informal registers (language styles)Rapid changeRapid changeAttitudes toward change Examples from OjibweAttitudes toward change Examples from OjibweAttitudes toward changeAttitudes toward changeAttitudes toward changeMany small varietiesSupport for many small varietiesMany small varietiesMany small varietiesMorphosyntactic divrgencesWhat Language technologies are useful?What do language communities want?What do language communities want?What do language communites want?What about MT?Suggested Research ProgramAVENUE Mapudungun and IñupiaqAvenue ArchitectureTransfer Rule FormalismTransfer Rule Formalism (II)MapudungunMapudungun-to-SpanishMapudungun-to-SpanishMapudungun-to-SpanishMapudungun morphemes  Spanish wordsSlide 58Slide 59Mapudungun dual  Spanish PluralKofketun  I eat breadMorphemes that correspond to Spanish tense, aspect, and moodSlide 63Feature manipulation before transferFeature manipulation before transferTest suiteEvaluationSample OutputIñupiaqIñupiaq resourcesIñupiaq XFST transducerSlide 72MorphophonemicsSlide 74A call to actionLori LevinLanguage Technologies Institute Carnegie Mellon UniversityAdaptable, Community Controlled Language TechnologiesPictures by Rodolfo Vega Pictures by Laura TomokiyoThe double life of an endangered language researcherResearchers urgently need to try new things.[endangered [language researcher]]Speakers of endangered languages urgently need tools that work. [[endangered language] researcher]Picture by Laura TomokiyoOutlineThe needs of language communitiesThe AVENUE project’s experience with:Iñupiaq (Alaska)Mapudungun (Chile)Suggested Research ProgramBeyond bootstrapping from low resourcesGenre and register adaptationTranslation between related languages and dialectsNon-synchronous grammars in order to handle extreme agglutination and polysynthesisTechnologies based on mobile phonesNew techniques: Learning in the wild (in the context of use), active learning, self training, etc.Endangered LanguagesAround 6000 human languages are currently spoken90% are not expected to survive the next centuryIn the US, about 200 indigenous languages are still spokenOnly a few will survive the next 30 years (Noori p.c.)Importance of Endangered LanguagesCultural lossStories, songs, ethnic identityScientific lossThe study of human language will suffer from losing 90% of the samplesAnother kind of scientific lossNames of places, geological formations, plants, animals, etc.Three Language CommunitiesNorth Slope Iñupiat (Alaska)Edna MacLean (linguist, lexicographer, native speaker)Larry Kaplan (linguist, Alaska Native Language Center, University of Alaska, Fairbanks)Aric Bills (linguistics student, UAF)Mapuche (Chile, Argentina)Rosendo Huisca (language expert, lexicographer, native speaker)Eliseo Cañulef (bilingual education and language maintenance)Anishinaabe (Ojibwe, Potawatame, Odawa) (Great Lakes)Margaret Noori (linguist, language revitalization)Other sources of informationDelyth Prys Welsh, Native speakerLanguage technologies developer, terminologist, language revitalizationJonathan AmithNahuatl (Mexico), Anthropologist, linguistLanguage technologies developerPer LanggaardKalaallisut (Greenland), Greenlandic GovernmentLanguage technologies developerNorth Slope IñupiatLanguage: North Slope IñupiaqAbout 5000 peopleAlmost all native speakers are over 40 years oldSome bilingual education and second language educationStatus: endangeredRelated to languages whose status is better: Inuktitut (Canada), Kalaallisut (Greenland)Related to languages that are also endangered: Kobuk Pass Inupiaq.Properties of Iñupiaq(From notes by Lawrence Kaplan)vowels: a i u aa ii uu ai ia au ua iu uiFconsonants:p t ch k q ‘F(f) ł ł s sr kh (x) qh (X) hv l ļ z y g (ɣ) ġ (ʁ)m n ñ ŋProperties of IñupiaqWord structureStem (noun or verb) – postbase/s (optional) – inflection –enclitic (optional)FNiġi – ñiaq – tu(q) – guuq. Eat - will - s/he – it is said“It is said that s/he will eat.’Properties of IñupiaqDual NumberNiġi-ruŋa. ‘I am eating’ or ‘I ate.’ (singular) Niġi-ruguk. ‘We2 are eating.’ or ‘We2 ate.’ (dual) Niġi-rugut. ‘We are eating. or ‘We ate.’ (plural)Properties of IñupiaqErgative Case (transitive sentences)Aŋuti-m tuttu niġi-gaa. Man-Rel. caribou-Abs. eat-trans. 3s-3s‘The man ate/is eating caribou.’FTuttu-m aŋun niġi-gaa. caribou-Rel. man-Abs. eat-trans. 3s-3s‘The caribou ate the man.’Properties of IñupiaqAnti-passive (indefinite object)Tuttu-mik tautuk-tuŋa. ‘I ate caribou.’ or ‘I am eating caribou.’Aŋuti-m tuttu niġi-gaa. Man-Rel. caribou-Abs. eat-trans. 3s-3s‘The man ate/is eating caribou.’Properties of IñupiaqLong, multi-morphemic wordsTauqsiġñiaġviŋmuŋniaŋitchugut. ‘We won’t go to the store.’Kalaallisut (Greenlandic, Per Langgaard, p.c.)PittsburghimukarthussaqarnavianngilaqPittsburgh+PROP+Trim+SG+kar+tuq+ssaq+qar+naviar+nngit+v+IND+3SG "It is not likely that anyone is going to Pittsburgh"Type token curves0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000100020003000400050006000Type-Token CurvesEnglishArabicHocąkInupiaqFinnishTokensTypesType token ratio curves103907701150153019102290267030503430381041904570495053305710609064706850723076107990837087509130951000.20.40.60.811.2Type-Token Ratio CurvesEnglish Arabic HocąkInupiaqTokensTypesIñupiaq Orthography and


Loading Unlocking...
Login

Join to view Adaptable, Community Controlled Language Technologies and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Adaptable, Community Controlled Language Technologies and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?