DOC PREVIEW
Enriching CHILDES for Morphosyntactic Analysis

This preview shows page 1-2-3-4-26-27-28-54-55-56-57 out of 57 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 57 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1. Introduction2. Analysis by Transcript Scanning3. Analysis by Lexical Tracking4. Measures of Morphosyntactic Development5. Generative Frameworks6. Analysis based on automatic morphosyntactic coding6.1. MOR and FST6.2. Understanding MOR6.3. Compounds and Complex Forms6.4. Lemmatization6.5. Errors and Replacements7. Using MOR with a New Corpus8. Affixes and Control Features9. MOR for Bilingual Corpora10. Training POST11. Difficult decisions12. Building MOR Grammars13. Chinese MOR14. GRASP15. Research Using the New Infrastructure16. Next Steps17. ConclusionReferencesEnriching CHILDES for Morphosyntactic AnalysisBrian MacWhinneyCarnegie Mellon University1. IntroductionThe modern study of child language development owes much to the methodological and conceptual advances introduced by Brown (1973). In his study of the language development of Adam, Eve, and Sarah, Roger Brown focused on a variety of core measurement issues, such as acquisition sequence, growth curves, morpheme inventories, productivity analysis, grammar formulation, and sampling methodology. The basic question that Brown was trying to answer was how one could use transcripts of interactions between children and adults to test theoretical claims regarding the child’s learning of grammar. Like many other child language researchers, Brown considered the utterances produced by children to be a remarkablyrich data source for testing theoretical claims. At the same time, Brown realized that one needed to specify a highly systematic methodology for collecting and analyzing these spontaneous productions.Language acquisition theory has advanced in many ways since Brown (1973), but we are still dealing with many of the same basic methodological issues he confronted. Elaborating on Brown’s approach, researchers have formulated increasingly reliable methods for measuring the growth of grammar, or morphosyntax, in the child. These new approaches serve to extend Brown’s vision into the modern world of computers 1and computational linguistics. New methods for tagging parts of speech and grammatical relations now open up new and more powerful ways of testing hypotheses and models regarding children’s language learning. The current paper examines a particular approach to morphosyntactic analysis that hasbeen elaborated in the context of the CHILDES (Child Language Data Exchange System) database. Readers unfamiliar with this database and its role in child languageacquisition research may find it useful to download and study the materials (manuals, programs, and database) that are available for free over the web at http://childes.psy.cmu.edu. However, before doing this, users should read the "Ground Rules" for proper usage of the system. This database now contains over 44 million spoken words from 28 different languages. In fact, CHILDES is the largest corpus of conversational spoken language data currently in existence. In terms of size,the next largest collection of conversational data is the British National Corpus with 5 million words. What makes CHILDES a single corpus is the fact that all of the data inthe system are consistently coded using a single transcript format called CHAT. Moreover, for several languages, all of the corpora have been tagged for part of speech using an automatic tagging program called MOR. When Catherine Snow and I proposed the formation of the CHILDES database in 1984, we envisioned the construction of a large corpus base would allow child language researchers to improve the empirical grounding of their analyses. In fact, the overwhelming majority of new studies of the development of grammatical production rely on the programs and data in the CHILDES database. In 2002, we conducted a review of articles based on the use of the database and found that more 2than 2000 articles had used the data and/or programs. The fact that CHILDES has hadthis effect on the field is enormously gratifying to all of us who have worked to build the database. At the same time, the quality and size of the database constitutes a testimony to the collegiality of the many researchers in child language who have contributed their data for the use of future generations. For the future, our goal is to build on these successful uses of the database to promote even more high quality transcription, analysis, and research. In order to move in this direction, it is important for the research community to understand why we have devoted so much attention to the improvement of morphosyntactic coding in CHILDES. To communicate effectively regarding this new morphosyntactic coding, we need to address the interests of three very different types of readers. Some readersare already very familiar with CHILDES and have perhaps already worked with the development and use of tools like MOR and POST. For these readers, this chapter is designed to highlight problematic areas in morphosyntactic coding and areas of new activity. It is perhaps a good idea to warn this group of readers that there have been major improvements in the programs and database over the last ten years. As a result,commands that worked with an earlier version of the programs will no longer work in the same way. It is a good idea for all active researchers to use this chapter as a way of refreshing their understanding of the CHILDES and TalkBank tools A second group of readers will have extensive background in computational methods, but little familiarity with the CHILDES corpus. For these readers, this chapter is an introduction to the possibilities that CHILDES offers for the development of new computational approaches and analyses. Finally, for child language researchers who 3are not yet familiar with the use of CHILDES for studying grammatical development, this chapter should be approached as an introduction to the possibilities that are now available. Readers in this last group will find some of the sections rather technical. Beginning users do not need to master all of these technical details at once. Instead, they should just approach the chapter as an introduction to possible modes of analysis that they may wish to use some time in the future.Before embarking on our review of computational tools in CHILDES, it is helpful to review briefly the ways in which researchers have come to use transcripts to study morphosyntactic development. When Brown collected his corpora back in the 1960s, the application of generative grammar to language development


Enriching CHILDES for Morphosyntactic Analysis

Download Enriching CHILDES for Morphosyntactic Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Enriching CHILDES for Morphosyntactic Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Enriching CHILDES for Morphosyntactic Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?