DOC PREVIEW
Path Diversity Media Streaming over Best Effort Packet Switched Networks

This preview shows page 1-2-22-23 out of 23 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 23 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

A Comparison of Phylogenetic Reconstruction Methods on an IE DatasetLuay Nakhleh Tandy WarnowDept. of Computer Science Dept. of Computer SciencesRice University University of [email protected] [email protected] Ringe Steven N. EvansDept. of Linguistics Dept. of StatisticsUniversity of Pennsylvania University of [email protected] [email protected] interested in the history of the Indo-European family of languages haveused a variety of methods to estimate the phylogeny of the family, and have obtainedwidely differing results. In this paper we explore the reconstructions of the Indo-European phylogeny obtained by using the major phylogeny estimation procedures onan existing database of 336 characters (including lexical, phonological, and morpho-logical characters) for 24 Indo-European languages. Our study finds that the differentmethods agree in part, but that there are also several striking differences. We dis-cuss the reasons for these differences, and make proposals with respect to phylogeneticreconstruction in historical linguistics.1 IntroductionReconstruction of the phylogenies of language families is a part of historical linguistics whichhas recently received significant attention from the non-linguistic scientific research commu-nity, some of whom are interested in seeing if phylogenetic reconstruction methods originallydesigned for biological data can be used on linguistic data to good effect. In this paperwe examine the results of using phylogenetic reconstruction methods from both biology andlinguistics on the character database we have used over the last decade to analyze the diver-sification of the Indo-European family. In addition to varying the methods we use to analyzethe dataset, we study the consequences for phylogenetic reconstruction of restricting thedata to lexical characters alone, and of screening the data to eliminate characters that mighthave evolved with borrowing or have undergone parallel evolution. Our study shows that thedifferences in the phylogenies obtained by different reconstruction methods are due at leastin part to data selection, with analyses based upon datasets that use only lexical charactersbeing probably less accurate than analyses based upon datasets that include morphologicaland phonological characters and that give these additional characters extra weight. We alsofind significant differences between methods, even on the same dataset. Finally, we find thatequal treatment of characters is probably unwise, with improved results obtained by recog-nizing that some characters (notably characters derived from inflectional morphology andcomplex phonological characters) are less likely to evolve in parallel or with back mutation.1Our paper is organized as follows. We begin by defining the concepts and terminologyin Section 2. The methods we use to analyze linguistic datasets are described in Section3. In Section 4 we discuss the dataset we use to compare reconstruction methods, brieflydiscussing how the characters were selected and coded. The results of our phylogeneticanalyses are presented in Section 5. We summarize our results and make recommendationsabout phylogenetic reconstruction in Section 6.2 Basics2.1 CharactersA (linguistic) character is any feature of languages that can take one or more forms; thesedifferent forms are called the “states” of the character. Our characters are of three types.For lexical characters the different states are cognate classes, so that two languages exhibitthe same state for the lexical character if and only if they have cognates for the meaningassociated with the lexical character. Phonological characters record the occurrence of soundchanges within the (pre)history of the language; thus a typical phonological character hastwo states, depending of whether or not the sound change (or, more often, constellationof changes) has occurred in the development of each language. Most of our morphologicalcharacters represent inflectional markers; like lexical characters, they are coded by cognation.Thus each character defines an equivalence relation on the language family, such that twolanguages are equivalent if they exhibit the same state for the character. Given a partitionof a set into disjoint subsets, we can define an equivalence relation by making two languagesequivalent if and only if they are in the same subset; thus, a partition of a set into disjointsubsets defines an equivalence relation (and the converse holds as well).For each character, we can assign numbers to the states of the character so that thecharacter is defined to be a function that assigns every language in a set L of languages areal number; the number assigned to the language is called the “state” of the character forthat language. Thus, the states of all our characters are real numbers, and when we writec(L) for a language L and a character c, we mean the state of the character c exhibited bythe language L. However, the particular real number used to label a state is irrelevant, andall that matters is whether two states are equal or different.2.2 Homoplasy, Character Compatibility, and Perfect Phyloge-niesThe phenomenon of back–mutation and/or parallel evolution is called “homoplasy”. Whenthere is no homoplasy in a character, then all changes of state for that character result innew states. When all the characters evolve without homoplasy down a tree, then the tree iscalled a “perfect phylogeny”, and each of the characters is said to be “compatible” on thetree.2For example, the characters c1and c2in Figure 1(b) are compatible with the tree T inFigure 2, whereas character c3is not.c1c2L10 0L20 0L31 0L41 1L51 1c1c2c3L10 0 0L20 0 1L31 0 1L41 1 0L51 1 0(a) (b)Figure 1: (a) Five languages L1, . . . , L5, with two characters c1and c2. (b) The same five languageswith a third character c3.L1,01,11,00,01,11,11,00,00,054321LLLLFigure 2: A perfect phylogeny T for the languages and character states of Figure 1(a).2.3 Study designWe examine the performance of six phylogeny reconstruction methods (two distance-basedmethods and four character-based methods) on four versions of an IE database. We evaluatethe accuracy of these methods with respect to established aspects of the Indo-Europeanhistory, and also with respect to the number and type of characters that are incompatiblewith each of the trees returned. We use the IE dataset we have developed over the lastdecade as the basic


Path Diversity Media Streaming over Best Effort Packet Switched Networks

Download Path Diversity Media Streaming over Best Effort Packet Switched Networks
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Path Diversity Media Streaming over Best Effort Packet Switched Networks and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Path Diversity Media Streaming over Best Effort Packet Switched Networks 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?