VCU HGEN 619 - Software for Behavioral Genetics - D343143

Home> Schools> Virginia Commonwealth University> Graphic Design (HGEN) > HGEN 619> Software for Behavioral Genetics

DOC PREVIEW

VCU HGEN 619 - Software for Behavioral Genetics

School name Virginia Commonwealth University

Course Hgen 619- Quantitative Genetics

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

FIRST PAGE PROOFSbsa124Software for BehavioralGeneticsHistorical BackgroundThe term software was coined in the 1950s by theeminent statistician John Tukey (1915–2000). It usu-bsa692ally refers to the program and algorithms used tocontrol the electronic machinery (hardware) of a com-puter, and may include the documentation. Typically,software consists of source code, which is then com-piled into machine-executable code which the enduser applies in a specific application. This generalscenario applies to software for genetically informa-tive studies, and might be considered to have existedbefore Professor Tukey invented the term in themid twentieth Century. Algorithms are at the heartof software, and this term dates back to the ninth•Iranian mathematician, Al-Khawarizmi. AlthoughQ2formal analysis of data collected from twins did notbegin until the 1920s, it was, nevertheless, algorith-mic in form. A heuristic estimate of heritability,bsa265such as twice the difference between the MZ andthe DZ correlations, may be implemented using men-tal arithmetic, the back of an envelope, or on asupercomputer. In all cases the algorithm constitutessoftware; it is only the hardware that differs.Software for Model-fittingMuch current behavior genetic analysis is built uponthe statistical framework of maximum likelihood,bsa200attributed to Ronald Fisher [4]. As its name sug-bsa232gests, maximum likelihood requires an algorithm foroptimization, of which there are many: some generaland some specific to particular applications. All suchmethods use input data whose likelihood is computedunder a particular statistical model. The values of theparameters of this model are not known, but it is oftenpossible to obtain the set of values that maximizethe likelihood. These maximum likelihood estimateshave two especially desirable statistical properties;they are asymptotically unbiased, and have minimumvariance of all asymptotically unbiased estimates.Therefore, in the analysis of both genetic linkage(see Linkage Analysis) using genetic markers, andbsa348of twin studies to estimate variance components,bsa694there was motivation to pursue these more complexmethods. This section focuses on twin studies andtheir extensions.Before the advent of high-speed computers,maximum likelihood estimation would typicallyinvolve: (a) writing out the formula for thelikelihood; (b) finding the first and second derivativesof this function with respect to the parameters ofthe model; and (c) solving the (often nonlinear)simultaneous equations to find those values ofthe parameters that maximize the likelihood, thatis, where the first derivatives are zero and thesecond derivatives are negative. The first of thesesteps is often relatively simple, as it typicallyinvolves writing out the formula for the probabilitydensity function (pdf ) (see Catalogue of ProbabilityDensity Functions) of the parameters of the model.bsa074In many cases, however, the second and third stepscan prove to be challenging or intractable. Therefore,the past 25 years has seen the advent of softwaredesigned to estimate parameters under increasinglygeneral conditions.Early applications of software for numerical opti-mization to behavior genetic data primarily consistedof purpose-built computer programs which were usu-ally written in the high-level language FORTRAN,originally developed in the 1950s by John Backus.From the 1960s to the 1980s this was very much thelanguage of choice, primarily because a large libraryof numerical algorithms had been developed with it.The availability of these libraries saved the behav-ior geneticist from having to write quite complexcode for optimization themselves. Two widely usedlibraries were MINUIT from the (Centre Europ´een deRecherche Nucl´eaire) (CERN) and certain routinesfrom the E04 library of the Numerical Algorithmsgroup (NAg). The latter were developed by ProfessorMurray and colleagues in the Systems OptimizationLaboratory at Stanford University. A key advantageof these routines was that they incorporated methodsto obtain numerical estimates of the first and secondderivatives, rather than requiring the user to providethem. Alleviated of the burden of finding algebraicexpressions for the derivatives, behavior geneticists inthe 1970s and 1980s were able to tackle a wider vari-ety of both statistical and substantive problems [3, 6].Nevertheless, some problems remained which cur-tailed the widespread adoption of model-fitting bymaximum likelihood. Not least of these was that thegeneticist had to learn to use FORTRAN or a simi-lar programming language in order to fit models toFIRST PAGE PROOFSbsa1242 Software for Behavioral Geneticstheir data, particularly if they wished to fit modelsfor which no suitable software was already available.Those skilled in programing were able to assem-ble loose collections of programs, but these typicallyinvolved idiosyncratic formats for data input, pro-gram control and interpretation of output. These lim-itations in turn made it difficult to communicate useof the software to other users, difficult to modify thecode for alternative types of data or pedigree struc-ture, and difficult to fit alternative statistical models.Fortunately, the development by Karl J¨oreskog andDag S¨orbom of a more general program for maxi-mum likelihood estimation, called LISREL, alleviatedmany of these problems [1, 7]. Although other pro-grams, such as COSAN, developed by C. Fraser &R. P. McDonald existed, these proved to be less pop-ular with the behavior genetic research community.In part, this was because they did not facilitate thesimultaneous analysis of data collected from multi-ple groups, such as from MZ and DZ twin pairs,which is a prerequisite for estimating heritability andother components of variance. The heart of LISREL’sflexibility was its matrix algebra formula for the spec-ification of what are now usually called structuralbsa655equation models In essence, early versions of theprogram allowed the user to specify the elements ofmatrices in the formula: =yA(  + )Ay+ yA x+δx Ay+ δxx+ δ,(1)where A = (I − B)−1. This somewhat cumbersomeexpression is the predicted covariance within a setof dependent y variables (upper left), within a set ofindependent variables, x (lower right) and betweenthese two sets (lower left and upper right). Usingthis framework, a wide variety of models may bespecified. The

View Full Document