Interpreting anonymous DNA samples from mass disasters

Home> Academic Documents> Interpreting anonymous DNA samples from mass disasters

DOC PREVIEW

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Vol. 22 no. 14 2006, pages e298–e306doi:10.1093/bioinformatics/btl200BIOINFORMATICSInterpreting anonymous DNA samples from mass disasters—probabilistic forensic inference using genetic markersTien-ho Lin1, Eugene W. Myers2and Eric P. Xing1,1School of Computer Science, Carnegie Mellon University, Pittsburgh, PA and2HHMI Janelia Farms ResearchCampus, Ashburn, VAABSTRACTMotivation: The problem of identifying victims in a mass disasterusing DNA fingerprints involves a scale of computation that requiresefficient and accurate algorithms. In a typical scenario there arehundreds of samples taken from remains that must be matched tothe pedigrees of the alleged victim’s surviving relatives. Moreoverthe samples are often degraded due to heat and exposure. To developa competent method for this type of forensic inference problem, thecomplicated quality issues of DNA typing need to be handled appropri-ately, the matches between every sample and every family must beconsidered, and the confidence of matches need to be provided.Results: We present a unified probabilistic framework that efficientlyclusters samples, conservatively eliminates implausible sample-pedigree pairings, and handles both degraded samples (missingvalues) and experimental errors in producing and/or reading a geno-type. We present a method that confidently exclude forensicallyunambiguous sample-family matches from the large hypothesisspace of candidate matches, based on posterior probabilistic inference.Due to the high confidentiality of disaster DNA data, simulation exp-eriments are commonly performed and used here for validation. Ourframework is shown to be robust to these errors at levels typical in realapplications. Furthermore, the flexibility in the probabilistic modelsmakes it possible to extend this framework to include other biologicalfactors such as interdependent markers, mitochondrial sequences,and blood type.Availability: The software and data sets are available from theauthors upon request.Contact: [email protected] INTRODUCTIONRapid advances in genotyping technology and mathematicaltheories of pedigrees have enabled their application in traditionalforensic applications such as victim or perpetrator identification andpaternity testing common place, even when family structures arecomplex or sample mixtures and mutations are involved (Morteraet al., 2003). A natural next step is to enlarge the scale of geneticforensic inference to mass disasters, such as airplane crashes,terrorist bombings, or battlefields, in which hundreds or eventhousands of remains, usually highly degraded, have to be identifiedfor all the victims according to DNA evidences from candidatefamily members (Egeland et al., 2000; Lauritzen and Sheehan,2003). In addition to issues related to the increased scale of theproblem, such a problem also poses new technical challenges suchas the presence of errors in the genotypes and pedigrees, incompletegenetic information, and the need for decision making with veryhigh confidence. (This last issue is typical of forensic cases, whereseemingly low probability event such as incorrect victim/familymatching can have serious legal consequence, and must be deter-mined with a confidence much more stringent than usually adoptedin experimental biology.)DNA typing has long been used in forensic investigations, but untila decade ago, mass disaster victim identification has generally reliedon dental and medical records, fingerprints, and even photographicevidence and personal effects (Ballantyne, 1997). These techniquesrequire comparison between ante mortem (AM) information for thevictim and post mortem (PM) information of the remains. However,in most mass disaster scenarios, AM information is not available forall victims and bodies are not intact, rendering such methods inef-fective. Whitaker et al. (1995) established the use of short tandemrepeat (STR) typing, or microsatellite markers, in mass disasteridentification, and Olaisen et al. (1997) applied it to victim identi-fication in the 1996 Spitsbergen aircraft accident, in which it provedto be highly reliable. A thirteen STR loci fingerprint set called theCombined DNA Index System (CODIS) is now in routine usage bythe FBI, and has become a major tool in difficult disaster victimidentification cases (Hsu et al., 1999; Cash et al., 2003).While the basic problem of computing the likelihood ratio thata given sample is part of a given pedigree versus the null hypothesisof a random sample has been extensively studied (Olaisen et al.,1997), the inference problem of matching many pedigreesagainst many samples has not. Specialized software tools havebeen developed for large scale mass disaster identification (Cashet al., 2003) including the use of mitochondrial DNA (mtDNA) andsingle nucleotide polymorphism (SNP), but the matching algo-rithms utilized only rank the likely samples for each victim, andrank the likely victims for each sample. The complex interactionsof all family evidence and all samples are not explored, and a greatamount of expert involvement is still required. Moreover thereis currently no systematic solution that addresses all the complicat-ing factors: body part clustering, arbitrary pedigrees and theirvetting, experimental genotyping error for the samples, partialgenotypes due to heat and pressure damage of the DNA, andconfidence of a cluster to family match based on other likely andTo whom correspondence should be addressed. The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open accessversion of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford UniversityPress are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in itsentirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected] family. This paper presents an architecture for the problemand a probabilistic framework that incorporates these uncertaintiesand scales to the required problem sizes.We consider the following problem. We are given N familypedigrees


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

Please select your school