DOC PREVIEW
Bayesian Analysis of Haplotypes for Linkage Disqeuilibrium Mapping

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bayesian Analysis of Haplotypes for LinkageDisequilibrium MappingJun S. Liu,1,6Chiara Sabatti,2Jun Teng,3Bronya J.B. Keats,4and Neil Risch51Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, USA;2Department of Statistics, Universityof California, Los Angeles, California 90095, USA;3JP Morgan, New York, New York 10036, USA;4Louisiana StateUniversity, Department of Genetics, Health Science Center, New Orleans, Louisiana 70112, USA;5Department of Genetics,Stanford University, Stanford, California 94305, USAHaplotype analysis of disease chromosomes can help identify probable historical recombination events andlocalize disease mutations. Most available analyses use only marginal and pairwise allele frequency information.We have developed a Bayesian framework that utilizes full haplotype information to overcome variouscomplications such as multiple founders, unphased chromosomes, data contamination, and incomplete markerdata. A stochastic model is used to describe the dependence structure among several variables characterizing theobserved haplotypes, for example, the ancestral haplotypes and their ages, mutation rate, recombination events,and the location of the disease mutation. An efficient Markov chain Monte Carlo algorithm was developed forcomputing the estimates of the quantities of interest. The method is shown to perform well in both real datasets (cystic fibrosis data and Friedreich ataxia data) and simulated data sets. The program that implements theproposed method, BLADE, as well as the two real datasets, can be obtained from http://www.fas.harvard.edu/∼junliu/TechRept/01folder/diseq_prog.tar.gz.In the quest to identify genes responsible for specific illnesses,it has been observed in many cases that a large portion of thecarriers of the disease gene in the current population are de-scendant from a small number of “founders” in whose ge-nomes the deleterious mutation appeared some generationsago. This translates into inhomogeneity between the allelefrequencies in the general population and those with the dis-ease for genetic markers close to the location of the diseasegene(s). The reason is that the allele frequencies of thesemarkers in the disease population still reflect those originallycarried by the founder chromosome(s), with modificationsintroduced by recombinations and mutations. This phenom-enon, known as linkage disequilibrium (LD), can be exploitedto identify the location of a disease gene by measuring thedependence between disease status and allele distributionsamong a set of markers.Simply looking at the marginal dependency betweeneach marker and disease status in a case/control sample ofchromosomes is clearly inefficient. For an LD mapping strat-egy to be optimal in fine mapping, it is essential to considerthe information observed in a set of contiguous markers (i.e.,haplotypes). The primary goal of our Bayesian analysis is thelocalization of a gene responsible for the disease within theconsidered set of markers. Secondary goals are the determina-tion of ancestral haplotypes, the separation of distinctfounders of the disease, the construction of haplotypes fromunphased chromosomes, and inference on the ages of themutations causing the disease. Our method, like any othersbased on LD, is appropriate when there are reasons to assumethe existence of a founder effect in at least a significant pro-portion of the diseased individuals. We note that several at-tempts along the lines of our approach have been discussed inthe literature, and we compare these methods with our ap-proach herein.By employing a Bayesian approach, we explicitly modelpositions of the historical recombinations and mutationevents that produced the observed haplotypes from an initialset of founders. As a result, our Bayesian LinkAge DisEquilib-rium mapping (BLADE) algorithm produces the posterior dis-tribution of the location of the disease mutation by account-ing for all sources of uncertainties. A major advantage of ourapproach is its flexibility in treating various complicationssuch as missing marker data, multiple founders, and un-phased chromosomes. For example, the algorithm providesnot only the estimation of the mutation location but also thehaplotype construction in the case when part or all of thedisease chromosomes are unphased. Our methodology is wellsuited for the fine mapping of a disease gene within a previ-ously identified linked region. The main idea presented herecan also be extended to LD genome screens and single nucleo-tide polymorphism (SNP) studies.RESULTSThe BLADE algorithm can be regarded as a specialized expertsystem: It takes as input the prior knowledge such as mutationrate, the range of founders’ ages, etc., and produces the pos-terior distributions of the location of the disease mutation(s),ancestral haplotypes, founder ages, cluster indicators, andhaplotypes of unphased chromosomes. All of these outputcomponents can be inspected directly by the researcher forfurther validation.The centerpiece of the BLADE algorithm is an explicitstochastic model describing the dependence structure amongthe many variables related to the generation of the observeddisease haplotypes. This model is closely related to the hiddenMarkov model employed by McPeek and Strahs (1999) andMorris et al. (2000) but appears to be simpler and more trans-parent. Our model assumes that the disease haplotypes can be6Corresponding author.E-MAIL [email protected]; FAX (617) 496-8057.Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.194801.Methods1716 Genome Research 11:1716–1724 ©2001 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/01 $5.00; www.genome.orgwww.genome.orggrouped into k+1 clusters, corresponding to k founder chro-mosomes in the current disease population and 1 “null” clus-ter for all other disease chromosomes. Each non-null cluster ischaracterized by an ancestral haplotype associated with asingle disease-causing mutation coalescing to a single timepoint (age). These k ancestral mutations are assumed to be atthe same (or very close) location. Although BLADE relies onthe simplifying assumption that the disease haplotypes of thecurrent generation within each cluster are mutually indepen-dent given the ancestral haplotype, allowing for multipleclusters and for different founder ages alleviates the need fora faithful (and very complex) model of the underlying gene-alogy. A Markov chain Monte Carlo strategy was developed


Bayesian Analysis of Haplotypes for Linkage Disqeuilibrium Mapping

Download Bayesian Analysis of Haplotypes for Linkage Disqeuilibrium Mapping
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Analysis of Haplotypes for Linkage Disqeuilibrium Mapping and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Analysis of Haplotypes for Linkage Disqeuilibrium Mapping 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?