UCF CAP 5937 - Computational Approaches to Haplotype Inference

Unformatted text preview:

1Computational Approaches to Haplotype InferenceRavi Vijaya SatyaAmar MukherjeeOverview SNPs & Haplotypes The HapMap Project Why “Infer” Haplotypes? Computational Methods Maximum Resolution Perfect Phylogeny Haplotyping Haplotyping with Pedigree information Haplotyping via sequencing Direct Approach for PPH (Bafna, Gusfield, et. al.)2Genetic Variationsunderlie phenotypic differencescause inherited diseasesallow tracking ancestral human historySource: Gabor T. Marth,www.vanbug.org/talk_ppts/Gabor_2004.ppt SNP: Single Nucleotide Polymorphism“Loci in the human genome in which a considerable percentage of the population differs from the rest.”…CATGATCACGTCGACGATCGAT……CATGATCACGTCGACGATCGAT……CATGATCATGTCGACGATCGAT……CATGATCACGTCGACGGTCGAT…Allele - One of the possible states of a given a locusThe locations, or loci, are also called ‘markers’3Types of SNPs Number of alleles: Bi-allelic: A site is called bi-allelic if there are only two possible states for that site. Multi-allelic: A site is called multi-allelic if there are more than two possible states for that site Almost all the SNPs are bi-allelic Coding / Noncoding Coding (CSNP), if the SNP occurs in an exon Non-coding, if it occurs in an intron or in a non-coding regionTypes of SNPs (contd…) Coding SNPs can be: Silent  Non-silent……aca gat cag atc atg…………. T D Q I M …………aca gat caa atc atg…………. T D Q I M …………aca gat cag atc atg…………. T D Q I M …………aca gaa cag atc atg…………. T E Q I M ……4HaplotypesDefinition1: “The sequence of a copy of the chromosome” Over 10 million SNPs in total 1 SNP every 300 base pairs If each SNP is independent, there can be 210,000,000combinations possible. Limited variation Adjacent SNPs are interdependent ‘A’ at SNP1→ ‘G’ at SNP2, and: ‘C’ at SNP1→ ‘T’ at SNP2Haplotypes(Contd…)Defintion2: Each individual form taken by a block of adjacent, interdependent SNPs is called a ‘Haplotype’. A block consisting of 15 SNPs might in fact have only five or six common haplotypes. One possible reason  Limited number of loci where recombinations are possible5The International HapMap Project“multi-country effort to identify and catalog genetic similarities and differences in human beings” - HapMap.orgTarget:A complete map of genetic variations in different populationsCountries currently involved:United States, Japan, China, Canada, UK and NigeriaHapMap Goals To provide tools and data for ‘association studies’ The HapMap will help in: Linking diseases to genetic variations Diagnosing diseases Preventing diseases Estimating response to drugs Designing ‘custom’ drugs6Construction of HapMap Identification of SNPs Compilation of SNPsinto Haplotypes Finding ‘tag’ SNPsPicture Source: HapMap.orgSample Populations Yoruba in Ibadan, Nigeria  Individuals having four Yoruba grand parents Japanese in Tokyo, Japan  Individuals from different parts of Japan Han Chinese in Beijing, China  Individuals having at least 3 out of four Han grand parents CEPH (Centre d'Etude du Polymorphisme Humain ) Utah Residents with Northern and Western European Ancestry7Sample Populations … 270 individuals in total: Yoruba – 30 ‘trio’s (two parents an adult child) Japanese – 45 unrelated individuals Han Chinese – 45 unrelated individuals CEPH – 30 ‘trio’s – collected in 1980’s The samples are anonymous with regards to individual identityWhy ‘infer’ Haplotypes? Humans are diploid: Two copies of each chromosome One each from each parent A site is homozygous if it has the same allele in both chromosomes A site is called heterozygous if it has different alleles on thetwo chromosomes Expensive to sequence each chromosome separately  The chromosomes are sequenced together, producing the ‘genotype’ information.8Genotype Data Genotype data tells whether each site is: Heterozygous (Aa, unordered) Homozygous with dominant allele (AA) Homozygous with the minor allele (aa) Haplotype data: Gives the actual alleles at each site Need to infer haplotypes from genotypes.Haplotype Inference Problem:Given a set of genotypes, can the underlying haplotypes be determined computationally?Types of Genotype data With pedigree information Relationships between at least some of the individuals are known Eg: trios Without pedigree information Unrelated individuals Relationship information not available.9Haplotyping: Definitions All sites are bi-allelic The two alleles are represented by ‘0’ and ‘1’ ‘0’ generally indicates the more frequent allele ‘1’ indicates the less frequent, or the minor allele A haplotype of length m: Is a vector h = <h1,…,hm> over {0,1}m Each position i is a site, or locusHaplotyping: Definitoins A genotype represents two haplotypes: Each site (position) is an unordered pair over {0,1} Can be written as: g = <g1,…,gm> over {0,1,2}m ‘0’ indicates the pair(0,0), 1 indicates (1,1) ‘2’ indicates the pairs (0,1) or (1,0)0 1 1 1 0 0 1 1 01 1 0 1 0 0 1 0 02 1 2 1 0 0 1 2 0The two haplotypesThe genotype10Haplotyping: Definitoins Resolution of a genotype g = <g1,…,gm>  A pair <h,k> of haplotypes such that: hi= ki= giif gi= 0 or 1 hi≠ kiif gi= 2, for each i, 1≤ i ≤ m A haplotype h is compatible with a genotype g if there exists another haplotype h’ such that that pair <h,h’>resolves g h’ is called realization of g by h h’ is denoted as R(g,h)Haplotyping: definitions Given h and g, there can be only one h’: h’[i] = h[i] if g[i] is homozygous h’[i] = 1-h[i] if g[i] is heterozygous2 1 2 1 0 0 1 2 0g0 1 1 1 0 0 1 0 0h1 1 0 1 0 0 1 1 0h’Compatible2 1 2 1 0 0 1 2 0g0 0 1 1 0 0 1 0 0hIncompatible11Haplotype inference problemInput: a set G = {g1, ……,gn} of genotypesOutput: for each g ∈ G a pair <h, h’>of haplotypes resolving g.Simple solution: Find h by randomly assigning ‘1’ or ‘0’ for each ‘2’ in g h’ ← R(g,h)2 1 2 1 0 0 1 2 0g0 1 0 1 0 0 1 0 0h1 1 1 1 0 0 1 1 0h’If there are p heterozygous sites, 2p-1different solutions


View Full Document

UCF CAP 5937 - Computational Approaches to Haplotype Inference

Documents in this Course
Load more
Download Computational Approaches to Haplotype Inference
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computational Approaches to Haplotype Inference and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computational Approaches to Haplotype Inference 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?