Machine Learning ! ! ! ! ! Srihari 1 Genetic Inheritance and Bayesian Networks Sargur Srihari [email protected] Learning ! ! ! ! ! Srihari 2 Genetics Pedigree Example • One of the earliest uses of Bayesian Networks – Before general framework was defined • Local independencies are intuitive • Model transmission of certain properties such as blood type from parent to childMachine Learning ! ! ! ! ! Srihari Phenotype and Genotype • Some background on genetics needed to model properly • Blood type is an observable quantity that depends on the genetic makeup – Called a phenotype • Genetic makeup of a person is called a genotype 3Machine Learning ! ! ! ! ! Srihari 4 4 Actual !Electron !Photo!micrograph!Single Chromosome: ~108 base-pairs!Genome: sequence of 3x109 base-pairs! (nucleotides A,C,G,T) !Represents full set of chromosomes!Genome has 46 chromosomes (22 are repeated plus XX and XY)!Large portions of DNA have no survival function (98.5%) and have variations !useful for identification!!TH01 is a location on short arm of chromosome 11:!short tandem repeats (STR) of same base pair AATG!Variant forms (alleles) different for different individuals! locus DNA BasicsMachine Learning ! ! ! ! ! Srihari Genetic Model • Human genetic material – 22 pairs of autosomal chromosomes – One pair of sex chromosomes (X and Y) • Each chromosome contains genetic material that determine person’s properties • Locus: Region of chromosome of interest – Blood type is a particular locus • Alleles: Variants of locus – Blood type has three variants: A, B, O 5Machine Learning ! ! ! ! ! Srihari Independence Assumptions • Arise from biology • Once we know – Genotype of a person • additional evidence about other members of family will not provide new information about blood-type – Genotype of both parents • Determine what is passed to off-spring • Additional ancestral information not needed • These independencies can be captured in BN for a family tree 6Machine Learning ! ! ! ! ! Srihari A small family tree 7 HarryMachine Learning ! ! ! ! ! Srihari BN for Genetic Inheritance 8 G: Genotype B: Blood TypeMachine Learning ! ! ! ! ! Srihari Autosomal Chromosome • In each pair, – Paternal: inherited from father – Maternal: inherited from mother • Person’s genotype is an ordered pair (X,Y) – with each having three possible values (A,B,O) – there are nine values such as (A,B) • Blood type phenotype is a function of both copies – E.g., genotype (A,O) blood type is A – (O,O) O 9Machine Learning ! ! ! ! ! Srihari CPDs for Genetic Inheritance • Penetrance Model P(B(c)|G(c)) – Probabilities of different phenotypes given person’s genotype • Deterministic for bloodtype • Transmission Model P(G(c)|G(p),G(m)) – Each parent equally likely to transmit either of two alleles to child • Genotype Priors P(G(c)) – Genotype frequencies in population 10Machine Learning ! ! ! ! ! Srihari Real models more complex • Phenotypes for late-onset diseases are not a deterministic function of genotype – A particular genotype may have a higher probability of a disease • Genetic makeup of individual determined by many genes • Some phenotypes depend on many genes • Multiple phenotypes depend on many genes 11Machine Learning ! ! ! ! ! Srihari Modeling multi-locus inheritance • Inheritance patterns of different genes not independent of each other • Need to take into account adjacent loci • Introduce selector variables S(l,c,m) • 1 if locus l in c’s maternal chromosome inherited from c’s maternal grandmother • 2 if locus inherited from c’s maternal grandfather • Model correlations of variables of adjacent loci l and l’ 12Machine Learning ! ! ! ! ! Srihari Use of Genetic Inheritance Model • Extensively used in 1. In genetic counseling and prediction 2. In linkage analysis 13Machine Learning ! ! ! ! ! Srihari Genetic Counseling and Prediction • Take phenotype with known loci and observed phenotype and genotype data for individuals – to infer genotype and phenotype for another person (planned child) • Genetic data – Direct measurements of relevant disease loci or nearby loci which are correlated with disease loci 14Machine Learning ! ! ! ! ! Srihari Linkage Analysis • Harder task • Identifying disease genes from pedigree data using several pedigrees – Several individuals exhibit disease phenotype – Available data • Phenotype information for many individuals in pedigree • Genotype information for known location in chromosome – Use inheritance model to evaluate likelihood – Pinpoint area linked to disease to further analyze genes in that area • Allows focusing on 1/10,000 of genome 15Machine Learning ! ! ! ! ! Srihari Sparse BN in genetic inheritance • Allow reasoning about large pedigree and multiple loci • Allow use of model learning algorithms to understand recombination rates in different regions and penetration probabilities for different diseases
View Full Document