Comparative Genome MapsWhat is a comparative map?Why construct comparative maps?Why automate?DefinitionsInput/OutputMap constructionChromosome labelingA natural model?ScoringAssumptionsSlide 12Slide 13Slide 14Slide 15Dynamic programmingRecurrence relationProblem with linear modelThe stack modelSlide 20Slide 21Slide 22Results: infers evolutionary eventsProblem: Incomplete inputThe reordering algorithmSlide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Results: Fewer mismatchesResults: Mismatches placed between segmentsResults: Detects new segmentsSummarySlide 39Future DirectionsSlide 41AcknowledgmentsComparative Genome MapsCSCI 7000-005: Computational GenomicsDebra [email protected] is a comparative map?Why construct comparative maps?Identify & isolate genes•Crops: drought resistance, yield, nutrition...•Human: disease genes, drug response,…Infer ancestral relationshipsDiscover principles of evolution•Chromosome•Gene family“key to understanding the human genome”Why automate?Time consuming, laborious•Needs to be redone frequentlyCodify a common set of principlesNadeau and Sankoff: warn of “arbitrary nature of comparative map construction”DefinitionsMarker: identifiable chromosomal locusHomology: genes with common ancesterHomeology: chromosomal regions derived from a common ancestral linkage groupSynteny: loci on the same chromosomeColinearity: syntenic regions with conserved gene orderInput/OutputInput: •genetic maps of 2 species•marker/gene correspondences (homologs)Output:•a comparative map•homeologies identifiedMap construction3S8L10L3LMaize 1 (target), Rice (base)Wilson et al. Genetics 1999pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10L)rz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Go from thisto thisChromosome labelingMaize 1 (target), Rice (base)Wilson et al. Genetics 1999Maize 1pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10L)rz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Rice3S8L10L3LA natural model?Maize 1 (target), Rice (base)Wilson et al. Genetics 1999Maize 1pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10L)rz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Rice3S8L10L3LScoring10L3Lsmbcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)isu92b (3L)AssumptionsAccept published marker orderAll linkage groups of base are uniqueSimplistic homeology criteriaAt least one homeologous regionA natural model?A natural model?A natural model?A natural model?Dynamic programmingli = location of homolog to marker iS[i,a] = penalty (score) for an optimal labeling of the submap from marker i to the end, when labeling begins with label aa 1 ... i ... nRecurrence relationS[n,a] = m (a, ln)S[i,a] = m (a, li) + min (S[i+1,b] + s (a,b) )bLa b ... i i+1 ... nlili+1lna ... n... lnProblem with linear models = 2a-b-c motif:a b c score: 2s = 4 a a a b b b c c ca-b-a motif:a score: 3m = 3 a a a b b b a a aThe stack modelSegment at top of the stack can be:•pushed (remembered), later popped•replacedPush and replace cost s -- pop is free.b b bfedcacScorings9L7L7L“free” popmmm uaz265a (7L) isu136 (2L) isu151 (7L) rz509b (7L) cdo59c (7L) rz698c (9L) bcd1087a (9L) rz206b (9L) bcd1088c (9L) csu40 (3S) cdo786a (9L) csu154 (7L) isu113a (7L) csu17 (7L) cdo337 (3L) rz530a (7L)Dynamic programmingS[i,j,a] = score for an optimal labeling of:•submap from marker i to marker j•when labeling begins with label a -- i.e., marker i is labeled aa 1 ... i ... j ... nRecurrence relationS[i,i,a] = m (a, li) S[i,j,a] = min: m (a, li) + min (S[i+1,j,b] + s (a,b) ) min S[i,k,a] + S[k+1,j,a] i<k<jbLa a1 ... i ... k+1... j ... na1 ... i i+1 ... na b1 ... i i+1 ... nResults: infers evolutionary eventsMaize 1 (target) Rice (base)Wilson et al.StackProblem: Incomplete inputGene order not always fully resolved.Co-located genes can be ordered to give most parsimonious labeling.8p19p33.0 Atp6b1 (8p)33.0 Comp (19)33.0 Jak3 (19p)33.0 Jund1 (19p)33.0 Lpl (8p)33.0 Mel (19p)33.0 Npy1r (4q)33.0 Pde4c (19)33.033.0 Srebf1 (17p)Slc18a1 (8p)Atp6b1 (8p)Lpl (8p)Npy1r (4q)Srebf1 (17p)Comp (19)Jak3 (19p)Jund1 (19p)Mel (19p)Pde4c (19)Slc18a1 (8p)=8p19pThe reordering algorithmUses a compression scheme•Within a megalocus, group genes by location of related gene.•Order these groups•First, last groups interact with nearby genes•Any ordering of internal groups is equally parsimoniousThe reordering algorithmThe reordering algorithmDefinitions extended to distance to a set A of labels0 if a A, 1 otherwiseS = the set of indices of supernode start elementsFor simplicity, call supernode i S (a, A) =DefinitionsFor i S: ni = # markers in ini(a) = # markers in i with a homolog on ali = set of labels matching markers in i•li = {a L | ni(a) 1},Definitionspi(c) gives mismatched marker and segment boundary penalties for label cpi(c) = s : m ni(c) sm ni(c) : m ni(c) sDefinitionsp(i,a,b) gives the total mismatched
View Full Document