CU-Boulder CSCI 7000 - Comparative Genome Maps - D1895578

Home> Schools> University of Colorado at Boulder> Computer Science (CSCI) > CSCI 7000> Comparative Genome Maps

CU-Boulder CSCI 7000 - Comparative Genome Maps

School name University of Colorado at Boulder

Course Csci 7000- Current Topics in Computer Science

Pages 42

Download Save

Unformatted text preview:

Comparative Genome MapsWhat is a comparative map?Why construct comparative maps?Why automate?DefinitionsInput/OutputMap constructionChromosome labelingA natural model?ScoringAssumptionsA natural model?A natural model?A natural model?A natural model?Dynamic programmingRecurrence relationProblem with linear modelThe stack modelScoringDynamic programmingRecurrence relationResults: infers evolutionary eventsProblem: Incomplete inputThe reordering algorithmThe reordering algorithmThe reordering algorithmDefinitionsDefinitionsDefinitionsDefinitionsDefinitionsDefinitionsRecurrence relationResults: Fewer mismatchesResults: Mismatches placed between segmentsResults: Detects new segmentsSummarySummaryFuture DirectionsFuture DirectionsAcknowledgmentsComparative Genome MapsCSCI 7000-005: Computational GenomicsDebra [email protected] is a comparative map?Why construct comparative maps?z Identify & isolate genes• Crops: drought resistance, yield, nutrition...• Human: disease genes, drug response,…z Infer ancestral relationshipsz Discover principles of evolution• Chromosome• Gene familyz “key to understanding the human genome”Why automate?z Time consuming, laborious• Needs to be redone frequentlyz Codify a common set of principlesz Nadeau and Sankoff: warn of “arbitrary nature of comparative map construction”Definitionsz Marker: identifiable chromosomal locusz Homology: genes with common ancesterz Homeology: chromosomal regions derived from a common ancestral linkage groupz Synteny: loci on the same chromosomez Colinearity: syntenic regions with conserved gene orderInput/Outputz Input: • genetic maps of 2 species• marker/gene correspondences (homologs)z Output:• a comparative map• homeologies identifiedMap construction3S8L10L3LMaize 1 (target), Rice (base)Wilson et al. Genetics 1999pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10Lrz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Go from thisto thisChromosome labelingMaize 1 (target), Rice (base)Wilson et al. Genetics 1999Maize 1pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10Lrz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Rice3S8L10L3LA natural model?Maize 1 (target), Rice (base)Wilson et al. Genetics 1999Maize 1pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10Lrz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Rice3S8L10L3LScoring10L3Lsmbcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)isu92b (3L)Assumptionsz Accept published marker orderz All linkage groups of base are uniquez Simplistic homeology criteriaz At least one homeologous regionA natural model?A natural model?A natural model?A natural model?Dynamic programmingz li= location of homolog to marker iz S[i,a] = penalty (score) for an optimal labeling of the submap from marker i to the end, when labeling begins with label aa1 ... i ... nRecurrence relationS[n,a] = m δ(a, ln)S[i,a] = m δ(a, li) + min (S[i+1,b] + s δ(a,b) )b∈Lab... ii+1 ... nlili+1lna... n... lnProblem with linear models = 2a-b-c motif:abcscore: 2s = 4aa a bbbccca-b-a motif:a score: 3m = 3aa a bbbaaaThe stack modelz Segment at top of the stack can be:• pushed (remembered), later popped• replacedz Push and replace cost s -- pop is free.bbbfedcacScorings9L7L7L“free”popmmmuaz265a (7L)isu136 (2L)isu151 (7L)rz509b (7L)cdo59c (7L)rz698c (9L)bcd1087a (9L)rz206b (9L)bcd1088c (9L)csu40 (3S)cdo786a (9L)csu154 (7L)isu113a (7L)csu17 (7L)cdo337 (3L)rz530a (7L)Dynamic programmingz S[i,j,a] = score for an optimal labeling of:• submap from marker i to marker j• when labeling begins with label a --i.e., marker i is labeled aa1 ... i ... j ... nRecurrence relationz S[i,i,a] = m δ(a, li) z S[i,j,a] = min:m δ(a, li) + min (S[i+1,j,b] + s δ(a,b) )min S[i,k,a] + S[k+1,j,a]i<k<jb∈Laa1 ... i ... k+1 ... j ... na1 ... ii+1 ... nab1 ... ii+1 ... nResults: infers evolutionary eventsMaize 1 (target) Rice (base)Wilson et al.StackProblem: Incomplete inputz Gene order not always fully resolved.z Co-located genes can be ordered to give most parsimonious labeling.8p19p33.0 Atp6b1 (8p)33.0 Comp (19)33.0 Jak3 (19p)33.0 Jund1 (19p)33.0 Lpl (8p)33.0 Mel (19p)33.0 Npy1r (4q)33.0 Pde4c (19)33.033.0 Srebf1 (17p)Slc18a1 (8p)Atp6b1 (8p)Lpl (8p)Npy1r (4q)Srebf1 (17p)Comp (19)Jak3 (19p)Jund1 (19p)Mel (19p)Pde4c (19)Slc18a1 (8p)=8p19pThe reordering algorithmz Uses a compression scheme• Within a megalocus, group genes by location of related gene.• Order these groups• First, last groups interact with nearby genes• Any ordering of internal groups is equally parsimoniousThe reordering algorithmThe reordering algorithmDefinitionsδ extended to distance to a set A of labels0 if a ∈ A, 1 otherwiseS = the set of indices of supernode start elementsFor simplicity, call supernode i ∈ S δ(a, A) =DefinitionsFor i ∈ S:z ni= # markers in iz ni(a) = # markers in i with a homolog on az li= set of labels matching markers in i• li= {a ∈ L | ni(a) ≥ 1},Definitionsz pi(c) gives mismatched marker and segment boundary penalties for label cpi(c) = s : m ni(c) ≥ sm ni(c) : m ni(c) ≤ sDefinitionsz p(i,a,b) gives the total mismatched marker and segment boundary penalties attributed to “hidden markers”Σ (pi(c)) + m

View Full Document


School:
Email:
New Password:
Confirm Password:

CU-Boulder CSCI 7000 - Comparative Genome Maps

Sign up for free to view:

Please select your school