Unformatted text preview:

Comparative Genome MapsWhat is a comparative map?Why construct comparative maps?Why automate?DefinitionsInput/OutputMap constructionChromosome labelingA natural model?ScoringAssumptionsA natural model?A natural model?A natural model?A natural model?Dynamic programmingRecurrence relationProblem with linear modelThe stack modelScoringDynamic programmingRecurrence relationResults: infers evolutionary eventsProblem: Incomplete inputThe reordering algorithmThe reordering algorithmThe reordering algorithmDefinitionsDefinitionsDefinitionsDefinitionsDefinitionsDefinitionsRecurrence relationResults: Fewer mismatchesResults: Mismatches placed between segmentsResults: Detects new segmentsSummarySummaryFuture DirectionsFuture DirectionsAcknowledgmentsComparative Genome MapsCSCI 7000-005: Computational GenomicsDebra [email protected] is a comparative map?Why construct comparative maps?z Identify & isolate genes• Crops: drought resistance, yield, nutrition...• Human: disease genes, drug response,…z Infer ancestral relationshipsz Discover principles of evolution• Chromosome• Gene familyz “key to understanding the human genome”Why automate?z Time consuming, laborious• Needs to be redone frequentlyz Codify a common set of principlesz Nadeau and Sankoff: warn of “arbitrary nature of comparative map construction”Definitionsz Marker: identifiable chromosomal locusz Homology: genes with common ancesterz Homeology: chromosomal regions derived from a common ancestral linkage groupz Synteny: loci on the same chromosomez Colinearity: syntenic regions with conserved gene orderInput/Outputz Input: • genetic maps of 2 species• marker/gene correspondences (homologs)z Output:• a comparative map• homeologies identifiedMap construction3S8L10L3LMaize 1 (target), Rice (base)Wilson et al. Genetics 1999pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10Lrz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Go from thisto thisChromosome labelingMaize 1 (target), Rice (base)Wilson et al. Genetics 1999Maize 1pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10Lrz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Rice3S8L10L3LA natural model?Maize 1 (target), Rice (base)Wilson et al. Genetics 1999Maize 1pds1 (3S)rz742a (2S)rz103b (2L)cdo1387b (3S)isu040 (3)rz574 (3S)cdo38a (7L)cdo938a (3S)rz585a (3S)rz672a (3S)isu081b (3S 10Lrz323a (8L)cdo344c (12L)rz296a (5L)bcd734b (3S)rz500 (10L)rz421 (10L)isu74 (3S)cdo464a (8L)isu73 (3S)cdo475b (6S)cdo595 (8L)cdo116 (8L)rz28a (8L)cdo99 (8L)rz698a (9L)bcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)bcd1072c (5C)isu92b (3L)cdo122a (3L)rz912a (3L)bcd808a (11S)cdo246 (3L)adh1 (11S)cdo353b (3L)isu106a (3L)phi1 (3L)Rice3S8L10L3LScoring10L3Lsmbcd207a (10L)cdo94b (10L)bcd386a (10L)isu78 (5L)csu77 (10L)cdo98b (10L)rz630e (3L)rz403 (3L)cdo795a (3L)isu92b (3L)Assumptionsz Accept published marker orderz All linkage groups of base are uniquez Simplistic homeology criteriaz At least one homeologous regionA natural model?A natural model?A natural model?A natural model?Dynamic programmingz li= location of homolog to marker iz S[i,a] = penalty (score) for an optimal labeling of the submap from marker i to the end, when labeling begins with label aa1 ... i ... nRecurrence relationS[n,a] = m δ(a, ln)S[i,a] = m δ(a, li) + min (S[i+1,b] + s δ(a,b) )b∈Lab... ii+1 ... nlili+1lna... n... lnProblem with linear models = 2a-b-c motif:abcscore: 2s = 4aa a bbbccca-b-a motif:a score: 3m = 3aa a bbbaaaThe stack modelz Segment at top of the stack can be:• pushed (remembered), later popped• replacedz Push and replace cost s -- pop is free.bbbfedcacScorings9L7L7L“free”popmmmuaz265a (7L)isu136 (2L)isu151 (7L)rz509b (7L)cdo59c (7L)rz698c (9L)bcd1087a (9L)rz206b (9L)bcd1088c (9L)csu40 (3S)cdo786a (9L)csu154 (7L)isu113a (7L)csu17 (7L)cdo337 (3L)rz530a (7L)Dynamic programmingz S[i,j,a] = score for an optimal labeling of:• submap from marker i to marker j• when labeling begins with label a --i.e., marker i is labeled aa1 ... i ... j ... nRecurrence relationz S[i,i,a] = m δ(a, li) z S[i,j,a] = min:m δ(a, li) + min (S[i+1,j,b] + s δ(a,b) )min S[i,k,a] + S[k+1,j,a]i<k<jb∈Laa1 ... i ... k+1 ... j ... na1 ... ii+1 ... nab1 ... ii+1 ... nResults: infers evolutionary eventsMaize 1 (target) Rice (base)Wilson et al.StackProblem: Incomplete inputz Gene order not always fully resolved.z Co-located genes can be ordered to give most parsimonious labeling.8p19p33.0 Atp6b1 (8p)33.0 Comp (19)33.0 Jak3 (19p)33.0 Jund1 (19p)33.0 Lpl (8p)33.0 Mel (19p)33.0 Npy1r (4q)33.0 Pde4c (19)33.033.0 Srebf1 (17p)Slc18a1 (8p)Atp6b1 (8p)Lpl (8p)Npy1r (4q)Srebf1 (17p)Comp (19)Jak3 (19p)Jund1 (19p)Mel (19p)Pde4c (19)Slc18a1 (8p)=8p19pThe reordering algorithmz Uses a compression scheme• Within a megalocus, group genes by location of related gene.• Order these groups• First, last groups interact with nearby genes• Any ordering of internal groups is equally parsimoniousThe reordering algorithmThe reordering algorithmDefinitionsδ extended to distance to a set A of labels0 if a ∈ A, 1 otherwiseS = the set of indices of supernode start elementsFor simplicity, call supernode i ∈ S δ(a, A) =DefinitionsFor i ∈ S:z ni= # markers in iz ni(a) = # markers in i with a homolog on az li= set of labels matching markers in i• li= {a ∈ L | ni(a) ≥ 1},Definitionsz pi(c) gives mismatched marker and segment boundary penalties for label cpi(c) = s : m ni(c) ≥ sm ni(c) : m ni(c) ≤ sDefinitionsz p(i,a,b) gives the total mismatched marker and segment boundary penalties attributed to “hidden markers”Σ (pi(c)) + m


View Full Document

CU-Boulder CSCI 7000 - Comparative Genome Maps

Download Comparative Genome Maps
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Comparative Genome Maps and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Comparative Genome Maps 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?