Unformatted text preview:

Comparing Mouse vs Human GenomesComparisons at the genome level are a much hardercomputational and theoretical problem.From International Human Genome Sequencing Consortium(2001), Nature.At the finer scale, we can start to see patterns.From Gregory et al. (2002), Nature.Within the genome of a single species, there are manyduplications, translocations, and inversions.From The Arabidopsis Genome Initiative (2000), Nature.How genomes involve through duplication.From Deonier, Tavaré and Waterman, 2005.How much of the genome is conserved?IYeast genome contains 70% coding sequences.IHuman genome contains 1.2% protein coding sequence.Does the stationarity assumption work?From Venter J.C. et al, 2001 Science.Definition of TermsIHomology (of genes) = similarity due to common ancestry.There are two types of homology, the distinction dependson ordering of speciation and gene duplication dates.IOrthologues = the “same” gene in different organisms, thatis, common ancestry goes back to a speciation event.IParalogues = different genes in the same organism, thatis, common ancestry goes back to a gene duplication.IThere are other forms of homology, such as lateral genetransfer.SyntenyIlinked genes = genes that reside on the samechromosome.Iconserved synteny = a group of linked genes that arehighly conserved and hypothesized to be homologous.Isyntenic segment = A group of landmarks that appear inthe same order on a single chromosome in each of the twospecies.Isyntenic block = A set of adjacent syntenic segments.SyntenyIlinked genes = genes that reside on the samechromosome.Iconserved synteny = a group of linked genes that arehighly conserved and hypothesized to be homologous.Isyntenic segment = A group of landmarks that appear inthe same order on a single chromosome in each of the twospecies.Isyntenic block = A set of adjacent syntenic segments.SyntenyIlinked genes = genes that reside on the samechromosome.Iconserved synteny = a group of linked genes that arehighly conserved and hypothesized to be homologous.Isyntenic segment = A group of landmarks that appear inthe same order on a single chromosome in each of the twospecies.Isyntenic block = A set of adjacent syntenic segments.SyntenyIlinked genes = genes that reside on the samechromosome.Iconserved synteny = a group of linked genes that arehighly conserved and hypothesized to be homologous.Isyntenic segment = A group of landmarks that appear inthe same order on a single chromosome in each of the twospecies.Isyntenic block = A set of adjacent syntenic segments.SyntenySyntenyGenome Alignment1. To align a whole genome we assume that the syntenicregions have already been found through homologousgenes. Next, the vast non-coding regions need to bealigned.2. Alignment of non-coding regions is much harder, due tothe low conservation.3. To combine speed and sensitivity, most programs use usean anchored-alignment approach: In a first step, a fastsearch tool is used to identify a chain of high-scoringsequence similarities. These similarities are then used asanchor points for the final alignment, where a moresensitive method aligns those regions that are left overbetween the identified anchor points.4. This is what the fast pair-wise alignment algorithms BLASTand FASTA. For genome alignment, the programs differ byhow the details of how the anchors are strung up, howmany anchors to use, etc.Genome Alignment1. To align a whole genome we assume that the syntenicregions have already been found through homologousgenes. Next, the vast non-coding regions need to bealigned.2. Alignment of non-coding regions is much harder, due tothe low conservation.3. To combine speed and sensitivity, most programs use usean anchored-alignment approach: In a first step, a fastsearch tool is used to identify a chain of high-scoringsequence similarities. These similarities are then used asanchor points for the final alignment, where a moresensitive method aligns those regions that are left overbetween the identified anchor points.4. This is what the fast pair-wise alignment algorithms BLASTand FASTA. For genome alignment, the programs differ byhow the details of how the anchors are strung up, howmany anchors to use, etc.Genome Alignment1. To align a whole genome we assume that the syntenicregions have already been found through homologousgenes. Next, the vast non-coding regions need to bealigned.2. Alignment of non-coding regions is much harder, due tothe low conservation.3. To combine speed and sensitivity, most programs use usean anchored-alignment approach: In a first step, a fastsearch tool is used to identify a chain of high-scoringsequence similarities. These similarities are then used asanchor points for the final alignment, where a moresensitive method aligns those regions that are left overbetween the identified anchor points.4. This is what the fast pair-wise alignment algorithms BLASTand FASTA. For genome alignment, the programs differ byhow the details of how the anchors are strung up, howmany anchors to use, etc.Genome Alignment1. To align a whole genome we assume that the syntenicregions have already been found through homologousgenes. Next, the vast non-coding regions need to bealigned.2. Alignment of non-coding regions is much harder, due tothe low conservation.3. To combine speed and sensitivity, most programs use usean anchored-alignment approach: In a first step, a fastsearch tool is used to identify a chain of high-scoringsequence similarities. These similarities are then used asanchor points for the final alignment, where a moresensitive method aligns those regions that are left overbetween the identified anchor points.4. This is what the fast pair-wise alignment algorithms BLASTand FASTA. For genome alignment, the programs differ byhow the details of how the anchors are strung up, howmany anchors to use, etc.For example, CHAOS, which was developed here byBatzouglou’s group, uses the following seed-and-extensionscheme.Questions to think about1. How should one frame the null hypothesis in genomealignment? Is it relevant?2. How should one choose the parameters for the alignment?3. How sensitive is the “optimal” alignment to the alignmentparameters?4. What does “homology” mean when it applies to non-codingregions? What is the unit of measurement? Can it possiblybe inferred at the nucleotide level?Questions to think about1. How should one frame the null hypothesis in genomealignment? Is it relevant?2. How should one choose the parameters for the


View Full Document

Stanford STATS 345 - Comparing Mouse vs Human Genomes

Download Comparing Mouse vs Human Genomes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Comparing Mouse vs Human Genomes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Comparing Mouse vs Human Genomes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?