Computational Biology, Part 2 Searching with Entrez/Sequence MotifsExample Entrez SessionExample Entrez Session: home of EntrezExample Entrez Session: search OMIM for ‘cystic fibrosis’Example Entrez Session: first hit is CFTRExample Entrez Session: after clicking linksNucleotideExample Entrez Session: after clicking linksProteinExample Entrez Session: Protein sequence from original cDNAExample Entrez Session: change ‘Send to’ to ‘File’Example Entrez Session: LinksPubMedExample Entrez Session: paper in PubMed that is relatedExample Entrez Session: Related ArticlesComputation of related articlesComputation of related articles: words consideredView the MeSH terms: change ‘Display’ to ‘Citation’Computation of related articles: weight of each wordComputation on related articles: Similarity score of two articlesExample Entrez Session: search Nucleotide for cftrExample Entrez Session: 1249 hits related to cftrExample Entrez Session: set limits as title and mRNAExample Entrez Session: 46 hits with limitsExample Entrez Session: further narrow it down to humanBlock Diagram for Entrez Literature SearchingSequence Analysis TasksDefinitionSequence featuresConsensus sequencesFinding occurrences of consensus sequencesInteractive DemonstrationBlock Diagram for Search with a Consensus SequenceDescribing features using frequency matricesSlide 32Frequency matrices (continued)Slide 34Frequency Matrices, PSSMs, and ProfilesMethods for converting frequency matrices to PSSMsPseudo-countsFinding occurrences of a sequence feature using a ProfileSlide 39Block Diagram for Building a PSSMBlock Diagram for Searching with a PSSMBlock Diagram for Searching for sequences related to a family with a PSSMConsensus sequences vs. frequency matricesSlide 44Reading for next classComputational Biology, Part 2Searching with Entrez/Sequence MotifsComputational Biology, Part 2Searching with Entrez/Sequence MotifsRobert F. MurphyRobert F. MurphyCopyright Copyright 1996, 1999-2008. 1996, 1999-2008.All rights reserved.All rights reserved.Example Entrez SessionExample Entrez SessionGoal: Find literature and sequences for cystic Goal: Find literature and sequences for cystic fibrosis genesfibrosis genesUse Use OMIMOMIM with with KeywordKeyword searching. searching.Switch to Switch to NucleotideNucleotide database to see sequence. database to see sequence. Switch to Switch to ProteinProtein database to see sequence. database to see sequence.Change to Change to GenPeptGenPept format to save sequence. format to save sequence.Use Use links links to find related literatures in to find related literatures in pubmed.pubmed.Use Use Related ArticlesRelated Articles to find similar articles. to find similar articles.Search the Search the NucleotideNucleotide database by database by genegene name. name.Set Set LimitsLimits to narrow down the search to narrow down the searchExample Entrez Session:home of EntrezExample Entrez Session:home of EntrezExample Entrez Session:search OMIM for ‘cystic fibrosis’Example Entrez Session:search OMIM for ‘cystic fibrosis’Example Entrez Session:first hit is CFTRExample Entrez Session:first hit is CFTRExample Entrez Session:after clicking linksNucleotideExample Entrez Session:after clicking linksNucleotideExample Entrez Session:after clicking linksProteinExample Entrez Session:after clicking linksProteinExample Entrez Session:Protein sequence from original cDNA Example Entrez Session:Protein sequence from original cDNAExample Entrez Session:change ‘Send to’ to ‘File’Example Entrez Session:change ‘Send to’ to ‘File’Example Entrez Session:LinksPubMedExample Entrez Session:LinksPubMedExample Entrez Session:paper in PubMed that is relatedExample Entrez Session:paper in PubMed that is relatedExample Entrez Session:Related ArticlesExample Entrez Session:Related ArticlesComputation of related articlesComputation of related articlesSimilarity between documents is measured Similarity between documents is measured by the words they have in common:by the words they have in common:Which words are considered?Which words are considered?What is the weight of each word ?What is the weight of each word ?How do we calculate a similarity score of two How do we calculate a similarity score of two articles?articles?Computation of related articles: words consideredComputation of related articles: words consideredRemove stopwords: uninformativeRemove stopwords: uninformativeStem wordsStem wordsWords from the abstract are “text words”Words from the abstract are “text words”Words from the title are put in twiceWords from the title are put in twiceWords from the MeSH termsWords from the MeSH termsU.S. National Library of MedicineU.S. National Library of MedicineVocabulary used for indexing articles Vocabulary used for indexing articles Consistent way to retrieve informationConsistent way to retrieve informationView the MeSH terms:change ‘Display’ to ‘Citation’View the MeSH terms:change ‘Display’ to ‘Citation’Computation of related articles: weight of each word Computation of related articles: weight of each word Global weight:Global weight:Greater, if the word is less frequent in the whole Greater, if the word is less frequent in the whole databasedatabaseLocal weight: Local weight: Greater, if the word is more frequent in the Greater, if the word is more frequent in the documentdocumentLonger document is not favoredLonger document is not favoredComputation on related articles: Similarity score of two articlesComputation on related articles: Similarity score of two articlesWeight of one pair of common word:Weight of one pair of common word: local wt1 * local wt2 * global wtlocal wt1 * local wt2 * global wtSimilarity of two articles: sum of weights Similarity of two articles: sum of weights of all common wordsof all common wordsThe higher the score the closer the two The higher the score the closer the two articlesarticlesSimilarity scores are pre-computedSimilarity scores are pre-computedExample Entrez Session:search Nucleotide for cftrExample Entrez Session:search Nucleotide for cftrExample Entrez Session:1249 hits related to cftrExample Entrez Session:1249 hits related to cftrExample Entrez Session:set limits as title and mRNAExample Entrez Session:set limits as title and mRNAExample Entrez Session:46 hits with limits
View Full Document