DOC PREVIEW
Stanford CS 374 - Lecture 10

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Properties of Interaction Networks Lecturer: Jason Turner-Maier Scribe: Jerome Ku Based on: 1. Yu et al. Annotation Transfer Between Genomes: Protein-Protein Interologs and Protein-DNA Regulogs. Genome Res. 2004 14: 1107-1118. 2. Stumpf et al. Subnets of scale-free networks are not scale free: sampling properties of networks. PNAS, 2006 12: 4221-4224. Additional: Matthews et al. Identification of potential interaction networks using sequence-based searches for conserved protein–protein interactions or “interologs.” Genome Res. 2001 11: 2120-2126. Li, S. et al. A map of the interactome network of the metazoan C. elegans. Science 2004 303: 540–543. I. Introduction There is a deluge of data on biological systems. In order to make sense of this information, we need a framework on which to structure our interpretation and understanding of this data. Given the highly complex networks of protein interactions, a natural model to use are graphs. By appropriately applying graph theory to these interaction networks, we can not only clarify our understanding of system-level function of genetics but can also deduce additional insights into biological systems. For example, Schaub et al. demonstrated that additional links could be inferred from signal cascade pathways through protein interaction network modeling.Source: Schaub et al. Qualitative Crosstalk Analysis of Wnt and Notch Signaling in Mamalian Skin, RECOMB Satelite Conference on Systems Biology (2007). II. Transferring Annotation Between Genomes For many organisms, there is a good deal of sequence data but a relatively scarce amount of interaction data. We can utilize this sequence data to gain understanding into interaction information by using comparative genomics. By mapping well-studied genomes to those less well-understood by using sequence homology as a metric for comparison, we can improve our understanding of interaction even with incomplete information. This is the thrust of Yu et al.’s paper. Before diving in, we need to first understand a few terms: Homolog - Proteins which are statistically-similar in sequence. There are two kinds of homologs: Ortholog – Two sequences which are derived from a common ancestor but separated by a speciation event. Thus orthologs are proteins that serve the same function in different species. Paralog – Two sequences derived from a gene duplication event. Thus paralogs serve different functions within the same species.A and A’ are paralogs;SourceInterolog – A pair of interacting proteins which are orthologous. Similarity Metric We need a means to measure similarity of these protein pairs. are paralogs; A1 and A2, A1’and A2’ are orthologs. ource: Maier Powerpoint Presentation A pair of interacting proteins which are orthologous. Source: Yu et al. We need a means to measure similarity of these protein pairs.1) Joint Similarity IA is percent identity of orthologs A and IB is the percent identity of orthologs B, where percent identity is a commonly used measure of sequence similarity. JI then, is the geometric mean of the percent identities of the homologs. 2) Joint E-Value The E-values take into account length of sequence in calculating similarities, which percent identities does not. 3) Minimal Individual Similarity The minimum of the two individual similarities is taken as measure of similarity. Note that the results do not change much with similarity metric. Interolog Mapping Mapping interactions in the source organism onto a target organism to find possible interactions. Methods Unidirectional Best Hit Mapping – Original mapping procedure that selects all best-matching homologs between source and target organisms. Reciprocal (Bidirectional) Best Hit Mapping – Refines unidirectional by only considering reciprocal best-matches. Both these methods suffer from insufficient coverage of the interactome and low prediction accuracy.Generalized Interolog Mapping addresses the first problem by using all possible homologs of interacting proteins, as opposed to only “best matches.” Source: Yu et al. Gold Standard Positives We need a benchmark for measuring accuracy of interolog mapping. The best-studied and reliable data sets for interaction are for S. cerevisiae, and thus it makes sense to use this as a target organism and the MIPS complex catalog as a gold standard positive for verifying interactions. Negatives Previous work has been done to show pairs of proteins in different subcellular compartments serve as a good standard for noninteraction. Source DataThe organisms C. elegans, D.melanogaster, and H. pylori were chosen as source data sets since they are only organisms for which there exists interaction information at a large-scale. C. elegans – 410 interactions D. melanogaster – 4786 interactions H. pylori – 1465 interactions Schema and Validation Source: Yu et al. The idea is to use known interactions in the source organism to predict interactions in a target organism. These interactions are validated by comparing to gold positives and negatives: true positives are the predicted interactions that overlap with gold standard positives; false positives are predictions that overlap with gold standard negatives. ResultsSource: Yu et al. These graphs are a weighted average of the 4 mappings. Key takeaway is that these results give us a threshold for determining when interactions can or can not be reliably predicted. For example, for values of JI > 80, we know there is high likelihood of interaction, whereas for values less than 40, interactions are unlikely. Similarly,for values of JE < 10-70, there is a high likelihood of interaction. Unidirectional Bidirectional Generalized, all Generalized, top 5% # of Interacting Pairs 410 410 410 410 # of Interologs 84 33 9317 112 # of True Positives 25 18 162 35 Accuracy 30% 54% 2% 31%Source: Yu et al. From these results, we see that increasing adjusting JE results in a trade-off between predictive power and accuracy. The greater the JE, the more relationships we will predict to start out with, but the lower the accuracy; the smaller the threshold we use, the less relationships will be predicted but with greater accuracy. Source: Yu et al. The table above summarizes the results of the paper. Of importance are the JE cut-off values, which are thresholds for reliably predicting interaction.III. Subnets of scale-free networks


View Full Document

Stanford CS 374 - Lecture 10

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Lecture 10
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 10 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 10 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?