Unformatted text preview:

Network Archaeology Uncovering Ancient Networks from Present day Interactions Saket Navlakha Department of Computer Science University of Maryland College Park USA Joint work with Carl Kingsford Yeast PPI network 2 599 proteins 8 275 interaction edges Did these proteins interact 10 million years ago Which model of evolution best characterizes the structure of this network How have these protein complexes reconfigured over evolutionary time Last fm social network 2 957 users 9 659 friendship edges When did this user enter the network How do growth principles of social networks differ from those of biological networks Who recruited these users to join the network How are networks changing over time Difficult to study in biology because we do not have access to ancestral networks Extant networks are insufficient to track fine level changes Difficult to study in sociology if networks are not crawled regularly or if data remains hidden privacy issues 2002 human fly worm yeast 1 billion years widely separated in time 2010 What can we do with ancestor networks Estimate the age of a node or a edge Track the emergence of clusters and motifs 1 3 DMC 2 4 5 Down sample to create realistic smaller network Validate growth models by comparing histories t 100 t 500 Networks vs sequence view of evolution PA FF ER Privacy implications of predicting user activity Truth Sequence Networks Actual Predicted Challenge Given a present day network G and a probabilistic model by which G putatively evolved can we reconstruct an ancestral version of G Related work Do not handle node labels Leskovec and Faloutsos ICML 2007 and Hormozdiari PLoS Comp Biol 2007 find seeds graphs putative ancestral subnetworks under the Kronecker and a duplication based model Do not model the evolution of interactions over time Flannick Genome Res 2006 Kelley PNAS 2003 and Singh RECOMB 2007 align multiple static networks to find conserved substructures Do not reconstruct ancestral networks Middendorf PNAS 2005 Wiuf PNAS 2006 Bez kov ICML 2006 and Leskovec KDD 2008 compute the likelihood that a growth model produced a given network Do not apply to general graphs Dutkowski Bioinformatics 2007 and Gibson and Goldberg PSB 2009 use the history encoded by the gene sequence phylogeny to reconstruct ancient networks Ahmed and Xing PNAS 2009 use node attributes e g gene expression Senator voting patterns to recover a hidden underlying static network Mithani Bioinformatics 2009 use principles from metabolic networks to model edge gain loss amongst fixed set of nodes Network growth models Preferential Attachment Forest Fire Barab si Science 1999 v Leskovec KDD 2005 Degree distribution 1 v 1 3 2 4 5 2 3 4 5 new node v links to k existing nodes according to the histogram u node v chooses anchor u starts probabilistic re links to burnt nodes Popular technique to study how graph properties evolve under various growth mechanisms Duplication Mutation with Complementarity model A biological growth mechanism that closely characterizes PPI networks1 2 3 u Step 1 v u Step 2 v u v Step 3 Step 1 Node v enters chooses random anchor node u and copies its links Step 2 For each neighbor x of v delete either u x or v x with prob qmod Step 3 Add edge u v with prob qcon 1Vazquez Complexus 2003 2Pereira Leal Phil Trans R Soc Lond B Biol 2006 3Middendorf PNAS 2005 BUT the growth model produces a random and unlabeled graph whose history does not correspond to the history of a real network Network archaeology Main idea reverse a network backwards in time as per the model a ab cg efd Network Model b e f g Ancestor graph at time t 4 Duplication based DMC Preferential Attachment PA Forest Fire FF Erd s Renyi ER Watts Strogatz c d Input graph at time t maintains labels models interactions likelihood estimate Given Gt and DMC find the most likely precursor network G t t present day network Gt t Zme steps argmax Pr Gt t Gt M t Gt t ancestor network DMC model Given Gt and DMC find the most likely precursor network G t t present day network Gt t Zme steps argmax Pr Gt t Gt M t Gt t ancestor network DMC model BUT number of possible paths is n G1 G1 G1 Gt 1 Gt Gt 1 G1 High likelihood path Gt 1 n 1 n Find most likely path of graphs G1 G2 Gt 1 that produced Gt under DMC Given Gt and DMC find the most likely precursor network G t t present day network Gt t argmax Pr Gt t Gt M t Gt t ancestor network Instead set t 1 greedily approximate Gt 1 Zme steps DMC model prior probability uniform Pr Gt Gt 1 M Pr Gt 1 M argmax Pr Gt M Gt 1 Bayes rule ignore denominator constant over all candidates What is the likelihood that u is a duplicate of v 1 LDMC u v n 1 N u N v Prob that u was chosen as v s anchor E g v u uv Gt 1 1 qmod Common neighbors of u and v no divergence 1 q 1 qmod 2 mod 2 qcon n 1 2 N u N v qmod uv 2 qcon if u v exists Symmetric difference 1 qcon else of u and v divergence In each step merge the pair with the highest likelihood of duplicating argmax LDMC u v u v Gt Gt Synthetic DMC experiments 100 graphs 100 nodes each DMC 0 1 0 1 Assess performance when evolutionary history is known DMC 0 1 0 3 DMC 0 1 0 5 DMC 0 1 0 7 DMC 0 1 0 9 Reverse Gt 100 Compare histories Repeat 100x DMC 0 3 0 1 DMC 0 9 0 9 Store known node anchor phylogeny and node arrival Zmes Synthetic DMC results 50 60 of node anchor relationships correctly identified Randomly replace 10 of true edges in Gt 100 Better ordering with low qmod less divergence and high qcon connected to anchor Recovery of ancient PPI networks Started only with extant PPI network for yeast Validating arrival times older proteins have orthologs in more eukaryotic species COG age class1 Result The biologicallyinspired DMC model best matches the sequence based age classes P 0 01 PA DMC 0 4 0 7 FF 0 3 Evolutionary principles reflected in best DMC parameters med low qmod 0 4 med high qcon 0 7 new proteins are usually connected to their duplicates2 3 and share some interaction partners 1Clusters of Orthologous Genes Tatusov BMC Bioinforma cs 2003 2Ispolatov NAR 2005 3Pereira Leal Genome Biol 2007 Validating node anchors If two proteins are duplicates they likely share a functional annotation tested using MIPS complexes u Gt 1 u v Gt Result 84 of identified DMC node anchors belong to same complex vs 68 for FF and 55 baseline no anchors for PA The evolution rate of proteins How does the evolution rate of a protein depend on its degree Result Duplication rate is inversely proportional to its of binding partners agreeing with rates found


View Full Document

UMD CMSC 423 - Network Archaeology: Uncovering Ancient Networks from Present-day Interactions

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Loading Unlocking...
Login

Join to view Network Archaeology: Uncovering Ancient Networks from Present-day Interactions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Network Archaeology: Uncovering Ancient Networks from Present-day Interactions and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?