Unformatted text preview:

Node RankingLidan Wang and Brad SkaggsCMSC 828G1TiTopicsWhat is PageRankWhat is PageRank PageRank algorithm [Brin and Page 2002] Topic-sensitive PageRank [Richardson et al. 2002, Haveliwala 2002] HITS [Kleinberg 2002] Other variationsApplicationsApplicationsIt d tiIntroductionWeb search results: quantity vs qualityWeb search results: quantity v.s. qualityAmount of web pages increases rapidlyHuman ability to inspect information is limitedypUsers only want to see a few very relevant pagesNeeded: a reliable measure for relevance and importanceSolution: analysis of link structurePRkititiPageRank intuitionHow can we tell a page’s importance?How can we tell a pages importance? Relies on the link structure of the web to indicate an page’s value. A hyperlink from page A to page B as a vote, for page B by page A.A page is “important” if:A page is “important” if: There are many pages pointing to it. These pages are considered valuable.ElExampleTaken from http://en.wikipedia.org/wiki/PageRankPageRank (Definition) [Brin and Page 2002]2002] Use the link structure of the web to calculate a qualityUse the link structure of the web to calculate a quality ranking (PageRank) for each web page. Assigns a numerical value to each element of a hyperlinked set of documents, such as WWW. “Measuring” an element’s relative importance within the setthe set The numerical weight assigned to an element E is called the PageRank of E.Li k t (1)Links as votes (1)A link from page B to page A denotes endorsement:A link from page B to page A denotes endorsement:  Page B considers page A an authority on a subject An authority value or PageRank(PR) value is assigned to every page (document)every page (document)025PR(A) = 0.75AB0.250250.25DC0.25Li k t (2)Links as votes (2)PR conferred by an outgoing link is the document’sownPRPR conferred by an outgoing link is the document s own PR divided by the number of outgoing links:PR(A) = PR(B)/2 + PR(C) + PR(D)/3ABPR(B)/2PR(D)/3DCPR(C)PR(D)/3PR(A) = PR(t1)/O(t1)+ +PR(tn)/O(tn); t= page linking to ADCPR(A) = PR(t1)/O(t1)+…+PR(tn)/O(tn); ti= page linking to A, O(ti): number of links from tiSi lifi d P R k l ithSimplified PageRank algorithm0.2A BThe Web as a directed graph G= (V, E). Let the total number of pages be n. The PageRank score of the page i0.4 0.20.20.20.4PR(i) =PR(j)Oj(ji)E∑,Ojis the number of out-link of j(denoted by PR(i)) is defined by:PageRank computations require several iterations until convergenceOj(j,i)∈Eoutlink of jC0.4Probability distribution represents the likelihood that a person randomly clicking th li k ill i t ti lPRk=[0.4; 0.2; 0.4]PRk+1=[0.4; 0.2; 0.4]on the links will arrive at a particular pageStationary PageRank vectorMti tiMatrix notionHyperlink matrixH=[H]in⎧ABHyperlink matrix H=[Hij] in which the entry in the ithrow and jthcolumn is:Hij=1Oiif (i, j) ∈ E0 otherwise⎧ ⎨ ⎪ ⎩ ⎪ C⎩ H=0 12 12001⎡ ⎢ ⎢ ⎤ ⎥ ⎥ ABCABLet PR= (PR(1), …, PR(n))TPR=HTPRH 0 0 1 1 0 0⎣ ⎢ ⎢ ⎦ ⎥ ⎥ BCStochastic matrix: all entries >=0; We can write equations: PRHPReach row sums up to 1Sl f P R kSolve for PageRankABABCA well known mathematicalPRk+1HTPRkABH = 0 12 120 0 1 ⎡ ⎢ ⎢ ⎢⎤ ⎥ ⎥ ⎥ABCABA well known mathematical technique called power iteration can be used to find PR ie PRk+1=HTPRkC1 0 0⎣ ⎢ ⎦ ⎥ C PR0PR1PR2PR3PRki.e.The solution to PR is an eigenvector with thePR PRPRPRPR0.33 0.33 0.33 0.42 0.40 0.33 0.17 0.25 0.17 0.20 eigenvector with the corresponding eigenvalue 1.If some conditions are satisfied, 0.33 0.50 0.42 0.42 0.40 ,1 is the largest eigenvalue and the PageRank vector PR is the principal eigenvector. Problem: the Web graph does not meet the conditions (later).Rd fiRandom surfingMarkov chain.Markov chain. Each web page in the Web graph is a state.  A link is a transition from one state to another.We can model web surfing as a stochastic process.We can model web surfing as a stochastic process.  It models a web surfer randomly clicks links without clicking back buttonclicking back button.  The fraction of time the surfer would spend at a page if the random process were iterated at infinitum is itsthe random process were iterated at infinitum is its PageRank.CConvergence Markov Chain Theorem:  A finite Markov chain defined by the stochastic matrix H has a unique stationary probability distribution if H is irreducible and aperiodic.  After a number of transitions PRkwill converge to a steady-state probability vectorlimk→∞PRk=PR When the steady-state is reached, then PRk= PRk+1= PRPRHTPRk→∞PR=HTPR PR is the principal eigenvector of HTwith eigenvalue 1.Wb hWeb graph Questions: Is His a stochastic matrix? IsHirreducible?Is Hirreducible?  Is Hand aperiodic? If the above conditions are not satisfied, how to extend the simplified equation to the actual PageRank formula?IHthti?(1)IsHstochastic? (1)Is the sum of entries in each row equal to 1?Is the sum of entries in each row equal to 1? If web pages have no out-links, i.e. some rows in transition matrix Hhave all 0’s. They are called the dangling pages:00001⎛ ⎞ v1H =0000110 00001/2 0 01/20012012⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ v1v2 v5f 00 000⎝ ⎜ ⎜ ⎜ ⎠ ⎟ ⎟ ⎟ v4v3Entries of all 0’s!IHthti?(2)IsHstochastic? (2)Is the sum of entries in each row equal to 1?Is the sum of entries in each row equal to 1? If web pages have no out-links, i.e. some rows in transition matrix Hhave all 0’s. They are called the dangling pages:00001⎛ ⎞ v1H =000011000001/20 01/20012012⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ v1v2 v5 1/5 1/5 1/5 1/5 1/5⎝ ⎜ ⎜ ⎜ ⎠ ⎟ ⎟ ⎟ v4v3FixedIHi d bl d i di ?IsHirreducable and aperiodic?Irreducible:Aperiodic: States vi and vj can communicate in the directed graph, i.e. a path exists between them.Aperiodic:State vi is periodic.Aperiodic Markov chain: none Irreducible Markov chain: all states communicateAperiodic Markov chain: none of the states is periodicv2v1v2v1v0v3v4v0v3v4H t fi th t iti t iHow to fix the transition matrixW fi th bl b ddi li k f h tWe can fix these problems by adding a link from each page to every page and give each link a small transition probability controlled by a parameter d.  Then, the


View Full Document

UMD CMSC 828G - Node Ranking

Documents in this Course
Lecture 2

Lecture 2

35 pages

Load more
Download Node Ranking
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Node Ranking and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Node Ranking 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?