DOC PREVIEW
BU CS 565 - More on Rankings

This preview shows page 1-2-3-4-5-6-7-46-47-48-49-50-51-92-93-94-95-96-97-98 out of 98 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 98 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

More on Rankings Query independent LAR Have an a priori ordering of the web pages Q Set of pages that contain the keywords in the query q Present the pages in Q ordered according to order What are the advantages of such an approach InDegree algorithm Rank pages according to in degree wi B i w 3 w 2 w 2 w 1 w 1 1 2 3 4 5 Red Page Yellow Page Blue Page Purple Page Green Page PageRank algorithm BP98 Good authorities should be pointed by good authorities Random walk on the web graph pick a page at random with probability 1 jump to a random page with probability follow a random outgoing link Rank according to the stationary distribution PR q 1 PR p q p F q 1 n 1 2 3 4 5 Red Page Purple Page Yellow Page Blue Page Green Page Markov chains A Markov chain describes a discrete time stochastic process over a set of states S s1 s2 sn according to a transition probability matrix P Pij Pij probability of moving to state j when at state i jPij 1 stochastic matrix Memorylessness property The next state of the chain depends only at the current state and not on the past of the process first order MC higher order MCs are also possible Random walks Random walks on graphs correspond to Markov Chains The set of states S is the set of nodes of the graph G The transition probability matrix is the probability that we follow an edge from one node to another An example 0 0 A 0 1 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1 0 12 12 0 0 0 0 0 0 1 P 0 1 0 0 0 1 3 1 3 1 3 0 0 1 2 0 0 0 1 2 v2 v1 v3 v5 v4 State probability vector The vector qt qt1 qt2 qtn that stores the probability of being at state i at time t q0i the probability of starting from state i qt qt 1 P An example 0 12 12 0 0 0 0 0 P 0 1 0 0 1 3 1 3 1 3 0 1 2 0 0 12 0 1 0 0 0 v2 v1 v3 qt 11 1 3 qt4 1 2 qt5 qt 12 1 2 qt1 qt3 1 3 qt4 qt 13 1 2 qt1 1 3 qt4 qt 14 1 2 qt5 qt 15 qt2 v5 v4 Stationary distribution A stationary distribution for a MC with transition matrix P is a probability distribution such that P A MC has a unique stationary distribution if it is irreducible the underlying graph is strongly connected it is aperiodic for random walks the underlying graph is not bipartite The probability i is the fraction of times that we visited state i as t The stationary distribution is an eigenvector of matrix P the principal left eigenvector of P stochastic matrices have maximum eigenvalue 1 Computing the stationary distribution The Power Method Initialize to some distribution q0 Iteratively compute qt qt 1P After enough iterations qt Power method because it computes qt q0Pt Rate of convergence determined by 2 The PageRank random walk Vanilla random walk make the adjacency matrix stochastic and run a random walk 0 12 12 0 0 0 0 0 P 0 1 0 0 1 3 1 3 1 3 0 1 2 0 0 12 0 1 0 0 0 The PageRank random walk What about sink nodes what happens when the random walk moves to a node without any outgoing inks 0 12 12 0 0 0 0 0 P 0 1 0 0 1 3 1 3 1 3 0 1 2 0 0 12 0 0 0 0 0 The PageRank random walk Replace these row vectors with a vector v typically the uniform vector 0 0 12 12 0 1 5 1 5 1 5 1 5 1 5 P 0 1 0 0 0 1 3 1 3 1 3 0 0 1 2 0 0 1 2 0 P P dvT 1 if i is sink d 0 otherwise The PageRank random walk How do we guarantee irreducibility add a random jump to vector v with prob typically to a uniform vector 0 0 12 12 0 1 1 5 1 5 1 5 1 5 1 5 1 P 0 1 0 0 0 1 1 1 3 1 3 1 3 0 0 1 1 1 2 0 0 0 1 2 P P 1 uvT where u is the vector of all 1s 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 1 1 1 1 1 5 5 5 5 5 Effects of random jump Guarantees irreducibility Motivated by the concept of random surfer Offers additional flexibility personalization anti spam Controls the rate of convergence the second eigenvalue of matrix P is A PageRank algorithm Performing vanilla power method is now too expensive the matrix is not sparse Efficient computation of y P T x q0 v t 1 repeat y P T x qt P qt 1 T qt qt 1 t t 1 until x 1 y1 y y v Random walks on undirected graphs In the stationary distribution of a random walk on an undirected graph the probability of being at node i is proportional to the weighted degree of the vertex Random walks on undirected graphs are not interesting Research on PageRank Specialized PageRank personalization BP98 instead of picking a node uniformly at random favor specific nodes that are related to the user topic sensitive PageRank H02 compute many PageRank vectors one for each topic estimate relevance of query with each topic produce final PageRank as a weighted combination Updating PageRank Chien et al 2002 Fast computation of PageRank numerical analysis tricks node aggregation techniques dealing with the Web frontier Topic sensitive pagerank HITS based scores are very inefficient to compute PageRank scores are independent of the queries Can we bias PageRank rankings to take into account query keywords Topic sensitive PageRank Topic sensitive PageRank Conventional PageRank computation r t 1 v u N v r t u d v N v neighbors of v d v degree of v r Mxr M 1 P 1 n nxn r 1 P r 1 n nxnr 1 P r p p 1 n nx1 Topic sensitive PageRank r 1 P r p Conventional PageRank p is a uniform vector with values 1 n Topic sensitive PageRank uses a non uniform personalization vector p Not simply a post processing step of the PageRank computation Personalization vector p introduces bias in all iterations of the iterative computation of the PageRank vector Personalization vector In the random walk model the personalization vector represents the addition of a set of transition edges where the probability of an artificial edge u v is pv Given a graph the result of …


View Full Document

BU CS 565 - More on Rankings

Documents in this Course
Load more
Download More on Rankings
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view More on Rankings and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view More on Rankings 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?