DOC PREVIEW
UCI INF 141 - The web as a directed graph

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

•Pages are nodes•Links are directed edges•<a href=”http://www.ics.uci.edu”>ICS</a>•...this links my page to the ICS home page•LinkAnalysis/PageRank has its origins in bibliometrics•“Measurement of influence among publications based on citations”•Just as citing a paper confers authority upon it, linking to a page confers authority to it.The web as a directed graphLink Analysis•Two ways of measuring similarity of scientific articles:•Cocitation similarity: The two articles are cited by the same articles•Bibliographic coupling similarity: The two articles site the same articlesBibliometricsLink AnalysisCo-citation similarityLink AnalysisPaper APaper BBibliographic coupling similarityLink AnalysisPaper APaper B•Citation frequency can be used to measure impact•Each article gets one vote•Not a very accurate measure•Better measure: weighted citation frequency/ citation rank•An article’s vote is weighted according to its citation impact.•Sounds circular, but can be formalized in a well-defined way•This is basically PageRank•Invited for citation analysis in the 1960’s by Pinsker and NarinBibliometricsLink Analysis•A citation in scientific literature is like a link on the webKey ObservationLink Analysis•First retrieve all pages meeting the query•First generation: •Then order them by their link popularity•citation frequency•Easy to spam. Why?•Second generation:•Order them by their weighted link popularity•PageRankLink based query processingLink Analysis•First retrieve all pages meeting the query•First generation: •Then order them by their link popularity•citation frequency•Easy to spam. Why?•Second generation:•Order them by their weighted link popularity•PageRankLink based query processingLink Analysis•A full search engine balances many different scores•cosine similarity•term proximity•zone scoring•PageRank•contextual relevance (implicit queries)Link based query processingLink Analysis•Every webpage gets a score•between 0 and 1•it’s PageRank•The random walk•Start at a random page•Follow an out edge with equal probability•In the long run each page has a long-term visit rate.PageRankLink Analysis33%33%33%•PageRank is a page’s long-term steady state visit rate based on a random walk model.PageRankLink Analysis•The web is full of dead-ends•A random walk can get stuck in dead-ends•Makes no sense to talk about long-term visit ratesVisit Rate not quite enoughLink Analysis•At a dead end, jump to a random web page•at any non-dead end, with probability 10% jump to a random web page anyway•the other 90% choose a random out link•“10%” is a tunable parameterTeleportingLink Analysis•Now we cannot get stuck locally•There is a long-term visit rate at which any page is visited.•How do we computer the visit rate?•aka How do we compute PageRank?•(By the way this is a Markov Model)TeleportingLink Analysis•Now we cannot get stuck locally•There is a long-term visit rate at which any page is visited.•How do we computer the visit rate?•aka How do we compute PageRank?•(By the way this is a Markov Model)TeleportingLink Analysis•A Markov Chain is a mathematical “game”•It consists of n states •corresponds to web pages•And a transition probability matrix•corresponds to links•it is like an adjacency matrixMarkov ChainsLink Analysis•At any moment in the game we are in one of the states•In the next step we move to a new state•We use the transition matrix to decide which state to move into.•If you are in state “i” then the probability of moving into state “j” is P(i -> j)Markov ChainsLink Analysis•Compute the parameters of the Markov Chain for this graphical modelExerciseLink


View Full Document

UCI INF 141 - The web as a directed graph

Download The web as a directed graph
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The web as a directed graph and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The web as a directed graph 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?