DOC PREVIEW
Stanford CS 276 - Link Analysis

This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

PowerPoint PresentationToday’s lecture – hypertext and linksLinks are everywhereExample 1: Good/Bad/UnknownSimple iterative logicSlide 6Slide 7Example 2: In-links to pages – unusual patterns Many other examples of link analysisOur primary interest in this courseThe Web as a Directed GraphAssumption 1: reputed sitesAssumption 2: annotation of targetAnchor Text WWW Worm - McBryan [Mcbr94]Indexing anchor textSlide 16Getting at all that link information InexpensivelyConnectivity ServerBoldi and Vigna 2004Adjacency listsAdjaceny list compressionMain ideas of Boldi/VignaBoldi/VignaGap encodingsMain advantages of BVSlide 26Citation AnalysisThe web isn’t scholarly citationPagerank scoringNot quite enoughTeleportingResult of teleportingMarkov chainsSlide 34Ergodic Markov chainsProbability vectorsChange in probability vectorHow do we compute this vector?Slide 39Hyperlink-Induced Topic Search (HITS)Hubs and AuthoritiesThe hopeHigh-level schemeBase setVisualizationDistilling hubs and authoritiesIterative updateScalingHow many iterations?Proof of convergenceHub/authority vectorsRewrite in matrix formIssuesResourcesIntroduction to Information RetrievalIntroduction to Information Retrieval Introduction toInformation RetrievalCS276Information Retrieval and Web SearchChris Manning and Pandu NayakLink analysisIntroduction to Information RetrievalIntroduction to Information Retrieval Today’s lecture – hypertext and linksWe look beyond the content of documentsWe begin to look at the hyperlinks between themAddress questions likeDo the links represent a conferral of authority to some pages? Is this useful for ranking?How likely is it that a page pointed to by the CERN home page is about high energy physicsBig application areasThe WebEmailSocial networksIntroduction to Information RetrievalIntroduction to Information Retrieval Links are everywherePowerful sources of authenticity and authorityMail spam – which email accounts are spammers?Host quality – which hosts are “bad”?Phone call logsThe Good, The Bad and The Unknown3????GoodBadIntroduction to Information RetrievalIntroduction to Information Retrieval Example 1: Good/Bad/UnknownThe Good, The Bad and The UnknownGood nodes won’t point to Bad nodesAll other combinations plausible4????GoodBadIntroduction to Information RetrievalIntroduction to Information Retrieval Simple iterative logicGood nodes won’t point to Bad nodesIf you point to a Bad node, you’re BadIf a Good node points to you, you’re Good5????GoodBadIntroduction to Information RetrievalIntroduction to Information Retrieval Simple iterative logicGood nodes won’t point to Bad nodesIf you point to a Bad node, you’re BadIf a Good node points to you, you’re Good6??GoodBadIntroduction to Information RetrievalIntroduction to Information Retrieval Simple iterative logicGood nodes won’t point to Bad nodesIf you point to a Bad node, you’re BadIf a Good node points to you, you’re Good7GoodBadSometimes need probabilistic analogs – e.g., mail spamIntroduction to Information RetrievalIntroduction to Information Retrieval Example 2:In-links to pages – unusual patterns 8Spammers violating power laws!Introduction to Information RetrievalIntroduction to Information Retrieval Many other examples of link analysisSocial networks are a rich source of grouping behaviorE.g., Shoppers’ affinity – Goel+Goldstein 2010Consumers whose friends spend a lot, spend a lot themselveshttp://www.cs.cornell.edu/home/kleinber/networks-book/9Introduction to Information RetrievalIntroduction to Information Retrieval Our primary interest in this courseLink analysis for most IR functionality thus far based purely on textScoring and rankingLink-based clustering – topical structure from linksLinks as features in classification – documents that link to one another are likely to be on the same subjectCrawlingBased on the links seen, where do we crawl next?10Introduction to Information RetrievalIntroduction to Information Retrieval The Web as a Directed GraphHypothesis 1: A hyperlink between pages denotes a conferral of authority (quality signal)Hypothesis 2: The text in the anchor of the hyperlink on page A describes the target page BPage AhyperlinkPage BAnchorSec. 21.1Introduction to Information RetrievalIntroduction to Information Retrieval Assumption 1: reputed sites12Introduction to Information RetrievalIntroduction to Information Retrieval Assumption 2: annotation of target13Introduction to Information RetrievalIntroduction to Information Retrieval Anchor Text WWW Worm - McBryan [Mcbr94] For ibm how to distinguish between:IBM’s home page (mostly graphical)IBM’s copyright page (high term freq. for ‘ibm’)Rival’s spam page (arbitrarily high term freq.)www.ibm.com“ibm” “ibm.com”“IBM home page”A million pieces of anchor text with “ibm” send a strong signalSec. 21.1.1Introduction to Information RetrievalIntroduction to Information Retrieval Indexing anchor textWhen indexing a document D, include (with some weight) anchor text from links pointing to D.www.ibm.comArmonk, NY-based computergiant IBM announced todayJoe’s computer hardware linksSunHPIBMBig Blue today announcedrecord profits for the quarterSec. 21.1.1Introduction to Information RetrievalIntroduction to Information Retrieval Indexing anchor textCan sometimes have unexpected effects, e.g., spam, miserable failureCan score anchor text with weight depending on the authority of the anchor page’s websiteE.g., if we were to assume that content from cnn.com or yahoo.com is authoritative, then trust (more) the anchor text from themIncrease the weight of off-site anchors (non-nepotistic scoring)Sec. 21.1.1Introduction to Information RetrievalIntroduction to Information Retrieval Getting at all that link informationInexpensivelyConnectivity servers17Introduction to Information RetrievalIntroduction to Information Retrieval Connectivity ServerSupport for fast queries on the web graphWhich URLs point to a given URL?Which URLs does a given URL point to?Stores mappings in memory fromURL to outlinks, URL to inlinksApplicationsLink analysisWeb graph analysisConnectivity, crawl optimizationCrawl controlSec. 20.4Introduction to Information RetrievalIntroduction to Information Retrieval Boldi


View Full Document

Stanford CS 276 - Link Analysis

Documents in this Course
Load more
Download Link Analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Link Analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Link Analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?