DOC PREVIEW
Duke CPS 049s - The anatomy of a Large-Scale

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

The anatomy of a Large-Scale Hypertextual Web Search EngineSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9The anatomy of a Large-Scale Hypertextual Web Search EngineWhat we want from a search engine.•Speed•Quantity of Results•Efficient Storage Space•Quality of ResultsGoogle attempts to bring us all of these aspects from search.Precision of result:Page RankAnchor TextSecond Generation Search EnginePage RankPR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn))The more number of links that is pointing to a page (from other pages), the higher the page rank will be.The probability that a random internet surfer will reach this page by randomly clicking links.Also determined by the number of links the page has pointing you have.The more links page A has, the more valued the link from page A to B will be.Anchor TextEach and every link on the internet will have some “invisible” text alongside it. This text is given by the page creator explaining what this link does, where it leads, or what it attempts to explain. By taking all of these links from hundreds of different sites, Google uses these anchor text to be able to provide most relevant search results.Proximity Search and OthersGoogle keeps track of how close the related words are too each other and also keeps track of the visual presentation (font size, color, boldness ect).Crawling and Indexing•Google typically ran about 3.•Each crawler opens roughly 300 connections as once.•At peak performance, with 4 crawlers, Google can crawl 100 web pages per second.•Roughly 600K per second of data.•Parsing•Indexing documents into


View Full Document
Download The anatomy of a Large-Scale
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The anatomy of a Large-Scale and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The anatomy of a Large-Scale 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?