DOC PREVIEW
Yale CPSC 155 - Web Searching and Google

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS155a: E-CommerceLecture 20: November 27, 2001Web Searching and GoogleFinding Informationon the InternetThe Internet is so successful partly becauseit is so easy to publish information on theWorld Wide Web.• No central authority on what pages exist, where they exist, or when they exist.• Too much to sort through, anyway.• Question: How do we find what we needon the web?WWW Search Engines• Answer: Set up websites that people can use to search for information by performing a search query.• Not such an easy solution! In addition to the technical problems, we have these business questions:– How do people know about the search engine websites?– How do you make money off of this? (Especially now that the service is free.)Examples of Search Engines• Yahoo!• Lycos•MSN• Excite•AltaVista• AOL/Netscape• InfoSpace/MetaCrawler• GoogleHave become portal sites with many other servicesISP / software site that incorporated a search engine and portal“Search engine searcher”Remains dedicated to searchingSolutions (?) toTechnical Problems• How do we keep track of what pages are on the WWW?– Have a crawler or spider scan the web and links between pages to find new, updated, and removed pages.• How do we store the content we find?– Design a way to map keywords in queries to documents so we can return a usefully ordered list to the user.• What happens when pages are temporarily unavailable?– Use caching: keep a local copy of documents as we crawl the web. (Need lots of space!)Solutions (?) to TechnicalProblems (continued)• How do we store all the information?– Use a large network of disks (and maybe a clever method of compression) that can be easily searched.• How do we handle so many different requests?– Use a cluster of computers that work together to process queries.There is still ongoing research to find betterways to solve these problems!WWW Digraph• More than 1.6 Billion Nodes (Pages)• Average Degree (links/Page) is 5-15. (Hard to Compute!)• Massive, Distributed, Explicit Digraph(Not Like Call Graphs)“Hot” Research Area• Graph Representation• Duplicate Elimination• Clustering• Ranking Query Results“Abundance” Problemhttp://simon.cs.cornell.edu/home/kleinber/kleinber.html• Given a query find:– Good Content (“Authorities”)– Good Sources of Links (“Hubs”)• Mutually Reinforcing• Simple (Core) Algorithm AHT = {n Pages}, A = {Links}Xpεℜ>0, p ε T non-negative “Authority Weights”Ypεℜ>0, p ε T non-negative “Hub Weights”I operation Update Authority WeightsXp ∑ YqO operation Update Hub WeightsYp ∑ XqNormalize: ∑ X2= ∑ Y2= 1(q,p) ε A(p,q) ε App ε Tp ε TpCore AlgorithmZ  (1,1,…,1)X  Y  ZRepeat until ConvergenceApply I /* Update Authority weights */Apply O /* Update Hub Weights */NormalizeReturn Limit (X*, Y*)Convergence of(Xi, Yi) = (OI)i(Z,Z)A = n x n “Adjacency Matrix”Rewrite I and O:X  ATY ;Y  AXXi= (ATA) i-1ATZ ; Yi= (AAT)iZAATSymm., Non-negative and Z = (1,1,…, 1) ⇒X* = lim Xi= ω1(ATA)Y* = lim Yi= ω1 (AAT)i  ∞i  ∞Whole Algorithm (k,d,c)q ⇒ Search Engine ⇒ |S| < kBase Set T:(In S, S  ,  S) and <d links/pageRemove “Internal Links”Run Core Algorithm on TFrom Result (X,Y), SelectC pages with max X* valuesC pages with max Y* valuesExamples (k= 200, d=5)q = censorship + netwww.EFF.orgwww.EFF.org/BlueRib.htmlwww.CDT.orgwww.VTW.orgwww.ACLU.prgq = Gateswww.roadahead.comwww.microsoft.comwww.ms.com/corpinfo/bill-g.html[Compares well with Yahoo, Galaxy, etc.]Approach to “Massiveness”:Throw Out Most of G!!• Non-principal Eigenvectors correspond to “Non-principal Communities”• Open (?):Objective Performance CriteriaDependence on Search EngineNondeterministic Choice of S and T• Full name: Google, Inc.• Privately held company. Funding partners include Kleiner Perkins Caufield & Byers and Sequoia Capital.• Employees: over 260(more than 50 with Ph.D.)• Mission: “[To] deliver the best search experience on the Internet by making the world’s information universally accessibleand useful.”• Award-winning search engine that has indexed 1.6 billion web pages.Google History• 1998: Founders Larry Page and Sergey Brin(Ph.D. students at Stanford) raise $1 million from family, friends, and angel investors. Google is incorporated Sept. 7. Site receives 10,000 queries per day and is listed in PC Magazine’s top 100 search websites list.•1sthalf 1999: Google has 8 employees and answers 500,000 queries/day. Red Hat (Linux distributor) becomes first customer. Googlegets $25 million equity funding.Google History (continued)•2ndhalf 1999: 39 employees, 3 million queries/day. Partners with Virgilio of Italy to provide search services.• 2000: Becomes largest web search engine, having indexed 1 billion documents. Answers 18 million queries/day. Gains more partners, including Yahoo! Starts web directory.Google History (continued)• 2001: Acquires Deja.com’s Usenet archive, adding newsgroups to Google’s index. Improves and adds services including browser plug-ins, image searching, PDF searching, cell-phone and handheld compatibility, and queries and document searches in many languages. Advertising services used by over 350 Premium Sponsorship customers.• Current: 1.6 billion web pages, 22 million PDF files, 650 million newsgroup messages, and 250 million images indexed.Serves 150 million queries/day.Google Partners• Yahoo!• Palm• Nextel• Netscape• Cisco Systems• Virgin Net• Netease.com• RedHat• Virgilio• Washingtonpost.comGoogle’s Business ModelScalable Search Services:• Google provides customized search services for websites.• Has become the primary search engine used by popular portal and ISP websites.Advertising:• Premium Sponsorship: sponsored text links at the top of search results based on search category.• AdWords: keyword-targeted, self-service advertising method. Choose keywords or phrases where text ads will appear to the right of the search result list.• No banner ads or graphics!Google Advertising ScreenshotTechnical Highlights• PageRank Technology: Heavily mathematical (linear algebra!), objective calculation of the PageRank (=importance?) of a page.– A link from Page A to Page B is a “vote” for B.– The importance of A is factored into the vote.– PageRank results are not modified by sponsors or


View Full Document

Yale CPSC 155 - Web Searching and Google

Documents in this Course
Portals

Portals

16 pages

Lecture 4

Lecture 4

30 pages

Lecture 2

Lecture 2

20 pages

Lecture 2

Lecture 2

19 pages

Lecture 9

Lecture 9

25 pages

Lecture 7

Lecture 7

23 pages

Load more
Download Web Searching and Google
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Web Searching and Google and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Web Searching and Google 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?