DOC PREVIEW
Duke CPS 049s - The Computer Science Within and its Impact on Society

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CPS 49S Google: The Computer Science Within and itsImpact on Society - Spring 2007Homework 2• Due date: Friday, Feb 16, 2007, 11.59 PM. Late submissions will not be accepted(unless there are documented excuses from the dean).• Submission: In class, via email to [email protected], or via Blackboard’s digitaldropbox.• Indicate your name on your submission.• Email questions to [email protected] and to [email protected]• Total points = 100.Question 1 Points 2What is googlewhacking?Question 2 Points 5If we search for “duke football” using the Archie search engine, will the http://www.goduke.comweb page be part of the results produced? Why or why not?Question 3 Points 5When a user of Google says that Google “has good performance”, what could be somemeasures of performance that the user is referring to? List at least two such measures.Question 4 Points 5One of the problems with the WWW Wanderer was that “it ate up too many processingand bandwidth cycles as it indexed a site’s contents.” Suggest two techniques that a crawlercan use to avoid this problem.1Question 5 Points 5Battelle says that Altavista was the Google of its era. However, it would be fair to saythat Altavista was ultimately a failure. In your opinion, what were the three main causesof Altavista’s downfall?Question 6 Points 5Even in its very early days, Altavista was able to “set a thousand crawlers loose atonce.”1. What made this feature possible?2. Why was this feature useful?Question 7 Points 5Excite was the first search engine that grouped web pages based on their underlyingconcepts. Give one concrete example that illustrates how such groupings can improve thequality of search results.Question 8 Points 5Imagine that you have graduated from Duke, and you are now working at Google. Yourboss says “Garfield’s impact factor is an excellent measure of the impact of a journal”. Youdisagree, and you claim that Garfield’s impact factor is flawed. Give two arguments thatyou will use to support your claim.Question 9 Points 5In its early days, Google (or back then, BackRub) caused numerous problems for otherweb servers on the Internet. List four types of problems that Google caused.Question 10 Points 5In Chapter 4 of the textbook, Steve Hansen recommends that BackRub should do “self-policing”. Give one way in which BackRub could have self-policed itself.Question 11 Points 5What is “search-engine optimization”? How is it related to (search-engine) spamming?2Question 12 Points 5One of the readings says: “any search engine on the Web must address the heterogeneityof HTML documents.”1. What does the term “heterogeneity of HTML documents” mean in this context?2. Give three ways in which Google deals with this heterogeneity.Question 13 Points 5June Levy, managing director of Cinahl, says that “manual indexers are able to pick upon the nuances of human language that machines simply cannot do.” Would you agree?Justify your answer using one or more concrete examples.Question 14 Points 5One of our readings suggests that it is hard for an automated indexer to “accuratelyforge relationships between documents that on the surface are not lexically linked”. Justifythis statement using a concrete example.Question 15 Points 5One of our readings says that “Google’s web crawler grabs around 100K of text per webpage, while Yahoo pulls about 500K.” What are the pros and cons of grabbing 100K Vs.500K? (Here, K stands for KiloBytes.)Question 16 Points 5What is the Metathesaurus? Is the Metathesaurus useful for a search engine? Justifybriefly.Question 17 Points 5In class we discussed that stemming has both positive effects and negative effects.1. List two positive effects of stemming.2. List two negative effects of stemming.Question 18 Points 5Imagine that you have graduated from Duke, and you are now working at Google. Youhave been made part of the “stop-word selection team” at Google. This team’s job is to3determine which words from the English language should be made stop words for Google’sindexer and search-engine. What criteria would you suggest to determine whether a wordshould be made a stop word or not?Question 19 Points 5One of the readings states that “there is a trade-off in systems that support contigu-ous word phrases and proximity measures—higher storage requirements and computationalcosts.” Justify this statement briefly.Question 20 Points 8Based on the discussion in the “Anatomy of a ... Web Search Engine” paper, explainstep-by-step how Google would process the search query “duke freshman seminar” to gen-erate ranked


View Full Document
Download The Computer Science Within and its Impact on Society
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Computer Science Within and its Impact on Society and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Computer Science Within and its Impact on Society 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?