DOC PREVIEW
Duke CPS 049s - Meta Crawlers vs. Single Search Engines

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Searching the World Wide Web: Meta Crawlers vs. Single Search EnginesThe BeginningPerformance Measures For a Search EngineWhat is the main problem with this?Experiment by Selberg and EtzioniExperiment by Lawrence and GilesThey found…What method did they find that could increase coverage?The First MetaCrawlerModular DesignMotivationSoftbot Addresses These ProblemsSlide 13Formatting and RankingSpeedAdaptability, Portability, ScalabilityMetaCrawlers TodaySearching the World Wide Web: Meta Crawlers vs. Single Search EnginesBy: Voris TejadaThe Beginning•Since its inception the internet has grown at a staggering rate with an extremely large number of pages being added every day•Search engines such as Alta Vista, Excite, HotBot, Infoseek, Lycos, and Northern Light attempted to turn the internet into a “15-billion-word encyclopedia”Performance Measures For a Search Engine•Coverage: also called “recall” in IR•Relevance: also called “precision” in IR•Freshness of pages in the index•SpeedWhat is the main problem with this?•Coverage–Despite their claims, no single search engine could index the entire web–Traditional IR systems were really designed for static collections•They could not keep up with the growth of the internetExperiment by Selberg and Etzioni•They did an experiment using the results from logs from their MetaCrawler web sites. –“unique documents”•What were the problems with this experiment?–They only took the first X pages returned from each engine–Ranking system of each search engine was differentExperiment by Lawrence and Giles•Produced statistics on the coverage of the major web search engines and the estimated size of the web•Compared the number of documents returned by each and analyzed the results•Problems–They did not know if they were indexing unique URLs or subsets of the same URLs–Returned first X amount of documentsThey found…•Using the estimate that the web contains 320 million pages they calculated the following:–HotBot: 34% coverage–Alta Vista: 28% coverage–Northern Light: 20% coverage–Excite: 14% coverage–Infoseek: 10% coverage–Lycos: 3% coverage*Note: both experiments were concerned with coverageWhat method did they find that could increase coverage?•Combining results from multiple engines–By combining all six search engines they were able to yield 3.5X the amount of results–Selberg and Etzioni had created a MetaCrawler which gathered a “market share” of the results of each engine*A solution better than MetaCrawler?The First MetaCrawler•Softbot–Invented by Selberg and Etzioni at the University of Washington–What important qualities did it provide?•A single interface to query through multiple search engines such as Lycos and Alta Vista•Obtained higher quality results as opposed to just combining resultsModular Design•User Interface–Translates user queries and options into appropriate parameters•Aggregation Engine–Obtains references, eliminates duplicates, collates & outputs results•Parallel Web Interface–Downloads HTML pages from the Web, sends queries and obtains results•Harness–Where service specific information is keptMotivation•Growth of the Web•Difficulty in finding information•Search engines index different documents and use different ranking algorithms–By using a single search engine you could miss over 77% of the most relevant references•Interfaces of many search engines were difficult to useSoftbot Addresses These Problems•Aggregates web search services under a unified interface–Interface was much easier to use–Forwards queries to single search engines and ranks results into one composite list•Obtains higher quality results–Allows users to be more specific–Eliminates duplicates using comparison algorithm–Adapts to a rapidly changing environmentFormatting and Ranking•MetaCrawler translates each query into the appropriate format for use in each search engine•Uses a “confidence score” to rank–Allows each service to vote on relevancy for a particular document–Higher total score = higher ranking on final listSpeed•Has user modifiable timeouts•References only downloaded when needed or only when user chooses to•Shows partial results–Doesn’t wait for full results list to be generated before showing you somethingAdaptability, Portability, Scalability•Modular design allows for services to be added, modified, and removed quickly•Does not require large databases/large amounts of memory, can run on most machines•Has ability to scale without adding more machinesMetaCrawlers Today•“The Big


View Full Document
Download Meta Crawlers vs. Single Search Engines
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Meta Crawlers vs. Single Search Engines and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Meta Crawlers vs. Single Search Engines 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?