GT CS 4440 - Web Spam Detection Using the Web Topology (26 pages)

Previewing pages 1, 2, 3, 24, 25, 26 of 26 page document View the full content.
View Full Document

Web Spam Detection Using the Web Topology



Previewing pages 1, 2, 3, 24, 25, 26 of actual document.

View the full content.
View Full Document
View Full Document

Web Spam Detection Using the Web Topology

53 views


Pages:
26
School:
Georgia Institute of Technology
Course:
Cs 4440 - Database Technologies

Unformatted text preview:

Know your Neighbors Web Spam Detection Using the Web Topology Carlos Castillo 1 Debora Donato 1 Aristides Gionis 1 Vanessa Murdock 1 Fabrizio Silvestri 2 1 Yahoo Research Barcelona Catalunya Spain 2 ISTI CNR Pisa Italy ACM SIGIR 25 July 2007 Amsterdam Presented By SOUMO GORAI Soumo s Biography 4th Year CS Major Graduating May 2008 Interesting About Me Lived in India Australia and the U S CS Interests Databases HCI Web Programming Networking Graphics Gaming Here s all that you can find on the web Here s just some of what really is out there And more Why so many different things There is a fierce competition for your attention Ease of publication for personal publication as well as commercial publication advertisements and economic activity and there s lots lots lots lots lots of spam What s Spam Hidden Text Only hidden text Here s a whole fake search engine Why is Spam bad Costs Costs for users lower precision for some queries Costs for search engines wasted storage space network resources and processing cycles Costs for the publishers resources invested in cheating and not in improving their contents Every undeserved gain in ranking for a spammer is a loss of search precision for the search engine How Do We Detect Spam Machine Learning Training Link based Detection Content based Detection Using Links and Contents Using Web based Topology Machine Learning Training ML Challenges Machine Learning Challenges Instances are not really independent graph Training set is relatively small Information Retrieval Challenges It is hard to find out which features are relevant It is hard for search engines to provide labeled data Even if they do it will not reflect a consensus on what is Web Spam Link based Detection Single level farms can be detected by searching groups of nodes sharing their out links Gibson et al 2005 Why use it egree related measures ageRank ustRank Gy ongyi et al 2004 uncated PageRank Becchetti et al 2006 similar to PageRank it limits a page to the PageRank



View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Web Spam Detection Using the Web Topology and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Web Spam Detection Using the Web Topology and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?