Filtering Spam with Behavioral Blacklisting

Home> Academic Documents> Filtering Spam with Behavioral Blacklisting

DOC PREVIEW

This preview shows page 1-2-3 out of 10 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 10 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

IntroductionMotivationThe Behavior of Spamming IP AddressesPersistence: ``New'' IP addresses every dayDistribution: Some IPs target many domainsThe Performance of IP-Based BlacklistsBackground: DNS-Based Blacklists (DNSBLs)CompletenessResponsivenessThe Case for Behavioral BlacklistingClustering AlgorithmSpectral ClusteringSpamTracker: Clustering Email SendersDesignOverviewClusteringClassificationTracking Changes in Sending PatternsEvaluationDataClustering and ClassificationDetecting ``New'' SpammersDiscussionImproving ClassificationIncorporating with Existing SystemsDeployment ChallengesEvasionSensor PlacementRelated WorkConclusionFiltering Spam with Behavioral BlacklistingAnirudh Ramachandran, Nick Feamster, and Santosh VempalaCollege of Computing, Georgia Tech801 Atlantic Drive, Atlanta, GA - 30332, USA{avr,feamster,vempala}@cc.gatech.eduABSTRACTSpam filters often use the reputation of an IP address (or IP ad-dress range) to classify email senders. This approach worked wellwhen most spam originated from senders with fixed IP addresses,but spam today is also sent from IP addresses for which blacklistmaintainers have outdated or inaccurate information (or no infor-mation at all). Spam campaigns also involve many senders, reduc-ing the amount of spam any particular IP address sends to a singledomain; this method allows spammers to stay “under the radar”.The dynamism of any particular IP address begs for blacklistingtechniques that automatically adapt as the senders of spam change.This paper presents SpamTracker, a spam filtering system thatuses a new technique called behavioral blacklisting to classify emailsenders based on their sending behavior rather than their iden-tity. Spammers cannot evade SpamTracker merely by using “fresh”IP addresses because blacklisting decisions are based on sendingpatterns, which tend to remain more invariant. SpamTracker usesfast clustering algorithms that react quickly to changes in send-ing patterns. We evaluate SpamTracker’s ability to classify spam-mers using email logs for over 115 email domains; we find thatSpamTracker can correctly classify many spammers missed by cur-rent filtering techniques. Although our current datasets prevent usfrom confirming SpamTracker’s ability to completely distinguishspammers from legitimate senders, our evaluation shows that Spam-Tracker can identify a significant fraction of spammers that currentIP-based blacklists miss. SpamTracker’s ability to identify spam-mers before existing blacklists suggests that it can be used in con-junction with existing techniques (e.g., as an input to greylisting).SpamTracker is inherently distributed and can be easily replicated;incorporating it into existing email filtering infrastructures requiresonly small modifications to mail server configurations.Categories and Subject Descriptors: C.2.0 [Computer Communi-cation Networks]: Security and protectionGeneral Terms: Security, Design, AlgorithmsKeywords: spam, botnets, blacklist, security, clustering1. INTRODUCTIONMore than 75% of all email traffic on the Internet is spam [25].To date, spam-blocking efforts have taken two main approaches:(1) content-based filtering and (2) IP-based blacklisting. Both ofthese techniques are losing their potency as spammers become morePermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.CCS’07, October 29–November 2, 2007, Alexandria, Virginia, USA.Copyright 2007 ACM 978-1-59593-703-2/07/0011 ...$5.00.agile. To evade content-based filters, spammers have adopted tech-niques such as image spam and emails explicitly designed to mis-lead filters that “learn” certain keyword patterns; spammers are alsoevading IP-based blacklists with nimble use of the IP address space(e.g., stealing IP addresses on the same local network [19], stealingIP address blocks with BGP route hijacking [30]). To make mattersworse, as most spam is now being launched by bots [30], spammerscan send a large volume of spam in aggregate while only sending asmall volume of spam to any single domain from a given IP address.This “low and slow” spam sending pattern and the ease withwhich spammers can quickly change the IP addresses from whichthey are sending spam has rendered today’s methods of blacklistingspamming IP addresses less effective than they once were [11]. Forexample, our study in Section2 shows that, of the spam receivedat our spam “traps”, as much as 35% was sent from IP addressesthat were not listed by either Spamhaus [37] or SpamCop [36], tworeputable blacklists. Further, 20% of these IP addresses remainedunlisted even after one month. Most of the IP addresses that wereeventually blacklisted evaded the blacklist for about two weeks, andsome evaded the blacklists for almost two months.Two characteristics make it difficult for conventional blacklists tokeep pace with spammers’ dynamism. First, existing blacklists arebased on non-persistent identifiers. An IP address does not sufficeas a persistent identifier for a host: many hosts obtain IP addressesfrom dynamic address pools, which can cause aliasing both of hosts(i.e., a single host may assume different IP addresses over time) andof IP addresses (i.e., a single IP address may represent differenthosts over time). Malicious hosts can steal IP addresses and stillcomplete TCP connections, which allows spammers to introducemore dynamism. IP blacklists cannot keep up. Second, informationabout email-sending behavior is compartmentalized by domain andnot analyzed across domains. Today, a large fraction of spam comesfrom botnets, large groups of compromised machines controlled bya single entity. With a much larger group of machines at their dis-posal, spammers now disperse their “jobs” so that each IP addresssends spam at a low rate to any single domain. By doing so, spam-mers can remain below the radar, since no single domain may deemany single spamming IP address as suspicious.IP blacklists must be continually updated to keep pace with cam-paigns mounted by armies of “fresh” IP addresses. Unfortunately,a spam campaign may complete by the time the IP addresses areblacklisted, at which time a new


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 10 pages.

Please select your school