CS155b E Commerce Lecture 23 April 17 2003 E Mail Abuse Spam and Viruses Acknowledgements V Ramachandran Yale and C Dwork Microsoft What is Spam Source Mail Abuse Prevention System LLC Spam is unsolicited bulk e mail primarily used for advertising An electronic message is spam IF 1 the recipient s personal identity and context are irrelevant because the message is equally applicable to many other potential recipients AND 2 the recipient has not verifiably granted deliberate explicit and still revocable permission for it to be sent AND 3 the transmission and reception of the message appears to the recipient to give a disproportionate benefit to the sender Spam About Spam Why is Spam such a problem Simple answer People don t like it Cost Postal mail and telephone calls cost money Sending e mail does not in general Speed Messages created and sent to many users instantaneously without human effort Almost Instant notification of success or failure to reach destination Consequences of Spam Large amounts of network traffic Network congestion Mail servers can be overloaded with network requests could slow mail delivery Wasted Time and Storage Downloading headers checking mail takes longer More unwanted mail to delete E mail must be stored at servers Microsoft 65 85 of storage costs go to Spam How is E mail Sent Source RFC 821 SMTP Example Mail Exchange vijayr cyndra telnet netra 25 Trying 128 36 229 21 Connected to netra cs yale edu 128 36 229 21 Escape character is 220 netra cs yale edu ESMTP Postfix HELO cyndra 250 netra cs yale edu MAIL FROM vijayr cs yale edu 250 Ok RCPT TO vijayr whigclio princeton edu 250 Ok DATA 354 End data with CR LF CR LF This is a test 250 Ok queued as EE0A5D728E QUIT 221 Bye Connection closed by foreign host Tracking Spam SMTP runs on top of TCP Packets are acknowledged Source of packets is known in any successful mail session SMTP servers add the IP address and hostname of every mail server or host involved in the sending process to the e mail s message header But dynamic IP addresses and large ISPs can make it difficult to identify senders E Mail Headers Spoofing E mail Headers Most e mail programs use and most people see only the standard To Cc From Subject and Date headers All of these are provided as part of the mail data by the mail sender s client Any of this information can be falsified The only headers you can always believe are message path headers from trusted SMTP servers Open Mail Relays An open mail relay is an SMTP server that will send mail when the sender and recipient are not in the server s domain These servers can be used to obfuscate the mail sending path of messages Mail sending cost can be offloaded to servers not under spammers control Most servers are now configured to reject relays and many servers will not accept mail from known open mail relays Relay Rejection vijayr cyndra telnet mail cloud9 net 25 Trying 168 100 1 4 Connected to russian caravan cloud9 net 168 100 1 4 Escape character is 220 russian caravan cloud9 net ESMTP Postfix MAIL FROM user cloud9 net 250 Ok RCPT TO vijayr cs yale edu 554 vijayr cs yale edu Relay access denied QUIT 221 Bye Connection closed by foreign host SpamAssassin is a spam fighting tool Primary development efforts exist for the open source UNIX compatible version The source code and select Linux binaries are available for free download for noncommercial use Commercial and Windows compatible products are available that use the technology SpamAssassin is installed on many ISP mail servers and is used by the CS dept at Yale SpamAssassin Overview Filtering is done at the mail server But the technology can also be used to create plug ins for mail clients Messages receive a score Message content and headers are parsed The more occurrences of Spam like items in the message the higher the score Messages with scores above a threshold are automatically moved from the user s INBOX Tolerance for Spam is user configurable Judging Spam Example 1 Judging Spam Results 1 Judging Spam Example 2 Judging Spam Results 2 SpamAssassin Techniques Source SpamAssassin org developers website The spam identification tactics used include header analysis spammers use a number of tricks to mask their identities fool you into thinking they ve sent a valid mail or fool you into thinking you must have subscribed at some stage SpamAssassin tries to spot these text analysis again spam mails often have a characteristic style to put it politely and some characteristic disclaimers and CYA text SpamAssassin can spot these too blacklists SpamAssassin supports many useful existing blacklists such as mail abuse org ordb org or others Razor Vipul s Razor is a collaborative spam tracking database which works by taking a signature of spam messages Since spam typically operates by sending an identical message to hundreds of people Razor short circuits this by allowing the first person to receive a spam to add it to the database at which point everyone else will automatically block it Once identified the mail can then be optionally tagged as spam for later filtering using the user s own mail user agent application Tricks to Avoid Filters Use MIME UU encoding for messages E mail messages can be in complex formats this allows messages to contain multiple parts and attachments To preserve warping of content message parts and attachments can be transformed using a standard encoding method E mail clients are supposed to decode message parts when presented to the reader Basic filters often do not process encoded text Insert HTML comments between words Examples of Tricks Source spam stopper net Proposals to Eliminate Spam Charge a micro payment for e mail Computational method force senders to prove that they spend some minimum amount of time per sender per message 86 400 sec day 10 sec msg 8640 msgs day Hotmail receives 1 billion msgs day Would need 125 000 computers Up front capital cost for all of Hotmail s spam 150M The spammers can t afford it C Dwork Microsoft Prove You are a Human CAPTCHA Completely Automated Public Turing test for telling Computers and Humans Apart Require people to pass CAPTCHAs to sign up for free e mail accounts Perform some easy for human but difficult for computer computation Identify words or find objects in pictures e g The future build into the e mail sending process some way to prove e mail senders are humans or authorized automated agents The Yahoo CAPTCHA Viruses A computer virus is a piece of code often malicious that
View Full Document