U-M EECS 530 - Automated Classification and Analysis of Internet Malware

Unformatted text preview:

Automated Classification and Analysis of Internet MalwareMichael Bailey,*Jon Oberheide,*Jon Andersen,*Z. Morley Mao,*Farnam Jahanian,*†Jose Nazario†*Electrical Engineering and Computer Science DepartmentUniversity of Michigan{mibailey, jonojono, janderse, zmao, farnam}@umich.edu†Arbor Networks{farnam, jose}@arbor.netApril 26, 2007AbstractNumerous attacks, such as worms, phishing, and botnets, threaten the availability of the Internet,the integrity of its hosts, and the privacy of its users. A core element of defense against these attacks isanti-virus(AV)–a service that detects, removes, and characterizes these threats. The ability of these prod-ucts to successfully characterize these threats has far-reaching effects—from facilitating sharing acrossorganizations, to detecting the emergence of new threats, and assessing risk in quarantine and cleanup.In this paper, we examine the ability of existing host-based anti-virus products to provide semanticallymeaningful information about the malicious software and tools (or malware) used by attackers. Using alarge, recent collection of malware that spans a variety of attack vectors (e.g., spyware, worms, spam), weshow that different AV products characterize malware in ways that are inconsistent across AV products,incomplete across malware, and that fail to be concise in their semantics. To address these limitations, weprop os e a new classification techniq ue that describes malware behavior in terms of system state changes(e.g., files written, pro cesse s created) rather than in sequences or patterns of system calls. To addressthe sheer volume of malware and diversity of its behavior, we provide a method for automatically catego-rizing these profiles of malware into groups that reflect similar classes of behaviors and demonstrate howbehavior-based clustering provides a more direct and effective way of classifying and analyzing Internetmalware.1 IntroductionMany of the most visible and serious problems facing the Internet today depend on a vast ecosystem ofmalicious software and tools. Spam, phishing, denial of service attacks, botnets, and worms largely depend onsome form of malicious code, commonly referred to as malware. Malware is often used to infect the computersof unsuspecting victims by exploiting software vulnerabilities or tricking users into running malicious code.Understanding this process and how attackers use the backdoors, key loggers, password stealers and othermalware functions is becoming an increasingly difficult and important problem.Unfortunately, the complexity of modern malware is making this problem more difficult. For example,Agobot [3], has been observed to have more than 580 variants since its initial release in 2002. Modern Agobotvariants have the ability to perform denial of service attacks, steal bank passwords and account details,propagate over the network using a diverse set of remote exploits, use polymorphism to evade detectionand disassembly, and e ven patch vulnerabilities and remove competing malware from an infected system [3].Making the problem even more challenging is the increase in the number and diversity of Internet malware.A recent Microsoft survey found more than 43,000 new variants of backdoor trojans and bots during thefirst half of 2006 [22]. Automated and robust approaches to understanding malware are required in order tosuccessfully stem the tide.1Dataset Date Number of Number of Unique LabelsName Collected Unique MD5s McAfee F-Prot ClamAV Trend Symanteclegacy 01 Jan 2004 - 31 Dec 2004 3,637 116 1216 590 416 57small 03 Sep 2006 - 2 2 Oct 2006 893 112 379 253 246 90large 03 Sep 2006 - 1 8 Mar 2007 3,698 310 1,544 1,102 2,035 50Table 1: The datasets used in this paper: A large collection of legacy binaries from 2004, a small 6 weekcollection from 2006, and a large 6 month collection of malware from 2006/2007. The number of uniquelabels provided by 5 AV systems is listed for each dataset.Previous efforts to automatically class ify and analyze malware (e.g., AV, IDS) focused primarily oncontent-based signatures. Unfortunately, content-based signatures are inherently susceptible to inaccuraciesdue to polymorphic and metamorphic techniques. In addition, the signatures used by these s ystem s oftenfocus on a specific exploit behavior–an approach increasingly complicated by the emergence of multi-vectorattacks. As a result, IDS and AV products characterize malware in ways that are inconsistent across products,incomplete across malware, and that fail to be concise in their semantics. This creates an environment inwhich defenders are limited in their ability to share intelligence across organizations, to detect the emergenceof new threats, and to assess risk in quarantine and cleanup of infections.To address the limitations of existing automated classification and analysis tools, we have developedand evaluated a dynamic analysis approach based on the execution of malware in virtualized environmentsand the causal tracing of the operating system objects created as a result of the malware’s execution. Thereduced collection of these user visible system state changes (e.g., files written, processes created) is usedto create a fingerprint of the malware’s b ehavior. These fingerprints are more invariant and directly usefulthan abstract code sequences representing programmatic behavior and can be directly used in assessingthe potential damage incurred, enabling detection and classification of new threats, and assisting in the riskassessment of these threats in mitigation and clean up. To address the shear volume of malware and diversityof its behavior, we provide a method for automatically categorizing these profiles of malware into groupsthat reflect similar classes of behaviors. These methods are thoroughly evaluated in the context of a malwaredataset that is large, recent, and diverse in the set of attack vectors it represents (e.g., spam, worms, bots,spyware).This paper is organized as follows: Section 2 describes the shortcomings of existing AV software andenumerates requirements for effective malware classification. We present our behavior-based fingerprintextraction and fingerprint clustering algorithm in Section 3. Our detailed evaluation is shown in Section 4.We present existing work in Section 5, offer limitations and future directions in Section 6, and conclude inSection 7.2 Anti-virus clustering of malwareHost-based AV systems detect and remove malicious threats from end


View Full Document

U-M EECS 530 - Automated Classification and Analysis of Internet Malware

Download Automated Classification and Analysis of Internet Malware
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Automated Classification and Analysis of Internet Malware and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Automated Classification and Analysis of Internet Malware 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?