DOC PREVIEW
UTD CS 4398 - LECTURE NOTES

This preview shows page 1-2-3-4-25-26-27-52-53-54-55 out of 55 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 55 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1OutlineWhat is Data Mining?What’s going on in data mining?Data Mining for Intrusion Detection: ProblemMisuse DetectionProblem: Anomaly DetectionOur Approach: OverviewSlide 9ResultsIntroduction: Detecting Malicious Executables using Data MiningState of the Art in Automated DetectionOur New Ideas (Khan, Masud and Thuraisingham)Feature ExtractionThe Hybrid Feature Retrieval ModelHybrid Feature Retrieval (HFR)Slide 17Feature ExtractionSlide 19Feature SelectionExperimentsSlide 22Slide 23Slide 24Future PlansData Mining for Buffer Overflow IntroductionBackgroundBackground (cont...)Slide 29Slide 30CodeBlocker (Our approach)Severity of the problemOur solutionCodeBlocker ModelSlide 36DisassemblyFeature extractionFeature extraction (cont...)Slide 40Putting it togetherSlide 42Slide 43Novelty, Advantages, Limitations, FutureSlide 45Traffic MiningSlide 47Worm Detection: IntroductionEmail Worm Detection using Data MiningAssumptionsFeature setsData Mining ApproachData setOur Implementation and AnalysisDigital Forensics and UTD WorkAlgorithms for Digital ForensicsDigital ForensicsHow do you detect that a problem has occurred?Prof. Bhavani ThuraisinghamThe University of Texas at DallasSeptember 2, 2009Lecture #4201/14/19 06:23 Outline0Data mining overview0Intrusion detection and Malicious code detection (worms and virus)0Digital forensics and UTD work0Algorithms for Digital Forensics301/14/19 06:23 What is Data Mining?Data MiningKnowledge MiningKnowledge Discoveryin DatabasesData ArchaeologyData DredgingDatabase MiningKnowledge ExtractionData Pattern ProcessingInformation HarvestingSiftwareThe process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data, often previously unknown, using pattern recognition technologies and statistical and mathematical techniques(Thuraisingham, Data Mining, CRC Press 1998)401/14/19 06:23 What’s going on in data mining?0What are the technologies for data mining?-Database management, data warehousing, machine learning, statistics, pattern recognition, visualization, parallel processing0What can data mining do for you?-Data mining outcomes: Classification, Clustering, Association, Anomaly detection, Prediction, Estimation, . . .0How do you carry out data mining?-Data mining techniques: Decision trees, Neural networks, Market-basket analysis, Link analysis, Genetic algorithms, . . .0What is the current status?-Many commercial products mine relational databases0What are some of the challenges?-Mining unstructured data, extracting useful patterns, web mining, Data mining, security and privacy501/14/19 06:23 Data Mining for Intrusion Detection: Problem0An intrusion can be defined as “any set of actions that attempt to compromise the integrity, confidentiality, or availability of a resource”. 0Attacks are:-Host-based attacks -Network-based attacks 0Intrusion detection systems are split into two groups:-Anomaly detection systems -Misuse detection systems 0Use audit logs-Capture all activities in network and hosts.-But the amount of data is huge!601/14/19 06:23 Misuse Detection0Misuse Detection701/14/19 06:23 Problem: Anomaly Detection0Anomaly Detection801/14/19 06:23 Our Approach: OverviewTrainingDataClassHierarchical Clustering (DGSOT)TestingTesting DataSVM Class TrainingDGSOT: Dynamically growing self organizing tree901/14/19 06:23 Hierarchical clustering with SVM flow chartOur ApproachOur Approach: Hierarchical Clustering1001/14/19 06:23 ResultsTraining Time, FP and FN Rates of Various MethodsMethodsAverageAccuracyTotal Training TimeAverage FPRate (%)Average FNRate (%)Random Selection52% 0.44 hours 40 47Pure SVM 57.6% 17.34 hours 35.5 42SVM+Rocchio Bundling51.6% 26.7 hours 44.2 48SVM + DGSOT 69.8% 13.18 hours 37.8 29.81101/14/19 06:23 Introduction: Detecting Malicious Executables using Data Mining0What are malicious executables?-Harm computer systems-Virus, Exploit, Denial of Service (DoS), Flooder, Sniffer, Spoofer, Trojan etc.-Exploits software vulnerability on a victim -May remotely infect other victims-Incurs great loss. Example: Code Red epidemic cost $2.6 Billion0 Malicious code detection: Traditional approach-Signature based-Requires signatures to be generated by human experts-So, not effective against “zero day” attacks1201/14/19 06:23 State of the Art in Automated Detection OAutomated detection approaches:0Behavioural: analyse behaviours like source, destination address, attachment type, statistical anomaly etc.0Content-based: analyse the content of the malicious executable-Autograph (H. Ah-Kim – CMU): Based on automated signature generation process-N-gram analysis (Maloof, M.A. et .al.): Based on mining features and using machine learning.1301/14/19 06:23 Our New Ideas (Khan, Masud and Thuraisingham)✗Content -based approaches consider only machine-codes (byte-codes).✗Is it possible to consider higher-level source codes for malicious code detection?✗Yes: Diassemble the binary executable and retrieve the assembly program✗Extract important features from the assembly program✗Combine with machine-code features1401/14/19 06:23 Feature Extraction✗Binary n-gram features-Sequence of n consecutive bytes of binary executable✗Assembly n-gram features-Sequence of n consecutive assembly instructions✗System API call features-DLL function call information1501/14/19 06:23 The Hybrid Feature Retrieval Model0Collect training samples of normal and malicious executables.0Extract features0Train a Classifier and build a model0Test the model against test samples1601/14/19 06:23 Hybrid Feature Retrieval (HFR)0Training 1701/14/19 06:23 Hybrid Feature Retrieval (HFR)0Testing 1801/14/19 06:23 Binary n-gram features-Features are extracted from the byte codes in the form of n-grams, where n = 2,4,6,8,10 and so on. Example: Given a 11-byte sequence: 0123456789abcdef012345, The 2-grams (2-byte sequences) are: 0123, 2345, 4567, 6789, 89ab, abcd, cdef, ef01, 0123, 2345The 4-grams (4-byte sequences) are: 01234567, 23456789, 456789ab,...,ef012345 and so on....Problem: -Large dataset. Too many features (millions!).Solution: -Use secondary memory, efficient data structures -Apply feature selection Feature Extraction1901/14/19 06:23 Assembly n-gram features-Features are extracted from the assembly programs in the form of n-grams, where n = 2,4,6,8,10 and so on. Example: three instructions “push eax”; “mov eax,


View Full Document

UTD CS 4398 - LECTURE NOTES

Documents in this Course
Botnets

Botnets

33 pages

Botnets

Botnets

33 pages

Load more
Download LECTURE NOTES
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view LECTURE NOTES and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view LECTURE NOTES 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?