View Full Document

A Characterization of Data Mining Algorithms on a Modern Processor



View the full content.
View Full Document
View Full Document

6 views

Unformatted text preview:

A Characterization of Data Mining Algorithms on a Modern Processor Amol Ghoting Gregory Buehrer and Srinivasan Parthasarathy Data Mining Research Laboratory The Ohio State University Daehyun Kim Anthony Nguyen Yen Kuang Chen and Pradeep Dubey Architecture Research Laboratory Intel Corporation Roadmap Motivation and Contributions Algorithms under study Performance characterization Case study Improving performance of FP Growth Related work Conclusions Motivation KDD applications constitute a rapidly growing segment of the commercial and scientific computing domains Interactive process response times Memory and compute intensive Modern architectures Memory wall issues Latency tolerating mechanisms prefetching SMT Objective here is to characterize such applications on a modern architecture Can we leverage above mechanisms effectively Contributions Specifically we study Performance and memory access behavior of eight data mining algorithms Impact of processor technologies such as hardware pre fetching and simultaneous multithreading SMT How to leverage latency tolerating mechanisms to improve performance of frequent pattern mining Roadmap Motivation and Contributions Algorithms under study Performance characterization Case study Improving performance of FP Growth Related work Conclusions Algorithms under study 1 Frequent itemset mining Finds groups of items that co occur frequently in a transactional data set Example Item A and Item B are purchased together 90 of the time FPGrowth FP tree MAFIA Tid list as a bit vector Sequence mining Discovers sets of items that are shared across time Example 70 of the customers who buy item A also buy item B within 1 month SPADE Tid list Algorithms under study 2 Graph mining Finds frequent sub graphs in a graph data set FSG Tid list Clustering Partitions data points into groups or clusters such that intra cluster distance in minimized and inter cluster distance in maximized kMeans and vCluster Algorithms under study 3 Outlier detection



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view A Characterization of Data Mining Algorithms on a Modern Processor and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Characterization of Data Mining Algorithms on a Modern Processor and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?