View Full Document

Tree Partition based Parallel Frequent Pattern



View the full content.
View Full Document
View Full Document

2 views

Unformatted text preview:

Tree Partition based Parallel Frequent Pattern mining on Shared Memory Systems Dehao Chen1 Chunrong Lai2 Wei Hu2 WenGuang Chen1 Yimin Zhang2 and Weimin Zheng1 1 2 Tsinghua University Intel Corporation Dept of Computer Science Intel China Research Center Beijing China 100084 No 2 Kexueyuan South Road chendh05 mails tsinghua edu cn Beijing China 100080 cwg zwm dcs tsinghua edu cn chunrong lai wei hu yimin zhang intel com Abstract In this paper we present a tree partition algorithm for parallel mining of frequent patterns Our work is based on FP Growth algorithm which is constituted of tree building stage and mining stage The main idea is to build only one FP Tree in the memory partition it into several independent parts and distribute them to di erent threads A heuristic algorithm is devised to balance the workload Our algorithm can not only alleviate the impact of locks during the tree building stage but also avoid the overhead that do great harm to the mining stage We present the experiments on di erent kinds of datasets and compare the results with other parallel approaches The results suggest that our approach has great advantage in e ciency especially on certain kinds of datasets As the number of processors increases our parallel algorithm shows good scalability 1 Introduction Association rule mining searches for interesting relationships among items in a given data set One of the most famous examples of association rule mining is the market basket problem which was introduced in 1 Since the invention of Apriori algorithm 1 2 a lot of algorithms have been devised to cope with the problem of frequent pattern mining which is the most time consuming part of association rule mining process However as for extremely large datasets the currently proposed frequent pattern mining algorithms 1 4244 0054 6 06 20 00 2006 IEEE still consume too much time One solution is to design more e cient mining algorithms to reduce the repeated I O scans as well as to minimize the memory



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Tree Partition based Parallel Frequent Pattern and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Tree Partition based Parallel Frequent Pattern and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?