View Full Document

Comprehensive Incremental Mining Algorithms



View the full content.
View Full Document
View Full Document

27 views

Unformatted text preview:

CISpan Comprehensive Incremental Mining Algorithms of Closed Sequential Patterns for Multi Versional Software Mining Ding Yuan Kyuhyung Lee Hong Cheng Gopal Krishna Zhenmin Li Xiao Ma Yuanyuan Zhou and Jiawei Han University of Illinois at Urbana Champaign Urbana Illinois USA CleanMake Inc Urbana Illinois USA dyuan3 kyuhlee hcheng3 gkrishn2 zli4 xiaoma2 yyzhou hanj cs uiuc edu Abstract Recently frequent sequential pattern mining algorithms have been widely used in software engineering field to mine various source code or specification patterns In practice software evolves from one version to another in its life span The effort of mining frequent sequential patterns across multiple versions of a software can be substantially reduced by efficient incremental mining This problem is challenging in this domain since the databases are usually updated in all kinds of manners including insertion various modifications as well as removal of sequences Also different mining tools may have various mining constraints such as low minimum support None of the existing work can be applied effectively due to various limitations of such work For example our recent work IncSpan failed solving the problem because it could neither handle low minimum support nor removal of sequences from database In this paper we propose a novel comprehensive incremental mining algorithm for frequent sequential pattern CISpan Comprehensive Incremental Sequential Pattern mining CISpan supports both closed and complete incremental frequent sequence mining with all kinds of updates to the database Compared to IncSpan CISpan tolerates a wide range for minimum support threshold as low as 2 Our performance study shows that in addition to handling more test cases on which IncSpan fails CISpan outperforms IncSpan in all test cases which IncSpan could handle including various sequence length number of sequences modification ratio etc with an average of 3 4 times speedup We also tested CISpan s performance on databases transformed from 20 consecutive versions of Linux Kernel source code On average CISpan outperforms the non incremental CloSpan by 42 times Keywords Incremental mining Software Engineering Cross Module Mining Frequent pattern 1 Introduction 1 1 Motivation Frequent sequential pattern mining 16 13 12 15 is an important and active research topic in data mining with broad applications including mining web logs customer shopping transaction analysis and DNA sequences etc These years also saw an increasing trend of utilizing frequent pattern mining in source code mining 5 7 14 1 9 6 and software specification mining 8 These tools tokenize the source code in certain ways into a sequence database representation and mine the frequent patterns in order to extract various information such as copy pasted code segments 5 API usage 14 1 programming rules 6 etc For example CP Miner 5 is a tool to effectively detect copy pasted code segments and copy paste related bugs from source code It first



Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Comprehensive Incremental Mining Algorithms and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Comprehensive Incremental Mining Algorithms and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?