Enhancing the usage pattern mining performance with temporal segmentation

Home> Academic Documents> Enhancing the usage pattern mining performance with temporal segmentation

DOC PREVIEW

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Cao et al. / J Zhejiang Univ SCI 2005 6A(11):1290-1296 1290 Enhancing the usage pattern mining performance with temporal segmentation of QPop Increment in digital libraries CAO San-xing (曹三省)†1,4, KLEIN R. Rody2,3,4, LIU Jian-bo (刘剑波)1 (1Information Engineering School, Communication University of China, Beijing 100024, China) (2RCID, Zhejiang University, Hangzhou 310027, China) (3SysCom Lab, University of Savoie, 73376 Le Bourget-du-Lac cedex, France) (4Media Research, METIS Global Network, http://www.metis-global.org/) †E-mail: [email protected]; [email protected] Received Aug. 5, 2005; revision accepted Sept. 10, 2005 Abstract: The convergence of next-generation Networks and the emergence of new media systems have made media-rich digital libraries popular in application and research. The discovery of media content objects’ usage patterns, where QPop Increment is the characteristic feature under study, is the basis of intelligent data migration scheduling, the very key issue for these systems to manage effectively the massive storage facilities in their backbones. In this paper, a clustering algorithm is established, on the basis of temporal segmentation of QPop Increment, so as to improve the mining performance. We employed the standard C-Means algorithm as the clustering kernel, and carried out the experimental mining process with segmented QPop Increases obtained in actual applications. The results indicated that the improved algorithm is more advantageous than the basic one in important indices such as the clustering cohesion. The experimental study in this paper is based on a Media Assets Library prototype developed for the use of the advertainment movie production project for Olympics 2008, under the support of both the Humanistic Olympics Study Center in Beijing, and China State Administration of Radio, Film and TV. Key words: Media-rich, Digital library, Data migration, Media content, Log mining, QPop doi:10.1631/jzus.2005.A1290 Document code: A CLC number: TP391 INTRODUCTION With the development of Multimedia Data Pressure, Content-based Retrieval, Grid-based In-formation Processing, High-speed Internet and Mas-sive Storage in recent years, media-rich digital li-braries have become technically feasible and busi-ness-wise mature. The convenience in designing, implementing, deploying and upgrading of their ap-plications is acting as the most important factor that drives content platforms practical. Application mod-els of these portals can now be found in broadcasters’ websites, online multimedia content providers and many Internet businesses (Song, 2001; Cao and Lu, 2001; Cao et al., 2003). As indicated in Fig.1, an important issue that these systems are facing is the effective data migra-tion model for the Hierarchical Storage schema of the media contents. Although HSM (Hierarchical Storage Management) and VSM (Virtual Storage Manage-ment) have respectively realized the multi-level model of data/content storage, and the consistency of storage access and application, they have also given birth to the problem of hierarchical data migration (Cao et al., 2004). In a massive storage system, the multiple storage modes/levels have necessitated the frequent migration of media data among them, ac-cording to the requirements of applications. Never-theless, data migration is further emphasized by data warehousing, disastrous prevention backups, and heterogeneous integration. In current industry, no data migration schemas with intelligent and effective scheduling are raised yet. As a result, frequent and random pushing and pulling Journal of Zhejiang University SCIENCE ISSN 1009-3095 http://www.zju.edu.cn/jzus E-mail: [email protected] et al. / J Zhejiang Univ SCI 2005 6A(11):1290-1296 1291of massive media data has been undermining the robustness and usability of the massive-storage-based systems in most Web application environments. Therefore it is essential that an intelligent data mi-gration scheduling model be established on the basis of content objects’ usage patterns, with the use of feature extraction and knowledge discovery, so as to ensure the effective functioning of massive media content portals on the Web. RELATED WORK REVIEW Data Migration has attracted the attention of many researchers in Information Processing and Computer Science since the last decade of the 20th century. In 1978, Todd (1978) of IBM posed the problem of data migration of geographically distrib-uted databases, with the support of rights manage-ment, after which studies on different aspects of data migration were carried out. IEEE Storage System Standards Working Group published the model and infrastructure of massive storage in 1994, and after that, Data Migration studies are largely carried out in the environment of distributed massive storage, with consideration on network storage architectures, such as SAN, NAS and iSCSI. Current research on Data Migration is concentrated in 3 directions: study of data migration models based on engineering experi-ences; study of scheduling algorithms based on cy-bernetics, and study of system policies combining data migration with related technologies. Khuller et al.(2004) established the polyno-mial-based temporal analysis model of data migration, and his implementation of the scheduling algorithm yielded the worst-case bound of 9.5. Gandhi (2004) established a 5.06 approximation algorithm for the Open Shop problem, which takes the complete mi-gration time as the cost variable. This is much more advantageous than typical algorithms of 9.0 and 5.83. Driven by the digitized and networked broad-casting media, and the convergent multimedia in-formation services, the Content Management Plat-form has become an important part of the information industry’s infrastructure. This emphasizes Hierar-chical Data Migration as one of the key problems within the domain of multimedia information proc-essing. Research on intelligent data migration in the integrated content service environment was presented in (Cao et al., 2004; Hu et al., 2005). And many case studies in application were carried out, for example Hu et al.(2004) has done a study focused on the automatic data migration schema based on TSM and DIVA of the IBM storage platform. With the concrete progress of intelligence and cognitive sciences, data migration researches are introducing the Ontology-based


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Please select your school