Chico CSCI 693 - The Research & Application of ETL tool in Business Intelligence Project

Unformatted text preview:

The Research & Application of ETL tool in Business Intelligence Project TangJun1 Cui Kai2 Feng Yu2 Tong Gang2 1.Petrochina Northeast Refining ﹠ Chemicals Engineering Company Limited,Shenyang Liaoning10016,China 2.College of Automation and Electrical Engineering, Qingdao University of Science and Technology, Qingdao Shandong 266042, China [email protected] Abstract In modern business, vast amounts of data are accumulated, which makes the decision-making process complicated. How to change the existing situation of "mass data, poor knowledge", support better business decision-making and help enterprises increase profits and market share become the business and IT sector issues of mutual concern. Business intelligence technologies emerge as the times require. ETL plays an important part role in BI project, which realize the technical service and decision-making support. This article sets out an overview of ETL, the main module of ETL, the optimization scheme of ETL, as well as the specific implementation of the ETL process. Keywords: BI; ETL; technical service; decision-making; optimization scheme 1. Introduction By now, the development of BI (Business Intelligence) [1-4] was relatively mature in foreign countries and BI has been widely applied in government, finance, insurance, retail and manufacturing industries. From the domestic point of view, BI has been acknowledged by the managers of financial, telecommunications, retail and other industries. Therefore, BI became a key point in the enterprise enformationalization engineering. The implementation of ERP, SCM and CRM system has accumulated a large amount of data resources, which laid a good foundation for the implementation of BI system. ETL process is the basis of BI and it matters the success or failure of BI project, therefore an urgent need to improve the design and implementation of the ETL level. 2. The overview of ETL As the core and soul of BI, ETL (Extract-Transform-Load) [5-7] can integrate and increase the value of data according to the uniformed rules. After data extract, data clearing and data arrangement, data will be loaded into the data warehouse, which is pre-defined precisely. In brief, ETL is a transfer process from data source to the target data warehouse and an important step of data warehouse implementation. From the practical experience at home and abroad, it is a common sense that ETL rules design and implementation is a key element in BI project. 3. The main module of ETL Accurate way to design ETL process to make it efficient, flexible and maintainability. ETL can be divided into five modules: data extraction, data validation, data cleaning, data conversion and data loading. ETL process is in Fig.1: Figure 1: ETL process 3.1 Data extraction This step requires a great deal of research preparation work. First of all, we need to find out which business system does the business data come from, what kind of database management system the business database server runs. Secondly, we need to know what kind of table structure does the database have and the corresponding meaning of each table structure. Thirdly, we need to check whether there exist the manual data and the quantity of the data. Fourthly, we should define whether there exist the unstructured data. After collecting all of the information, we give out he data explanation file. 3.2 Data validation Data validation involves a lot of checking work, including the effective value of the property, foreign key checking and so on. As for the low-quantity data, we can 2009 International Forum on Information Technology and Application978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Application978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Application978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Application978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Application978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Application978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Applications978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Applications978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Applications978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.486202009 International Forum on Information Technology and Applications978-0-7695-3600-2/09 $25.00 © 2009 IEEEDOI 10.1109/IFITA.2009.48620refuse them firstly, and then these data will be stored in order to be fixed in the field of the amendment [8-9]. 3.3 Data cleaning The task of data cleaning is to filter out the undesirable data, and then send them to the business operation department. These undesirable data include: incomplete data, wrong data, duplicated data and so on [10]. 3.4 Data conversion From the micro-details perspective, data conversion involves the following types: direct mapping, field operations, character string processing, null value determination, date conversion, date operation, and assemble operations and so on [11]. 3.5 Data loading Data will be moved to the center of the target data warehouse table, it is usually the last step in the process of ETL. As to the best way to load data, the implementation depends on the type of operation and the quantity of data. There are two ways to insert or update data in the database table: SQL insert/update/delete or batch loading application program. 4. The optimization scheme of ETL According to the characteristic of ETL and the demands of practical BI project, the optimization scheme of ETL is given out. The highlight spot of BI project is the suitable scheme of ETL, which associate with the traditional ETL solution and apply in the practical BI project. 4.1 Optimize the solution method There are two solution methods to realize the ETL process. The two ETL solution methods are in table1: Table 1: Two ETL solution methods flexibility complexity efficiency develop cycle work load price ETL tools good easy


View Full Document

Chico CSCI 693 - The Research & Application of ETL tool in Business Intelligence Project

Download The Research & Application of ETL tool in Business Intelligence Project
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view The Research & Application of ETL tool in Business Intelligence Project and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view The Research & Application of ETL tool in Business Intelligence Project 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?