DOC PREVIEW
Chico CSCI 693 - A Comparative Study between ETL and E-LT approach

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

KEYWORDS: ETL, E-LT, DATA WAREHOUSE, PERFORMANCE MEASUREMENTSINTRODUCTIONRELATED WORK TECHNICAL BACKGROUNDEXPERIMENTSRESULTSCONCLUSION AND FUTURE WORK ACKNOWLEDGEMENTREFERENCESCSCI 693 Research Paper: Ranjan Vikas Abstract – Many organizations rely heavily on their data warehouse for enterprise level decision making. Since a data warehouse pulls data from various heterogeneous sources, the extract and load processes play very important roles in building an optimized data warehouse solution. In this paper we compare Extract, Transform and Load (ETL) approach and Extract, Load and Transform (E-LT) approach for loading data into a data warehouse. We will gauge the performance difference for both ETL and E-LT approaches. KEYWORDS: ETL, E-LT, DATA WAREHOUSE, PERFORMANCE MEASUREMENTSINTRODUCTIONIn today’s world everyone including commercial companies, non profit organizations,  A Comparative Study between ETL (Extract-Transform-Load) and E-LT (Extract-Load-Transform) approach for loading data into a Data Warehouse ByVikas RanjanMS Candidate in Computer Science at California State University, Chico, CA 95929 ([email protected]), 12/17/20091CSCI 693 Research Paper: Ranjan Vikasuniversities and government agencies are spending millions of dollars on building data warehouse solutions for various reasons like forecasting, profitability, process improvement, reporting etc. Since a data warehouse is a central repository for reporting and analytical purposes, the extraction of data from various heterogeneous sources and loading it into a target data warehouse plays a very important role in building an optimized data warehouse. There are different approaches that can be used for extracting data from various heterogeneous source systems and loading this data into an Enterprise Data Warehouse. The aim of this paper is to compare traditional Extract, Transform and Load (ETL) approach vs. Extract, Load and Transform (E-LT) approach for loading data into data warehouse. In this paper, we have also performed and compared performance benchmark tests between ETL and E-LT approaches. RELATED WORK The research presented in [1] covered the Extract, Transform and Load (ETL) approach for building a data warehouse solution. The old materialized view approach is discussed and compared to show how the ETL approach is better. It further focused on how to build a data warehouse solution by optimizing the ETL approach. In contrast, [2] demonstrated how to build a successful data warehouse solution using the Extract, Load and Transform (E-LT) approach. It presented the advantages in terms of cost and time in building the E-LT solution over the traditional ETL approach. The next generation of building data warehouse solutions based on Corporate Information 2CSCI 693 Research Paper: Ranjan VikasFactory is discussed in [3]. It also presents the importance of the ETL approach in building the proposed solution. The ETL approach and its key features like metadata, audit trails and data quality are well presented in [3].Both the ETL and the E-LT approaches are thoroughly discussed in [4], where the pros and the cons of both approaches are also presented.TECHNICAL BACKGROUNDExtract Transform and Load (ETL) Approach: The traditional Extract, Transform and Load (ETL) approach operates by first extracting data from various heterogeneous sources like databases, flat files, ERP systems, CRM systems and main frame systems [2]. Different business rules are applied on the data extracted from various sources by the proprietary, middle-tier ETL engine [2]. This massaged and transformed data is finally loaded into the target data warehouse system or integration system. The process is often designed from the end backwards, in that the required output is designed first. In so doing, this informs exactly what data is required from the source. The routines designed and developed to implement the process are written specifically for the purpose of achieving the desired output, and only the data required for the output is included in the extraction process [4].Business rules that define how aggregations are achieved and the relationships between the various entities in both the source and target, are designed and therefore coded into the routines that implement the ETL process. This approach leads to tight dependencies in the routines at each stage of the process [4].3CSCI 693 Research Paper: Ranjan VikasFigure 1 (from [4])Strengths: • Designing from the output backwards ensures that only data relevant to the solution is extracted and processed, potentially reducing development, extract, and processing overhead, thus reducing the time to build the solution. [4]• Due to the targeted nature of the load process, the data warehouse contains only data relevant to the presentation.[4]• ETL can perform more complex operations in single data flow diagrams (data maps).Weaknesses:• The data transformation step of the ETL approach is the most compute-intensive and is performed entirely by the proprietary ETL engine on a dedicated server. This increases the job’s runtime as well as more hardware costs. • The ETL engine performs data transformations and sometimes data quality checks on a row-by-row basis. This can easily become the bottleneck in the overall process [4].4CSCI 693 Research Paper: Ranjan Vikas• The data is moved over the network twice – once between sources and the ETL server and again between the ETL server and the target data warehouse [2].• Since only the relevant data in captured in the data warehouse, data needed for any future requirements might not exists in the data warehouse and will need to be added to the ETL routines. Due to nature of tight dependency between the routines developed, this often leads to a need for fundamental re-design and development. As a result this increases the time and costs involved.Extract, Load and Transform (E-LT) Approach: The Extract, Load and Transform (E-LT) approach incorporates both the manual coding as well as leveraging ETL approach in the same solution [2]. The data is extracted in the same way as in the ETL approach. This data extracted from different sources is now loaded into the target data warehouse system. Once loaded, the transformations and business logics are applied using native SQL drivers. This helps in saving cost and extra


View Full Document

Chico CSCI 693 - A Comparative Study between ETL and E-LT approach

Download A Comparative Study between ETL and E-LT approach
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Comparative Study between ETL and E-LT approach and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Comparative Study between ETL and E-LT approach 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?