DOC PREVIEW
KSU CS 8630 - Datawarehouse

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

10-30-2008, DatawarehouseSlide 2Example of DatawarehouseTypical Daily OperationsNeed for Data WarehousingData Warehouse ArchitecturesDependant datawarehouseIndependent Data MartSlide 9Slide 10Slide 11Data ReconciliationThe ETL ProcessSlide 14Slide 15Slide 16Slide 17Slide 18Slide 19Derived DataSlide 21Slide 22Slide 23Issues Regarding Star SchemaSlide 25The User Interface Metadata (data catalog)On-Line Analytical Processing (OLAP)Slide 28Slide 29Data Mining and VisualizationSummary: Data warehouse CharacteristicsTypical Data Warehouse functionsSummary - GUIDELINESEnd of LectureCS 8630 Database Administration, Dr. Guimaraes10-30-2008, DatawarehouseClassWill Start Momentarily…CS8630 Database AdministrationDr. Mario GuimaraesCS 8630 Database Administration, Dr. Guimaraes•Datawarehouse: –Integrated, –Time Varient, –Non-upatable (read-only, periodically re-freshed) •Datamartsub-set of a DatawarehouseCS 8630 Database Administration, Dr. GuimaraesExample of DatawarehouseCS 8630 Database Administration, Dr. GuimaraesTypical Daily Operations•OLPTInsertUpdateDeleteSelectDatawarehouseInserts in batchSelect retrieving many recordsCS 8630 Database Administration, Dr. GuimaraesNeed for Data Warehousing•Integrated, company-wide view of high-quality information (from disparate databases)•Separation of operational and informational systems and data (for improved performance)CS 8630 Database Administration, Dr. GuimaraesData Warehouse Architectures•Generic Two-Level Architecture•Independent Data Mart•Dependent Data Mart and Operational Data Store•Logical Data Mart and @ctive Warehouse•Three-Layer architectureAll involve some form of extraction, transformation and loading (ETLETL)CS 8630 Database Administration, Dr. GuimaraesDependant datawarehouseCS 8630 Database Administration, Dr. GuimaraesIndependent Data MartCS 8630 Database Administration, Dr. GuimaraesCS 8630 Database Administration, Dr. GuimaraesLogical data mart and @ctive data warehouseCS 8630 Database Administration, Dr. GuimaraesThree-layer architectureCS 8630 Database Administration, Dr. GuimaraesData Reconciliation•Typical operational data is:–Transient – not historical–Not normalized (perhaps due to denormalization for performance)–Restricted in scope – not comprehensive–Sometimes poor quality – inconsistencies and errors•After ETL, data should be:–Detailed – not summarized yet–Historical – periodic–Normalized – 3rd normal form or higher–Comprehensive – enterprise-wide perspective–Quality controlled – accurate with full integrityCS 8630 Database Administration, Dr. GuimaraesThe ETL Process•Capture•Scrub or data cleansing•Transform•Load and IndexETL = Extract, transform, and loadCS 8630 Database Administration, Dr. GuimaraesSteps in data reconciliationStatic extractStatic extract = capturing a snapshot of the source data at a point in timeIncremental extractIncremental extract = capturing changes that have occurred since the last static extractCapture = extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouseCS 8630 Database Administration, Dr. GuimaraesSteps in data reconciliation (cont.)Scrub = cleanse…uses pattern recognition and AI techniques to upgrade data qualityFixing errors:Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistenciesAlso:Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing dataCS 8630 Database Administration, Dr. GuimaraesSteps in data reconciliation (cont.)Transform = convert data from format of operational system to format of data warehouseRecord-level:Record-level:Selection – data partitioningJoining – data combiningAggregation – data summarizationField-level:Field-level: single-field – from one field to one fieldmulti-field – from many fields to one, or one field to manyCS 8630 Database Administration, Dr. GuimaraesSteps in data reconciliation (cont.)Load/Index= place transformed data into the warehouse and create indexesRefresh mode:Refresh mode: bulk rewriting of target data at periodic intervalsUpdate mode:Update mode: only changes in source data are written to data warehouseCS 8630 Database Administration, Dr. GuimaraesSingle-field transformationIn general – some transformation function translates data from old form to new formAlgorithmic transformation uses a formula or logical expressionTable lookup – another approachCS 8630 Database Administration, Dr. GuimaraesMultifield transformationM:1 –from many source fields to one target field1:M –from one source field to many target fieldsCS 8630 Database Administration, Dr. GuimaraesDerived Data•Objectives–Ease of use for decision support applications–Fast response to predefined user queries–Customized data for particular target audiences–Ad-hoc query support–Data mining capabilities Characteristics–Detailed (mostly periodic) data–Aggregate (for summary)–Distributed (to departmental servers)Most common data model = star schemastar schema(also called “dimensional model”)CS 8630 Database Administration, Dr. GuimaraesComponents of a star schemastar schemaFact tables contain factual or quantitative dataDimension tables contain descriptions about the subjects of the business 1:N relationship between dimension tables and fact tables Excellent for ad-hoc queries, but bad for online transaction processingDimension tables are denormalized to maximize performanceCS 8630 Database Administration, Dr. GuimaraesStar schema exampleFact table provides statistics for sales broken down by product, period and store dimensionsCS 8630 Database Administration, Dr. GuimaraesStar schema with sample dataCS 8630 Database Administration, Dr. GuimaraesIssues Regarding Star Schema•Dimension table keys must be surrogate (non-intelligent and non-business related), because:–Keys may change over time–Length/format consistency•Granularity of Fact Table – what level of detail do you want? –Transactional grain – finest level–Aggregated grain – more summarized–Finer grains  better market basket analysis capability–Finer grain  more dimension tables, more rows in fact tableCS 8630 Database Administration, Dr. GuimaraesModeling datesFact tables contain time-period data Date dimensions are importantCS 8630 Database Administration, Dr. GuimaraesThe User


View Full Document

KSU CS 8630 - Datawarehouse

Download Datawarehouse
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Datawarehouse and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Datawarehouse 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?