Unformatted text preview:

MIS 385/MBA 664 Systems Implementation with DBMS/ Database ManagementObjectivesImportance of Data QualityCharacteristics of Quality DataCauses of poor data qualityData quality improvementImproving Data Capture ProcessesData Stewardship ProgramPrinciples for High Quality Data ModelsExample of a many-to-many relationship as an entity typeData IntegrationTechniques for Data IntegrationComparing Consolidation, Federation, & Propagation as Forms of Data IntegrationMaster Data Management (MDM)Before ETL, operational data is…After ETL, data should be…The ETL ProcessCapture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouseScrub/Cleanse…uses pattern recognition and AI techniques to upgrade data qualityTransform = convert data from format of operational system to format of data warehouseSlide 21Single-field transformationMulti-field transformationSamples of Tools to Support Data Reconciliation and IntegrationMIS 385/MBA 664Systems Implementation with DBMS/Database ManagementDave [email protected] (email)http://www.davesalisbury.com/ (web site)ObjectivesDefinition of termsDescribe importance and measures of data qualityDefine characteristics of quality dataDescribe reasons for poor data quality in organizationsDescribe a program for improving data qualityDescribe three types of data integration approachesDescribe the purpose and role of master data managementDescribe four steps and activities of ETL for data integration for a data warehouseExplain various forms of data transformation for data warehousesImportance of Data QualityMinimize IT project riskMake timely business decisionsEnsure regulatory complianceExpand customer baseCharacteristics of Quality DataUniquenessAccuracyConsistencyCompletenessTimelinessCurrencyConformanceReferential integrityCauses of poor data qualityExternal data sourcesLack of control over data qualityRedundant data storage and inconsistent metadataProliferation of databases with uncontrolled redundancy and metadataData entryPoor data capture controlsLack of organizational commitmentDo not recognize poor data quality as an organizational issueData quality improvementPerform data quality auditImprove data capture processesEstablish data stewardship programApply total quality management (TQM) practicesApply modern DBMS technologyEstimate return on investmentStart with a high-quality data modelImproving Data Capture ProcessesAutomate data entry as much as possibleManual data entry should be selected from preset optionsUse trained operators when possibleFollow good user interface design principlesImmediate data validation for entered dataData Stewardship ProgramData stewardA person responsible for ensuring that organizational applications properly support the organization’s data quality goalsData governanceHigh-level organizational groups and processes overseeing data stewardship across the organizationPrinciples for High Quality Data ModelsEntity types represent underlying nature of an objectEntity types part of subtype/supertype hierarchy for universal contextActivities and associations represented by (event) entity types, not relationshipsRelationships used to represent only involvement of entity types with activities or associationsCandidate attributes suspected of representing relationships to other entity typesEntity types should have a single attribute as the primary unique identifierExample of a many-to-many relationship as an entity typeData IntegrationData integration creates a unified view of business dataOther possibilities:Application integrationBusiness process integrationUser interaction integrationAny approach required changed data capture (CDC)Indicates which data have changed since previous data integration activityTechniques for Data IntegrationConsolidation (ETL)Consolidating all data into a centralized database (like a data warehouse)Data federation (EII)Provides a virtual view of data without actually creating one centralized databaseData propagation (EAI and ERD)Duplicate data across databases, with near real-time delayComparing Consolidation, Federation, & Propagation as Forms of Data IntegrationMaster Data Management (MDM)The disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areasThree main approachesIdentity registryIntegration hubPersistentBefore ETL, operational data is…Transient–not historicalNot normalized (perhaps due to denormalization for performance)Restricted in scope–not comprehensiveSometimes poor quality–inconsistencies and errorsAfter ETL, data should be…Detailed–not summarized yetHistorical–periodicNormalized–3rd normal form or higherComprehensive–enterprise-wide perspectiveTimely–data should be current enough to assist decision-makingQuality controlled–accurate with full integrityThe ETL ProcessCapture/ExtractScrub or data cleansingTransformLoad and IndexETL = Extract, transform, and loadStatic extract = capturing a snapshot of the source data at a point in timeIncremental extract = capturing changes that have occurred since the last static extractCapture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouseFixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistenciesAlso: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing dataScrub/Cleanse…uses pattern recognition and AI techniques to upgrade data qualityRecord-level:Selection–data partitioningJoining–data combiningAggregation–data summarizationField-level: singl e-field–from one field to one fieldmulti-field–from many fields to one, or one field to manyTransform = convert data from format of operational system to format of data warehouse21Load/Index= place transformed data into the warehouse and create indexesRefresh mode:Refresh mode: bulk rewriting of target data at periodic intervalsUpdate mode:Update mode: only changes in source data are written to data warehouseFigure 12-2 Steps in data reconciliation(cont.)In general–some transformation


View Full Document

Dayton MIS 385 - Importance of Data Quality

Download Importance of Data Quality
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Importance of Data Quality and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Importance of Data Quality 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?