DOC PREVIEW
GT CS 4440 - Lineage Tracing for General Data  Warehouse Transformations

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lineage Tracing for General Data Warehouse TransformationsOutlineData WarehousesLineage TracingAn ExampleSlide 6Lineage GranularityExisting WorkTracing Lineage - DefinitionsDetermining ContributionsSlide 11Transformation ClassesSchema MappingsProvided Inverses/Tracing ProceduresProperty HierarchyFinding LineageOptimizationsTransformation GraphsPerformanceQuestions?Lineage Tracing for General Data Warehouse TransformationsYingwei Cui and Jennifer WidomComputer Science Department, Stanford UniversityPresentation by Aaron St.ClairOutlineWhat is lineage tracing?Why is tracing lineage data important?How can we find lineage data?Performance resultsData WarehousesIntegrate data from multiple sourcesData undergoes series of transformationsTransformations vary in complexityData Source 1Data Source 2Data Source N…TransformationSummarized DataLineage TracingIdentifying the specific data items in the sources that derive a given data item in the warehouseAllowsIn-depth data analysisData miningAuthorization managementView updateEfficient warehouse recoveryAn ExampleSelects items whose last quarter sales are more than twice the average of the last three quarter’s salesAn ExampleLineage GranularityCoarse-GrainedSchema-level, attribute mappingFine-GrainedSet of source data itemsExisting WorkMostly coarse-grained lineageExisting methods for fine-grained lineageExtra annotationDeveloper-defined weak inversesStatistical estimationCan’t handle complex procedural transformationsTracing Lineage - DefinitionsData set – set of data items without duplicatesTransformation – any procedure that takes data sets as input and produces data sets as outputStable (no spurious output)Deterministic (under some conditions)Lineage of a data item – set of input data items that contribute to that itemDetermining Contributions•Need to find relevant data items–Easy for simple relational operators–Difficult for procedural transformations•Select positives vs. Aggregation and sumLineage Tracing•Use of hierarchical model–Transformation classes–Schema mappings–Defined inversesTransformation ClassesTransformation class defines procedure lineage determinationFor a dispatcher:Iteratively apply transformation to inputsIf T(I) is in output set add I to lineage of the output setSchema MappingsDefined schema for input and output of a transformation•Backward key-maps –Akey  g(B)–T1Forward key-mapsf(A)  Bkey T4Backward total-mapsA  g(B)T5Provided Inverses/Tracing ProceduresBest case; someone has defined a function mapping output items to their deriving lineage itemsKnow nothing about efficiency of functionProperty HierarchyFinding Lineage•Recursively apply algorithms based on the transformation type until we reach top levelOptimizationsIndexing input data set improves performanceFunctional index using the schema optimizes queries of the form F(i) = vStore auxiliary or intermediate views in the warehouseReduce number by composing transformationsTransformation GraphsCreate a tracing sequence for each path from input to output in the graphCombine the results of each sequencePerformance•1GB warehouse•Schema mapping better than transformation class-specific algorithms•Indexing helps•Combining attributes reduces trace


View Full Document

GT CS 4440 - Lineage Tracing for General Data  Warehouse Transformations

Download Lineage Tracing for General Data  Warehouse Transformations
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lineage Tracing for General Data  Warehouse Transformations and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lineage Tracing for General Data  Warehouse Transformations 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?