Mt Holyoke CS 341 - Data Warehousing and Decision Support

Unformatted text preview:

Data Warehousing and Decision SupportViews and Decision SupportView Modification (Evaluate On Demand)View Materialization (Precomputation)Materialized ViewsIssues in View MaterializationView MaintenanceDeferred MaintenanceSnapshots in Oracle 7Issues in View Maintenance (1)Issues in View Maintenance (2)Issues in View Maintenance (3)Incremental Maintenance Alg: One Rule, InsertsIncremental Maintenance Alg: One Rule, DeletesIncremental Maintenance Alg: GeneralMaintaining Warehouse ViewsExample of Warehouse View Maint.Warehouse View MaintenanceDatabase Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 1Data Warehousing and Decision SupportChapter 25, Part BDatabase Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 2Views and Decision SupportOLAP queries are typically aggregate queries.Precomputation is essential for interactive response times.The CUBE is in fact a collection of aggregate queries, and precomputation is especially important: lots of work on what is best to precompute given a limited amount of space to store precomputed results.Warehouses can be thought of as a collection of asynchronously replicated tables and periodically maintained views.Has renewed interest in view maintenance!Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 3View Modification (Evaluate On Demand)CREATE VIEW RegionalSales(category,sales,state)AS SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locidSELECT R.category, R.state, SUM(R.sales)FROM RegionalSales AS R GROUP BY R.category, R.stateSELECT R.category, R.state, SUM(R.sales)FROM (SELECT P.category, S.sales, L.stateFROM Products P, Sales S, Locations LWHERE P.pid=S.pid AND S.locid=L.locid) AS RGROUP BY R.category, R.stateViewQueryModifiedQueryDatabase Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 4View Materialization (Precomputation)Suppose we precompute RegionalSales and store it with a clustered B+ tree index on [category,state,sales].Then, previous query can be answered by an index-only scan.SELECT R.state, SUM(R.sales)FROM RegionalSales RWHERE R.category=“Laptop”GROUP BY R.stateSELECT R.state, SUM(R.sales)FROM RegionalSales RWHERE R. state=“Wisconsin”GROUP BY R.categoryIndex on precomputed view is great!Index is less useful (must scan entire leaf level).Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 5Materialized ViewsA view whose tuples are stored in the database is said to be materialized.Provides fast access, like a (very high-level) cache.Need to maintain the view as the underlying tables change.Ideally, we want incremental view maintenance algorithms.Close relationship to data warehousing, OLAP, (asynchronously) maintaining distributed databases, checking integrity constraints, and evaluating rules and triggers.Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 6Issues in View MaterializationWhat views should we materialize, and what indexes should we build on the precomputed results?Given a query and a set of materialized views, can we use the materialized views to answer the query?How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 7View MaintenanceTwo steps:Propagate: Compute changes to view when data changes.Refresh: Apply changes to the materialized view table.Maintenance policy: Controls when we do refresh.Immediate: As part of the transaction that modifies the underlying data tables. (+ Materialized view is always consistent; - updates are slowed)Deferred: Some time later, in a separate transaction. (- View becomes inconsistent; + can scale to maintain many views without slowing updates)Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 8Deferred MaintenanceThree flavors:Lazy: Delay refresh until next query on view; then refresh before answering the query. Periodic (Snapshot): Refresh periodically. Queries possibly answered using outdated version of view tuples. Widely used, especially for asynchronous replication in distributed databases, and for warehouse applications.Event-based: E.g., Refresh after a fixed number of updates to underlying data tables.Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 9Snapshots in Oracle 7A snapshot is a local materialization of a view on data stored at a master site.Periodically refreshed by re-computing view entirely.Incremental “fast refresh” for “simple snapshots” (each row in view based on single row in a single underlying data table; no DISTINCT, GROUP BY, or aggregate ops; no sub-queries, joins, or set ops)•Changes to master recorded in a log by a trigger to support this.Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 10Issues in View Maintenance (1)What information is available? (Base relations, materialized view, ICs). Suppose parts(p5,5000) is inserted:Only materialized view available: Add p5 if it isn’t there.Parts table is available: If there isn’t already a parts tuple p5 with cost >1000, add p5 to view.•May not be available if the view is in a data warehouse!If we know pno is key for parts: Can infer that p5 is not already in view, must insert it.expensive_parts(pno) :- parts(pno, cost), cost > 1000Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 11Issues in View Maintenance (2)What changes are propagated? (Inserts, deletes, updates). Suppose parts(p1,3000) is deleted:Only materialized view available: If p1 is in view, no way to tell whether to delete it. (Why?)•If count(#derivations) is maintained for each view tuple, can tell whether to delete p1 (decrement count and delete if = 0).Parts table is available: If there is no other tuple p1 with cost >1000 in parts, delete p1 from view.If we know pno is key for parts: Can infer that p1 is currently in view, and must be deleted.expensive_parts(pno) :- parts(pno, cost), cost > 1000Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke 12Issues in View Maintenance (3)View definition language? (Conjunctive queries, SQL subset, duplicates, aggregates, recursion)Suppose parts(p5,5000) is inserted:Can’t tell whether to insert p5 into view if we’re only


View Full Document

Mt Holyoke CS 341 - Data Warehousing and Decision Support

Download Data Warehousing and Decision Support
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Data Warehousing and Decision Support and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Data Warehousing and Decision Support 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?