DOC PREVIEW
Duke CPS 296.1 - WebView Materialization and Maintenance

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1WebViewMaterialization and MaintenanceCPS 296.1Topics in Database Systems2Roadmap!Where to materialize dynamic Web content– Labrinidis and Roussopoulos. “WebView Materialization.” SIGMOD, 2000• When to refresh materialized Web content– Labrinidis and Roussopoulos. “Update Propagation Strategies for Improving the Quality of Data on the Web.” VLDB, 20013Multi-tier Web architecturecaches pagerequestrequest pagepagerequest pagequery resultformats pageprocesses query4WebView caching• WebView: a page (or fragment of a page) dynamically generated from data in DBMS– Base data– Query results– Result pages• Question: What and where to materialize?– Virtual (do not materialize)– Materialize query results inside the DBMS– Materialize result pages at the Web serverQuery at DBMSFormat at Web server5Virtual• Access– Query at DBMS– Format at Web server• Update– Update base tables at DBMS!Contention at DBMS between queries and updates6Materialize inside DBMS• Access– Read materialized query result at DBMS– Format at Web server• Update– Update base tables at DBMS– Update materialized query results at DBMS• Re-compute affect queries, or• Incrementally maintain materialized results!Contention at DBMS between reading and updating of materialized query results27Materialize at Web server• Access– Read materialized result page at Web server• Update– Update base tables at DBMS– Re-compute queries at DBMS– Re-format materialized result pages at Web server! Last two steps can be pipelined! Incremental maintenance is very difficult, if at all possible! Contention at Web server between reading and writing materialized result pages8Performance metric: response timeAverage response time of an access• Not simply the average access time over all WebViews• Account for different access frequencies– Access time of a WebView is weighted by its access frequencies• Account for contention between accesses and updates– DBMS is likely to be the bottleneck– If all WebViews are materialized at Web server, then updates do not impact accesses• Contention at Web server between reading and writing pages is ignored– Otherwise, update time is also counted9Performance metric: stalenessStaleness of WebViews• Virtual policy does not necessarily provide lowest staleness, because query time also contributes to staleness10Experimental setup• Synthetic load: single-table selections on index columns– Materialization and incremental maintenance do not buy us much in this case! Expect bad performance for the mat-db policy• Updater processes run in background to refresh WebViews• Interesting tidbits– Do not spawn a process to handle each request (like CGI does) → an order of magnitude performance improvement– Database connection pooling → another order of magnitude performance improvement11Scaling up access rate (no updates)• Mat-web definitely wins because it does not repeat any work• Mat-db ≈ virtual, indicating that for this query load, re-computing queries is as cheap as reading pre-computed results (or the cost is dominated by the overhead of interacting with a DBMS)12Scaling up access rate (with updates)• Mat-db is even worse than virtual, because of the extra work of refreshing materialized query results• Mat-web wins again: The trip to the DBMS and/or the re-formatting of the result page are worth saving313Other experiments• Scaling up the update rate– Access time under mat-web hardly changes because updates are handled in background– Virtual is worse because of access/update contention at DBMS– Mat-db is even worse because of the extra work of refreshing materialized query results• Scaling up the number of WebViews– Also making 10% of the queries simple two-table key-joins! Materialization makes more sense in this case– Mat-db works better, but is still bad with lots of updates• Scaling up the WebView size, Zipf distribution, mixing three policies, etc.14Staleness• Inferred from the results of the experiments– Under light load, virtual provides lowest staleness– With heavy load, mat-web works better because it is able to maintain fast response time15Summary of WebView materialization• Experiments indicate mat-web is the best• But have the experiments covered all practical cases?– If queries are more expensive to compute (e.g., aggregates), then mat-db should outperform virtual– If queries are cheap to maintain incrementally yet expensive to re-compute (e.g., aggregate), then mat-db could outperform mat-web!Perhaps the decision of which materialization policy to use should be made on a per-WebView basis?16Roadmap• Where to materialize dynamic Web content– Labrinidis and Roussopoulos. “WebView Materialization.” SIGMOD, 2000!When to refresh materialized Web content– Labrinidis and Roussopoulos. “Update Propagation Strategies for Improving the Quality of Data on the Web.” VLDB, 200117Serving dynamic Web content• Too many accesses?– WebView materialization• Too many updates? Surges in update rate?– Schedule refreshes of materialized WebViews to maximize their freshness!Freshness should degrade gracefully if there are not enough resources to keep up with the updates!Freshness should recover quickly after update surges18Freshness metric• Freshness of a view at a particular time:f(v, t) = 0 if v is stale at time t, or 1 otherwise– Being 1 day stale is no worse than being 1 second stale• Freshness probability during observation interval [ts, te]:pf(v,[ts, te]) = (∫ts, tef(v, t) dt) Ú (te – ts)• Overall freshness during [ts, te]:pf(db, [ts, te]) = ∑v∈db(pf(v, [ts, te]) × access-freq (v))! Related work: Cho and Garcia-Molina, SIGMOD, 2000419Scheduling refreshes• Assume that refresh operations do not overlap (parallelism not considered)• Updates on base tables should be applied in order• If U → V, then refresh U before V– V is recomputed from U! How about refreshing v6directly from r1and r2?Base tables: need to apply updatesVirtual views: no need to refreshMaterialized views: need refresh20FIFO refresh schedule• Overall freshness: 0.51• Refresh all affected views after the base table is updated0.12 0.37 0.190.070.060.090.050.0521Optimal static refresh schedule• Overall freshness: 0.68• Merge updates– v3, v5, v6 are refreshed once after two base table updates0.12 0.37 0.190.070.060.090.050.05• Favor views with biggest (access frequency Ú


View Full Document

Duke CPS 296.1 - WebView Materialization and Maintenance

Documents in this Course
Lecture

Lecture

18 pages

Lecture

Lecture

6 pages

Lecture

Lecture

13 pages

Lecture

Lecture

5 pages

Load more
Download WebView Materialization and Maintenance
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view WebView Materialization and Maintenance and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view WebView Materialization and Maintenance 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?