DOC PREVIEW
USC CSCI 585 - GammaPart3

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Homework 1 Common Mistakes Memory Leak Storing of memory pointers instead of data Memory Leak A program that uses new malloc without delete free suffers from memory leak Why C new C malloc allocate space from heap If a program loses access to this space then this memory space remains unused until the program stops executing Example pseudo code For cntr ranging from 1 to 100 do BEGIN Record new byte 1024 Set Record fields Insert the record into the database END Memory Leak Why is memory leak bad As the program executes the available heap space shrinks This space is allocated from the virtual memory managed by the operating system If the virtual address space exceeds the available memory space and starts to thrash the program becomes very very very slow Memory Leak A program that uses new malloc without delete free suffers from memory leak Why C new C malloc allocate space from heap If a program loses access to this space then this memory space remains unused until the program stops executing Correct pseudo code For cntr ranging from 1 to 100 do BEGIN Record new byte 1024 Set Record fields Insert the record into the database delete record END Returns the 1 kilobyte of data back to the heap Bad Design Store Pointers Insertion of record 1 inserts pointers not data Shahram id Record 1 Age Name 5 25 Why is this design bad When the program stops execution all the memory addresses pointers stored in the database become invalid Bad Design Store Pointers Insertion of record 1 inserts pointers not data Shahram id Record 1 0101010101010 Age Name 5 25 Why is this design bad When the program stops execution all the memory addresses pointers stored in the database become invalid The operating system may move Shahram around invalidating the stored memory address Good Design Store Data Serialize the record to generate data id Record 1 Age 5 25 Name Shahram Insertion of record 1 inserts data into the DBMS This is the right design Homework 2 Posted on the 585 web site http dblab usc edu csci585 and is due on Feb 24th Objective Use of primary and secondary indexes Highlight a few limitations of BDB when configured in main memory Use of transactions to maintain integrity of a main memory database Read the homework description for in class review this Thursday Gamma DBMS Part 3 Function Shipping versus Data Shipping Evaluation Shahram Ghandeharizadeh Computer Science Department University of Southern California Data Shipping Client retrieves data from the node Client performs computation locally Limitation Dumb servers utilizes the limited network bandwidth Process f x Xmit Data Data A Node Function Shipping Client ships the function to the node for processing Relevant data is sent to client Function f x should produce less data than the original data stored in the database Minimizes demand for the network bandwidth Process function f x Output of f x A Node Gamma Gamma is based on function shipping Hybrid hash join partitions the referenced tables across the nodes of the shared nothing architecture Data does not leave the realm of the shared nothing hardware Service Time Focus on query service time only 1 request executing in the system as a function of input table size Hash partition the table Store the results of each query back in the database Why Why Seek time is a function of the distance traveled by the disk head Join Queries Join tables A and Bprime A is 10X Bprime Produces the same number of records as BPrime Note that re partitioning the table is not that expensive Join Queries Join tables A and Bprime A is 10X Bprime Produces the same number of records as BPrime Why How to Evaluate Focus on use of parallelism and scalability of the system How Speedup Given a table with r rows and a query if the service time of the system is X with one node does it speedup by a factor of n with n nodes Scaleup Speedup If the service time of a query referencing a table with r rows and a system with n nodes is X does the service time remain X with a table of mr rows and mn nodes of Nodes Scaleup Both metrics measure service time of the system because only one request is submitted to the system of Nodes Selection Predicates Speedup Super linear speed up with 1 non clustered index and 10 clustered index selection Referenced table consists of 1 million rows Selection Predicates Scaleup Join Predicates Speedup 1 Bucket starting with 5 nodes Results would have been superlinear if Bprime did not fit in main memory of 5 nodes Join Predicates Scaleup Overhead of parallelism The scheduler coordinating the activation coordination and de activation of different operators 2009 Evolution of Gamma Shared nothing architecture consisting of thousands of nodes A node is an off the shelf commodity PC Yahoo s Pig Latin Google s Map Reduce Framework Google s Bigtable Data Model Google File System Gamma in 2009 Shared nothing architecture consisting of thousands of nodes A node is an off the shelf commodity PC Yahoo s Pig Latin Google s Map Reduce Framework Google s Bigtable Data Model Google File System Divide Conquer Gamma in 2009 Source code for Pig and hadoop are available for free download Yahoo s Pig Latin Google s Map Reduce Framework Google s Bigtable Data Model Google File System Pig Hadoop References Pig Latin Map Reduce Dean and Ghemawat MapReduce Simplified Data Processing on Large Clusters Communications of the ACM Vol 51 No 1 January 2008 Bigtable Olston et al Pig Latin A Not So Foreign Language for Data Processing SIGMOD 2008 Chang et al Bigtable A Distributed Storage System for Structured Data In OSDI 2006 GFS Ghemawat et al The Google File System In SOSP 2003 Overview Pig Latin A high level program that specifies a query execution plan Example For each sufficiently large category retrieve the average pagerank of high pagerank urls in that category SQL assuming a table urls url category pagerank SELECT FROM WHERE GROUP BY HAVING category AVG pagerank urls pagerank 0 2 category count 1 000 000 Overview Pig Latin A high level program that specifies a query execution plan Example For each sufficiently large category retrieve the average pagerank of high pagerank urls in that category Pig Latin 1 2 3 4 Good urls FILTER urls BY pagerank 0 2 Groups GROUP Good urls BY category Big groups FILTER Groups by COUNT Good urls 1 000 000 Output FOREACH Big groups GENERATE category AVG Good urls AVG Good urls pagerank Overview Map Reduce Hadoop A programming model to make parallelism transparent to a programmer Programmer


View Full Document

USC CSCI 585 - GammaPart3

Loading Unlocking...
Login

Join to view GammaPart3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view GammaPart3 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?