DOC PREVIEW
Berkeley COMPSCI C267 - Parallel Database Primer

This preview shows page 1-2-3-20-21-40-41-42 out of 42 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 42 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Parallel Database PrimerTodayA Little HistoryRelational Data ModelTwo Levels of IndirectionExample: University DatabaseData IndependenceStructure of a DBMSRelational Query LanguagesSlide 10Formal Relational Query LanguagesPreliminariesRelational AlgebraProjectionSelectionCross-ProductJoinsSlide 18Basic SQLConceptual Evaluation StrategyQuery Optimization & ProcessingWorkloadsSlide 23Parallelizing SortHash JoinParallelizing Hash JoinThemes in Parallel QPDisk LayoutHandling SkewQuery OptimizationParallel Query OptimizationParallel Query SchedulingSlide 33Case Study: TeradataHistory and StatusTeraData Data LayoutTeraData Query ExecutionMore on TeraData QPLessons to LearnMore LessonsMoving OnwardHistory & ResourcesParallel Database PrimerJoe HellersteinToday•Background:–The Relational Model and you–Meet a relational DBMS•Parallel Query Processing: sort and hash-join–We will assume a “shared-nothing” architecture–Supposedly hardest to program, but actually quite clean•Data Layout•Parallel Query Optimization•Case Study: TeradataA Little History•In the Dark Ages of databases, programmers reigned–data models had explicit pointers (C on disk)–brittle Cobol code to chase pointers•Relational revolution: raising the abstraction–Christos: “as clear a paradigm shift as we can hope to find in computer science”–declarative languages and data independence–key to the most successful parallel systems•Rough Timeline–Codd’s papers: early 70’s–System R & Ingres: mid-late 70’s–Oracle, IBM DB2, Ingres Corp: early 80’s–rise of parallel DBs: late 80’s to todayRelational Data Model•A data model is a collection of concepts for describing data.•A schema is a description of a particular collection of data, using the a given data model.•The relational model of data :–Main construct: relation, basically a table with rows and columns.–Every relation has a schema, which describes the columns, or fields.–Note: no pointers, no nested structures, no ordering, no irregular collectionsTwo Levels of Indirection•Many views, single conceptual (logical) schema and physical schema.–Views describe how users see the data. –Conceptual schema defines logical structure–Physical schema describes the files and indexes used.Physical SchemaConceptual SchemaView 1 View 2 View 3Example: University Database•Conceptual schema: – Students(sid: string, name: string, login: string, age: integer, gpa:real)– Courses(cid: string, cname:string, credits:integer) – Enrolled(sid:string, cid:string, grade:string)•Physical schema:–Relations stored as unordered files. –Index on first column of Students.•External Schema (View): –Course_info(cid:string,enrollment:integer)Data Independence•Applications insulated from how data is structured and stored.•Logical data independence: –Protection from changes in logical structure of data.–Lets you slide || systems under traditional apps•Physical data independence: –Protection from changes in physical structure of data.–Minimizes constraints on processing, enabling clean parallelismStructure of a DBMS•A typical DBMS has a layered architecture.•The figure does not show the concurrency control and recovery components.•This is one of several possible architectures; each system has its own variations.Query Optimizationand ExecutionRelational OperatorsFiles and Access MethodsBuffer ManagementDisk Space ManagementDBParallel considerationsmostly hereBy relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental power of the race.-- Alfred North Whitehead (1861 - 1947)Relational Query LanguagesRelational Query Languages•Query languages: Allow manipulation and retrieval of data from a database.•Relational model supports simple, powerful QLs:–Strong formal foundation based on logic.–Allows for much optimization/parallelization•Query Languages != programming languages!–QLs not expected to be “Turing complete”.–QLs not intended to be used for complex calculations.–QLs support easy, efficient access to large data sets.Formal Relational Query LanguagesTwo mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:Relational Algebra: More operational, very useful for representing internal execution plans. “Database byte-code”. Parallelizing these is most of the game.-Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative -- SQL comes from here.)Preliminaries•A query is applied to relation instances, and the result of a query is also a relation instance.–Schemas of input relations for a query are fixed (but query will run regardless of instance!)–The schema for the result of a given query is also fixed! Determined by definition of query language constructs.–Languages are closed (can compose queries)Relational Algebra•Basic operations:–Selection () Selects a subset of rows from relation.–Projection () Hides columns from relation.–Cross-product (x) Concatenate tuples from 2 relations.–Set-difference (—) Tuples in reln. 1, but not in reln. 2.–Union () Tuples in reln. 1 and in reln. 2.•Additional operations:–Intersection, join, division, renaming: Not essential, but (very!) useful.Projectionsname ratingyuppy 9lubber 8guppy 5rusty 10sname ratingS,( )2age35.055.5ageS( )2•Deletes attributes that are not in projection list.•Schema of result:–exactly the fields in the projection list, with the same names that they had in the (only) input relation.•Projection operator has to eliminate duplicates! (Why??)–Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. (Why not?)SelectionratingS82( )sid sname rating age28 yuppy 9 35.058 rusty 10 35.0sname ratingyuppy 9rusty 10 sname ratingratingS,( ( ))82•Selects rows that satisfy selection condition.•No duplicates in result! •Schema of result:–identical to schema of (only) input relation.•Result relation can be the input for another relational algebra operation! (Operator composition.)( ( , ), )C sid sid S R1 1 5 2 1 1  (sid) sname rating age (sid) bid day22 dustin 7 45.0 22 101 10/ 10/ 9622 dustin 7 45.0 58 103 11/ 12/ 9631 lubber 8 55.5 22 101 10/ 10/ 9631 lubber 8 55.5 58 103


View Full Document

Berkeley COMPSCI C267 - Parallel Database Primer

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Parallel Database Primer
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallel Database Primer and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel Database Primer 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?