1 Introduction to Database Systems CSE 444 Lecture 18: Relational Algebra Magda Balazinska - CSE 444, Fall 2010Outline • Motivation and sets v.s. bags • Relational Algebra • Translation from SQL to the Relational Algebra • Read Sections 2.4, 5.1, and 5.2 – [Old edition: 5.1 through 5.4] – These book sections go over relational operators 2 Magda Balazinska - CSE 444, Fall 2010The WHAT and the HOW • In SQL, we write WHAT we want to get from the data • The database system needs to figure out HOW to get the data we want • The passage from WHAT to HOW goes through the Relational Algebra 3 Magda Balazinska - CSE 444, Fall 20104 SQL = WHAT SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’ Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) It’s clear WHAT we want, unclear HOW to get it Magda Balazinska - CSE 444, Fall 20105 Relational Algebra = HOW Product Purchase pid=pid price>100 and city=‘Seattle’ x.name,z.name δ"Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) cid=cid Customer Π"σ"T1(pid,name,price,pid,cid,store) T2( . . . .) T4(name,name) Final answer T3(. . . ) Temporary tables T1, T2, . . .Relational Algebra = HOW The order is now clearly specified: • Iterate over PRODUCT… • …join with PURCHASE… • …join with CUSTOMER… • …select tuples with Price>100 and City=‘Seattle’… • …eliminate duplicates… • …and that’s the final answer ! 6 Magda Balazinska - CSE 444, Fall 2010Relations • A relation is a set of tuples – Sets: {a,b,c}, {a,d,e,f}, { }, . . . • But, commercial DBMS’s implement relations that are bags rather than sets – Bags: {a, a, b, c}, {b, b, b, b, b}, . . . Magda Balazinska - CSE 444, Fall 2010 7Sets v.s. Bags Relational Algebra has two flavors: • Over sets: theoretically elegant but limited • Over bags: needed for SQL queries + more efficient – Example: Compute average price of all products We discuss set semantics • We mention bag semantics only where needed 8 Magda Balazinska - CSE 444, Fall 2010Outline • Motivation and sets v.s. bags • Relational Algebra • Translation from SQL to the Relational Algebra • Read Sections 2.4, 5.1, and 5.2 – [Old edition: 5.1 through 5.4] – These book sections go over relational operators 9 Magda Balazinska - CSE 444, Fall 2010Magda Balazinska - CSE 444, Fall 2010 Relational Algebra • Query language associated with relational model • Queries specified in an operational manner – A query gives a step-by-step procedure • Relational operators – Take one or two relation instances as argument – Return one relation instance as result – Easy to compose into relational algebra expressions 10Relational Algebra (1/3) Five basic operators: • Union (∪) and Set difference (–) • Selection: : σcondition(S) – Condition is Boolean combination (∧,∨) of terms – Term is: attribute op constant, attr. op attr. – Op is: <, <=, =, ≠, >=, or > • Projection: πlist-of-attributes(S) • Cross-product or cartesian product (×) Magda Balazinska - CSE 444, Fall 2010 11Relational Algebra (2/3) Derived or auxiliary operators: • Intersection (∩), Division (R/S) • Join: R θ S = σθ(R × S) • Variations of joins – Natural, equijoin, theta-join – Outer join and semi-join • Rename ρ B1,…,Bn (S) Magda Balazinska - CSE 444, Fall 2010 12Relational Algebra (3/3) Extensions for bags • Duplicate elimination: δ • Group by: γ [Same symbol as aggregation] – Partitions tuples of a relation into “groups” • Sorting: τ Other extensions • Aggregation: γ (min, max, sum, average, count) Magda Balazinska - CSE 444, Fall 2010 13Union and Difference • R1 ∪ R2 • Example: – ActiveEmployees ∪ RetiredEmployees • R1 – R2 • Example: – AllEmployees – RetiredEmployees Magda Balazinska - CSE 444, Fall 2010 14 Be careful when applying to bags!What about Intersection ? • It is a derived operator • R1 ∩ R2 = R1 – (R1 – R2) • Also expressed as a join (will see later) • Example – UnionizedEmployees ∩ RetiredEmployees Magda Balazinska - CSE 444, Fall 2010 15Relational Algebra (1/3) Five basic operators: • Union (∪) and Set difference (–) • Selection: : σcondition(S) – Condition is Boolean combination (∧,∨) of terms – Term is: attribute op constant, attr. op attr. – Op is: <, <=, =, ≠, >=, or > • Projection: πlist-of-attributes(S) • Cross-product or cartesian product (×) Magda Balazinska - CSE 444, Fall 2010 1617 Selection • Returns all tuples that satisfy a condition • Notation: σc(R) • Examples – σSalary > 40000 (Employee) – σname = “Smith” (Employee) • The condition c can be – Boolean combination (∧,∨) of terms – Term is: attribute op constant, attr. op attr. – Op is: <, <=, =, ≠, >=, or > Magda Balazinska - CSE 444, Fall 201018 σSalary > 40000 (Employee) SSN Name Salary 1234545 John 200000 5423341 Smith 600000 4352342 Fred 500000 SSN Name Salary 5423341 Smith 600000 4352342 Fred 500000 Magda Balazinska - CSE 444, Fall 2010Projection • Eliminates columns • Notation: Π A1,…,An (R) • Example: project social-security number and names: ‒ Π SSN, Name (Employee) – Output schema: Answer(SSN, Name) Magda Balazinska - CSE 444, Fall 2010 19 Semantics differs over set or over bags20 Π Name,Salary (Employee) SSN Name Salary 1234545 John 200000 5423341 John 600000 4352342 John 200000 Name Salary John 20000 John 60000 Set semantics: duplicate elimination automatic21 Π Name,Salary (Employee) SSN Name Salary 1234545 John 200000 5423341 John 600000 4352342 John 200000 Name Salary John 20000 John 60000 John 20000 Bag semantics: no duplicate elimination; need explicit δ"Magda Balazinska - CSE 444, Fall 2010 Selection & Projection Examples no name zip disease 1 p1 98125 flu 2 p2 98125 heart 3 p3 98120 lung 4 p4 98120 heart Patient σdisease=‘heart’(Patient) no name zip disease 2 p2 98125 heart 4 p4 98120 heart zip disease 98125 flu
View Full Document