1 Lecture 16: Relational Algebra Monday, May 10, 2010 Dan Suciu -- 444 Spring 20102 Outline Relational Algebra: • Chapters 5.1 and 5.2 Dan Suciu -- 444 Spring 2010The WHAT and the HOW • In SQL we write WHAT we want to get form the data • The database system needs to figure out HOW to get the data we want • The passage from WHAT to HOW goes through the Relational Algebra 3 Dan Suciu -- 444 Spring 2010 Data Independence4 SQL = WHAT SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = y.cid and x.price > 100 and z.city = ‘Seattle’ Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) Dan Suciu -- 444 Spring 2010 It’s clear WHAT we want, unclear HOW to get it5 Relational Algebra = HOW Product Purchase pid=pid price>100 and city=‘Seattle’ x.name,z.name δ"Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city) cid=cid Customer Π"σ"T1(pid,name,price,pid,cid,store) T2( . . . .) T4(name,name) Final answer T3(. . . ) Temporary tables T1, T2, . . .Relational Algebra = HOW The order is now clearly specified: 6 Dan Suciu -- 444 Spring 2010 Iterate over PRODUCT…!…join with PURCHASE…!…join with CUSTOMER…!…select tuples with Price>100 and City=‘Seattle’…!…eliminate duplicates…!…and that’s the final answer !!Sets v.s. Bags • Sets: {a,b,c}, {a,d,e,f}, { }, . . . • Bags: {a, a, b, c}, {b, b, b, b, b}, . . . Relational Algebra has two semantics: • Set semantics • Bag semantics 7 Dan Suciu -- 444 Spring 2010Dan Suciu -- 444 Spring 2010 Extended Algebra Operators • Union ∪, intersection ∩, difference - • Selection σ"• Projection Π"• Join ⨝ • Rename ρ"• Duplicate elimination δ"• Grouping and aggregation γ"• Sorting τ"89 Relational Algebra (1/3) The Basic Five operators: • Union: ∪ • Difference: - • Selection: σ • Projection: Π • Join: ⨝ Dan Suciu -- 444 Spring 201010 Relational Algebra (2/3) Derived or auxiliary operators: • Renaming: ρ • Intersection, complement • Variations of joins – natural, equi-join, theta join, semi-join, cartesian product Dan Suciu -- 444 Spring 2010Relational Algebra (3/3) Extensions for bags: • Duplicate elimination: δ!• Group by: γ!• Sorting: τ!11 Dan Suciu -- 444 Spring 2010Union and Difference Dan Suciu -- 444 Spring 2010 12 What do they mean over bags ? R1 ∪ R2!R1 – R2!13 What about Intersection ? • Derived operator using minus • Derived using join (will explain later) Dan Suciu -- 444 Spring 2010 R1 ∩ R2 = R1 – (R1 – R2)!R1 ∩ R2 = R1 ⨝ R2!14 Selection • Returns all tuples which satisfy a condition • Examples – σSalary > 40000 (Employee) – σname = “Smith” (Employee) • The condition c can be =, <, ≤, >, ≥, <> Dan Suciu -- 444 Spring 2010 σc(R)!15 σSalary > 40000 (Employee) SSN Name Salary 1234545 John 200000 5423341 Smith 600000 4352342 Fred 500000 SSN Name Salary 5423341 Smith 600000 4352342 Fred 500000 Dan Suciu -- 444 Spring 2010 Employee!16 Projection • Eliminates columns • Example: project social-security number and names: – Π SSN, Name (Employee) – Answer(SSN, Name) Semantics differs over set or over bags Dan Suciu -- 444 Spring 2010 Π A1,…,An (R)!17 Π Name,Salary (Employee) SSN Name Salary 1234545 John 200000 5423341 John 600000 4352342 John 200000 Name Salary John 20000 John 60000 John 20000 Dan Suciu -- 444 Spring 2010 Employee!Name Salary John 20000 John 60000 Bag semantics!Set semantics!Which is more efficient to implement ?!18 Cartesian Product • Each tuple in R1 with each tuple in R2 • Very rare in practice; mainly used to express joins Dan Suciu -- 444 Spring 2010 R1 × R2!19 Dan Suciu -- 444 Spring 2010 Name! SSN!John! 999999999!Tony! 777777777!Employee!EmpSSN! DepName!999999999! Emily!777777777! Joe!Dependent!Employee ✕ Dependent!Name! SSN! EmpSSN! DepName!John! 999999999! 999999999! Emily!John! 999999999! 777777777! Joe!Tony! 777777777! 999999999! Emily!Tony! 777777777! 777777777! Joe!20 Renaming • Changes the schema, not the instance • Example: – ρN, S(Employee) Answer(N, S) Dan Suciu -- 444 Spring 2010 ρ B1,…,Bn (R)!21 Natural Join • Meaning: R1⨝ R2 = ΠA(σ(R1 × R2)) • Where: – The selection σ checks equality of all common attributes – The projection eliminates the duplicate common attributes Dan Suciu -- 444 Spring 2010 R1 ⨝ R2!Natural Join Dan Suciu -- 444 Spring 2010 22 A B X Y X Z Y Z Z V B C Z U V W Z V A B C X Z U X Z V Y Z U Y Z V Z V W R! S!R ⨝ S =!ΠABC(σR.B=S.B(R × S)) !23 Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⨝ S ? • Given R(A, B, C), S(D, E), what is R ⨝ S ? • Given R(A, B), S(A, B), what is R ⨝ S ? Dan Suciu -- 444 Spring 201024 Theta Join • A join that involves a predicate • Here θ can be any condition Dan Suciu -- 444 Spring 2010 R1 ⨝θ R2 = σ θ (R1 × R2)!25 Eq-join • A theta join where θ is an equality • This is by far the most used variant of join in practice Dan Suciu -- 444 Spring 2010 R1 ⨝A=B R2 = σA=B (R1 × R2)!So Which Join Is It ? • When we write R ⨝ S we usually mean an eq-join, but we often omit the equality predicate when it is clear from the context 26 Dan Suciu -- 444 Spring 201027 Semijoin • Where A1, …, An are the attributes in R Dan Suciu -- 444 Spring 2010 R ⋉C S = Π A1,…,An (R ⨝C S)!Semijoins in Distributed Databases 28 SSN Name Stuff . . . . . . . . . . EmpSSN DepName Age Stuff . . . . . . . . . . . . . Employee Dependent network Employee ⨝SSN=EmpSSN (σ age>71 (Dependent)) Task: compute the query with minimum amount of data transfer !Semijoins in Distributed Databases 29 SSN Name Stuff . . . . . . . . . . EmpSSN DepName Age Stuff . . . . . . . . . . . . . Employee Dependent network Employee ⨝SSN=EmpSSN (σ age>71 (Dependent)) T(SSN) = Π SSN σ age>71 (Dependents)Semijoins in Distributed Databases 30 SSN Name Stuff . . . . . . . . . . EmpSSN DepName Age Stuff . . . . . . . . . . . . . Employee Dependent network Employee
View Full Document