3/9/20091Thap PanitanarakCS6260 Spring 2009Solving Linear System using Parallel Gaussion Eliminationa0,0x0+ a0,1x1+ a0,2x2... + a0,n-1xn-1= b0a1,0x0+ a1,1x1+ a1,2x2... + a1,n-1xn-1= b1... ... ... ... ... ...an-1,0x0+ an-1,1x1+ an-1,2x2... + an-1,n-1xn-1= bn-1Linear System Linear System or System of Linear Equations n equations of n variables ai,jis a coefficient of xjin equation i3/9/20092Linear System Matrix representationa0,0a0,1a0,2... a0,n-1x0b0a1,0a1,1a1,2... a1,n-1x1= b1... ... ... ... ... ...an-1,0an-1,1an-1,2... an-1,n-1xn-1bn-1AxbLinear System Matrix representation – Augmented matrixa0,0a0,1a0,2... a0,n-1b0a1,0a1,1a1,2... a1,n-1b1... ... ... ... ... ...an-1,0an-1,1an-1,2... an-1,n-1bn-1[A:b]3/9/20093Linear System Matrix representation – Triangular matrixa0,0a0,1a0,2... a0,n-10 a'1,1a'1,2... a'1,n-1... ... ... ... ...0 0 0 ... 0 a'n-1,n-1TSolving Linear SystemAx = bOriginal Linear SystemTx = cBy Gaussian Eliminationx = dBy Back Substitution3/9/200941x0+ 1x1- 1x2= 0- 2x1- 3x2= 32x2= 62x3= 41x0+ 1x1- 1x2+ 4x3= 8- 2x1- 3x2+ 1x3= 52x2- 3x3= 02x3= 4Back Substitution Solve a linear system Tx = c, where T is upper triangular Substitute x3= 2 Next x2and x11x0= 3- 2x1= 122x2= 62x3= 41x0+ 1x1= 3- 2x1= 122x2= 62x3= 4Back Substitution Substitute x2= 3 Substitute X1= -6 We get x0= 3, x1= -6, x2= 3 and x3= 23/9/20095Gaussian Elimination General Concept “Well-Known” algorithm for solving the linear system Ax = b Reduce Ax = b to Tx = c where T is an upper triangular matrix Back substitution Tx = c to solve for x Row Operations used Multiply any row with a nonzero constant Row swap Add/Subtract one row with another4x0+ 6x1+ 2x2- 2x3= 8- 3x1+ 4x2- 1x3= 0+ 3x1- 3x2+ 2x3= 9+ 6x1- 6x2+ 7x3= 244x0+ 6x1+ 2x2- 2x3= 82x0+ 5x2- 2x3= 4- 4x0- 3x1- 5x2+ 4x3= 18x0+ 18x1- 2x2+ 3x3= 40Gaussian Elimination Consider the linear system m1,0= a1,0/a0,0= 0.5, m2,0= a2,0/a0,0= -1 and m3,0= a3,0/a0,0= 2 Lower elements of column 0 are eliminated !!!3/9/200964x0+ 6x1+ 2x2- 2x3= 8- 3x1+ 4x2- 1x3= 0+ 1x2+ 1x3= 9+ 2x2+ 5x3= 244x0+ 6x1+ 2x2- 2x3= 8- 3x1+ 4x2- 1x3= 0+ 1x2+ 1x3= 9+ 3x3= 6Gaussian Elimination m2,1= a2,1/a1,1= -1 and m3,1= a3,1/a1,1= -2 m3,2= a3,2/a2,2= 2 We get Tx = c+ 6x1+ 2x2- 2x3= 82x0+ 5x2- 2x3= 4- 4x0- 3x1- 5x2+ 4x3= 18x0+ 18x1- 2x2+ 3x3= 40Gaussian Elimination There is a problem !!! Consider the system We cannot compute mi,0= ai,0/a0,0, i = 1..3 Divided by Zero Solution: Partial Pivoting3/9/20097+ 6x1+ 2x2- 2x3= 82x0+ 5x2- 2x3= 4- 4x0- 3x1- 5x2+ 4x3= 18x0+ 18x1- 2x2+ 3x3= 40Gaussian Elimination Partial Pivoting Use the row that has the biggest absolute value of a pivot column as a pivot row (row swap needed) Also make the computation more accuracy8x0+ 18x1- 2x2+ 3x3= 402x0+ 5x2- 2x3= 4- 4x0- 3x1- 5x2+ 4x3= 1+ 6x1+ 2x2- 2x3= 8Gaussian Elimination Sequential Algorithmfor i from 0 to n-1 do//Find Pivot Rowpmax = 0for j from i to n-1 doif pmax < |a[j,i]| thenpmax = |a[j,i]|prow = jend ifend dorswap(i,prow)//Gaussian Eliminationfor j from i+1 to n-1 dom = a[j,i]/a[i,i]for k from i to n doa[j,k] = a[j,k]-a[i.k]*mend doend doend do//Back Substitutionfor i from n-1 to 0x[i] = a[i,n]/a[i,i]for j from 0 to i-1 doa[i,n] = a[j,n]-x[i]*a[j,i]end doend doNote: To make row swap more efficiently, we can also use indirect index loc[i] = j where “physical row” j is indexed by “virtual row” i3/9/20098+ 6x1+ 2x2- 2x3= 82x0+ 5x2- 2x3= 4- 4x0- 3x1- 5x2+ 4x3= 18x0+ 18x1- 2x2+ 3x3= 40Gaussian Elimination Indirect index for row swap+ 6x1+ 2x2- 2x3= 82x0+ 5x2- 2x3= 4- 4x0- 3x1- 5x2+ 4x3= 18x0+ 18x1- 2x2+ 3x3= 4001233210Gaussian EliminationSequential Algorithm - Analysis Gaussian elimination with partial pivoting In iteration k (column k), Finding pivot row step uses (n-k) Elimination step uses (n-k-1)(n-k) n iterations (0 to n-1) n(n+1)/2 + n(n-1)(n+1)/3 O(n3) Back substitution In iteration k, it takes (n-k-1) n iterations n(n-1)/2 O(n2) All in O(n3)3/9/200991x0= 8 - 4x3+ 1x2- 1x1- 2x1= 5 - 1x3+ 3x22x2= 0 + 3x32x3= 41x0+ 1x1- 1x2+ 4x3= 8- 2x1- 3x2+ 1x3= 52x2- 3x3= 02x3= 4Gaussian EliminationParallel Design Back Substitution Consider We can change to1x0= 8 - 4x3+ 1x2- 1x1- 2x1= 5 - 1x3+ 3x22x2= 0 + 3x32x3= 4Gaussian EliminationParallel Design Back Substitution Processors’ asignment (row-wise) P3computes x3and broadcast x3to others P0, P1and P2update their values (b’s) P2computes x2and broadcast x2to others P0and P1update their values (b’s) P1computes x1and broadcast x1to others P0updates its value and computes its x0P1P2P3P03/9/200910Gaussian EliminationParallel Design Back Substitution – Algorithm & Analysisfor i from n-1 to 0 doP(i) computes x[i]P(i) boardcasts x[i] to others P’sfor j from 0 to i-1 do in parallelP(j) updates b[j]end doend doP(0) computes x[0] With n processors, we can achieve O(n*log2n) Speedup S = n2/nlog2n = n/log2n Example: n = 16 S = 16/4 = 44x0+ 6x1+ 2x2- 2x3= 82x0+ 5x2- 2x3= 4- 4x0- 3x1- 5x2+ 4x3= 18x0+ 18x1- 2x2+ 3x3= 40Gaussian EliminationParallel Design Gaussian elimination with partial pivoting Processors’ asignment (row-wise) Communication needed to find a pivot row in each iteration All-reduce (pmax, prow) Communication needed to zero off (elimination) – all processors need to know all elements in pivot row Parallel elimination can be done after all processors have pivot rowP1P2P3P03/9/200911All-Reduce Communication Hypercube (Find MAX)436912754669947779999777466994777997779 999999999123Gaussian EliminationParallel Design Gaussian elimination with partial pivoting – Algorithm for i from 0 to n-1 do//find pivot rowfind max(pmax,prow) using all-reduceP(i) swaps its row (or index) with P(prow)//eliminationP(i) broadcasts its row to othersfor j from i+1 to n-1 do in parallelP(j) computes m[j]for k from i to n doP(j) computes a[j,k]end doend doend do3/9/200912Gaussian EliminationParallel Design Gaussian elimination with partial pivoting – Analysis With n processors At iteration k, Finding pivot row step takes log2n Elimination step takes (n-k)log2n + (n-k) Total time:Tp = nlog2n + n(n+1)log2n/2 + n(n+1)/2 Speedup:S = (n(n+1)/2 + n(n-1)(n+1)/3)/(nlog2n + n(n+1)log2n/2 + n(n+1)/2)= (n+1)(2n+1)/(3(3log2n+nlog2n+n+1)) Example: n = 16 S ≈ 2Gaussian EliminationParallel Design Other issues ? Another Data
View Full Document