Carnegie MellonIntroduction to Computer Systems15-213/18-243, spring 20099thLecture, Feb. 10thInstructors:Gregory Kesden and Markus PüschelCarnegie Mellon%rax%rbx%rcx%rdx%rsi%rdi%rsp%rbpLast Time%r8%r9%r10%r11%r12%r13%r14%r15Callee savedCallee savedCallee savedCallee savedC: Callee savedCallee savedCallee savedStack pointerUsed for linkingReturn valueArgument #4Argument #1Argument #3Argument #2Argument #6Argument #5Carnegie MellonLast Time Procedures (x86-64): Optimizations No base/frame pointer Passing arguments to functions through registers (if possible) Sometimes: Writing into the “red zone” (below stack pointer) Sometimes: Function call using jmp (instead of call) Reason: Performance use stack as little as possible while obeying rules (e.g., caller/callee save registers)rtn Ptrunused%rsp−8loc[1]loc[0]−16−24Carnegie MellonLast Time Arrays Nested Multi-levelint val[5];1 5 2 1 3xx + 4 x + 8 x + 12 x + 16 x + 20int pgh[4][5];int *univ[3]Carnegie MellonDynamic Nested Arrays Strength Can create matrix of any size Programming Must do index computation explicitly Performance Accessing single element costly Must do multiplicationint * new_var_matrix(int n){return (int *) calloc(sizeof(int), n*n);}int var_ele(int *a, int i, int j, int n){return a[i*n+j];}movl 12(%ebp),%eax # imovl 8(%ebp),%edx # aimull 20(%ebp),%eax # n*iaddl 16(%ebp),%eax # n*i+jmovl (%edx,%eax,4),%eax # Mem[a+4*(i*n+j)]Carnegie MellonDynamic Array Multiplication Per iteration: Multiplies: 3 2 for subscripts 1 for data Adds: 4 2 for array indexing 1 for loop index 1 for data/* Compute element i,k ofvariable matrix product */int var_prod_ele(int *a, int *b,int i, int k, int n){int j;int result = 0;for (j = 0; j < n; j++)result +=a[i*n+j] * b[j*n+k];return result;}a bi-th rowj-th columnxCarnegie MellonOptimizing Dynamic Array Multiplication Optimizations Performed when set optimization level to -O2 Code Motion Expression i*n can be computed outside loop Strength Reduction Incrementing j has effect of incrementing j*n+k by n Operations count 4 adds, 1 mult{int j;int result = 0;for (j = 0; j < n; j++)result +=a[i*n+j] * b[j*n+k];return result;}{int j;int result = 0;int iTn = i*n;int jTnPk = k;for (j = 0; j < n; j++) {result +=a[iTn+j] * b[jTnPk];jTnPk += n;}return result;}4 adds, 3 mults4 adds, 1 multCarnegie MellonToday Structures Alignment Unions Floating pointCarnegie Mellonstruct rec {int i;int a[3];int *p;};IA32 Assembly# %eax = val# %edx = rmovl %eax,(%edx) # Mem[r] = valvoid set_i(struct rec *r,int val){r->i = val;}Structures Concept Contiguously-allocated region of memory Refer to members within structure by names Members may be of different types Accessing Structure MemberMemory Layouti a p0 4 1620Carnegie Mellon# %ecx = idx# %edx = rleal 0(,%ecx,4),%eax # 4*idxleal 4(%eax,%edx),%eax # r+4*idx+4int *find_a(struct rec *r, int idx){return &r->a[idx];}Generating Pointer to Structure Memberstruct rec {int i;int a[3];int *p;};i a p0 4 1620r+4+4*idxrWill disappearblackboard?What does it do?Carnegie Mellon# %ecx = idx# %edx = rleal 0(,%ecx,4),%eax # 4*idxleal 4(%eax,%edx),%eax # r+4*idx+4int *find_a(struct rec *r, int idx){return &r->a[idx];}Generating Pointer to Structure Member Generating Pointer to Array Element Offset of each structure member determined at compile timestruct rec {int i;int a[3];int *p;};i a p0 4 1620r+4+4*idxrCarnegie Mellonstruct rec {int i;int a[3];int *p;};# %edx = rmovl (%edx),%ecx # r->ileal 0(,%ecx,4),%eax # 4*(r->i)leal 4(%edx,%eax),%eax # r+4+4*(r->i)movl %eax,16(%edx) # Update r->pvoid set_p(struct rec *r){r->p =&r->a[r->i];}Structure Referencing (Cont.) C Codei a p0 4 1620i a0 4 1620Element iWhat does it do?Carnegie MellonToday Structures Alignment Unions Floating pointCarnegie MellonAlignment Aligned Data Primitive data type requires K bytes Address must be multiple of K Required on some machines; advised on IA32 treated differently by IA32 Linux, x86-64 Linux, and Windows! Motivation for Aligning Data Memory accessed by (aligned) chunks of 4 or 8 bytes (system dependent) Inefficient to load or store datum that spans quad word boundaries Virtual memory very tricky when datum spans 2 pages Compiler Inserts gaps in structure to ensure correct alignment of fieldsCarnegie MellonSpecific Cases of Alignment (IA32) 1 byte: char, … no restrictions on address 2 bytes: short, … lowest 1 bit of address must be 02 4 bytes: int, float, char *, … lowest 2 bits of address must be 002 8 bytes: double, … Windows (and most other OS’s & instruction sets): lowest 3 bits of address must be 0002 Linux: lowest 2 bits of address must be 002 i.e., treated the same as a 4-byte primitive data type 12 bytes: long double Windows, Linux: lowest 2 bits of address must be 002 i.e., treated the same as a 4-byte primitive data typeCarnegie MellonSpecific Cases of Alignment (x86-64) 1 byte: char, … no restrictions on address 2 bytes: short, … lowest 1 bit of address must be 02 4 bytes: int, float, … lowest 2 bits of address must be 002 8 bytes: double, char *, … Windows & Linux: lowest 3 bits of address must be 0002 16 bytes: long double Linux: lowest 3 bits of address must be 0002 i.e., treated the same as a 8-byte primitive data typeCarnegie Mellonstruct S1 {char c;int i[2];double v;} *p;Satisfying Alignment with Structures Within structure: Must satisfy element’s alignment requirement Overall structure placement Each structure has alignment requirement K K = Largest alignment of any element Initial address & structure length must be multiples of K Example (under Windows or x86-64): K = 8, due to double elementc i[0] i[1] v3 bytes 4 bytesp+0 p+4 p+8 p+16 p+24Multiple of 4 Multiple of 8Multiple of 8 Multiple of 8Carnegie MellonDifferent Alignment Conventions x86-64 or IA32 Windows: K = 8, due to double element IA32 Linux K = 4; double treated like a 4-byte data typec i[0] i[1] v3 bytes 4 bytesp+0 p+4 p+8 p+16 p+24c i[0] i[1] v3 bytesp+0 p+4 p+8 p+12 p+20struct S1 {char c;int i[2];double v;} *p;Carnegie MellonSaving Space Put large data types first Effect (example x86-64, both have K=8)c i[0] i[1] v3 bytes 4 bytesp+0 p+4 p+8 p+16 p+24struct S1 {char c;int i[2];double v;} *p;struct S2
View Full Document