Carnegie Mellon Introduction to Computer Systems 15 213 18 243 spring 2009 9th Lecture Feb 10th Instructors Gregory Kesden and Markus P schel Carnegie Mellon Last Time rax Return value r8 Argument 5 rbx Callee saved r9 Argument 6 rcx Argument 4 r10 Callee saved rdx Argument 3 r11 Used for linking rsi Argument 2 r12 C Callee saved rdi Argument 1 r13 Callee saved rsp Stack pointer r14 Callee saved rbp Callee saved r15 Callee saved Carnegie Mellon Last Time Procedures x86 64 Optimizations No base frame pointer Passing arguments to functions through registers if possible Sometimes Writing into the red zone below stack pointer rtn Ptr 8 unused 16 loc 1 24 loc 0 Sometimes Function call using jmp instead of call Reason Performance use stack as little as possible while obeying rules e g caller callee save registers rsp Carnegie Mellon Last Time Arrays int val 5 1 x Nested int pgh 4 5 Multi level int univ 3 5 x 4 2 x 8 1 x 12 3 x 16 x 20 Carnegie Mellon Dynamic Nested Arrays Strength Can create matrix of any size Programming Must do index computation explicitly Performance Accessing single element costly Must do multiplication int new var matrix int n return int calloc sizeof int n n int var ele int a int i int j int n return a i n j movl 12 ebp eax movl 8 ebp edx imull 20 ebp eax addl 16 ebp eax movl edx eax 4 eax i a n i n i j Mem a 4 i n j Carnegie Mellon Dynamic Array Multiplication Per iteration Multiplies 3 2 for subscripts 1 for data Adds 4 2 for array indexing 1 for loop index 1 for data Compute element i k of variable matrix product int var prod ele int a int b int i int k int n int j int result 0 for j 0 j n j result a i n j b j n k return result a b x i th row j th column Carnegie Mellon Optimizing Dynamic Array Multiplication Optimizations int j int result 0 for j 0 j n j result a i n j b j n k return result Performed when set optimization level to O2 Code Motion Expression i n can be computed outside loop Strength Reduction 4 adds 1 mult int j int result 0 int iTn i n int jTnPk k for j 0 j n j result a iTn j b jTnPk jTnPk n return result Incrementing j has effect of incrementing j n k by n 4 adds 3 mults Operations count 4 adds 1 mult Carnegie Mellon Today Structures Alignment Unions Floating point Carnegie Mellon Structures struct rec int i int a 3 int p Memory Layout i a 0 4 p 16 20 Concept Contiguously allocated region of memory Refer to members within structure by names Members may be of different types Accessing Structure Member void set i struct rec r int val r i val IA32 Assembly eax val edx r movl eax edx Mem r val Carnegie Mellon Generating Pointer to Structure Member struct rec int i int a 3 int p r r 4 4 idx i a 0 4 p 16 20 int find a struct rec r int idx return r a idx What does it do ecx idx edx r leal 0 ecx 4 eax Will 4 idx disappear leal 4 eax edx eax r 4 idx 4 blackboard Carnegie Mellon Generating Pointer to Structure Member struct rec int i int a 3 int p Generating Pointer to Array Element Offset of each structure member determined at compile time r r 4 4 idx i a 0 4 p 16 20 int find a struct rec r int idx return r a idx ecx idx edx r leal 0 ecx 4 eax 4 idx leal 4 eax edx eax r 4 idx 4 Carnegie Mellon Structure Referencing Cont C Code struct rec int i int a 3 int p i a 0 i a void set p struct rec r r p r a r i What does it do edx r movl edx ecx leal 0 ecx 4 eax leal 4 edx eax eax movl eax 16 edx 4 p 16 20 0 4 Element i r i 4 r i r 4 4 r i Update r p 16 20 Carnegie Mellon Today Structures Alignment Unions Floating point Carnegie Mellon Alignment Aligned Data Primitive data type requires K bytes Address must be multiple of K Required on some machines advised on IA32 treated differently by IA32 Linux x86 64 Linux and Windows Motivation for Aligning Data Memory accessed by aligned chunks of 4 or 8 bytes system dependent Inefficient to load or store datum that spans quad word boundaries Virtual memory very tricky when datum spans 2 pages Compiler Inserts gaps in structure to ensure correct alignment of fields Carnegie Mellon Specific Cases of Alignment IA32 1 byte char no restrictions on address 2 bytes short lowest 1 bit of address must be 02 4 bytes int float char lowest 2 bits of address must be 002 8 bytes double Windows and most other OS s instruction sets lowest 3 bits of address must be 0002 Linux lowest 2 bits of address must be 002 i e treated the same as a 4 byte primitive data type 12 bytes long double Windows Linux lowest 2 bits of address must be 002 i e treated the same as a 4 byte primitive data type Carnegie Mellon Specific Cases of Alignment x86 64 1 byte char no restrictions on address 2 bytes short lowest 1 bit of address must be 02 4 bytes int float lowest 2 bits of address must be 002 8 bytes double char Windows Linux lowest 3 bits of address must be 0002 16 bytes long double Linux lowest 3 bits of address must be 0002 i e treated the same as a 8 byte primitive data type Carnegie Mellon Satisfying Alignment with Structures Within structure struct S1 char c int i 2 double v p Must satisfy element s alignment requirement Overall structure placement Each structure has alignment requirement K K Largest alignment of any element Initial address structure length must be multiples of K Example under Windows or x86 64 K 8 due to double element c p 0 i 0 3 bytes p 4 i 1 p 8 Multiple of 4 Multiple of 8 v 4 bytes p 16 p 24 Multiple of 8 Multiple of 8 Carnegie Mellon Different Alignment Conventions struct S1 char c int i 2 double v p x86 64 or IA32 Windows K 8 due to double element c p 0 3 bytes i 0 p 4 i 1 v 4 bytes p 8 p 16 p 24 IA32 Linux K 4 double treated like a 4 byte data type c p 0 3 bytes p 4 i 0 i 1 p 8 v p 12 p 20 Carnegie Mellon Saving Space Put large data types first struct S1 char c int i 2 double v p struct S2 …
View Full Document