Unformatted text preview:

Carnegie Mellon Program Op miza on 15 213 Introduc0on to Computer Systems 25th Lecture Nov 23 2010 Instructors Randy Bryant and Dave O Hallaron 1 Carnegie Mellon Today Overview Generally Useful Op miza ons Code mo0on precomputa0on Strength reduc0on Sharing of common subexpressions Removing unnecessary procedure calls Op miza on Blockers Procedure calls Memory aliasing Exploi ng Instruc on Level Parallelism Dealing with Condi onals 2 Carnegie Mellon Performance Reali es There s more to performance than asympto1c complexity Constant factors maHer too Easily see 10 1 performance range depending on how code is wriQen Must op0mize at mul0ple levels algorithm data representa0ons procedures and loops Must understand system to op mize performance How programs are compiled and executed How to measure program performance and iden0fy boQlenecks How to improve performance without destroying code modularity and generality 3 Carnegie Mellon Op mizing Compilers Provide e cient mapping of program to machine register alloca0on code selec0on and ordering scheduling dead code elimina0on elimina0ng minor ine ciencies Don t usually improve asympto c e ciency up to programmer to select best overall algorithm big O savings are oWen more important than constant factors but constant factors also maQer Have di culty overcoming op miza on blockers poten0al memory aliasing poten0al procedure side e ects 4 Carnegie Mellon Limita ons of Op mizing Compilers Operate under fundamental constraint Must not cause any change in program behavior OWen prevents it from making op0miza0ons when would only a ect behavior under pathological condi0ons Behavior that may be obvious to the programmer can be obfuscated by languages and coding styles e g Data ranges may be more limited than variable types suggest Most analysis is performed only within procedures Whole program analysis is too expensive in most cases Most analysis is based only on sta1c informa on Compiler has di culty an0cipa0ng run 0me inputs When in doubt the compiler must be conserva ve 5 Carnegie Mellon Generally Useful Op miza ons Op miza ons that you or the compiler should do regardless of processor compiler Code Mo on Reduce frequency with which computa0on performed If it will always produce same result Especially moving code out of loop void set row double a double b long i long n long j for j 0 j n j a n i j b j long j int ni n i for j 0 j n j a ni j b j 6 Carnegie Mellon Compiler Generated Code Mo on void set row double a double b long i long n long j for j 0 j n j a n i j b j long j long ni n i double rowp a ni for j 0 j n j rowp b j Where are the FP operations set row testq jle movq imulq leaq movl rcx rcx L4 rcx rax rdx rax rdi rax 8 rdx 0 r8d movq movq addq addq cmpq jg rsi r8 8 rax rax rdx 1 r8 8 rdx r8 rcx L3 L3 L4 rep ret Test n If 0 goto done rax n rax i rowp A n i 8 j 0 loop t b j rowp t j rowp Compare n j If goto loop done 7 Carnegie Mellon Reduc on in Strength Replace costly opera0on with simpler one ShiW add instead of mul0ply or divide 16 x x 4 U0lity machine dependent Depends on cost of mul0ply or divide instruc0on On Intel Nehalem integer mul0ply requires 3 CPU cycles Recognize sequence of products for i 0 i n i for j 0 j n j a n i j b j int ni 0 for i 0 i n i for j 0 j n j a ni j b j ni n 8 Carnegie Mellon Share Common Subexpressions Reuse por0ons of expressions Compilers oWen not very sophis0cated in exploi0ng arithme0c proper0es Sum neighbors of i j up val i 1 n j down val i 1 n j left val i n j 1 right val i n j 1 sum up down left right 3 multiplications i n i 1 n i 1 n leaq leaq imulq imulq imulq addq addq addq 1 rsi rax 1 rsi r8 rcx rsi rcx rax rcx r8 rdx rsi rdx rax rdx r8 i 1 i 1 i n i 1 n i 1 n i n j i 1 n j i 1 n j long inj i n j up val inj n down val inj n left val inj 1 right val inj 1 sum up down left right 1 multiplication i n imulq addq movq subq leaq rcx rsi i n rdx rsi i n j rsi rax i n j rcx rax i n j n rsi rcx rcx i n j n 9 Carnegie Mellon Op miza on Blocker 1 Procedure Calls Procedure to Convert String to Lower Case void lower char s int i for i 0 i strlen s i if s i A s i Z s i A a Extracted from 213 lab submissions Fall 1998 10 Carnegie Mellon Lower Case Conversion Performance Time quadruples when double string length Quadra0c performance lower 200 180 CPU seconds 160 140 120 100 80 60 40 20 0 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 String length 11 Carnegie Mellon Convert Loop To Goto Form void lower char s int i 0 if i strlen s goto done loop if s i A s i Z s i A a i if i strlen s goto loop done strlen executed every itera0on 12 Carnegie Mellon Calling Strlen My version of strlen size t strlen const char s size t length 0 while s 0 s length return length Strlen performance Only way to determine length of string is to scan its en0re length looking for null character Overall performance string of length N N calls to strlen Require 0mes N N 1 N 2 1 Overall O N2 performance 13 Carnegie Mellon Improving Performance void lower char s int i int len strlen s for i 0 i len i if s i A s i Z s i A a Move call to strlen outside of loop Since result does not change from one itera0on to another Form of code mo0on 14 Carnegie Mellon Lower Case Conversion Performance Time doubles when double string length Linear performance of lower2 200 180 CPU seconds 160 140 120 lower 100 80 60 40 20 lower2 0 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 String length 15 Carnegie Mellon Op miza on Blocker Procedure Calls Why couldn t compiler move strlen out of inner loop Procedure may have side e ects Alters global state each 0me called Func0on may not return same value for given arguments Depends on other parts of global state Procedure lower could interact with strlen Warning Compiler treats procedure call as a black box Weak op0miza0ons near them int lencnt 0 Remedies size t strlen const char s Use of inline func0ons size t length 0 while s 0 s length lencnt length return length GCC does this with O2 See web aside …


View Full Document

UT CS 429H - Program Optimization

Loading Unlocking...
Login

Join to view Program Optimization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Program Optimization and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?