CS 6340 LLVM Primer Mayur Naik Roadmap Welcome This primer has four parts Part I Overview of LLVM Part II Structure of LLVM IR Part III The LLVM API Part IV Navigating the Documentation 2 Part I Overview of LLVM What is LLVM A modular and reusable compiler framework supporting multiple front ends and back ends Human friendly Source Code C C Java etc Compiler Binary Code x86 ARM etc Machine friendly Pre processor expand includes and defines Parser check for syntax mistakes build Abstract Syntax Tree AST IR Generator convert AST to intermediate representation IR Optimizer transform the program to an equivalent one that is more efficient Compiler Backend convert IR to target specific assembly code Assembler convert target specific assembly code to target specific machine code Linker combine multiple machine code files into single image e g executable Front End Back End focus of our course 4 Architecture of LLVM Front Ends Language specific Back Ends Architecture specific C C Clang Go Rust Gollvm rustc LLVM IR LLVM Optimizer x86 ARM MIPS C Source Code if b 0 a 0 Front End LLVM IR cmp icmp eq i32 0 0 br i1 cmp label if then label if end Back End x86 assembly CMP ECX 0 SETBZ EAX 5 LLVM Passes C source file your code here optimizer Clang frontend LLVM IR pass 1 LLVM IR pass k LLVM IR pass N LLVM IR X86 backend The LLVM Optimizer opt is a series of passes that run one after another Two kinds of passes analysis and transformation Analysis pass analyzes LLVM IR to check program properties Transformation pass transforms LLVM IR to monitor or optimize the program Analysis passes do not change code transformation passes do LLVM is typically extended by implementing new passes that look at and change the LLVM IR as it flows through the compilation process 6 Example Factorial Program Factorial 64 c Factorial ll Factorial s Access Files via Canvas Files Resources 7 Why LLVM IR Easy to translate from the level above Easy to translate to the level below Narrow interface simpler phases optimizations The IR language is independent of the source and target languages in order to maximize the compiler s ability to support multiple source and target languages Example Source language might have while for and foreach loops IR language might have only while loops and sequence Translation eliminates for and foreach 8 LLVM IR Normal Form Instead of handling AST of 1 X4 3 X1 5 Add Add Const 1 Var X4 Add Const 3 Mul Var X1 Const 5 we have to handle tmp0 1 X4 tmp1 X1 5 tmp2 3 tmp1 tmp3 tmp0 tmp2 Translation makes the order of evaluation explicit Names intermediate values Introduced temporaries are never modified 9 Generate LLVM IR Yourself C source code output LLVM IR clang factorial 64 c S emit llvm o file perform preprocessing and compilation steps only and emit textual assembly Output to target file 10 History of LLVM The LLVM project was initially developed by Vikram Adve and Chris Lattner at the University of Illinois at Urbana Champaign in 2000 Their original purpose was to develop dynamic compilation techniques for static and dynamic programming languages In 2005 Lattner entered Apple and continued to develop LLVM In 2013 LLVM initially represented Low Level Virtual Machines but as the LLVM family grew larger the original meaning was no longer applicable Today LLVM Clang comprise a total LOC of 2 5 million lines of C code 11 Where is LLVM Used Traditional C C toolchain Qualcomm Snapdragon LLVM compiler for Android Programming languages Pyston performance oriented Python implementation by LLVM Language runtime systems LLILC LLVM based NET MSIL compiler GPU Majority of OpenCL implementations based on Clang LLVM Linux FreeBSD Debian experimenting with Clang LLVM as an additional compiler Source Where is LLVM being used today https llvm org devmtg 2016 01 slides fosdem 2016 llvm pdf Contributing companies 12 Part II Structure of LLVM IR In memory binary in memory format used during compilation process Bitcode binary on disk format suitable for fast loading Obtained by clang emit llvm c factorial c o xxx bc Assembly human readable format Obtained by clang emit llvm S c factorial c o xxx ll LLVM IR Three formats Assembly ll tmp2 sub i32 a 1 tmp3 add i32 b 1 ret i32 tmp3 Compare to Java instead of class bytecode you get bc In Memory Bitcode bc 33 32 C2 D3 23 0A D1 00 C4 45 82 F2 A2 21 02 0C A5 79 E5 F6 A2 54 30 FF 14 Program Structure in LLVM IR Instruction Basic Block Function Module Module Function Basic Block Instruction 15 Program Structure in LLVM IR Module is a top level container of LLVM IR corresponding to each translation unit of the front end compiler Function is a function in a programming language including a function signature and several basic blocks The first basic block in a function is called an entry basic block Basic Block is a set of instructions that are executed sequentially with only one entry and one exit and non head and tail instructions will not jump to other instructions in the order they are executed Instruction is the smallest executable unit in LLVM IR each instruction occupies a single line 16 LLVM IR Iterators Module Function 1 Function Basic Block 1 Basic Block Instruction1 Function 2 Basic Block 2 Instruction 2 Function n Basic Block n Instruction n 17 LLVM IR Iterators Iterator types Example uses Module iterator Approach 1 using STL iterator Function iterator BasicBlock iterator Value use iterator User op iterator for Function iterator FI F begin FI F end FI for BasicBlock iterator BI FI begin BI FI end BI some operations Approach 2 using auto keyword for auto FI F begin FI F end FI for auto BI FI begin BI FI end BI some operations Approach 3 using InstIterator include llvm IR InstIterator h for inst iterator It inst begin F E inst end F It E It some operations 18 Variables and Types Two kinds of variables local and global indicates local variables 1 add nsw i32 a tmp indicates global variables g global i32 20 align 4 Two kinds of types primitive e g integer floating point and derived e g pointer struct Integer type is used to specify an integer of desired bit width i1 i32 A single bit integer A 32 bit integer Pointer type is used to specify memory locations i32 i32 i32 A pointer to a function that takes as argument a pointer to an A pointer to a pointer to an integer integer and returns an integer as result More details at https llvm org docs LangRef html type system 19 The SSA Form The Static Single Assignment SSA form requires that every variable be defined only once but
View Full Document