Today's Big AdventureLinking as our first naming systemPerspectives on memory contentsHow is a process specified?How is a program executed?What does a process look like? (Unix)Who builds what?ExampleLinkers (Linkage editors)Simple linker: two passes neededWhere to put emitted objects?Where to put emitted objectsWhere is everything?Linker: Where is everythingExample: 2 modules and C libInitial object filesPass 1: Linker reorganizationPass 2: RelocationWhat gets written outExamining programs with nmExamining programs with objdumpTypes of relocationName manglingInitialization and destructionOther information in executablesVariation 0: Dynamic linkingVariation 1: Static shared librariesStatic shared librariesVariation 2: Dynamic shared libsPosition-independent codeLazy dynamic linkingCode = data, data = codeHow?Linking and securityLinking SummaryToday’s Big Adventure- How to name and refer to things that don’t exist yet- How to merge separate name spaces into a cohesive whole• Readings-a.out & elf man pages, ELF standard- Run “nm” or “objdump” on a few .o and a.out files.1/35Linking as our first naming system• Naming is a very deep theme that comes upeverywhere• Naming system: maps names to values• Examples:- Linking: Where is printf? How to refer to it? How to deal withsynonyms? What if it doesn’t exist?- Virtual memory address (name) resolved to physical address(value) using page table- File systems: translating file and directory names to disklocations, organizing names so you can navigate, . . .- www.stanford.edu resolved 171.67.216.17 using DNS- IP addresses resolved to Ethernet addresses with ARP- Street names: translating (elk, pine, . . . ) vs (1st, 2nd, . . . ) toactual location2/35Perspectives on memory contents• Programming language view:x += 1; add $1, %eax- Instructions: Specify operations to perform-Variables: Operands that can change over time-Constants: Operands that never change• Hardware view:-executable: code, usually read-only-read only: constants (maybe one copy for all processes)-read/write: variables (each process needs own copy)• Need addresses to use data:- Addresses locate things. Mu st update them when you move- Examples: linkers, garbage collectors, changing apartment• Binding time: When is a value determined/computed?- Early to late: Compile time, Link time, Load time, Runtime3/35How is a process specified?• Executable file: the linker/OS interface.- What is code? What is data?- Where should they live?• Linker builds executables from object files:4/35How is a program executed?• On Unix systems, read by “loader”- Reads all code/data segs into buffer cache;Maps code (read only) and initialized data (r/w) into addr space- Or. . . fakes process state to look like paged out• Lots of optimizations happen in practice:- Zero-initialized data does not need to be read in.- Demand load: wait until code used before get from disk- Copies of same program running? Share code- Multiple programs use same routines: share code (harder)5/35What does a process look like? (Unix)• Process address space divided into “segments”- text (code), data, heap (dynamic data), and stackStackCodeRead-only dataInitialized dataUninitialized dataHeapKernelregionsmmapped- Why? (1) different allocation patterns; (2) separate code/data6/35Who builds what?• Heap: allocated and laid out at runtime by malloc- Compiler, linker not involved other than saying where it can start- Namespace constructed dynamically and managed by programmer(names stored in pointers, and organized using data structures)• Stack: alloc a t runtime (proc c a ll), layout by compiler- Names are relative off of stack (or frame) pointer- Managed by compiler (alloc on proc entry, free on exit)- Linker not involved because name space entirely local:Compiler has enough information to build it.• Global data/code: alloc by compiler, layout by linker- Compiler emits them and names with symbolic references- Linker lays them out and translates references7/35Example• Simple program has “printf ("hello world\n");”• Compile w ith: cc -m32 -fno-builtin -S hello.c- -S says don’t run assembler (-m32 is 32-bit x86 code)• Output in hello.s has symbolic reference to printf.section .rodata.LC0: .string "hello world\n".text.globl mainmain: ...subl $4, %espmovl $.LC0, (%esp)callprintf• Disassemble w ith objdump -d:18: e8fc ff ff ff call 19 <main+0x19>- Jumps to PC - 4 = address of address within instruction8/35Linkers (Linkage editors)• Unix: ld- Usually hidden behind compiler- Run gcc -v hello.c to see ld or invoked (may see collect2)• Three functions:- Collect together all pieces of a program- Coalesce like segments- Fix addresses of code and data so the program can run• Result: runnable program stored in new object file• Why can’t compiler do this?- Limited world view: sees one file, rather than all files• Usually linkers don’t rearrange segments, but can- E.g., re-order instructions for fewer cache misses;remove routines that are never called from a.out9/35Simple linker: two passes needed• Pass 1:- Coalesce like segments; arrange in non-overlapping mem.- Read file’s symbol table, construct global symbol table withentry for every symbol used or defined- Compute virtual address of each segment (at start+offset)• Pass 2:- Patch references using file and global symbol table- Emit result• Symbol table: information about program kept whilelinker running- Segments: name, size, old location, new location- Symbols: name, input segment, offset within segment10/35Where to put emitted objects?• Assember:- Doesn’t know where data/code should beplaced in the process’s address space- Assumes everything starts at zero- Emitssymbol table that holds the name andoffset of each created object- Routines/variables exported by file arerecorded asglobal definitions• Simpler perspective:- Code is in a big char array- Data is in another big char array- Assembler creates (object name, index) tuplefor each interesting thing- Linker then merges all of these arrays0 foo:call printfret40 bar:...retfoo: 0: Tbar: 40: t11/35Where to put emitted objects• At link time, linker- Determines the size of each segment and the resulting addressto place each object at- Stores all global definitions in a global symbol table that mapsthe definition to its final virtual address12/35Where is everything?• How to call procedures or reference variables?- E.g., call to printf needs a target
View Full Document