Unformatted text preview:

Professor Hubertus Franke Programming Assignment 1 Lab 1 Linker Class CSCI GA 2250 001 Operating Systems Spring2026 In this lab you will be implementing a two pass linker In general a linker takes individually compiled code object modules and creates a single executable by resolving external symbol references e g variables and functions and module relative addressing and assigning global addresses after placing the modules object code at global addresses Rather than dealing with complex x86 tool chains that I demonstrated in class we assume a target machine with the following properties a word addressable b addressable memory of 512 words and c each valid word is represented by an integer 10000 I know that is a really strange machine but core tasks of linkers such relocation relative addressing symbol table maintenance and in particular error checking and reporting is handled in this notation as well The input to the linker is a file containing a sequence of tokens symbols and integers and instruction mode characters Don t assume tokens that make up a section to be on one line nor make assumptions about how much space separates tokens or that lines are non empty for that matter or that each input conforms syntactically Symbols always begin with alpha characters followed by optional alphanumerical characters i e a Z a Z0 9 Valid symbols can be up to 16 characters Integers are decimal based Instruction type characters are M A R I E Token delimiters are t or n The input file to the linker is structured as a series of object module definitions Each object module definition contains three parts in fixed order definition list use list and program text definition list consists of a count defcount followed by defcount pairs S R where S is the symbol being defined and R is the relative word address offset to which the symbol refers in the module 0 based counting use list consists of a count usecount followed by usecount symbols that can be referred to in this module These could include symbols defined in the definition list of any module prior or subsequent or not at all program text consists of a count codecount followed by codecount pairs addrmode instr where addrmode is a single character indicating the addressing mode as Module Absolute Relative Immediate or External instr is the instruction integer Note that codecount defines the length size of the module The instruction integer is comprised of an opcode instruction 1000 and an operand instruction 1000 The opcode always remains unchanged by the linker For the instruction valu read an integer and ensure opcode 10 see errorcodes below The operand is modified retained based on the instruction type in the program text as follows M operand is the number of a valid module and is replaced with the base address of that module A operand is an absolute address which will never be changed in pass2 however it can t be the machine size 512 R operand is a relative address in the module which is relocated by replacing the relative address with the absolute address of that relative address after the module s global address has been determined absolute addr module base relative addr I an immediate operand is unchanged but must be less than 900 E operand is an external address which is represented as an index into the uselist For example a reference in the program text with operand K represents the Kth symbol in the use list using 0 based counting e g if the use list is 2 f g then an instruction E 7000 refers to f and an instruction E 5001 refers to g You must identify to which global address the symbol is assigned and then replace the operand with that global address The linker must process the input twice that is why it is called two pass to preempt the favored question Can I do it in one pass NO because storing tokens makes your program more complex and non conformant to the specification Pass One parses the input and verifies the correct syntax and determines 1 the base address for each module and stores it in a module base table and 2 the absolute address for each defined symbol storing the latter in a symbol table The first module has base address zero the base address for module X 1 is equal to the base address of module X plus the length of module X defined as the number of instructions in a module The absolute address for symbol S defined in module M is the base address of M plus the relative address of S within M After pass one print the symbol table including errors related to it see rule2 later in order of appearance Do not store parsed tokens the only data you should and need to store between passes are the module base table the symbol table Pass Two again parses the input and uses the base addresses and the symbol table entries created in pass one to generate the actual output by relocating relative addresses and resolving external references You should reuse pass 1 parser code just with different actions You must clearly mark your two passes in the code through comments and or proper function naming While parsing you cannot store tokens not between pass one and pass two nor while parsing a single line This practices reentrant code which one encounters frequently in Operating Systems This implies you can only parse and consume one token at a time and can t run through a full line and store tokens per line or file as indicated in the strtok man page or use a C stream class To enforce this we deduct 3 pts if you do not parse and consume one token at a time Page 1 of 8 Professor Hubertus Franke Programming Assignment 1 Lab 1 Linker Class CSCI GA 2250 001 Operating Systems Spring2026 Other requirements error detection limits and space used To receive full credit you must check the input for various errors test inputs will have lots of errors All errors warnings should follow the message catalog provided below We will perform a textual difference against a reference implementation to grade your program Any reported difference will indicate a non compliance with the instructions provided and is reported as an error and results in deductions You should continue processing after encountering an error warning other than a syntax error and you should be able to detect multiple errors in the same run 3 2 4 5 1 You should stop processing if a syntax error is detected in the input print a syntax error message with the line number and the character offset in the input file where observed A syntax error is defined as a missing token e g 4 used


View Full Document

NYU CSCI-GA 2250 - (Lab 1): Linker

Loading Unlocking...
Login

Join to view (Lab 1): Linker and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view (Lab 1): Linker and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?