NYU CSCI-UA 0201 - First Look at ia32 Assembly Language

Unformatted text preview:

First Look at ia32 Assembly LanguageIn this chapter, we will take a first look at the assembly language and machine languageof the ia32. Rather than start from scratch, we are going to ask gcc to be our tutor. Whatwe will do is to write some very simple C programs, and then we will ask gcc to show usthe assembler code that it generates for these C programs. Then the task will be tounderstand why these assembly instructions that are generated do in fact result in theright behavior given the original C program.For a first example, we will use the following C code unsigned a = 1; unsigned b = 2; unsigned c = 3; void t () { a = b + c; if (a == 4) b = 3; else c = a & b; while (a > 0) a--; }For the moment, we avoid the use of signed integers, and we avoid either passingarguments to functions or trying to return results to functions. Right, so let’s ask gcc tocompile this, and instead of generating machine language, let’s ask gcc to show us theassembly language. Normally gcc generates this assembly language in a temporary file,assembles it using the assembler into machine language, and then deletes the temporaryfile, but by using –S instead of –c, we ask gcc to simply generate the assembly language(into a file called name.s where the C program was name.c), and then we can look at thisassembly language. The exact command we use to compile, assuming that the aboveexample is stored in a file called t.c, is gcc –S t.c –fomit-frame-pointer –masm=intelHere, the switch –S asks for assembly language to be generated, as discussed above. Theswitch –fomit-frame-pointer asks gcc not to use a frame pointer. We don’t know yetwhat a frame pointer is, and that’s the point. We don’t want to worry about framepointers, so this option gets rid of them for now. The switch –masm=intel asks gcc to useIntel syntax for the assembly language. There are two quite different syntaxes in use foria32 assembly language. The Intel syntax is the one that Intel originally devised for thisarchitecture. The AT&T syntax is typically used on Unix, and is more similar to theassembly language used by other processors. There is no particular reason technically toprefer one over the other. We choose to use the Intel syntax simply because most textbooks on assembly language for this machine use this syntax, so if you are using someauxiliary reference materials, life will most likely be easier using the Intel syntax.With this command line, the output of gcc is stored in file t.s and looks like:.file "t.c".intel_syntax.globl _a.data.align 4_a:.long 1.globl _b.align 4_b:.long 2.globl _c.align 4_c:.long 3.text.globl _t.def _t; .scl 2; .type 32; .endef_t:mov eax, DWORD PTR _cadd eax, DWORD PTR _bmov DWORD PTR _a, eaxcmp DWORD PTR _a, 4jne L2mov DWORD PTR _b, 3jmp L3L2:mov eax, DWORD PTR _band eax, DWORD PTR _amov DWORD PTR _c, eaxL3:L4:cmp DWORD PTR _a, 0je L5dec DWORD PTR _ajmp L4L5:retSo now let’s get busy understanding this, line by line. A general note here is that the linesthat start with a period are directions to the assembler, and are typically not part of theactual program. It’s as though we wrote down a speech for a politician, and at the start wehad a direction saying “remember to smile and don’t sneer”. We don’t expect thepolitician to read these words at the start of the speech (though you never know thesedays ) The dot lines are similar, typically they are not part of the program proper, butrather they are directions to the assembler..file "t.c"The .file line simply records the name of the original C file for informational purposes.This is not part of the program, but can be useful for both humans and other computertools in keeping track of where things came from..intel_syntaxAs we discussed above, there are two different syntaxes for ia32 assembly language. Thedefault is AT&T syntax. This directive tells the assembler that the rest of the file will usethe Intel syntax..globl _aThis line is a note to the assembler that the symbol _a can possibly be referenced fromother files. The assembler will notify the linker so that the proper inter-file connectionscan be made. There is no effect on the actual code generated for the program. Note thatall symbols in the original C program have an underscore appended. This avoids nameclashes with some existing symbols (at least that was historically the reason for thisdecision, though probably it is no longer really necessary)..dataA program is generally divided into data and code. Generally these two sections shouldnot be mixed up. You don’t want to execute your data as code, and you don’t want totreat your code as data. The .data directive tells the assembler that the following linesgenerate data rather than code. The assembler and linker will between them arrange toplace data and code in separate sections of memory, so that they are kept apart..align 4On the ia32, there is no requirement for data alignment. A program will work correctlywith four-byte integers regardless of where they are located. For example, a four byteinteger could be located at addresses 1,2,3,4. However, the machine executes much moreefficiently if, for example, four byte integers are on a four byte boundary, so a betterchoice of starting address for a four byte integer is an address that is a multiple of 4. The.align directive tells the assembler to bump the location counter (the location of the nextdata to be generated) to the next four byte boundary. This may or may not waste spacedepending on the current value. Typically the data from a given file always starts on afour byte boundary, so most likely the alignment directive has no effect in this particularcase, but it is certainly harmless, and in the general case it may improve efficiency byensuring that the value about to be generated after the label is optimally aligned for themost efficient execution._a:This is a label. It causes the symbol _a to be assigned to the address of the next data orcode to be generated. Later on we can reference this address by using this label name..long 1This is the first line in the assembler file that actually generates something. The .longdirective causes four bytes (a long word) of data to be generated, initialized to the givenvalue. Since this is a little-endian machine, the four bytes generated will contain 1, 0, 0, 0in sequence..globl _b.align 4_b:.long 2Similar declarations for the variable b, initialized to 2..globl _c.align


View Full Document

NYU CSCI-UA 0201 - First Look at ia32 Assembly Language

Documents in this Course
Load more
Download First Look at ia32 Assembly Language
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view First Look at ia32 Assembly Language and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view First Look at ia32 Assembly Language 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?