Berkeley COMPSCI 164 - ia32 Assembly Language - D829118

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 164> ia32 Assembly Language

DOC PREVIEW

Berkeley COMPSCI 164 - ia32 Assembly Language

School name University of California, Berkeley

Course Compsci 164- Programming Languages and Compilers

Pages 14

This preview shows page 1-2-3-4-5 out of 14 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 14 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

First Look at ia32 Assembly LanguageIn this chapter, we will take a first look at the assembly language and machine language ofthe ia32. Rather than start from scratch, we are going to ask gcc to be our tutor. What wewill do is to write some very simple C programs, and then we will ask gcc to show us theassembler code that it generates for these C programs. Then the task will be to understandwhy these assembly instructions that are generated do in fact result in the right behaviorgiven the original C program.For a first example, we will use the following C code unsigned a = 1; unsigned b = 2; unsigned c = 3; void t () { a = b + c; if (a == 4) b = 3; else c = a & b; while (a > 0) a--; }For the moment, we avoid the use of signed integers, and we avoid either passingarguments to functions or trying to return results to functions. Right, so let’s ask gcc tocompile this, and instead of generating machine language, let’s ask gcc to show us theassembly language. Normally gcc generates this assembly language in a temporary file,assembles it using the assembler into machine language, and then deletes the temporaryfile, but by using –S instead of –c, we ask gcc to simply generate the assembly language(into a file called name.s where the C program was name.c), and then we can look at thisassembly language. The exact command we use to compile, assuming that the aboveexample is stored in a file called t.c, is gcc –S t.c –fomit-frame-pointer –masm=intelHere, the switch –S asks for assembly language to be generated, as discussed above. Theswitch –fomit-frame-pointer asks gcc not to use a frame pointer. We don’t know yet whata frame pointer is, and that’s the point. We don’t want to worry about frame pointers, sothis option gets rid of them for now. The switch –masm=intel asks gcc to use Intel syntaxfor the assembly language. There are two quite different syntaxes in use for ia32 assemblylanguage. The Intel syntax is the one that Intel originally devised for this architecture. TheAT&T syntax is typically used on Unix, and is more similar to the assembly language usedby other processors. There is no particular reason technically to prefer one over the other.We choose to use the Intel syntax simply because most text books on assembly languagefor this machine use this syntax, so if you are using some auxiliary reference materials, lifewill most likely be easier using the Intel syntax.With this command line, the output of gcc is stored in file t.s and looks like:.file "t.c".intel_syntax.globl _a.data.align 4_a:.long 1.globl _b.align 4_b:.long 2.globl _c.align 4_c:.long 3.text.globl _t.def _t; .scl 2; .type 32; .endef_t:mov eax, DWORD PTR _cadd eax, DWORD PTR _bmov DWORD PTR _a, eaxcmp DWORD PTR _a, 4jne L2mov DWORD PTR _b, 3jmp L3L2:mov eax, DWORD PTR _band eax, DWORD PTR _amov DWORD PTR _c, eaxL3:L4:cmp DWORD PTR _a, 0je L5dec DWORD PTR _ajmp L4L5:retSo now let’s get busy understanding this, line by line. A general note here is that the linesthat start with a period are directions to the assembler, and are typically not part of theactual program. It’s as though we wrote down a speech for a policitian, and at the start wehad a direction saying “remember to smile and don’t snear”. We don’t expect the politicianto read these words at the start of the speech (though you never know these days ) Thedot lines are similar, typically they are not part of the program proper, but rather they aredirections to the assembler..file "t.c"The .file line simply records the name of the original C file for informational purposes. Thisis not part of the program, but can be useful for both humans and other computer tools inkeeping track of where things came from..intel_syntaxAs we discussed above, there are two different syntaxes for ia32 assembly language. Thedefault is AT&T syntax. This directive tells the assembler that the rest of the file will usethe Intel syntax..globl _aThis line is a note to the assembler that the symbol _a can possibly be referenced fromother files. The assembler will notify the linker so that the proper inter-file connections canbe made. There is no effect on the actual code generated for the program. Note that allsymbols in the original C program have an underscore appended. This avoids name clasheswith some existing symbols (at least that was historically the reason for this decision,though probably it is no longer really necessary). .dataA program is generally divided into data and code. Generally these two sections should notbe mixed up. You don’t want to execute your data as code, and you don’t want to treat yourcode as data. The .data directive tells the assembler that the following lines generate datarather than code. The assembler and linker will between them arrange to place data andcode in separate sections of memory, so that they are kept apart..align 4On the ia32, there is no requirement for data alignment. A program will work correctly withfour-byte integers regardless of where they are located. For example, a four byte integercould be located at addresses 1,2,3,4. However, the machine executes much moreefficiently if, for example, four byte integers are on a four byte boundary, so a better choiceof starting address for a four byte integer is an address that is a multiple of 4. The .aligndirective tells the assembler to bump the location counter (the location of the next data to begenerated) to the next four byte boundary. This may or may not waste space depending onthe current value. Typically the data from a given file always starts on a four byteboundary, so most likely the alignment directive has no effect in this particular case, but itis certainly harmless, and in the general case it may improve efficiency by ensuring that thevalue about to be generated after the labal is optimally aligned for the most efficientexecution._a:This is a label. It causes the symbol _a to be assigned to the address of the next data orcode to be generated. Later on we can reference this address by using this label name..long 1This is the first line in the assembler file that actually generates something. The .longdirective causes four bytes (a long word) of data to be generated, initialized to the givenvalue. Since this is a little-endian machine, the four bytes generated will contain 1, 0, 0, 0in sequence..globl _b.align 4_b: .long 2Similar declarations for the variable b, initialized to 2..globl _c.align

View Full Document