General Info:
To compile a C program, use the following command:
gcc -g -c -o drv.o drv.c
gcc is the compiler command. -g tells the compiler to prepare for the debugger. -c says to compile, but do not link. -o tells the compiler to rename the output to the following string. So this command is compiling the file drv.c into an object file but not linking it, readying it for debugging, and renaming the output to drv.o.
To assemble an assembly program, use the following command:
as -g -o asm.o asm.s
Similarly, as is the assembler command. -g prepares it for debugging, and -o renames the object file. So this command assembles asm.s, readies it for debugging, and renames the output to asm.o.
You can link those outputs using:
gcc -g -o myprog asm.o drv.o
This uses the gcc command to link two object files, ready it for debugging, and renames the output executable to myprog.
Or, you can do it the easy way:
gcc -g -o myprog drv.c asm.s
This does everything in one step. It compiles, links, and readies for debugging both files, puts the output executable in myprog.
You can change input/output names as needed, but C programs need a .c extension, and Assembly programs need a .s extension.
If these commands work, youll get dropped back to the command line with no output. If there are errors, theyll be listed for you. You can either try to fix errors by reading through the code, or you can run the debugger.
You run the program itself by typing either the name of the program (myprog from above) and hitting enter, or typing ./myprog, which tells the terminal to look in the current directory (./) for the executable name (myprog).
To run the debugger, type "gdb myprog" without quotes and substituting whatever program name you used. From there, youll be brought to another prompt for the debugger. To run the program as normal from here, type "run". You can set break points by typing "break [line number]". You can step through the code by typing "step". There are a bunch more options here as well, which you can read about by typing "help" or getting more information at one of the links at the beginning of this email. Typing "quit" exits the debugger. Stepping through the program can be really useful when trying to locate the place where a control structure acted incorrectly.
gcc -g -c -o drv.o drv.c
gcc is the compiler command. -g tells the compiler to prepare for the debugger. -c says to compile, but do not link. -o tells the compiler to rename the output to the following string. So this command is compiling the file drv.c into an object file but not linking it, readying it for debugging, and renaming the output to drv.o.
To assemble an assembly program, use the following command:
as -g -o asm.o asm.s
Similarly, as is the assembler command. -g prepares it for debugging, and -o renames the object file. So this command assembles asm.s, readies it for debugging, and renames the output to asm.o.
You can link those outputs using:
gcc -g -o myprog asm.o drv.o
This uses the gcc command to link two object files, ready it for debugging, and renames the output executable to myprog.
Or, you can do it the easy way:
gcc -g -o myprog drv.c asm.s
This does everything in one step. It compiles, links, and readies for debugging both files, puts the output executable in myprog.
You can change input/output names as needed, but C programs need a .c extension, and Assembly programs need a .s extension.
If these commands work, youll get dropped back to the command line with no output. If there are errors, theyll be listed for you. You can either try to fix errors by reading through the code, or you can run the debugger.
You run the program itself by typing either the name of the program (myprog from above) and hitting enter, or typing ./myprog, which tells the terminal to look in the current directory (./) for the executable name (myprog).
To run the debugger, type "gdb myprog" without quotes and substituting whatever program name you used. From there, youll be brought to another prompt for the debugger. To run the program as normal from here, type "run". You can set break points by typing "break [line number]". You can step through the code by typing "step". There are a bunch more options here as well, which you can read about by typing "help" or getting more information at one of the links at the beginning of this email. Typing "quit" exits the debugger. Stepping through the program can be really useful when trying to locate the place where a control structure acted incorrectly.
Lab2:
Lab 2 introduces simple assignments, variables, registers, and instructions. The eventual goal here is to be able to implement C commands such as "a = ((b + c) - (d + e)) - 10;" in Assembly. For each program youll be working on, you will be given a piece of C driver code, and you will have to implement an Assembly function that implements the a C function described in the assignment.
In most computer programs, you deal with memory locations that store data (variables) and instructions that manipulate that data. In Assembly, you also deal with registers. You can store values in registers, as well as use registers in computations. Registers are fast, temporary variables located within the microprocessor itself, where normal variables exist in memory. Theres a limited number of registers that can be used, but if those can be used, it is desirable due to speed. You cant just use memory variables either. The 8086 processor limits the use of memory variables to a MAXIMUM of ONE memory variable per computation. Im going to repeat that for emphasis: you cannot use two memory variables in a computation. You must use
AT LEAST one register or a constant.
In Intel 8086 assembly language, you can only do one computation at a time, and you can only use one memory variable per computation. So the command from above, "a = ((b + c) - (d + e)) - 10;" cannot be implemented in a single line. In fact, Assembly is so low level that this takes many lines.
For the processor were using, there are four general purpose registers: A, B, C, and D. That previous line is implemented in Assembly like so:
.comm a, 4
.comm b, 4
.comm c, 4
.comm d, 4
.comm e, 4
.text
movl b,%eax ; move b into register ax
addl c,%eax ; add c to register ax
movl d,%ebx ; move d into register bx
addl e,%ebx ; add e to register bx
subl %ebx,%eax ; subtract register bx from register ax
subl $10,%eax ; subtract 10 from register ax
movl %eax,a ; move register ax out to a
Lets go through this bit by bit. Like in C, you need to declare variables before you can use them. There are several assembler directives you can use, depending on what youre trying to do, but the most common one (and the only one we should need this semester) is the .comm command. It takes two arguments. The first is the variable name, and the second is the number of bytes the variables is. Note, there is no type information. The command format for most of what well be doing is "instruction firstarg, secondarg".
A note here about writing in assembly: there is no symbol to end a line of code (in C, you end all lines with a ";" ). To comment your code, multi-line comments use the C-Style /*comment goes here*/, and a single line comment should be able to use either ";" or "#", but I know for a fact that "#" works.
The above commands do not initialize the data either. If you want to initialize your variables, you can use a command like "b: .int 10", which declares an integer b and initializes it to 10. More examples can be found in the manual. So all those .comm commands up above declare variables a,b,c,d, and e of size 4 bytes (integers). For most of the programs Ill be sending you, there should be sections at the bottom labeled "declare variables here." That is where you should put the declarations.
The next bit is arithmetic operators. As you may have noticed, assembly doesnt use standard symbols (+,-,*,/,etc). Each operation has its own instruction. Addition uses the add instruction, subtraction uses sub, multiplication mul, and division div. The closest thing to an "=" is the move command, or mov. It allows you to move the value from one register/variable to another, essentially setting them equal. However, there are suffixes that are attached to each of these depending on the size of the data being operated on. Each piece of data being used above is 4 bytes, which is a "long-word". So each command has an "l" attached to the end of it. For 2 byte data, you would attach a "w" for "word". For 1 byte data, you would use "b" for "byte".
The mov instruction is formatted thus (for 4 byte data): "movl source, destination". This is the equivalent of "destination = source". Only one of the source or destination can be a variable, so the other must most likely be a register. One use of this command is to move data between variables and registers.
Arithmetic operands always operate on a value. So "addl c, %eax" is the equivalent of "%eax = %eax + c" or "%eax += c". For these math instructions, one of the arguments will always be the destination.
The add command is used thus (for 4 byte data): "addl source, destination". This is equivalent to "destination += source".
Similarly, subtraction is "subl source, destination" which is the equivalent of "destination -= source". This is the most complicated you can get with math commands, one addition/subtraction/equals per line.
I mentioned earlier that there are registers that we can use. The general purpose registers are A, B, C, and D. You can access these registers by saying %eax, %ebx, %ecx, and %edx, respectively. These are the 4 Byte registers. However, we can operate on several different sizes of data. 4 byte, 2 byte, and 1 byte. The 2 Byte registers are similarly accessed using %ax, %bx, %cx, and %dx. These are the lower 16 bits of the corresponding 32 bit registers. They are NOT separate. Each of those 16 bit registers can be thought of as a pair of 8 bit registers. So %ax is split into %ah and %al (a, high and low bytes), %bx into %bh, %bl, then you have %ch, %cl, %dh, %dl similarly. These are NOT different registers, they just allow you to access smaller chunks of the larger registers. So, for instance, "addb $2, %al" is an 8 bit instruction, and "addw $2, %ax" is a 16 bit instruction.
Hopefully, by now you can look at the C code and Assembly equivalents above and figure out how they are equivalent. Note, $10 denotes a constant number 10.
Multiplication and division get a little bit more complicated, however. If you take two 32 bit numbers and multiply them, what do you get? A 64 bit number. So you would need two registers to store the result. The multiplication operand well be using is "mul". This takes ONE argument. That argument is multiplied against the A register (its ALWAYS the A register). The result is placed in the A register (and if the number is big enough, the D register). Two 32 bit numbers being multiplied together will put the lower 32 bits of the result into the A register and the upper 32 bits of the result into the D register. To multiply two values, you must first move one of them into the A register. For an 8bit multiplication, the result goes in %ah:%al. For 16 bits, %dx:%ax. 32, %edx:%eax.
Formulas:
mulb X8bit ? %ax = %al * X 8bit
mulw X16bit ? %dx:%ax = %ax * X 16bit
mull X32bit ? %edx:%eax = %eax * X 32bit
Those formulas show the results of each possible operand. For instance, if you use "mull some32bitnumber", the result gets put in the concatenation of %edx:%eax, and that result is %eax * that32bitnumber.
Divide is similar. It takes one operand, and the source and destination registers are chosen automatically. Again, the operation is performed against the A register. However, it also pulls in the D register. So for a 32 bit division, youre actually dividing the 64bit %edx:%eax by a 32 bit number. The 32 bit result gets placed in %eax, and the remainder gets put in %edx. Note, before dividing, you need to zero the second register (D), unless you have something there that you want. If youre storing a totally different value in the D register, it could throw off your division.
Division formulas:
divb X8bit ?%al = %ax / X 8bit , %ah = remainder
divw X16bit ?%ax = %dx:%ax / X 16bit , %dx = remainder
divl X32bit ?%eax = %edx:%eax / X 32bit , %edx = remainder
Example use of formula: If you use the command "divl some32bitnumber", the result that is placed in the register %eax is the concatenation %edx:%eax divided by that32bitnumber. %edx gets the remainder of the division. Note, the remainder is how you do modulus.
In most computer programs, you deal with memory locations that store data (variables) and instructions that manipulate that data. In Assembly, you also deal with registers. You can store values in registers, as well as use registers in computations. Registers are fast, temporary variables located within the microprocessor itself, where normal variables exist in memory. Theres a limited number of registers that can be used, but if those can be used, it is desirable due to speed. You cant just use memory variables either. The 8086 processor limits the use of memory variables to a MAXIMUM of ONE memory variable per computation. Im going to repeat that for emphasis: you cannot use two memory variables in a computation. You must use
AT LEAST one register or a constant.
In Intel 8086 assembly language, you can only do one computation at a time, and you can only use one memory variable per computation. So the command from above, "a = ((b + c) - (d + e)) - 10;" cannot be implemented in a single line. In fact, Assembly is so low level that this takes many lines.
For the processor were using, there are four general purpose registers: A, B, C, and D. That previous line is implemented in Assembly like so:
.comm a, 4
.comm b, 4
.comm c, 4
.comm d, 4
.comm e, 4
.text
movl b,%eax ; move b into register ax
addl c,%eax ; add c to register ax
movl d,%ebx ; move d into register bx
addl e,%ebx ; add e to register bx
subl %ebx,%eax ; subtract register bx from register ax
subl $10,%eax ; subtract 10 from register ax
movl %eax,a ; move register ax out to a
Lets go through this bit by bit. Like in C, you need to declare variables before you can use them. There are several assembler directives you can use, depending on what youre trying to do, but the most common one (and the only one we should need this semester) is the .comm command. It takes two arguments. The first is the variable name, and the second is the number of bytes the variables is. Note, there is no type information. The command format for most of what well be doing is "instruction firstarg, secondarg".
A note here about writing in assembly: there is no symbol to end a line of code (in C, you end all lines with a ";" ). To comment your code, multi-line comments use the C-Style /*comment goes here*/, and a single line comment should be able to use either ";" or "#", but I know for a fact that "#" works.
The above commands do not initialize the data either. If you want to initialize your variables, you can use a command like "b: .int 10", which declares an integer b and initializes it to 10. More examples can be found in the manual. So all those .comm commands up above declare variables a,b,c,d, and e of size 4 bytes (integers). For most of the programs Ill be sending you, there should be sections at the bottom labeled "declare variables here." That is where you should put the declarations.
The next bit is arithmetic operators. As you may have noticed, assembly doesnt use standard symbols (+,-,*,/,etc). Each operation has its own instruction. Addition uses the add instruction, subtraction uses sub, multiplication mul, and division div. The closest thing to an "=" is the move command, or mov. It allows you to move the value from one register/variable to another, essentially setting them equal. However, there are suffixes that are attached to each of these depending on the size of the data being operated on. Each piece of data being used above is 4 bytes, which is a "long-word". So each command has an "l" attached to the end of it. For 2 byte data, you would attach a "w" for "word". For 1 byte data, you would use "b" for "byte".
The mov instruction is formatted thus (for 4 byte data): "movl source, destination". This is the equivalent of "destination = source". Only one of the source or destination can be a variable, so the other must most likely be a register. One use of this command is to move data between variables and registers.
Arithmetic operands always operate on a value. So "addl c, %eax" is the equivalent of "%eax = %eax + c" or "%eax += c". For these math instructions, one of the arguments will always be the destination.
The add command is used thus (for 4 byte data): "addl source, destination". This is equivalent to "destination += source".
Similarly, subtraction is "subl source, destination" which is the equivalent of "destination -= source". This is the most complicated you can get with math commands, one addition/subtraction/equals per line.
I mentioned earlier that there are registers that we can use. The general purpose registers are A, B, C, and D. You can access these registers by saying %eax, %ebx, %ecx, and %edx, respectively. These are the 4 Byte registers. However, we can operate on several different sizes of data. 4 byte, 2 byte, and 1 byte. The 2 Byte registers are similarly accessed using %ax, %bx, %cx, and %dx. These are the lower 16 bits of the corresponding 32 bit registers. They are NOT separate. Each of those 16 bit registers can be thought of as a pair of 8 bit registers. So %ax is split into %ah and %al (a, high and low bytes), %bx into %bh, %bl, then you have %ch, %cl, %dh, %dl similarly. These are NOT different registers, they just allow you to access smaller chunks of the larger registers. So, for instance, "addb $2, %al" is an 8 bit instruction, and "addw $2, %ax" is a 16 bit instruction.
Hopefully, by now you can look at the C code and Assembly equivalents above and figure out how they are equivalent. Note, $10 denotes a constant number 10.
Multiplication and division get a little bit more complicated, however. If you take two 32 bit numbers and multiply them, what do you get? A 64 bit number. So you would need two registers to store the result. The multiplication operand well be using is "mul". This takes ONE argument. That argument is multiplied against the A register (its ALWAYS the A register). The result is placed in the A register (and if the number is big enough, the D register). Two 32 bit numbers being multiplied together will put the lower 32 bits of the result into the A register and the upper 32 bits of the result into the D register. To multiply two values, you must first move one of them into the A register. For an 8bit multiplication, the result goes in %ah:%al. For 16 bits, %dx:%ax. 32, %edx:%eax.
Formulas:
mulb X8bit ? %ax = %al * X 8bit
mulw X16bit ? %dx:%ax = %ax * X 16bit
mull X32bit ? %edx:%eax = %eax * X 32bit
Those formulas show the results of each possible operand. For instance, if you use "mull some32bitnumber", the result gets put in the concatenation of %edx:%eax, and that result is %eax * that32bitnumber.
Divide is similar. It takes one operand, and the source and destination registers are chosen automatically. Again, the operation is performed against the A register. However, it also pulls in the D register. So for a 32 bit division, youre actually dividing the 64bit %edx:%eax by a 32 bit number. The 32 bit result gets placed in %eax, and the remainder gets put in %edx. Note, before dividing, you need to zero the second register (D), unless you have something there that you want. If youre storing a totally different value in the D register, it could throw off your division.
Division formulas:
divb X8bit ?%al = %ax / X 8bit , %ah = remainder
divw X16bit ?%ax = %dx:%ax / X 16bit , %dx = remainder
divl X32bit ?%eax = %edx:%eax / X 32bit , %edx = remainder
Example use of formula: If you use the command "divl some32bitnumber", the result that is placed in the register %eax is the concatenation %edx:%eax divided by that32bitnumber. %edx gets the remainder of the division. Note, the remainder is how you do modulus.
Lab3:
Lab 2 introduces simple assignments, variables, registers, and instructions. The eventual goal here is to be able to implement C commands such as "a = ((b + c) - (d + e)) - 10;" in Assembly. For each program youll be working on, you will be given a piece of C driver code, and you will have to implement an Assembly function that implements the a C function described in the assignment.
In most computer programs, you deal with memory locations that store data (variables) and instructions that manipulate that data. In Assembly, you also deal with registers. You can store values in registers, as well as use registers in computations. Registers are fast, temporary variables located within the microprocessor itself, where normal variables exist in memory. Theres a limited number of registers that can be used, but if those can be used, it is desirable due to speed. You cant just use memory variables either. The 8086 processor limits the use of memory variables to a MAXIMUM of ONE memory variable per computation. Im going to repeat that for emphasis: you cannot use two memory variables in a computation. You must use
AT LEAST one register or a constant.
In Intel 8086 assembly language, you can only do one computation at a time, and you can only use one memory variable per computation. So the command from above, "a = ((b + c) - (d + e)) - 10;" cannot be implemented in a single line. In fact, Assembly is so low level that this takes many lines.
For the processor were using, there are four general purpose registers: A, B, C, and D. That previous line is implemented in Assembly like so:
.comm a, 4
.comm b, 4
.comm c, 4
.comm d, 4
.comm e, 4
.text
movl b,%eax ; move b into register ax
addl c,%eax ; add c to register ax
movl d,%ebx ; move d into register bx
addl e,%ebx &
In most computer programs, you deal with memory locations that store data (variables) and instructions that manipulate that data. In Assembly, you also deal with registers. You can store values in registers, as well as use registers in computations. Registers are fast, temporary variables located within the microprocessor itself, where normal variables exist in memory. Theres a limited number of registers that can be used, but if those can be used, it is desirable due to speed. You cant just use memory variables either. The 8086 processor limits the use of memory variables to a MAXIMUM of ONE memory variable per computation. Im going to repeat that for emphasis: you cannot use two memory variables in a computation. You must use
AT LEAST one register or a constant.
In Intel 8086 assembly language, you can only do one computation at a time, and you can only use one memory variable per computation. So the command from above, "a = ((b + c) - (d + e)) - 10;" cannot be implemented in a single line. In fact, Assembly is so low level that this takes many lines.
For the processor were using, there are four general purpose registers: A, B, C, and D. That previous line is implemented in Assembly like so:
.comm a, 4
.comm b, 4
.comm c, 4
.comm d, 4
.comm e, 4
.text
movl b,%eax ; move b into register ax
addl c,%eax ; add c to register ax
movl d,%ebx ; move d into register bx
addl e,%ebx &