X86 Assembly - Tutorial 3
Controling the Flow
We break programs in logical structures know as functions.Function is logical set of instructions that tansforms data in very particular way. This logical separation is really great form of abstraction building.This construct in programing language is called function. The way we utilize this function is by calling it.We will see in detail what it means to call function.We may want to call function based on some condition.The language construct used for this is branches.Also, transforming data sometime means performing repeatative operations on data.The construct we use for this is called loops.All of this constructs are implemented with help of compilers.I hope at the end of chapter you will have better understanding about workings of branch, loop, function.
Note : Please read Part I and Part II of this tutorial series.
How Program Is Executed?
Processor has special purpose register called as Instruction Pointer that holds the address of the next instruction to execute. The nature of this operation looks very sequential. This was reason why processors were slow. Later inventions of Instruction Prefetch Cache, Out Of Order Execution Engine and Retirement Unit were made to speed up the CPU operations.
Many instructions ahead of current instruction pointer location are prefeched in Instruction Prefetch Cache.Out Of Order Execution Engine takes these instructions and starts executing them and the results of this execution is then placed in Retirement Unit.The Retirement Unit will not execute the result until its time to do so in program logic.So, instruction pointer reached on instruction then, Out Of Order Execution Engine might have already placed the result of it in Retirement Unit. The Retirement Unit then will execute the result of this instruction which was placed by Out Of Order Execution Engine. Once the Retirement Unit executes this result then and then only we say the instruction is executed and instruction pointer will be increamented.This explaination describes what actually happens in processors. For simplicity we will consider that the instruction pointer behaves in sequential manner and processor execute instructions sequentially.
EIP is instruction pointer for processor. As a programmer we cannot directly change the content of EIP register. But, there are instructions who has side effects and by executing these instructions we can indirectly manipulate EIP. Manipulating the EIP is how we alter or control the flow of program. So, Lets dive deep to see these instructions.
Unconditional Branches
The place in logic where we manipulate EIP creates branch in flow of program.Unconditional Branches are used when we want to change the flow or branch out without any condition.Example of this could be function call.Lets see the instructions used to do this
• Jump
This instruction is similar to goto statement for C. It will help the control to go to certain location. When below instruction is executed then EIP is updated to location(address mentioned with JUMP).
jmp location - location is memory address where the jump will be made.
There are multiple types of jumps. Based of on the distance between the current EIP location and jump to location, one of the following jumps should be used.
Short Jump - Used when jump offset is less than 128 bytes.
Far Jump - Used when location of jump is in another segment.
Near Jump - In all places where Short and Far jump is not applied.
GAS will take care of converting to proper jump if you just use mnemonic jmp.But, beware this will have performance implications. In next parts we will see how this might affect the performance.Lets see small example
.section .text
.globl _start
_start:
nop
movl $1, %eax
jmp label
movl $3, %ebx
label:
int $0x80
We are loading value 1 in EAX and jump to make sys_exit() linux system call. We have some instructions after the call to jmp but those will not be executed. Now lets assemble this code and use one more tool called as Objdump to examine the object file generated. Lets examine the output generated by Objdump.Specifically see the address next to jump instruction. Observe for jump you mentioned the label and it is replaced by the address of label. Label is just place holder name given to refer to address of instruction we want to point to. Commands to generate following output are as below
as –32 -o Jump.o Jump.s
objdump -m i386 -D Jump.o
Jump.o: file format elf32-i386
Disassembly of section .text:
00000000 <_start>:
0: 90 nop
1: b8 01 00 00 00 mov $0x1,%eax
6: eb 05 jmp d <label>
8: bb 03 00 00 00 mov $0x3,%ebx
0000000d <label>:
d: cd 80 int $0x80
Note : As explained in previour tutorials x86 systems uses segmented memory model.Hence, we saw different types of jmp.
Please read more about Objdump it is very useful tool. Best way to learn more about them is use man pages.
Addresses you are seeing in above code segment are called Translation Time Addresses.Compiler choses its own origin for
starting addresses called as Translated Origin.Translation Unit is final input to Compiler/Assembler after
preprocessing and other steps.Linker will chose its own address origin called as Linked Origin and addresses
generated by linker are called Linked Time Addresses.Similary Loader will have Load Origin and Load Time Addresses.
These are virtual addresses.When your program is loaded for execution the physical addresses are allocated and virtual address will be mapped to it.
• Calls
CALL instruction is similar to JUMP.The only difference is CALL remembers from where it jumped and if needed it can return.
call address - Address here is label that will be converted to address of first instruction in function.
CALL instruction is used with RET statement.When RET is executed at end of function the EIP will be set to address just after place of CALL instruction. When we execute CALL the current value of EIP and all parameters are pushed on stack.After this is done then EIP is updated to start of function.
Lets look at below program.
.section .data
output:
.asciz "This is %d"
.section .text
.globl _start
_start:
pushl $1 ----------------------------------> Push 1 on stack. (4 bytes)
pushl $output ----------------------------------> Push the address of string output. (4 bytes)
call printf
add $8, %esp ----------------------------------> Clear parameters from stack. (4 bytes + 4 bytes)
call overhere
pushl $2 ----------------------------------> Push 2 on stack (4 bytes).
pushl $output ----------------------------------> Push the address of string output (4 bytes).
call printf
add $8, %esp ----------------------------------> Clear parameters from stack. (4 bytes + 4 bytes)
overhere:
pushl %ebp ----------------------------------> Store current value in EBP.
movl %esp, %ebp ----------------------------------> Store current value of ESP to EBP. We need to restore stack pointer before RET is executed.
pushl $3
pushl $output
call printf
add $8, %esp
movl %ebp, %esp ----------------------------------> Restore the value of ESP as it was before the CALL was made.
popl %ebp ----------------------------------> Restore EBP value.
ret ----------------------------------> RET will make execution continue and print value 2.
Above program will print 1 3 2.Please find the explaination next to program in above section.
• Interrupts
INT instruction is used to issue interrupt and processor will take control to ISR. We saw explaination for interrupts and ISR in Part I of this tutorial series. Hence, we will not discuss much about this instruction here.
Conditional Branches
When ever we want to transfer the flow based on some condition we use these types of instructions.The conditional statement is executed and as side effect EFLAGS is changed. Based on the status of EFLAGS conditional jump instruction will be executed.These types of instruction consults below 5 flags
Carry flag (CF) - bit 0 (lease significant bit)
Overflow flag (OF) - bit 11
Parity flag (PF) - bit 2
Sign flag (SF) - bit 7
Zero flag (ZF) - bit 6
Please see example and explaination for conditional branches in section Indexed Memory Addressing Mode of Part II of the series.
Branch Prediction
Lets see how If statement in C program is implemented by assembler. we will write simple C program given below to implement If and Else.
include<stdio.h>
int main()
{
int a = 100;
int b = 50;
if(a>b)
{
printf("A is greater than B");
}
else
{
printf("B is greater than A");
}
return 0;
}
After assembling we see following output
.file "Branching.c"
.text
.section .rodata
.LC0:
.string "A is greater than B"
.LC1:
.string "B is greater than A"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
endbr32
leal 4(%esp), %ecx
.cfi_def_cfa 1, 0
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
.cfi_escape 0x10,0x5,0x2,0x75,0
movl %esp, %ebp ------------------------> Store current stack pointer to EBP so we can manipulate ESP freely and restore it later.
pushl %ebx
pushl %ecx
.cfi_escape 0xf,0x3,0x75,0x78,0x6
.cfi_escape 0x10,0x3,0x2,0x75,0x7c
subl $16, %esp
call __x86.get_pc_thunk.ax
addl $_GLOBAL_OFFSET_TABLE_, %eax
movl $100, -16(%ebp) ------------------------> -16(%ebp) represents variable a.
movl $50, -12(%ebp) ------------------------> -12(%ebp) represents variable b.
movl -16(%ebp), %edx ------------------------> Move a to EDX.
cmpl -12(%ebp), %edx ------------------------> Compare b with a.
jle .L2 ------------------------> Jump to else part of b <= a. (Find more explaination in section below)
subl $12, %esp ------------------------> Continue with If part.
leal .LC0@GOTOFF(%eax), %edx
pushl %edx
movl %eax, %ebx
call printf@PLT
addl $16, %esp
jmp .L3
.L2:
subl $12, %esp -----------------------> Else part.
leal .LC1@GOTOFF(%eax), %edx
pushl %edx
movl %eax, %ebx
call printf@PLT
addl $16, %esp
.L3:
movl $0, %eax
leal -8(%ebp), %esp
popl %ecx
.cfi_restore 1
.cfi_def_cfa 1, 0
popl %ebx
.cfi_restore 3
popl %ebp
.cfi_restore 5
leal -4(%ecx), %esp
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
We will look at only interesting details.If we observe the flow, compiler has reverse condition than what we have in program.So It might look like reverse of If-Else block we have in program. This is done for some specific purpose.To understand this we will need to look at Branch Prediction Unit.Out Of Order Unit must predict next instructions to be executed but, when we have branches in program this task becomes little bit complicated.Also the original purpose of adding Out Of Order Unit is to optimize the processor but, what if it predicts the branch in wrong way?.This will actually affect the performance a lot.So, Out Of Order Unit take help of Branch Prediction Unit.This unit has some basic rules on how to evaluate the branches in program path.
Backward Branches are always taken - Loop is example of backward branch where we will go back to start of loop to execute instructions those might be previously executed.
Forward Branches are never taken - More details in section below.
Braches previously taken are taken again
Now lets examine our program again. Please see instructions (cmpl -12(%ebp), %edx) and (jle .L2).Compiler changed the code because it saw that If-Then part is likely to be taken so, it changed the logic such that, If-Else part becomes the forward branch. So Branch Prediction Unit will predict this branch will not be executed and hence instead of If-Else part If-Then code will be prefeched and we will get desired performance.So, in short placing code that is most likely to be taken as the fall-through of the JUMP forward statement increases the likelihood that it will be in the instruction prefetch cache when needed.
Note : General Optimization tips you will find online is to avoid branches all together or unroll loops.I would urge you to profile first when ever you are trying to optimize.If profiler shows certain branch is very slow then try to see the assembly and try to figure out what is happening.Same follows for loop unrolling.Most of the platforms provide Conditional Move to implement branching which are not that costly.Lets say, if Conditional Move instruction was not present then, we might have used JUMP and implmented conditions.Thi might have affected the performance.When it comes to loop unrolling we have to be careful.If we unroll loop, it might expand in lots of instruction.As result of this we might end up with no space in instruction prefectch cache and it will again affect the performance.
Loops
We have now idea how loops might have implemented using the JUMP instructions.But, lets see how assembler generates the loops. We will use following simple C program for that.
.file "Branching.c"
.text
.section .rodata
.LC0:
.string "Hello : %d"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
endbr32
leal 4(%esp), %ecx
.cfi_def_cfa 1, 0
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
.cfi_escape 0x10,0x5,0x2,0x75,0
movl %esp, %ebp
pushl %ebx
pushl %ecx
.cfi_escape 0xf,0x3,0x75,0x78,0x6
.cfi_escape 0x10,0x3,0x2,0x75,0x7c
subl $16, %esp
call __x86.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
movl $0, -12(%ebp) --------------------------------> -12(%ebp) represent i. Init it to 0.
jmp .L2
.L3:
subl $8, %esp
pushl -12(%ebp) ---------------------------------> Push i on stack.
leal .LC0@GOTOFF(%ebx), %eax
pushl %eax
call printf@PLT
addl $16, %esp
addl $1, -12(%ebp) --------------------------------> Increment the i by 1.
.L2:
cmpl $9, -12(%ebp) --------------------------------> Do comparison i <= 9
jle .L3 --------------------------------> If i <=9 then jump back to start of loop label is L3.
movl $0, %eax
leal -8(%ebp), %esp
popl %ecx
.cfi_restore 1
.cfi_def_cfa 1, 0
popl %ebx
.cfi_restore 3
popl %ebp
.cfi_restore 5
leal -4(%ecx), %esp
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size main, .-main
.section .text.__x86.get_pc_thunk.bx,"axG",@progbits,__x86.get_pc_thunk.bx,comdat
.globl __x86.get_pc_thunk.bx
.hidden __x86.get_pc_thunk.bx
.type __x86.get_pc_thunk.bx, @function
__x86.get_pc_thunk.bx:
.LFB1:
.cfi_startproc
movl (%esp), %ebx
ret
.cfi_endproc
.LFE1:
.ident "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
.section .note.GNU-stack,"",@progbits
.section .note.gnu.property,"a"
.align 4
.long 1f - 0f
.long 4f - 1f
.long 5
0:
.string "GNU"
1:
.align 4
.long 0xc0000002
.long 3f - 2f
2:
.long 0x3
3:
.align 4
4:
Please see the comment section in above code segment to understand part where loop is implemented. So, with this we have now covered loops, branches and functions.