Why Assembly ?

Assembly language is really fun. You will get tour of very exotic corners of programming and learn how things work under the hood. Though, not many of us will program in assembly for our day to day jobs still I have very compelling reasons to learn assembly. Most of the optimizations for program we write in c/c++ are left for compilers. Compilers are super awsome these days yet, they are general purpose tools. So, they have to make some assumptions and might not make educated guess for what we are trying to do. It is always benifical to look at the assembly generated for your program by compiler. This habit of peeking under the hood can make lot of difference in long run. Other than this its always good to know how things are executed under the hood. For example, different addressing modes those are available, calling conventions, stack manipulations and so on. If you are interested in reverse engineering then assembly is must to have tool in your arsenal. So without further ado lets get started.

Note : We are specifically starting with x86 assembly so we get better understanding of segmented address modes and then move on to x64.

What is Assembly ?

Assembly language is abstraction layer on machine code instructions. Assembly language has opcodes for machine codes instructions. Special program called as assembler will take this program and do necessary conversions.

Simplest Program Execution Model

Below image depicts simplest representation for Von-Neuman/Stored program architecture.This architecture proposed that the instructions that operate on data will be present in same memory as data.

Architecture

  • Control unit makes the control bus to fetch instruction. It then decodes the instruction into micro-code. Micro-code is actually code that drives the signals to gate levels.
  • The decoded instruction is then passed to Arithmetic/Execution unit. This unit makes data bus to fetch the data and store it in registers.
  • Referring to Instruction Pointer (IP) control and execution unit knows the address of the current instruction/ data.
  • This address is sent on address bus to tell RAM which data we are interested in and then, data bus and control bus fetches the actual data or instructions.

at the end of this tutorial I will add one more section explaining more on control unit components. Out of order execution, branch prediction, retirement unit and many components are not included in this section for simplicity. Cache is one more memory that is omitted from above figure for same reason and will be explained later in separate sections.

Types of Registers

Now we understand RAM is where data resides but, RAM has much higher latency than CPU. So, we have added on chip memory known as registers. Assume we did not have registers, in that case when cpu need to read the data multiple times each time it has to wait for RAM to complete the operation and give the data back to CPU. This will waste CPU cycles hence registers are important.

• General Purpose

  EAX - Accumulator for the operands and result data
  EBX - Pointer to data in data segment
  ECX - Counter for string and loop operations
  EDX - I/O pointer 
  EDI - Data pointer for destination in string operation
  ESI - Data pointer for source in string operation
  ESP - Stack pointer
  EBP - Stack Data pointer 

Note : Each of the registers explained above are 32 bit. We can access 8 bit/16 bit part of it. This is done for backward compatibility. Please refer below image register-internals

• Segment Registers

  Most of the platforms we are using has segmented memory model.
  Each program when launched is referred as process that is divided into different
  segments for each data, instruction and stack. These segments are actually memory areas in RAM.
  Every location has logical address. This logical address contain the segment address and offset 
  inside segment for particular instruction or data. Segment registers hold starting address of
  specific segments.

  CS - Code Segment 
  DS - Data Segment
  SS - Stack Segment
  ES/FS/GS - Extra segment pointer

Note : So when ever process is loaded its segments start address will be loaded into these registers. This is part of context initialization for process.

• Instruction Pointer Register

  EIP points to next instruction in code segment to be executed. Program
  cant modify the address inside this register directly but, we cant control
  it as side effect of executing instructions like jump.

• Control Registers 

  These registers hold values indicating status and operating mode of processor.
  As result of execution of instructions these registers will be updated and can be later read to decide 
  if the instruction succeeded or failed.
  There are multiple registers I am listing just important ones here.

  CR0 - control the states of processor 
  CR2 - Page fault information 
  CR4 - Flags enabling cpu features.

Note : control registers cant be modified directly this have to be done using general purpose register.

• Flags

  Flags is the mechanism by which program can determine if executed 
  operation was successful or not.

  Status flags   - check result of mathematical operation (carry /parity ..)
  Control flags  - control some processor behavior. (string processing direction - DF)
  System flags   - control operating system level operations (traps/interrupts/io privileges…)

Tools Used

We will be using GNU tool chain for this tutorial.

  • gcc : GNU compiler collection
  • as : GNU assembler (also knows as GAS)
  • ld : GNU linker
  • gdb : GNU Debugger (we will be using kdbg for GUI)
  • gProg : Profiler
  • Objdump : Display info from Obj file
  • kdbg : UI front end for gdb

I am using WSL with ubuntu installed as windows linux subsystem. Please make sure your system has these tools installed. If your system doesn’t have these then please follow “Installation” section for details.

Note : I will include special instruction for using UI applications in WSL setup.

Structure of Assembly Program

Program Structure

• Data Section : This section contains data that is initialized.

• Bss Section : Uninitialized data declarations goes here.

• Text Section : Instructions and logic manipulating data goes here.

Hello World Assembly

For fun we start with showing hello world program in assembly. Along side is given explanation for each line for understanding.

.section .data
output:
        .ascii "Hello World..."
.section .text 
.globl _start
_start:
        movl $4, %eax 
        movl $1, %ebx
        movl $output, %ecx
        movl $14, %edx
        int $0x80
        movl $1, %eax
        movl $0, %ebx
        int $0x80

Lets breakdown every part of above program to understand it in more details.

.section .data   
output:
        .ascii "Hello World..."

Data section starts here. We declare string output that has value hello world stored in it.

.section .text
.globl _start    

_start is the entry point function like main. _start is special label that is associated with entry point function. .globl directive tells linker that we want to export this function so, it will be available for modules outside this file. LibC actually calls this entry point hence, it is important to export this _start function.

_start:
        movl $4, %eax   
        movl $1, %ebx           
        movl $output, %ecx       
		movl $14, %edx                    
        int $0x80                  

We want to print the value to console. Hence we are using system calls for that purpose. Each Linux system call is given unique number. We are using 4 which is sys_write() system call.Then we want to put this string to stdout. Linux treats everything as file. 1 is file descriptor for stdout.So, we fill the registers with following values

EAX - system call number

EBX - file descriptor number

ECX - actual string

EDX - length of string

After loading proper values inside registers we issue software interrupt using int instruction. There are software interrupt and hardware interrupts. Hardware interrupts have higher priority than software interrupts. Even within software interrupts there is priority.

When process issues software interrupt it will be put in queue and CPU sees if any interrupts are pending in interrupt queue. It will refer to interrupt table (Interrupt Vector) which has address for interrupt service routine (ISR) corresponding to the interrupt received.

Before it starts executing the ISR it will store the current context (register values for process currently being executed) onto interrupt stack, clear interrupt source entry, start and finish execution of ISR and reset the context as it was before execution of ISR started.

There is kernel mode stack that is allocated per process. When ever we want to execute ISR it wont get its own stack but it will use the kernel mode stack of process that invoked it. Please note each process has user mode stack and kernel mode stack allocated as part of per process data structure by kernel.

This ISR will read the values is EAX and call the sys_write() system call passing it other values from EBX, ECX, EDX as arguments and hence, we get out string printed on console.

Note : Please refer to Miscellaneous to get some details on how Harware Interrupts are handled.

movl $1, %eax
movl $0, %ebx   
int $0x80

In this part we are simply calling sys_exit() system call to complete the execution of our program.

Compiling and Running

This is the most exciting part (If you dont get any errors !).

Use following command to build object file

as –32 -o HelloWorld.o HelloWorld.s

Use following command to link the object file and get executable

ld -m elf_i386 -o HelloWorld HelloWorld.o

Use following command to run

./HelloWorld

Note : Please see options –32 and -m elf_i386 used to build in 32 bit mode. Mostly on modern systems you will have gcc , as ,ld which are 64 bit compatible. So, you will have to specifically tell these tools to generate 32 bit output.

Debugging HelloWorld

We need to have debug symbols embedded in our exe in order to debug it. Hence, we use following commands so assembler embeds those special symbols in object file.

as –32 -gstabs -o HelloWorld.o HelloWorld.s

ld -m elf_i386 -o HelloWorld HelloWorld.o

As you remember we are using kdbg to debug so run following command

kdbg HelloWorld

You will see below window and as you set breakpoints and step in code you can inspect values in registers and memory locations. If you see in below image, bottom most window on left shows the values in different registers.

Debugger View

Using Gcc to inspect assembly generated

Lets write HelloWorld program in c and see the generated assembly and try to compare it with what we have written.

#include <stdio.h>
int main() {

    printf("Hello World!!");
    return 0;
}

Run command

gcc -m32 -S HelloWorld.c

This command will generate HelloWorld.s file.

You will see code like below

.LC0:
        .string "Hello World!!"
main:
        lea     ecx, [esp+4]
        and     esp, -16
        push    DWORD PTR [ecx-4]
        push    ebp
        mov     ebp, esp
        push    ecx                         -------------------------> Push old value of stack pointer as return address. 
        sub     esp, 4
        sub     esp, 12                     -------------------------> Pushing parameters for printf on stack. Please note instruction sub is used. 
        push    OFFSET FLAT:.LC0
        call    printf                      -------------------------> Call to printf.
        add     esp, 16                     -------------------------> Restore the stack pointer. On x86 cpus stack grows downwards. This means when ever we use push 
        mov     eax, 0                                                 to place the parameters on stack the stack pointer is decremented. Hence to restore the stack   
        mov     ecx, DWORD PTR [ebp-4]                                 we add to stack pointer with appropriate size of data that was stored on stack.
        leave
        lea     esp, [ecx-4]
        ret

The major difference you will observe here is direct call to printf function from LibC. We can also use functions from LibC by linking it.

Using LibC function in assembly

We write out own version like below

.section .data
output:
        .asciz "Hello World..."            -------------------------> asciz directive is used to specify null terminated string. LibC functions needs null terminated string.
.section .text
.globl _start
_start:
        pushl $output
        call printf
        addl $4, %esp
        pushl $0
        call exit

Use following commands to compiler and then link dynamically to LibC.

as –32 -o HelloWorld.o HelloWorld.s

ld -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -o HelloWorld -lc HelloWorld.o

Note : Please make sure you have 32 bit version of LibC installed. If you dont have it please use following command to install it.

sudo apt-get install build-essential libc6-dev-i386

Now you are set to use any function from LibC in your assembly code.

Installation

Installing GNU Tool Chain

If your ubuntu installation doesnt have any development tools mentioned in above section then you can install them. Tools like as, ld, objdump, gProf are part of binutils package which can be installed using following command

sudo apt-get update -y

sudo apt-get install -y binutils-common

Installing GCC

You will have to install gcc seperately by installing build-essential package

sudo apt-get install -y build-essential

Installing kdgb

Below are the steps to install kdgb on WSL and how to use it using vcxsrv ( X Server).

  • sudo apt install extra-cmake-modules
  • git clone -b maint https://github.com/j6t/kdbg.git
  • cd kdbg
  • git tag -l
  • git checkout kdbg-3.0.1
  • sudo apt install libkf5iconthemes-dev libkf5xmlgui-dev
  • sudo apt-get install -y gettext
  • cmake .
  • sudo make install

After following above steps kdbg is not installed on your system.

Using kdgb on WSL

To use applications with GUI on WSL follow below steps.

  • On Windows (Host Side) install X server vcxsrv.

  • Launch it with default option. ( Just make sure to check box saying “Disable Access Control” and put -ac in command line for allowing public access)

  • On WSL command prompt execute following command that will set the display number (You can also write below command in your .bashrc in case you dont want to do it everytime)

**export DISPLAY=$(cat /etc/resolv.conf grep nameserver awk ‘{print $2}’):0**
  • Now you are good to go!

Following above steps when ever you will launch the kdgb you will see the GUI and you can debug the application.

Miscellaneous

Hardware Interrupt Handling

CPU gives time slice for each process to execute in round robin fashion. When CPU is executing a process and its time slice finishes then the clock sends hardware interrupt to CPU. Clock hardware interrupt is at highest priority so, CPU will start executing its ISR and its ISR will do the task of context saving for current process and load the context of next scheduled process. This process generally is called as context switch.

CPU at end of execution of each instruction will see the interrupt queue to see if it has any high priority interrupt pending if so it will start handling it.

Big/Little Endian

These are ways to store numbers or data in memory addresses. Let’s use a 16-bit word as example, (0xABCD) 16 in this case. Let’s also assume we are storing this word starting at address 0x4000.

Little Endian :

Store LSB at smallest memory location.

    -------------------------------
   |               |               |  
   |     CD        |       AB      |
    -------------------------------
        0x4000          0x4001  

Big Endian :

Store MSB at smallest memory location.

    -------------------------------
   |               |               |  
   |      AB       |       CD      |
    -------------------------------
        0x4000          0x4001

This is important when we will deal with strings. How we store them and read them back will be affected by this.