Home > Computer/Technology, Programming > Primer on program disassembly and intel x86 assembler

Primer on program disassembly and intel x86 assembler

An intro to disassembling C program and using a debugger (in Linux)

First of all, I assume you have some basic knowledge of the C Programming language, Linux (or any UNIX based OS) and using a shell. Knowing how to code in C is most of the time enough to make you a decent programmer. You don’t usually need to understand all the inner workings of the CPU to make the program run. Ignorance is bliss. But if you are like me- who want to get a bigger picture of what is happening inside your machine, then reading this post will help you get a better idea. If you want crack or exploit programs, then these are the basic set of skills you need to master.

You need to realize that C code is meant to be compiled. The code can’t actually do anything until it’s compiled into an executable binary file. Thinking of C-source as a program is a common misconception that is exploited by hackers every day. The binary a.out’s instructions are written in machine language, an elementary language the CPU can understand. Compilers are designed to translate the language of C code into machine language for a variety of processor architectures. In this case, the processor is in a family that uses the x86 architecture. There are also other kinds of processor such as Sparc. Each architecture has a different machine language, so the compiler acts as a middle ground—translating C code into machine language for the target architecture.

We will use the following C code throughout this guide:
Filename: first.c

#include <stdio.h>

int main() {
	int i;
	for (i=0; i<10; i++)
	printf("Hello world!\n"); //print this 10 times
	return 0; //Indicate Normal exit to OS

}

To compile & run the program type the following commands:


roshan@linuxmint ~ $ gcc first.c -o f
roshan@linuxmint ~ $ ./f
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
Hello world!
roshan@linuxmint ~ $


What does this executable binary look like? The GNU development tools include a program called objdump, which can be used to examine compiled binaries. Let’s start by looking at the machine code the main() function was translated into.

The objdump command will give a lot of lines which we are not interested. We use pipe the pass the output to grep, and grep will display 20 lines following the regular expression main.:. Each byte is represented in hexadecimal notation, which is a base-16 numbering system. The hexadecimal numbers—starting with 80483b4 on the far left—are memory addresses. The bits of the machine language instructions must be put somewhere, and this somewhere is called memory. Memory is just a collection of bytes of temporary storage space that are numbered with addresses.

The instructions on the far right are in assembly language. Assembly language is really just a collection of mnemonics for the corresponding machine language instructions. The instruction ret is far easier to remember and make sense of than 0xc3 or 11000011. Assembly is just a way for programmers to represent the machine language instructions that are given to the processor.

The assembly shown above is AT&T syntax, as just about all of Linux’s disassembly tools use this syntax by default. It’s easy to recognize AT&T syntax by the % and $ symbols prefixing everything. The same code can be shown in Intel syntax by providing an additional command-line option, -M intel, to objdump, as shown in the output below.

The intel syntax is clearer and easier to understand than its AT&T counterpart. Most of the assembly operations move memory around, perform some sort of basic math, or interrupt the processor to get it to do something else.

Processors also have their own set of special variables called registers. Most of the instructions use these registers to read or write data, so understanding the registers of a processor is essential to understanding the instructions.

The x86 Processor
The x86 processor has several registers, which are like internal variables for the processor. The GNU development tools also include a debugger called GDB. Debuggers are used by programmers to step through compiled programs, examine program memory, and view processor registers.

Below, GDB is used to show the state of the processor registers right before the program starts.

A breakpoint is set on the main() function so execution will stop right before our code is executed. Then GDB runs the program, stops at the breakpoint, and is told to display all the processor registers and their current states.

Below is an explanation of the registers:
EAX : Accumulator
ECX : Counter
EDX : Data
EBX : Base
They are used for a variety of purposes, but they mainly act as temporary variables for the CPU when it is executing machine instructions.

The second four registers are:
ESP : Stack Pointer
EBP : Base Pointer
ESI : Source Index
EDI : Destination Index
These are also general-purpose registers, but they are sometimes known as pointers and indexes. The first two registers are called pointers because they store 32-bit addresses, which essentially point to that location in memory. These registers are fairly important to program execution and memory management.The last two registers are also technically pointers, which are commonly used to point to the source and destination when data needs to be read from or written to. There are load and store instructions that use these registers, but for the most part, these registers can be thought of as just simple general-purpose registers.

The EIP register is the Instruction Pointer register, which points to the current instruction the processor is reading. Like a child pointing his finger at each word as he reads, the processor reads each instruction using the EIP register as its finger. Naturally, this register is quite important and will be used a lot while debugging.
The remaining EFLAGS register actually consists of several bit flags that are used for comparisons and memory segmentations. The actual memory is split into several different segments, which will be discussed later, and these registers keep track of that. For the most part, these registers can be ignored since they rarely need to be accessed directly.

Assembly Language
Inside GDB, the disassembly syntax can be set to Intel by simply typing set disassembly intel or set dis intel, for short. You can configure this setting to run every time GDB starts up by putting the command in the file .gdbinit in your home directory.


roshan@linuxmint ~ $ gdb -q
(gdb) set disassembly intel
(gdb) quit
roshan@linuxmint ~ $ echo "set disassembly intel"> ~/.gdbinit
roshan@linuxmint ~ $ cat ~/.gdbinit
set disassembly intel
roshan@linuxmint ~ $


Now that GDB is configured to use Intel syntax, let’s begin understanding it. The assembly instructions in Intel syntax generally follow this style:



operation {destination}, {source}


The destination and source values will either be a register, a memory address, or a value. The operations are usually intuitive mnemonics: The mov operation will move a value from the source to the destination, sub will subtract, inc will increment, and so forth. For example, the instructions below will move the value from ESP to EBP and then subtract 8 from ESP (storing the result in ESP).

If you want to learn about the intel assembler syntax, a simple googling of “intel assembler tutorial” will land you 23,100 results.

The -g flag can be used by the GCC compiler to include extra debugging information, which will give GDB access to the source code.

First, the source code is listed and the disassembly of the main() function is displayed. Then a breakpoint is set at the start of main(), and the program is run. This breakpoint simply tells the debugger to pause the execution of the program when it gets to that point. Since the breakpoint has been set at the start of the main() function, the program hits the breakpoint and pauses before actually executing any instructions in main(). Then the value of EIP (the Instruction Pointer) is displayed.
Notice that EIP contains a memory address that points to an instruction in the main() function’s disassembly.

The GDB debugger provides a direct method to examine memory, using the command x, which is short for examine. Examining memory is a critical skill for any hacker. The examine command in GDB can be used to look at a certain address of memory in a variety of ways. This command expects two arguments when it’s used: the location in memory to examine and how to display that memory.

The display format also uses a single-letter shorthand, which is optionally preceded by a count of how many items to examine. Some common format letters are as follows:
o Display in octal.
x Display in hexadecimal.
u Display in unsigned, standard base-10 decimal.
t Display in binary.


(gdb) x/o 0x80483bd
0x80483bd : 03411042307
(gdb) x/x $eip
0x80483bd : 0x1c2444c7
(gdb) x/u $eip
0x80483bd : 472138951
(gdb) x/t $eip
0x80483bd : 00011100001001000100010011000111


The memory the EIP register is pointing to can be examined by using the address stored in EIP. The debugger lets you reference registers directly, so $eip is equivalent to the value EIP contains at that moment.

A number can also be prepended to the format of the examine command to examine multiple units at the target address.


(gdb) x/2x $eip
0x80483bd : 0x1c2444c7 0x00000000
(gdb) x/12x $eip
0x80483bd : 0x1c2444c7 0x00000000 0x04c711eb 0x0484b024
0x80483cd : 0xff1de808 0x4483ffff 0x83011c24 0x091c247c
0x80483dd : 0x00b8e87e 0xc9000000 0x909090c3 0x90909090
(gdb)


This concludes our topic. You now know how to disassemble a C program and read the value of its registers. You also know how to insert breakpoint in a debugger to pause a program execution and examining its registers. If you want to learn more, check out the book Hacking: The art of exploitation from which this post is adapted from.

//roshans89

  1. yog
    October 11, 2011 at 9:49 pm

    interessant

  2. Rajesh Vyas
    December 27, 2011 at 5:50 am

    useful and good explained …

  3. January 5, 2012 at 11:23 am

    #trans I will definitely recommend this to my friends

  4. November 28, 2013 at 11:21 pm

    Likes 😀

  1. No trackbacks yet.

Leave a comment