Linux Assembly and Disassembly an Introduction

EDB-ID:

13220

CVE:

N/A

Author:

lhall

Type:

papers

Platform:

Multiple

Published:

2006-04-08

|=------------=[ Linux Assembly and Disassembly an Introduction ]=------------=|
|=----------------------------------------------------------------------------=|
|=-------------------------=[ lhall@telegenetic.net ]=------------------------=|


----[ Introduction to gcc, gdb and objdump.

Here we start with a very simple c program. All it does is 
write() the fourteen character string "Hello, World!\n" to STDOUT, 
which is file descriptor 1. Normally with no type of redirection 
going on STDOUT is your monitor - your standard output. If we `man 2 write`, 
section 2 of the man pages are for system calls, we can see that writes prototype 
is:

	ssize_t write(int fd, const void *buf, size_t count);

So write returns a ssize_t, its first argument is a file descriptor to 
write to, its second argument is a pointer or address of a buffer, and 
its third argument is how many bytes for the starting address of the 
buffer to write.

entropy@phalaris asm $ cat hello.c 

main() {
   write (1,"Hello, World!\n", 14);
   return 0;
} 

---[ gcc

Compile the program.

entropy@phalaris asm $ gcc hello.c 

gcc produces a file called a.out if you dont specify a filename it should 
output too. a.out stands for assembler output and is the default name for 
executable output from many compilers, especially UNIX ones. a.out is also 
an old object file format for executables.
 
entropy@phalaris asm $ ./a.out 
Hello, World!

Specify the executable file name gcc should output to with -o.
 
entropy@phalaris asm $ gcc hello.c -o hello

entropy@phalaris asm $ ./hello 
Hello, World!

Use the -S switch to generate the assembly.

entropy@phalaris asm $ gcc hello.c -S -o hello.s

We output with -o to the file hello.s, the -S will generate at&t 
assembly from our code, the same assembly that gcc would use when 
it calls as and ld.

entropy@phalaris asm $ cat hello.s
        .file   "hello.c"
        .section        .rodata
.LC0:
        .string "Hello, World!\n"
        .text
.globl main
        .type   main, @function
main:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        andl    $-16, %esp
        movl    $0, %eax
        subl    %eax, %esp
        subl    $4, %esp
        pushl   $14
        pushl   $.LC0
        pushl   $1
        call    write
        addl    $16, %esp
        movl    $0, %eax
        leave
        ret
        .size   main, .-main
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)"

Everything that starts with a "." like ".file" is a directive - it 
directs the assembler or compiler to do something instead of it 
itself doing something.

.file "hello.c"	

This directive tells the compiler the name of the source file.

.section .rodata

This starts the section .rodata, which stands for read only data.  
Sections break up your programs into pieces that are easier to manage, 
and segmentation helps keep things in their place. Some other sections 
are .data which is initialized data;  .bss which is uninitialized data and 
.text which is your program code.

.LC0:

This is a text label for an address, also called a symbol, so we dont have
to refer to hex digits to access an address. Labels tell the assembler 
to make the symbols value be what ever the next instruction or data element 
is. A label is a symbol followed by a colon. It does _not_ have to start 
with a ".".

.string "Hello, World!\n"

This is data of type string, the labels value is the address of the "H" at the 
beginning or the string.

.text

This begins the .text section, which is where out code begins. Dont ask me why 
it dosent say .section before it.

.globl main

main is a symbol that is going to be replaced by an address during either 
assembly or linking. Symbols are used to mark locations of addresses, such 
as addresses of data or addresses of function pointers. .globl tells the 
assembler that it shouldnt get rid of the symbol after assembly because 
the linker needs it.  main is the symbol where the program starts.

.type   main, @function

main's type is a function.

main:

The main label which tells the kernel where to start executing your program.

pushl   %ebp
movl    %esp, %ebp

This is called a procedure prolog, it sets up stuff thats needed so 
functions dont corrupt data or the stack.

subl    $8, %esp
andl    $-16, %esp
movl    $0, %eax
subl    %eax, %esp
subl    $4, %esp

This is all unimportant for now.


pushl   $14
pushl   $.LC0
pushl   $1
call    write

This is our system call to write(), it pushl's each of the functions 
arguments onto the stack. Notice it pushes from right to left while 
the function call looks like write(1, STRING, 14); So we push immediate 
value $14 or 14 which is the length of our string, the value of the 
label (which is the address of our string "Hello, World!\n"), and 1 
for STDOUT, then it calls the write system call to write it out.
 
addl    $16, %esp

Clean up the stack, since we pushed three things for write and
one to make the base pointer the stack pointer thats 4*4(bytes).

movl    $0, %eax

movl the immediate value $0 into the register eax. eax is used 
for the return value of programs and functions.

leave
ret

Leave the function, ret means return control of the cpu to who 
ever called the program because the program is done doing what 
it was written too.

.size   main, .-main
.section        .note.GNU-stack,"",@progbits
.ident  "GCC: (GNU) 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)"

Extra stuff, like the size of the program, the .note section and 
an identification string to show what version of gcc and which 
linux it was compiled on.

Get rid of all the old compiled programs.

entropy@phalaris asm $ rm a.out hello

Compile the assembly file.

entropy@phalaris asm $ gcc hello.s

And execute.

entropy@phalaris asm $ ./a.out
Hello, World!

Compile and execute with a output -o.

entropy@phalaris asm $ gcc hello.s -o hello
entropy@phalaris asm $ ./hello
Hello, World!

Fun times.

Now for the disassembly.
Compile with debugging symbols.

entropy@phalaris asm $ gcc -gstabs hello.s -o hello

---[ gdb

Start gdb. Gentoo has something going on that I havent yet figured out, 
it prints this

warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.

Yours may or may not do this, it dosent effect us for now.

entropy@phalaris asm $ gdb hello
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db 
library "/lib/libthread_db.so.1".

Here we list the source of the file we are debugging. We can do this because
we compiled with the -g or -gstabs for debugging information to be included.

(gdb) list main
4               .string "Hello, World!\n"
5               .text
6       .globl main
7               .type   main, @function
8       main:
9               pushl   %ebp
10              movl    %esp, %ebp
11              subl    $8, %esp
12              andl    $-16, %esp
13              movl    $0, %eax
(gdb) <hit enter> 
14              subl    %eax, %esp
15              subl    $4, %esp
16              pushl   $14
17              pushl   $.LC0
18              pushl   $1
19              call    write
20              addl    $16, %esp
21              movl    $0, %eax
22              leave
23              ret
(gdb) <hit enter>
24              .size   main, .-main
25              .section        .note.GNU-stack,"",@progbits
26              .ident  "GCC: (GNU) 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)"

Now set a breakpoint at the address of the symbol main. The "*" before 
main says its an address not an immediate value. A breakpoint is where 
execution of the program will stop when it is hit, so this is saying 
execute until you hit the address of main.

(gdb) break *main
Breakpoint 1 at 0x8048370: file hello.s, line 9.

Now run the program. If you wanted to run with arguments you would 
do run argv1 argv2.

(gdb) run
Starting program: /home/entropy/asm/hello 
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.

Breakpoint 1, main () at hello.s:9
9               pushl   %ebp
Current language:  auto; currently asm

Ok at this point we have stopped at line 9, one line after the symbol main.
step means take one step forward, or execute exactly one instruction.

(gdb) step
10              movl    %esp, %ebp

Here we can see if we type step that esp will be moved into ebp, so look
at both of their values before this instruction executes.

(gdb) print $esp     
$1 = (void *) 0xbfd10578
(gdb) print $ebp
$2 = (void *) 0xbfd105a8

Ok they are different, so type step to have the instruction at line 10 execute.

(gdb) step

main () at hello.s:11
11              subl    $8, %esp

This is a bit confusing with the line 11 showing, it shows the line that 
will execute when you type step. So it has just executed line 10 at this 
point and will execute line 11 when you type step again.

Now we examine their values again.

(gdb) print $esp
$3 = (void *) 0xbfd10578
(gdb) print $ebp
$4 = (void *) 0xbfd10578

They are the same after the movl.
Now step past all the things deemed unimportant for now.

(gdb) step
main () at hello.s:12
12              andl    $-16, %esp
(gdb) step
13              movl    $0, %eax
(gdb) step
14              subl    %eax, %esp
(gdb) step
15              subl    $4, %esp
(gdb) step
main () at hello.s:16
16              pushl   $14

Ok so now were going to check out the stack. Remeber I said that esp 
points to the last thing pushed in the stack? Here we execute line 16, 
which pushl's the value 14 onto the stack.

(gdb) step
main () at hello.s:17
17              pushl   $.LC0

Now we examine(x) a decimal(d) at the address esp points to.

(gdb) x/d $esp
0xbfd10568:     14

And you can see it has a 14 there.
If we step again we will execute line 17, pushing the address of 
the string on the stack.

(gdb) step
main () at hello.s:18
18              pushl   $1

Ok lets see the address it pushed, we again examine(x) but this time in hex(x).

(gdb) x/x $esp
0xbfd10564:     0x0804849c

Ok so it has the address 0x0804849c in it, thats the address of our sting so 
lets examine(x) with the type string(s).

(gdb) x/s 0x0804849c
0x804849c <_IO_stdin_used+4>:    "Hello, World!\n"

And we can see our Hello, World!\n string.
step again to put the $1 onto the stack.

(gdb) step
main () at hello.s:19
19              call    write

And examine  it as decimal.

(gdb) x/d $esp
0xbfd10560:     1

We can also still exame the stack at other places + or - our current position.
To see two pushes up which would be the $14 we would examine decimal at esp+8,
because a long is four bytes, and each pushl pushes four bytes so 
2(pushes)*4(bytes each) = 8(bytes total).

(gdb) x/d $esp+8 
0xbf9b9af8:     14

Or to examine the hex address one pushl up we would examiine hex at esp+4.

(gdb) x/x $esp+4
0xbf9b9af4:     0x0804849c

The next instruction is a system call, see line "19 	call write" above. 
If we were to step into this we would step into the syscall which is a 
mess to trace when just beginning. Instead we will execute the instruction 
"next" which will execute the next instruction until it returns. What this 
means is the syscall will fire, write its message then return to us.
 
(gdb) next
Hello, World!
20              addl    $16, %esp

So it wrote out the string.

(gdb) step
main () at hello.s:21
21              movl    $0, %eax

movl $0 into eax, 0 is the return value.

(gdb) step
22              leave

Prepare to leave.

(gdb) step
main () at hello.s:23
23              ret

ret(urn) control to the caller of the program.

(gdb) step
0x4003a28e in __libc_start_main () from /lib/libc.so.6

And we are done, let the kernel continue executing.

(gdb) continue
Continuing.

Program exited normally.
(gdb) quit

---[ objdump

A little objdump. 

Lets check out another part of our executable by disassembling 
with objdump.

entropy@phalaris asm $ objdump -d hello

hello:     file format elf32-i386

This will show all the .sections disassembled with their address, 
opcodes and asm.

[...snip...]

We'll just look at main.

08048370 <main>:
 8048370:       55               push   %ebp
 8048371:       89 e5            mov    %esp,%ebp
 8048373:       83 ec 08         sub    $0x8,%esp
 8048376:       83 e4 f0         and    $0xfffffff0,%esp
 8048379:       b8 00 00 00 00   mov    $0x0,%eax
 804837e:       29 c4            sub    %eax,%esp
 8048380:       83 ec 04         sub    $0x4,%esp
 8048383:       6a 0e            push   $0xe
 8048385:       68 9c 84 04 08   push   $0x804849c
 804838a:       6a 01            push   $0x1
 804838c:       e8 07 ff ff ff   call   8048298 <write@plt>
 8048391:       83 c4 10         add    $0x10,%esp
 8048394:       b8 00 00 00 00   mov    $0x0,%eax
 8048399:       c9               leave  
 804839a:       c3               ret    
 804839b:       90               nop    
 804839c:       90               nop    
 804839d:       90               nop    
 804839e:       90               nop    
 804839f:       90               nop    
       ^        ^                ^
       |        |                Assmebly
       |        Opcodes
       Virtual address when loaded          
  
[...snip...]

Virtual address are the address the program will think its loaded 
at that are actually mapped to physical address, this is so every 
program can start at the same virtual address.

The opcodes are the machine code that is generated from the assembly, 
this is a good way to generate shellcode, although null's are 
bad times, bad times.

Then we have the assembly, looks similar execpt for $-16 is written 
as $0xfffffff0 which is the twos complement for -16, and is used to 
clear bits. Also the pushl $14 is the hex version pushl $0xe, the label 
.LC0 of instruction pushl $.LC0 has been translated to its address namely 
the line push   $0x804849c (remeber before this is the address we examined 
with (gdb) x/s 0x0804849c, which displayed our string).

Thats it for this one.

# milw0rm.com [2006-04-08]