|=------------=[ Linux Assembly and Disassembly an Introduction ]=------------=| |=----------------------------------------------------------------------------=| |=-------------------------=[ lhall@telegenetic.net ]=------------------------=| ----[ Introduction to gcc, gdb and objdump. Here we start with a very simple c program. All it does is write() the fourteen character string "Hello, World!\n" to STDOUT, which is file descriptor 1. Normally with no type of redirection going on STDOUT is your monitor - your standard output. If we `man 2 write`, section 2 of the man pages are for system calls, we can see that writes prototype is: ssize_t write(int fd, const void *buf, size_t count); So write returns a ssize_t, its first argument is a file descriptor to write to, its second argument is a pointer or address of a buffer, and its third argument is how many bytes for the starting address of the buffer to write. entropy@phalaris asm $ cat hello.c main() { write (1,"Hello, World!\n", 14); return 0; } ---[ gcc Compile the program. entropy@phalaris asm $ gcc hello.c gcc produces a file called a.out if you dont specify a filename it should output too. a.out stands for assembler output and is the default name for executable output from many compilers, especially UNIX ones. a.out is also an old object file format for executables. entropy@phalaris asm $ ./a.out Hello, World! Specify the executable file name gcc should output to with -o. entropy@phalaris asm $ gcc hello.c -o hello entropy@phalaris asm $ ./hello Hello, World! Use the -S switch to generate the assembly. entropy@phalaris asm $ gcc hello.c -S -o hello.s We output with -o to the file hello.s, the -S will generate at&t assembly from our code, the same assembly that gcc would use when it calls as and ld. entropy@phalaris asm $ cat hello.s .file "hello.c" .section .rodata .LC0: .string "Hello, World!\n" .text .globl main .type main, @function main: pushl %ebp movl %esp, %ebp subl $8, %esp andl $-16, %esp movl $0, %eax subl %eax, %esp subl $4, %esp pushl $14 pushl $.LC0 pushl $1 call write addl $16, %esp movl $0, %eax leave ret .size main, .-main .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)" Everything that starts with a "." like ".file" is a directive - it directs the assembler or compiler to do something instead of it itself doing something. .file "hello.c" This directive tells the compiler the name of the source file. .section .rodata This starts the section .rodata, which stands for read only data. Sections break up your programs into pieces that are easier to manage, and segmentation helps keep things in their place. Some other sections are .data which is initialized data; .bss which is uninitialized data and .text which is your program code. .LC0: This is a text label for an address, also called a symbol, so we dont have to refer to hex digits to access an address. Labels tell the assembler to make the symbols value be what ever the next instruction or data element is. A label is a symbol followed by a colon. It does _not_ have to start with a ".". .string "Hello, World!\n" This is data of type string, the labels value is the address of the "H" at the beginning or the string. .text This begins the .text section, which is where out code begins. Dont ask me why it dosent say .section before it. .globl main main is a symbol that is going to be replaced by an address during either assembly or linking. Symbols are used to mark locations of addresses, such as addresses of data or addresses of function pointers. .globl tells the assembler that it shouldnt get rid of the symbol after assembly because the linker needs it. main is the symbol where the program starts. .type main, @function main's type is a function. main: The main label which tells the kernel where to start executing your program. pushl %ebp movl %esp, %ebp This is called a procedure prolog, it sets up stuff thats needed so functions dont corrupt data or the stack. subl $8, %esp andl $-16, %esp movl $0, %eax subl %eax, %esp subl $4, %esp This is all unimportant for now. pushl $14 pushl $.LC0 pushl $1 call write This is our system call to write(), it pushl's each of the functions arguments onto the stack. Notice it pushes from right to left while the function call looks like write(1, STRING, 14); So we push immediate value $14 or 14 which is the length of our string, the value of the label (which is the address of our string "Hello, World!\n"), and 1 for STDOUT, then it calls the write system call to write it out. addl $16, %esp Clean up the stack, since we pushed three things for write and one to make the base pointer the stack pointer thats 4*4(bytes). movl $0, %eax movl the immediate value $0 into the register eax. eax is used for the return value of programs and functions. leave ret Leave the function, ret means return control of the cpu to who ever called the program because the program is done doing what it was written too. .size main, .-main .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)" Extra stuff, like the size of the program, the .note section and an identification string to show what version of gcc and which linux it was compiled on. Get rid of all the old compiled programs. entropy@phalaris asm $ rm a.out hello Compile the assembly file. entropy@phalaris asm $ gcc hello.s And execute. entropy@phalaris asm $ ./a.out Hello, World! Compile and execute with a output -o. entropy@phalaris asm $ gcc hello.s -o hello entropy@phalaris asm $ ./hello Hello, World! Fun times. Now for the disassembly. Compile with debugging symbols. entropy@phalaris asm $ gcc -gstabs hello.s -o hello ---[ gdb Start gdb. Gentoo has something going on that I havent yet figured out, it prints this warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. Yours may or may not do this, it dosent effect us for now. entropy@phalaris asm $ gdb hello GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". Here we list the source of the file we are debugging. We can do this because we compiled with the -g or -gstabs for debugging information to be included. (gdb) list main 4 .string "Hello, World!\n" 5 .text 6 .globl main 7 .type main, @function 8 main: 9 pushl %ebp 10 movl %esp, %ebp 11 subl $8, %esp 12 andl $-16, %esp 13 movl $0, %eax (gdb) 14 subl %eax, %esp 15 subl $4, %esp 16 pushl $14 17 pushl $.LC0 18 pushl $1 19 call write 20 addl $16, %esp 21 movl $0, %eax 22 leave 23 ret (gdb) 24 .size main, .-main 25 .section .note.GNU-stack,"",@progbits 26 .ident "GCC: (GNU) 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3)" Now set a breakpoint at the address of the symbol main. The "*" before main says its an address not an immediate value. A breakpoint is where execution of the program will stop when it is hit, so this is saying execute until you hit the address of main. (gdb) break *main Breakpoint 1 at 0x8048370: file hello.s, line 9. Now run the program. If you wanted to run with arguments you would do run argv1 argv2. (gdb) run Starting program: /home/entropy/asm/hello warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. Breakpoint 1, main () at hello.s:9 9 pushl %ebp Current language: auto; currently asm Ok at this point we have stopped at line 9, one line after the symbol main. step means take one step forward, or execute exactly one instruction. (gdb) step 10 movl %esp, %ebp Here we can see if we type step that esp will be moved into ebp, so look at both of their values before this instruction executes. (gdb) print $esp $1 = (void *) 0xbfd10578 (gdb) print $ebp $2 = (void *) 0xbfd105a8 Ok they are different, so type step to have the instruction at line 10 execute. (gdb) step main () at hello.s:11 11 subl $8, %esp This is a bit confusing with the line 11 showing, it shows the line that will execute when you type step. So it has just executed line 10 at this point and will execute line 11 when you type step again. Now we examine their values again. (gdb) print $esp $3 = (void *) 0xbfd10578 (gdb) print $ebp $4 = (void *) 0xbfd10578 They are the same after the movl. Now step past all the things deemed unimportant for now. (gdb) step main () at hello.s:12 12 andl $-16, %esp (gdb) step 13 movl $0, %eax (gdb) step 14 subl %eax, %esp (gdb) step 15 subl $4, %esp (gdb) step main () at hello.s:16 16 pushl $14 Ok so now were going to check out the stack. Remeber I said that esp points to the last thing pushed in the stack? Here we execute line 16, which pushl's the value 14 onto the stack. (gdb) step main () at hello.s:17 17 pushl $.LC0 Now we examine(x) a decimal(d) at the address esp points to. (gdb) x/d $esp 0xbfd10568: 14 And you can see it has a 14 there. If we step again we will execute line 17, pushing the address of the string on the stack. (gdb) step main () at hello.s:18 18 pushl $1 Ok lets see the address it pushed, we again examine(x) but this time in hex(x). (gdb) x/x $esp 0xbfd10564: 0x0804849c Ok so it has the address 0x0804849c in it, thats the address of our sting so lets examine(x) with the type string(s). (gdb) x/s 0x0804849c 0x804849c <_IO_stdin_used+4>: "Hello, World!\n" And we can see our Hello, World!\n string. step again to put the $1 onto the stack. (gdb) step main () at hello.s:19 19 call write And examine it as decimal. (gdb) x/d $esp 0xbfd10560: 1 We can also still exame the stack at other places + or - our current position. To see two pushes up which would be the $14 we would examine decimal at esp+8, because a long is four bytes, and each pushl pushes four bytes so 2(pushes)*4(bytes each) = 8(bytes total). (gdb) x/d $esp+8 0xbf9b9af8: 14 Or to examine the hex address one pushl up we would examiine hex at esp+4. (gdb) x/x $esp+4 0xbf9b9af4: 0x0804849c The next instruction is a system call, see line "19 call write" above. If we were to step into this we would step into the syscall which is a mess to trace when just beginning. Instead we will execute the instruction "next" which will execute the next instruction until it returns. What this means is the syscall will fire, write its message then return to us. (gdb) next Hello, World! 20 addl $16, %esp So it wrote out the string. (gdb) step main () at hello.s:21 21 movl $0, %eax movl $0 into eax, 0 is the return value. (gdb) step 22 leave Prepare to leave. (gdb) step main () at hello.s:23 23 ret ret(urn) control to the caller of the program. (gdb) step 0x4003a28e in __libc_start_main () from /lib/libc.so.6 And we are done, let the kernel continue executing. (gdb) continue Continuing. Program exited normally. (gdb) quit ---[ objdump A little objdump. Lets check out another part of our executable by disassembling with objdump. entropy@phalaris asm $ objdump -d hello hello: file format elf32-i386 This will show all the .sections disassembled with their address, opcodes and asm. [...snip...] We'll just look at main. 08048370
: 8048370: 55 push %ebp 8048371: 89 e5 mov %esp,%ebp 8048373: 83 ec 08 sub $0x8,%esp 8048376: 83 e4 f0 and $0xfffffff0,%esp 8048379: b8 00 00 00 00 mov $0x0,%eax 804837e: 29 c4 sub %eax,%esp 8048380: 83 ec 04 sub $0x4,%esp 8048383: 6a 0e push $0xe 8048385: 68 9c 84 04 08 push $0x804849c 804838a: 6a 01 push $0x1 804838c: e8 07 ff ff ff call 8048298 8048391: 83 c4 10 add $0x10,%esp 8048394: b8 00 00 00 00 mov $0x0,%eax 8048399: c9 leave 804839a: c3 ret 804839b: 90 nop 804839c: 90 nop 804839d: 90 nop 804839e: 90 nop 804839f: 90 nop ^ ^ ^ | | Assmebly | Opcodes Virtual address when loaded [...snip...] Virtual address are the address the program will think its loaded at that are actually mapped to physical address, this is so every program can start at the same virtual address. The opcodes are the machine code that is generated from the assembly, this is a good way to generate shellcode, although null's are bad times, bad times. Then we have the assembly, looks similar execpt for $-16 is written as $0xfffffff0 which is the twos complement for -16, and is used to clear bits. Also the pushl $14 is the hex version pushl $0xe, the label .LC0 of instruction pushl $.LC0 has been translated to its address namely the line push $0x804849c (remeber before this is the address we examined with (gdb) x/s 0x0804849c, which displayed our string). Thats it for this one. # milw0rm.com [2006-04-08]