|=----------------------=[ Functions and Linux Assembly ]=--------------------=| |=----------------------------------------------------------------------------=| |=-------------------------=[ lhall@telegenetic.net ]=------------------------=| ---[ Intro Start off with a simple C program with a super simple function: entropy@phalaris {~/asm/functions} cat function.c void functOne (void) { write(1,"in functOne\n",12); /* write out our string */ return; /* and just return */ } int main (void) { write(1,"in _start\n",12); /* call main _start, using gcc */ functOne(); /* call functOne */ write(1,"in _start\n",12); /* call main _start, using gcc */ exit(0); /* call exit return value 0 */ } All this program does is write a string telling us we are in the function main (we call it _start as we will be using `as` not `gcc` later), call function functOne which writes a string, and return back to main to write the string we wrote before the call again, then it calls exit with return value 0. You could generate the assembly here from gcc (with gcc -S -O0 function.c) but the asm that is output is a bit confusing. For instance if you look at the asm generated by `gcc` you'll see it reserving space for local variables: [...snip...] functOne: pushl %ebp /* save the base pointer */ movl %esp, %ebp /* make the stack pointer the base pointer */ subl $8, %esp /* <--- subtract 8 from the stack pointer */ subl $4, %esp /* <--- subtract another 4 from the stack pointer */ pushl $12 /* string length */ pushl $.LC0 /* address of string */ pushl $1 /* to stdout */ call write /* call libc write */ addl $16, %esp /* fix up stack (4 pushl's) leave /* *leave */ ret /* return to caller */ [...snip...] Notes: leave, also known as the procedure epilog, is the same as: movl %ebp, %esp popl %ebp enter, also known as the procedure prolog, is the same as: pushl %ebp movl %esp, %ebp I suppose it keeps the stack cleaner but the `addl $16, %esp` seems to take care of that. Also while learing I think its better to do everything yourself, and not rely on libc calls, instead using syscalls. If your ever confused or cant figure a part of asm out this is a good way to at least get an idea of what to do. So heres out simple function call program: entropy@phalaris {~/asm/functions} cat funct.s .section .data .equ SYS_WRITE, 4 .equ SYS_EXIT, 1 .equ LINUX_KERNEL, 0x80 .equ STDOUT, 1 _startStr: .ascii "in _start\n\0" functStr: .ascii "in functOne\n\0" .section .text .type functOne, @function functOne: # begin procedure prolog pushl %ebp # save the base pointer movl %esp, %ebp # make the stack pointer the base pointer # end procedure prolog movl $SYS_WRITE, %eax # mov WRITE(4) into eax movl $12, %edx # length of the string movl $functStr, %ecx # address of our string movl $STDOUT, %ebx # writing to stdout int $LINUX_KERNEL # call the kernel # begin procedure epilog movl %ebp, %esp # restore the stack pointer popl %ebp # restore the base pointer ret .globl _start _start: nop # so our breakpoint will break in gdb movl $SYS_WRITE, %eax # mov WRITE(4) into eax movl $10, %edx # length of the string movl $_startStr, %ecx # address of our string movl $STDOUT, %ebx # writing to stdout int $LINUX_KERNEL # call the kernel call functOne # call functOne movl $SYS_WRITE, %eax # mov WRITE(4) into eax movl $10, %edx # length of the string movl $_startStr, %ecx # address of our string movl $STDOUT, %ebx # writing to stdout int $LINUX_KERNEL # call the kernel movl $SYS_EXIT, %eax # mov EXIT(1) into eax movl $0, %ebx # 0 is the return value int $LINUX_KERNEL # call the kernel A couple things to notice is we use .equ, equates, to make the code a bit easier to read, these are similar to #define's in C. Again this is pretty simple all we do is write a string in _start, call functOne which prints a string, we return print the same string as before in _start and then call exit with return value 0. Everything should be readable while a few thigns need explination. ---[ Call The instruction call is how you call functions. What this does is 1) Push the address of the next instruction, the return address, onto the stack. 2) Points %eip to the start of the function, the functions symbol. ---[ Procedure Prolog Ok so our functions have no arguments or parameters, they are just void. The first thing a function has to do is called the procedure prolog. It first saves the current base pointer (ebp) with the instruction pushl %ebp (remember ebp is the register used for accessing function parameters and local variables). Now it copies the stack pointer (esp) to the base pointer (ebp) with the instruction movl %esp, %ebp. This allows you to access the function parameters as indexes from the base pointer. Local variables are always a subtraction from ebp, such as -4(%ebp) or (%ebp)-4 for the first local variable, the return value is always at 4(%ebp) or (%ebp)+4, each parameter or argument is at N*4+4(%ebp) such as 8(%ebp) for the first argument while the old ebp is at (%ebp). A more visual diagram of this may be clearer: argv[1] 12(%ebp) argv[0] 8(%ebp) return address 4(%ebp) old ebp (%ebp) local variable 1 -4(%ebp) local variable 2 -8(%ebp) Note: %ebp is the value at %epx, (%ebp) is the address of %ebp. Moving the stack pointer into the base pointer allows the base pointer to be a constant reference to the stack frame while in a function. We could not use esp in a function as we will most likely change it during the execution of the function itself. ---[ Procedure Epilog The procedure epilog must do the oppisite of the prolog before a function can exit, so everything is retured to how it was at the time of the call. With out restoring the stack frame the ret instruction would have an incorrect value to return to because the pushed return address wouldnt be at the top of the stack. To reset the stack pointer we do: movl %ebp, %esp # restore the stack pointer popl %ebp # pop the old ebp back into ebp ret # grab the return address from the stack and jmp to it Assemble and link funct.s and open it in gdb. entropy@phalaris {~/asm/functions} as -g funct.s -o funct.o entropy@phalaris {~/asm/functions} ld funct.o -o funct entropy@phalaris {~/asm/functions} gdb funct GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) list functOne 13 .section .text 14 15 .type functOne, @function 16 functOne: 17 # begin procedure prolog 18 pushl %ebp # save the base pointer 19 movl %esp, %ebp # make the stack pointer the base pointer 20 # end procedure prolog 21 movl $SYS_WRITE, %eax # mov WRITE(4) into eax 22 movl $12, %edx # length of the string (gdb) 23 movl $functStr, %ecx # address of our string 24 movl $STDOUT, %ebx # writing to stdout 25 int $LINUX_KERNEL # call the kernel 26 # begin procedure epilog 27 movl %ebp, %esp # restore the stack pointer 28 popl %ebp # restore the base pointer 29 ret (gdb) list _start 28 popl %ebp # restore the base pointer 29 ret 30 31 .globl _start 32 _start: 33 nop # so our breakpoint will break in gdb 34 movl $SYS_WRITE, %eax # mov WRITE(4) into eax 35 movl $10, %edx # length of the string 36 movl $_startStr, %ecx # address of our string 37 movl $STDOUT, %ebx # writing to stdout (gdb) 38 int $LINUX_KERNEL # call the kernel 39 call functOne # call functOne 40 movl $SYS_WRITE, %eax # mov WRITE(4) into eax 41 movl $10, %edx # length of the string 42 movl $_startStr, %ecx # address of our string 43 movl $STDOUT, %ebx # writing to stdout 44 int $LINUX_KERNEL # call the kernel 45 movl $SYS_EXIT, %eax # mov EXIT(1) into eax 46 movl $0, %ebx # 0 is the return value 47 int $LINUX_KERNEL # call the kernel Break at the address of _start + 1. (gdb) break *_start+1 Breakpoint 1 at 0x80480b2: file funct.s, line 34. Start the program executing. (gdb) run Starting program: /home/entropy/asm/functions/funct Breakpoint 1, _start () at funct.s:34 34 movl $SYS_WRITE, %eax # mov WRITE(4) into eax Current language: auto; currently asm Breakpoint was hit. Up until the call everthing is pretty clear, I'm just going to step until the call. (gdb) step _start () at funct.s:35 35 movl $10, %edx # length of the string (gdb) step _start () at funct.s:36 36 movl $_startStr, %ecx # address of our string (gdb) step _start () at funct.s:37 37 movl $STDOUT, %ebx # writing to stdout (gdb) step _start () at funct.s:38 38 int $LINUX_KERNEL # call the kernel (gdb) step in _start _start () at funct.s:39 39 call functOne # call functOn At this point we have written out the string "in _start\n", and the next instruction will be our call functOne. Disassemble _start and see what the address of the call functOne is at, look at the next address and that is the return address that the call instruction should push onto the stack. (gdb) disassemble _start Dump of assembler code for function _start: 0x080480b1 <_start+0>: nop 0x080480b2 <_start+1>: mov $0x4,%eax 0x080480b7 <_start+6>: mov $0xa,%edx 0x080480bc <_start+11>: mov $0x80490f0,%ecx 0x080480c1 <_start+16>: mov $0x1,%ebx 0x080480c6 <_start+21>: int $0x80 0x080480c8 <_start+23>: call 0x8048094 0x080480cd <_start+28>: mov $0x4,%eax 0x080480d2 <_start+33>: mov $0xa,%edx 0x080480d7 <_start+38>: mov $0x80490f0,%ecx 0x080480dc <_start+43>: mov $0x1,%ebx 0x080480e1 <_start+48>: int $0x80 0x080480e3 <_start+50>: mov $0x1,%eax 0x080480e8 <_start+55>: mov $0x0,%ebx 0x080480ed <_start+60>: int $0x80 End of assembler dump. The address of the call is at 0x080480c8, as shown by the line 0x080480c8 <_start+23>: call 0x8048094 , while the address of the symbol functOne is at 0x8048094. The next address after 0x080480c8 is 0x080480cd, so this is what we should see at the top of the stack immeditly after the call instruction is executed. (gdb) step functOne () at funct.s:18 Our call has been executed, take a look at the registers. We see esp is pointing to the address 0xbfdc17ec, take a look to see what that points too. 18 pushl %ebp # save the base pointer (gdb) info reg eax 0xa 10 ecx 0x80490f0 134516976 edx 0xa 10 ebx 0x1 1 esp 0xbfdc17ec 0xbfdc17ec ebp 0x0 0x0 esi 0x0 0 edi 0x0 0 eip 0x8048094 0x8048094 eflags 0x246 582 cs 0x73 115 ss 0x7b 123 ds 0x7b 123 es 0x7b 123 fs 0x0 0 gs 0x0 0 Examine in hex the address at 0xbfdc17ec. (gdb) x/x 0xbfdc17ec 0xbfdc17ec: 0x080480cd And its the return address seen from the disassembly. Everything in the function should now be understandable so just step through it. (gdb) step functOne () at funct.s:19 19 movl %esp, %ebp # make the stack pointer the base pointer (gdb) step functOne () at funct.s:21 21 movl $SYS_WRITE, %eax # mov WRITE(4) into eax (gdb) step 22 movl $12, %edx # length of the string (gdb) step 23 movl $functStr, %ecx # address of our string (gdb) step 24 movl $STDOUT, %ebx # writing to stdout (gdb) step 25 int $LINUX_KERNEL # call the kernel (gdb) step in functOne 27 movl %ebp, %esp # restore the stack pointer (gdb) step 28 popl %ebp # restore the base pointer (gdb) step functOne () at funct.s:29 29 ret Here the instruction ret is going to jmp (%esp), so take a look at what the value is at the address of %esp. (gdb) x/x $esp 0xbfdc17ec: 0x080480cd Return address from before, so we will return to the instruction right after the call to functOne. (gdb) step _start () at funct.s:40 40 movl $SYS_WRITE, %eax # mov WRITE(4) into eax (gdb) list 35 movl $10, %edx # length of the string 36 movl $_startStr, %ecx # address of our string 37 movl $STDOUT, %ebx # writing to stdout 38 int $LINUX_KERNEL # call the kernel 39 call functOne # call functOne 40 movl $SYS_WRITE, %eax # mov WRITE(4) into eax 41 movl $10, %edx # length of the string 42 movl $_startStr, %ecx # address of our string 43 movl $STDOUT, %ebx # writing to stdout 44 int $LINUX_KERNEL # call the kernel (gdb) You can see we are at line 40 now, and in the disassembly: (gdb) disassemble _start Dump of assembler code for function _start: 0x080480b1 <_start+0>: nop 0x080480b2 <_start+1>: mov $0x4,%eax 0x080480b7 <_start+6>: mov $0xa,%edx 0x080480bc <_start+11>: mov $0x80490f0,%ecx 0x080480c1 <_start+16>: mov $0x1,%ebx 0x080480c6 <_start+21>: int $0x80 0x080480c8 <_start+23>: call 0x8048094 0x080480cd <_start+28>: mov $0x4,%eax 0x080480d2 <_start+33>: mov $0xa,%edx 0x080480d7 <_start+38>: mov $0x80490f0,%ecx 0x080480dc <_start+43>: mov $0x1,%ebx 0x080480e1 <_start+48>: int $0x80 0x080480e3 <_start+50>: mov $0x1,%eax 0x080480e8 <_start+55>: mov $0x0,%ebx 0x080480ed <_start+60>: int $0x80 End of assembler dump. we are at the line 0x080480cd <_start+28>: mov $0x4,%eax. The rest is pretty easy, we just write out our string and call exit. (gdb) step _start () at funct.s:41 41 movl $10, %edx # length of the string (gdb) _start () at funct.s:42 42 movl $_startStr, %ecx # address of our string (gdb) _start () at funct.s:43 43 movl $STDOUT, %ebx # writing to stdout (gdb) _start () at funct.s:44 44 int $LINUX_KERNEL # call the kernel (gdb) in _start _start () at funct.s:45 45 movl $SYS_EXIT, %eax # mov EXIT(1) into eax (gdb) _start () at funct.s:46 46 movl $0, %ebx # 0 is the return value (gdb) _start () at funct.s:47 47 int $LINUX_KERNEL # call the kernel (gdb) Program exited normally. # milw0rm.com [2006-04-08]