|=----------------------=[ Functions and Linux Assembly ]=--------------------=|
|=----------------------------------------------------------------------------=|
|=-------------------------=[ lhall@telegenetic.net ]=------------------------=|


---[ Intro

Start off with a simple C program with a super simple function:

entropy@phalaris {~/asm/functions} cat function.c

void
functOne (void) {
   write(1,"in functOne\n",12); /* write out our string */
   return;                      /* and just return */
}

int
main (void) {
   write(1,"in _start\n",12); /* call main _start, using gcc */
   functOne();                /* call functOne */
   write(1,"in _start\n",12); /* call main _start, using gcc */
   exit(0);                   /* call exit return value 0 */
}

All this program does is write a string telling us we are in the function
main (we call it _start as we will be using `as` not `gcc` later), call function
functOne which writes a string, and return back to main to write the string
we wrote before the call again, then it calls exit with return value 0.

You could generate the assembly here from gcc (with gcc -S -O0 function.c) but
the asm that is output is a bit confusing. For instance if you look at the asm
generated by `gcc` you'll see it reserving space for local variables:

[...snip...]

functOne:
        pushl   %ebp          /* save the base pointer */
        movl    %esp, %ebp    /* make the stack pointer the base pointer */
        subl    $8, %esp      /* <--- subtract 8 from the stack pointer */
        subl    $4, %esp      /* <--- subtract another 4 from the stack pointer */
        pushl   $12           /* string length */
        pushl   $.LC0         /* address of string */
        pushl   $1            /* to stdout */
        call    write         /* call libc write */
        addl    $16, %esp     /* fix up stack (4 pushl's)
        leave                 /* *leave */
        ret                   /* return to caller */

[...snip...]

Notes:

leave,  also known as the procedure epilog, is the same as:

 movl  %ebp, %esp
 popl  %ebp

enter, also known as the procedure prolog, is the same as:

 pushl   %ebp
 movl    %esp, %ebp

I suppose it keeps the stack cleaner but the `addl    $16, %esp` seems to take
care of that. Also while learing I think its better to do everything yourself,
and not rely on libc calls, instead using syscalls. If your ever confused or 
cant figure a part of asm out this is a good way to at least get an idea of 
what to do.

So heres out simple function call program:

entropy@phalaris {~/asm/functions} cat funct.s

.section .data

.equ SYS_WRITE, 4
.equ SYS_EXIT, 1
.equ LINUX_KERNEL, 0x80
.equ STDOUT, 1

_startStr:
   .ascii "in _start\n\0"
functStr:
   .ascii "in functOne\n\0"

.section .text

.type functOne, @function
functOne:
                          # begin procedure prolog
   pushl %ebp             # save the base pointer
   movl  %esp, %ebp       # make the stack pointer the base pointer
                          # end procedure prolog
   movl  $SYS_WRITE, %eax # mov WRITE(4) into eax
   movl  $12, %edx        # length of the string
   movl  $functStr, %ecx  # address of our string
   movl  $STDOUT, %ebx    # writing to stdout
   int   $LINUX_KERNEL    # call the kernel
                          # begin procedure epilog
   movl  %ebp, %esp       # restore the stack pointer
   popl  %ebp             # restore the base pointer
   ret

.globl _start
_start:
   nop                    # so our breakpoint will break in gdb
   movl $SYS_WRITE, %eax  # mov WRITE(4) into eax
   movl $10, %edx         # length of the string
   movl $_startStr, %ecx  # address of our string
   movl $STDOUT, %ebx     # writing to stdout
   int  $LINUX_KERNEL     # call the kernel
   call functOne          # call functOne
   movl $SYS_WRITE, %eax  # mov WRITE(4) into eax
   movl $10, %edx         # length of the string
   movl $_startStr, %ecx  # address of our string
   movl $STDOUT, %ebx     # writing to stdout
   int  $LINUX_KERNEL     # call the kernel
   movl $SYS_EXIT, %eax   # mov EXIT(1) into eax
   movl $0, %ebx          # 0 is the return value
   int  $LINUX_KERNEL     # call the kernel

A couple things to notice is we use .equ, equates, to make the code a bit 
easier to read, these are similar to #define's in C. Again this is pretty 
simple all we do is write a string in _start, call functOne which prints a 
string, we return print the same string as before in _start and then call 
exit with return value 0. Everything should be readable while a few thigns 
need explination.


---[ Call

The instruction call is how you call functions. What this does is

1) Push the address of the next instruction, the return address, onto the stack.
2) Points %eip to the start of the function, the functions symbol.


---[ Procedure Prolog

Ok so our functions have no arguments or parameters, they are just void. The
first thing a function has to do is called the procedure prolog. It first
saves the current base pointer (ebp) with the instruction pushl %ebp (remember
ebp is the register used for accessing function parameters and local variables).
Now it copies the stack pointer (esp) to the base pointer (ebp) with the
instruction movl %esp, %ebp. This allows you to access the function parameters 
as indexes from the base pointer. Local variables are always a subtraction from 
ebp, such as -4(%ebp) or (%ebp)-4 for the first local variable, the return value 
is always at 4(%ebp) or (%ebp)+4, each parameter or argument is at N*4+4(%ebp) 
such as 8(%ebp) for the first argument while the old ebp is at (%ebp). A more 
visual diagram of this may be clearer:

 argv[1]           12(%ebp)
 argv[0]           8(%ebp)
 return address    4(%ebp)
 old ebp           (%ebp)
 local variable 1  -4(%ebp)
 local variable 2  -8(%ebp)

Note:
%ebp is the value at %epx, (%ebp) is the address of %ebp.

Moving the stack pointer into the base pointer allows the base pointer to be a
constant reference to the stack frame while in a function. We could not use esp
in a function as we will most likely change it during the execution of the
function itself.

---[ Procedure Epilog

The procedure epilog must do the oppisite of the prolog before a function can
exit, so everything is retured to how it was at the time of the call. With out
restoring the stack frame the ret instruction would have an incorrect value to
return to because the pushed return address wouldnt be at the top of the stack.
To reset the stack pointer we do:

   movl %ebp, %esp  # restore the stack pointer
   popl %ebp        # pop the old ebp back into ebp
   ret              # grab the return address from the stack and jmp to it

Assemble and link funct.s and open it in gdb.

entropy@phalaris {~/asm/functions} as -g funct.s -o funct.o

entropy@phalaris {~/asm/functions} ld funct.o -o funct

entropy@phalaris {~/asm/functions} gdb funct
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library
"/lib/libthread_db.so.1".

(gdb) list functOne
13      .section .text
14
15      .type functOne, @function
16      functOne:
17                                # begin procedure prolog
18         pushl %ebp             # save the base pointer
19         movl  %esp, %ebp       # make the stack pointer the base pointer
20                                # end procedure prolog
21         movl  $SYS_WRITE, %eax # mov WRITE(4) into eax
22         movl  $12, %edx        # length of the string
(gdb) <enter>
23         movl  $functStr, %ecx  # address of our string
24         movl  $STDOUT, %ebx    # writing to stdout
25         int   $LINUX_KERNEL    # call the kernel
26                                # begin procedure epilog
27         movl  %ebp, %esp       # restore the stack pointer
28         popl  %ebp             # restore the base pointer
29         ret
(gdb) list _start
28         popl  %ebp             # restore the base pointer
29         ret
30
31      .globl _start
32      _start:
33         nop                    # so our breakpoint will break in gdb
34         movl $SYS_WRITE, %eax  # mov WRITE(4) into eax
35         movl $10, %edx         # length of the string
36         movl $_startStr, %ecx  # address of our string
37         movl $STDOUT, %ebx     # writing to stdout
(gdb) <enter>
38         int  $LINUX_KERNEL     # call the kernel
39         call functOne          # call functOne
40         movl $SYS_WRITE, %eax  # mov WRITE(4) into eax
41         movl $10, %edx         # length of the string
42         movl $_startStr, %ecx  # address of our string
43         movl $STDOUT, %ebx     # writing to stdout
44         int  $LINUX_KERNEL     # call the kernel
45         movl $SYS_EXIT, %eax   # mov EXIT(1) into eax
46         movl $0, %ebx          # 0 is the return value
47         int  $LINUX_KERNEL     # call the kernel

Break at the address of _start + 1.

(gdb) break *_start+1
Breakpoint 1 at 0x80480b2: file funct.s, line 34.

Start the program executing.

(gdb) run
Starting program: /home/entropy/asm/functions/funct

Breakpoint 1, _start () at funct.s:34
34         movl $SYS_WRITE, %eax  # mov WRITE(4) into eax
Current language:  auto; currently asm

Breakpoint was hit. Up until the call everthing is pretty clear, I'm just going
to step until the call.

(gdb) step
_start () at funct.s:35
35         movl $10, %edx         # length of the string
(gdb) step
_start () at funct.s:36
36         movl $_startStr, %ecx  # address of our string
(gdb) step
_start () at funct.s:37
37         movl $STDOUT, %ebx     # writing to stdout
(gdb) step
_start () at funct.s:38
38         int  $LINUX_KERNEL     # call the kernel
(gdb) step
in _start
_start () at funct.s:39
39         call functOne          # call functOn

At this point we have written out the string "in _start\n", and the next
instruction will be our call functOne. Disassemble _start and see what the
address of the call functOne is at, look at the next address and that is the
return address that the call instruction should push onto the stack.

(gdb) disassemble _start
Dump of assembler code for function _start:
0x080480b1 <_start+0>:  nop
0x080480b2 <_start+1>:  mov    $0x4,%eax
0x080480b7 <_start+6>:  mov    $0xa,%edx
0x080480bc <_start+11>: mov    $0x80490f0,%ecx
0x080480c1 <_start+16>: mov    $0x1,%ebx
0x080480c6 <_start+21>: int    $0x80
0x080480c8 <_start+23>: call   0x8048094 <functOne>
0x080480cd <_start+28>: mov    $0x4,%eax
0x080480d2 <_start+33>: mov    $0xa,%edx
0x080480d7 <_start+38>: mov    $0x80490f0,%ecx
0x080480dc <_start+43>: mov    $0x1,%ebx
0x080480e1 <_start+48>: int    $0x80
0x080480e3 <_start+50>: mov    $0x1,%eax
0x080480e8 <_start+55>: mov    $0x0,%ebx
0x080480ed <_start+60>: int    $0x80
End of assembler dump.

The address of the call is at 0x080480c8, as shown by the line
0x080480c8 <_start+23>: call   0x8048094 <functOne>, while the address of the
symbol functOne is at 0x8048094. The next address after 0x080480c8 is 0x080480cd,
so this is what we should see at the top of the stack immeditly after the call
instruction is executed.

(gdb) step
functOne () at funct.s:18

Our call has been executed, take a look at the registers.
We see esp is pointing to the address 0xbfdc17ec, take a look to see what that
points too.

18         pushl %ebp             # save the base pointer
(gdb) info reg
eax            0xa      10
ecx            0x80490f0        134516976
edx            0xa      10
ebx            0x1      1
esp            0xbfdc17ec       0xbfdc17ec
ebp            0x0      0x0
esi            0x0      0
edi            0x0      0
eip            0x8048094        0x8048094
eflags         0x246    582
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x0      0

Examine in hex the address at 0xbfdc17ec.

(gdb) x/x 0xbfdc17ec
0xbfdc17ec:     0x080480cd

And its the return address seen from the disassembly. Everything in the function
should now be understandable so just step through it.

(gdb) step
functOne () at funct.s:19
19         movl  %esp, %ebp       # make the stack pointer the base pointer
(gdb) step
functOne () at funct.s:21
21         movl  $SYS_WRITE, %eax # mov WRITE(4) into eax
(gdb) step
22         movl  $12, %edx        # length of the string
(gdb) step
23         movl  $functStr, %ecx  # address of our string
(gdb) step
24         movl  $STDOUT, %ebx    # writing to stdout
(gdb) step
25         int   $LINUX_KERNEL    # call the kernel
(gdb) step
in functOne
27         movl  %ebp, %esp       # restore the stack pointer
(gdb) step
28         popl  %ebp             # restore the base pointer
(gdb) step
functOne () at funct.s:29
29         ret

Here the instruction ret is going to jmp (%esp), so take a look at what the 
value is at the address of %esp.

(gdb) x/x $esp
0xbfdc17ec:     0x080480cd

Return address from before, so we will return to the instruction right after 
the call to functOne.

(gdb) step
_start () at funct.s:40
40         movl $SYS_WRITE, %eax  # mov WRITE(4) into eax
(gdb) list
35         movl $10, %edx         # length of the string
36         movl $_startStr, %ecx  # address of our string
37         movl $STDOUT, %ebx     # writing to stdout
38         int  $LINUX_KERNEL     # call the kernel
39         call functOne          # call functOne
40         movl $SYS_WRITE, %eax  # mov WRITE(4) into eax
41         movl $10, %edx         # length of the string
42         movl $_startStr, %ecx  # address of our string
43         movl $STDOUT, %ebx     # writing to stdout
44         int  $LINUX_KERNEL     # call the kernel

(gdb)

You can see we are at line 40 now, and in the disassembly:
(gdb) disassemble _start
Dump of assembler code for function _start:
0x080480b1 <_start+0>:  nop
0x080480b2 <_start+1>:  mov    $0x4,%eax
0x080480b7 <_start+6>:  mov    $0xa,%edx
0x080480bc <_start+11>: mov    $0x80490f0,%ecx
0x080480c1 <_start+16>: mov    $0x1,%ebx
0x080480c6 <_start+21>: int    $0x80
0x080480c8 <_start+23>: call   0x8048094 <functOne>
0x080480cd <_start+28>: mov    $0x4,%eax
0x080480d2 <_start+33>: mov    $0xa,%edx
0x080480d7 <_start+38>: mov    $0x80490f0,%ecx
0x080480dc <_start+43>: mov    $0x1,%ebx
0x080480e1 <_start+48>: int    $0x80
0x080480e3 <_start+50>: mov    $0x1,%eax
0x080480e8 <_start+55>: mov    $0x0,%ebx
0x080480ed <_start+60>: int    $0x80
End of assembler dump.

we are at the line 0x080480cd <_start+28>: mov    $0x4,%eax.

The rest is pretty easy, we just write out our string and call exit.

(gdb) step
_start () at funct.s:41
41         movl $10, %edx         # length of the string
(gdb)
_start () at funct.s:42
42         movl $_startStr, %ecx  # address of our string
(gdb)
_start () at funct.s:43
43         movl $STDOUT, %ebx     # writing to stdout
(gdb)
_start () at funct.s:44
44         int  $LINUX_KERNEL     # call the kernel
(gdb)
in _start
_start () at funct.s:45
45         movl $SYS_EXIT, %eax   # mov EXIT(1) into eax
(gdb)
_start () at funct.s:46
46         movl $0, %ebx          # 0 is the return value
(gdb)
_start () at funct.s:47
47         int  $LINUX_KERNEL     # call the kernel
(gdb)

Program exited normally.

# milw0rm.com [2006-04-08]