Smashing the Modern Stack for Fun and Profit










  When it comes to buffer overflows, 'Smashing The Stack For Fun And Profit' by
Aleph One is still the first resource many people are directed towards, and for
good reason; it is thorough, well written, and chock-full of examples. However,
the GNU C compiler has evolved since 1998, and as a result, many people are left
wondering why they can't get the examples to work for them, or if they do get the
code to work, why they had to make the changes that they did. Having these same
problems myself, and being unable to find an updated version of Aleph One's document
on the web, I set out to identify the source of these variations on my own.

  I have taken the liberty of writing this paper to share my findings with others
who are experiencing the same problems I did, but it is meant only to be a modern
supplement to Aleph One's paper. You should read Smashing The Stack first, as it is
assumed that you understand the concepts and code presented there.

                                  The Problem(s)

  Unfortunately, different versions of gcc act differently when compiling source code
(i.e., the assembly instructions they create from the source code don't always use the
stack in the exact same manner), and many different flavors of Linux use different versions
of gcc; in other words, the number of bytes needed to properly overwrite the return address
changes depending on the version of gcc used to compile the binary. This makes things very
difficult if you are trying to write precise code for use on various flavors of Linux. For
instance, let's try to use one of Aleph One's examples on Debian 3.1 (currently the newest
release of Debian), which comes with gcc version 3.3.5. If you compile Aleph One's example3
(code shown below), you may wonder why it doesn't work for you. Let's identify the possible
reasons why it wouldn't work by looking at the source first:

void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
int *ret;

ret = buffer1 + 12; <----Possible reason #1: Distance from buffer1 to ret
(*ret) += 8; <------Possible reason #2: Number added to ret address

void main() {
int x;

x = 0;
x = 1;

  The object of this code is to change the return address from function() so that it
skips the 'x=1' assignemnt. But, if you compile this on a default Debian 3.1 system, this
does not won't even get a segmentation fault. This tells us that we haven't
pointed ret to the correct address. How can this be? Aleph One states that all local variables
are allocated space on the stack in 4 byte blocks, which today is not always true. The
following is a excerpt from one of GCC's online docs:


The stack is required to be aligned on a 4 byte boundary. On Pentium and PentiumPro, double
and long double values should be aligned to an 8 byte boundary (see `-malign-double') or suffer
significant run time performance penalties. On Pentium III, the Streaming SIMD Extention (SSE)
data type __m128 suffers similar penalties if it is not 16 byte aligned.

To ensure proper alignment of this values on the stack, the stack boundary must be as aligned
as that required by any value stored on the stack. Further, every function must be generated
such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred
stack boundary from a function compiled with a lower preferred stack boundary will most likely
misalign the stack. It is recommended that libraries that use callbacks always use the default

This extra alignment does consume extra stack space. Code that is sensitive to stack space
usage, such as embedded systems and operating system kernels, may want to reduce the preferred
alignment to `-mpreferred-stack-boundary=2'.


  Using the GNU debugger and a little experimentation, I verified that gcc 3.3.5 on Debian 3.1
(have not tested this version of gcc on any other version/flavor of Linux) allocates memory on
the stack in 16 byte blocks. So, instead of buffer1 taking up 8 bytes and buffer2 taking up 12
bytes, they each take up 16 bytes, doubling the actual size of buffer1. But even accounting for
this will not produce our desired result. Compare the below memory map of the stack from Aleph
One's paper, and the memory map which is actually present in our program:

Aleph One:

bottom of                                                                       top of
memory                                                                          memory
               buffer2      buffer1        sfp    ret    a      b      c
<------    [            ][              ][     ][     ][     ][     ][     ]

top of                                                                           bottom of
stack                                                                            stack

Our Program:

bottom of                                                                        top of
memory                                                                           memory
                buffer2    buffer1      Slack    sfp    ret     a      b       c
<------    [            ][            ][       ][     ][     ][     ][     ][     ]

top of                                                                           bottom of
stack                                                                            stack

  Notice the addition of the 'Slack' in our program. In gcc 3.3.5 this random buffer is added
between the last local variable and the sfp, and is nearly invariably 7 bytes in length. So now
our total distance from buffer1 to the return address can be calculated:

                                        buffer1   = 16
                                        Slack     =  7
                                        sfp       =  4
                                        TOTAL     = 27

  So we change 'ret = buffer1 + 12' to 'ret = buffer1 + 27', recompile, and...we get a
segmentation fault! At least we're closer. Now, let's take a look at Possible Reason #2 by
comparing the disassembly of Aleph One's binary to that of ours:

Aleph One:

(gdb) disassemble main
Dump of assembler code for function main:
0x8000490 : pushl %ebp
0x8000491 : movl %esp,%ebp
0x8000493 : subl $0x4,%esp
0x8000496 : movl $0x0,0xfffffffc(%ebp)
0x800049d : pushl $0x3
0x800049f : pushl $0x2
0x80004a1 : pushl $0x1
0x80004a3 : call 0x8000470 <----Call to function
0x80004a8 : addl $0xc,%esp
0x80004ab : movl $0x1,0xfffffffc(%ebp) <----The instruction we want to skip
0x80004b2 : movl 0xfffffffc(%ebp),%eax
0x80004b5 : pushl %eax
0x80004b6 : pushl $0x80004f8
0x80004bb : call 0x8000378
0x80004c0 : addl $0x8,%esp
0x80004c3 : movl %ebp,%esp
0x80004c5 : popl %ebp
0x80004c6 : ret
0x80004c7 : nop

Our Program:

(gdb) disassemble main
Dump of assembler code for function main:
0x080483a2 : push %ebp
0x080483a3 : mov %esp,%ebp
0x080483a5 : sub $0x18,%esp
0x080483a8 : and $0xfffffff0,%esp
0x080483ab : mov $0x0,%eax
0x080483b0 : sub %eax,%esp
0x080483b2 : movl $0x0,0xfffffffc(%ebp)
0x080483b9 : movl $0x3,0x8(%esp)
0x080483c1 : movl $0x2,0x4(%esp)
0x080483c9 : movl $0x1,(%esp)
0x080483d0 : call 0x8048384     <-----Call to function
0x080483d5 : movl $0x1,0xfffffffc(%ebp) <-----The instruction we want to skip
0x080483dc : mov 0xfffffffc(%ebp),%eax
0x080483df : mov %eax,0x4(%esp)
0x080483e3 : movl $0x8048514,(%esp)
0x080483ea : call 0x80482b0 <_init+56>
0x080483ef : leave
0x080483f0 : ret
0x080483f1 : nop
0x080483f2 : nop
0x080483f3 : nop
0x080483f4 : nop

  Due to differences in the way our more modern gcc compiler translates the source code
into assembly instructions, we need to adjust the return address to match our code. The
return address pushed onto the stack is 0x080483d5, but we want to skip that and go to
0x080483dc. A little math tells us the distance is 7 bytes as opposed to Aleph's 8 bytes.
Change '(*ret) += 8' to '(*ret) += 7', and (if you're running a default Debian 3.1 system)
x should stay set to 0.

                                   More Problems...

  What if you aren't using Debian 3.1? Let's say you're using the increasingly popular Ubuntu
5.10. There are two key differences between Ubuntu and Debian which we are concerned with:
A) Ubuntu comes with gcc version 4.0.5, and B) Ubuntu uses Linux kernel 2.6.12 while Debian
uses 2.4.27. The kernel version is particularly important here, because starting with kernel
2.6.12, Linux has stack protection turned on by default. So, if you're trying to learn about
buffer overflows, stick with a 2.4 kernel so you can get practice with some of the easier
exploits first, then move on to some more advanced methods and practice using them to
circumvent the stack protection.

  But since this article's primary focus is to look at how different versions of gcc and Linux
utilize stack memory, let's look at how gcc 4.0.5 in Ubuntu allocates stack space for internal
variables and how much (if any) space is used up by the slack buffer. I used the following program
to do a little experimentation:

int main(int argc, char *argv[])
char buffer1[5];
char buffer2[10];

printf("\nbuff1: %s\nbuff2: %s\n\n",buffer1,buffer2);
return 0;

  We can open this program up in gdb and look at the memory addresses for buffer1 and buffer2
to determine how far apart their starting addresses are, and, subsequently, how much space gcc
has allocated for them on the stack:

(gdb) disassemble main
Dump of assembler code for function main:
0x080483b8 : push %ebp
0x080483b9 : mov %esp,%ebp
0x080483bb : sub $0x18,%esp <--------Allocates 24 bytes on the stack
0x080483be : and $0xfffffff0,%esp
0x080483c1 : mov $0x0,%eax
0x080483c6 : add $0xf,%eax
0x080483c9 : add $0xf,%eax
0x080483cc : shr $0x4,%eax
0x080483cf : shl $0x4,%eax
0x080483d2 : sub %eax,%esp
0x080483d4 : mov 0xc(%ebp),%eax
0x080483d7 : add $0x4,%eax
0x080483da : mov (%eax),%eax
0x080483dc : sub $0x8,%esp
0x080483df : push %eax
0x080483e0 : lea 0xfffffff1(%ebp),%eax <-------- Address of buffer2
0x080483e3 : push %eax
0x080483e4 : call 0x80482e8 <_init+72>
0x080483e9 : add $0x10,%esp
0x080483ec : sub $0x4,%esp
0x080483ef : lea 0xfffffff1(%ebp),%eax
0x080483f2 : push %eax
0x080483f3 : lea 0xfffffffb(%ebp),%eax <---------Address of buffer1
0x080483f6 : push %eax
0x080483f7 : push $0x8048518
0x080483fc : call 0x80482d8 <_init+56>
0x08048401 : add $0x10,%esp
0x08048404 : mov $0x0,%eax
 0x08048409 : leave
0x0804840a : ret

  The difference between the beginning address of buffer2  (0xfffffff1) and buffer1
(0xfffffffb) is 10 bytes, indicating that buffer2 has been given 10 bytes of room on the
stack. We can also set argv[1] to different lengths until we achieve a segmentation fault,
observing how many bytes total are needed to get to the return address, allowing us to
calculate the size of the slack buffer and buffer1. Let's start by supplying 11 bytes of data
(buffer2 is allocated 10 bytes in the source code) and see what we get:

root@ubuntu:~/bof#./test aaaaaaaaaa

buff1: a
buff2: aaaaaaaaaa

  This confirms that only 10 bytes were allotted for buffer2, because the 11th byte was placed in the
address space for buffer1. Let's assume that buffer1 will be handled the same way and given 5
bytes of stack space, and that there is no slack buffer. This means that to overflow into the return
address we will need to supply 20 bytes of data:

                                               buffer2   = 10
                                               buffer1   =  5
                                               sfp       =  4
                                               TOTAL     = 19

...and the 20th byte should overwrite the first byte of the return address. Let's double check
by running it in gdb:

(gdb)run aaaaaaaaaaaaaaaaaaaa
Starting program: /root/bof/test aaaaaaaaaaaaaaaaaaaa

buff1: aaaaaaaaaa
buff2: aaaaaaaaaaaaaaaaaaaa

Program received signal SIGSEGV, Segmentation fault.
0xb7ea0061 in ?? () from /lib/tls/i686/cmov/

  We guessed correctly, and the first byte of the return address is set to 0x61 (ASCII 'a').
So, our memory map of the stack for Ubuntu 5.10 with gcc version 4.0.5 looks like:

bottom of                                                              top of
memory                                                                 memory
                              buffer2       buffer1    sfp   ret
<------                    [            ][           ][     ][     ]

top of                                                                  bottom of
stack                                                                   stack

  Where buffer2 = 10 bytes, buffer1 = 5 bytes and sfp = 4 bytes. Note that the stack is still
aligned on a 4 byte boundary because 24 bytes have been allocated on the stack
('sub $0x18, %esp').


  Hopefully you now have a better idea of how differences in compiler versions affect stack
and variable size, and how to take these changes into account when writing an overflow exploit.
The best way to adjust for these changes over multiple systems is to add a little NOP padding
at the beginning of your shellcode, but if you need more precise calculations, duplicate the
target system as closely as possible and start experimenting.

  Please contact me with any questions, comments, corrections, etc: craig [at]

# [2006-09-04]