Reconstructing Binaries to C for Beginners

EDB-ID:

13216

CVE:

N/A

Author:

mercy

Type:

papers

Platform:

Multiple

Published:

2006-04-08

        DTORS   SECURITY  RESEARCH
                (DSR)

        Author: mercy
        Title: Reconstructing binaries to C for beginners.
        Date: 20/01/2003

An example of reverse engineering code:
Hopefully at the end of this small disassembly dump, we will have been able to analyze what is happening in the assembly, and construct accurate C code to complement it.

To start with, we will talk about the allocation of local variables.
In our C programs we always use variables, any program does, and for this point
we must take carefull note as to the space allocated, and the data type
it may have been.

push %ebp
mov %esp,%ebp
sub $0x18,%esp

These three steps are known as the procedure prolog, and are essential in
the operations of your program.

push %ebp 
This is pushing your base pointer onto the stack, this is so we can 
retrieve it at program exit, and restore the program back to its original
state before return.

mov %esp,%ebp
This is moving your current stack pointer to become your base pointer for
your program.

sub $0x18,%esp
This is actually increasing your stack pointer by 24 bytes. (Stack is growing down)
This is probably one of the most important things to note, because you will 
later see your local variables being accessed via a command such as:

movl   $0x0,0xfffffffc(%ebp)

It is important to note, the difference between:
mov
movb
movl
movw

These minor operations can effect which type of variable you will be using.
fffffffc(%ebp) is a negative 4 at %ebp.
If we think about it:
%ebp = %esp
%esp + 24
The stack grows down, so negative 4 ebp will be hitting the memory allocated
for a local variable, and it is a movl, long indicating an integer.
(An int is the same as type long on most systems)

Exit Status:
You will be able to determine programs return type by the value held in %eax.
A statement in C such as:
return(0);
Would result in the assembly equivelant to:
xor %eax,%eax
ret
(Our return value is obviously held in %eax)

Procedure epilog:
mov    %ebp,%esp
pop    %ebp
ret 

This is obviously moving our contents of ebp into esp. (The original %esp)
pop %ebp Is popping the pushed base pointer from the stack. (The original %ebp)
ret is obviously returning control.

mercy@hewey:~/examples/gdb > gdb 1 -q
(gdb) disass main
Dump of assembler code for function main:
0x8048410 <main>:       push   %ebp
0x8048411 <main+1>:     mov    %esp,%ebp
0x8048413 <main+3>:     sub    $0x18,%esp
0x8048416 <main+6>:     nop    
0x8048417 <main+7>:     movl   $0x0,0xfffffffc(%ebp)
0x804841e <main+14>:    mov    %esi,%esi
0x8048420 <main+16>:    cmpl   $0x9,0xfffffffc(%ebp)
0x8048424 <main+20>:    jle    0x8048428 <main+24>
0x8048426 <main+22>:    jmp    0x8048440 <main+48>
0x8048428 <main+24>:    add    $0xfffffff4,%esp
0x804842b <main+27>:    push   $0x80484b4
0x8048430 <main+32>:    call   0x8048300 <printf>
0x8048435 <main+37>:    add    $0x10,%esp
0x8048438 <main+40>:    incl   0xfffffffc(%ebp)
0x804843b <main+43>:    jmp    0x8048420 <main+16>
0x804843d <main+45>:    lea    0x0(%esi),%esi
0x8048440 <main+48>:    xor    %eax,%eax
0x8048442 <main+50>:    jmp    0x8048444 <main+52>
0x8048444 <main+52>:    mov    %ebp,%esp
0x8048446 <main+54>:    pop    %ebp
0x8048447 <main+55>:    ret    
End of assembler dump.
(gdb) 


0x8048410 <main>:       push   %ebp    
Push the ebp (extended base pointer) onto the stack.
0x8048411 <main+1>:     mov    %esp,%ebp
Move the current esp (extended stack pointer) into the old ebp register, in turn setting up a new bp.
0x8048413 <main+3>:     sub    $0x18,%esp
Set aside 24 bytes for local variables.
0x8048416 <main+6>:     nop    
Do nothing. ( no operation )
0x8048417 <main+7>:     movl   $0x0,0xfffffffc(%ebp)
put 0 at -4 ebp, or four bytes from the ebp.
0x804841e <main+14>:    mov    %esi,%esi
two byte padding, next instruction main+14+2.
0x8048420 <main+16>:    cmpl   $0x9,0xfffffffc(%ebp)
compare long -4 ebp, with the value 9.
0x8048424 <main+20>:    jle    0x8048428 <main+24>
if it is lower than 9, jump to adress 0x8048428
0x8048426 <main+22>:    jmp    0x8048440 <main+48>
if it is equal, then jump to 0x8048440
0x8048428 <main+24>:    add    $0xfffffff4,%esp
if it was lower than 9, subtract 12 from esp (stack setup)
0x804842b <main+27>:    push   $0x80484b4
push %s\n\000 onto the stack
0x8048430 <main+32>:    call   0x8048300 <printf>
make a call to printf
0x8048435 <main+37>:    add    $0x10,%esp
change the stack pointer
0x8048438 <main+40>:    incl   0xfffffffc(%ebp)
increment long, -4 ebp ( +1 to the value at ebp)
0x804843b <main+43>:    jmp    0x8048420 <main+16>
go back to main+16
0x804843d <main+45>:    lea    0x0(%esi),%esi
load effective adress of esi into esi. (I beleive it is 2 byte padding again, I may be wrong though.)
0x8048440 <main+48>:    xor    %eax,%eax
xor the value in eax with itself, in turn putting 0 in eax, this will be the exit status.
0x8048442 <main+50>:    jmp    0x8048444 <main+52>
jump to adress 0x8048444
0x8048444 <main+52>:    mov    %ebp,%esp
move the base poiter into the stack pointer.
0x8048446 <main+54>:    pop    %ebp
pop the inital ebp value. (0x8048410)
0x8048447 <main+55>:    ret    
return.

>From that very basic assembly dump, we were able to construct a fairly accurate image of what the programs source may have been.
Lets start with putting the info together.

put a null byte at -4 ebp, or four bytes from the ebp. <== that tells me that there is one size int variable been set aside.
compare long -4 ebp, with the value 9. <== compare the value in the int with 9, if it isnt jump here: if it is continue here:
push %s\n\000 onto the stack <== shows me it is just going to print a random string.
make a call to printf <== obviously, printf("%s\n"); (from the above)
increment long, -4 ebp ( +1 to the value at ebp) <== if the value is not 9, print the line above, add 1 to the value.
go back to main+16 <== go back to compare long (compare ebp with $0x09).
xor the value in eax with itself, in turn putting 0 in eax, this will be the exit status. <== tells me to exit with status(0) (success).

That above is the basis of our re-construction code, there are a few other values in there, though the above is what info we really
need to be able to re-construct the binary into C format, here is my attempt:

#include <stdio.h>
int main(int argc, char **argv)
{
    int a;   // int
    for(a = 0; a <=9; a++)  // compare long loop, increment a;
      printf("%s\n");  // print the pushed value %s\n\000
     return(0);  // return
}

Above is my reconstructed C code, below is the actual program:

#include <stdio.h>

int main(int argc, char **argv)
{
     int i;
     for(i = 0; i < 10; i++)
       printf("%s\n");
     return(0);
}

As you can see, there is basically no difference besides the name of local variables, re-construction of binaries can be very complex for large
programs, thats why i suggest if you think you have found a hole, re-construct the vuln function rather than the whole binary.
And that is just about a rap, any questions feel free to e-mail me at: 
mercy@dtors.net

/* Some people may be wondering where we got our pushed values; %s\n\000, you simply do this on the push value:
(gdb) x/5bc 0x80484b4
0x80484b4 <_IO_stdin_used+4>:   37 '%'  115 's' 10 '\n' 0 '\000'        Cannot access memory at address 0x80484b8
(gdb) 
and that is how I got what i needed, have fun and until next time, peace */


Paper Two:
Reverse engineering code, a function? a function? a function? sheeeeesh.

# milw0rm.com [2006-04-08]