DTORS SECURITY RESEARCH (DSR) Author: mercy Title: Reconstructing binaries to C for beginners. Date: 20/01/2003 An example of reverse engineering code: Hopefully at the end of this small disassembly dump, we will have been able to analyze what is happening in the assembly, and construct accurate C code to complement it. To start with, we will talk about the allocation of local variables. In our C programs we always use variables, any program does, and for this point we must take carefull note as to the space allocated, and the data type it may have been. push %ebp mov %esp,%ebp sub $0x18,%esp These three steps are known as the procedure prolog, and are essential in the operations of your program. push %ebp This is pushing your base pointer onto the stack, this is so we can retrieve it at program exit, and restore the program back to its original state before return. mov %esp,%ebp This is moving your current stack pointer to become your base pointer for your program. sub $0x18,%esp This is actually increasing your stack pointer by 24 bytes. (Stack is growing down) This is probably one of the most important things to note, because you will later see your local variables being accessed via a command such as: movl $0x0,0xfffffffc(%ebp) It is important to note, the difference between: mov movb movl movw These minor operations can effect which type of variable you will be using. fffffffc(%ebp) is a negative 4 at %ebp. If we think about it: %ebp = %esp %esp + 24 The stack grows down, so negative 4 ebp will be hitting the memory allocated for a local variable, and it is a movl, long indicating an integer. (An int is the same as type long on most systems) Exit Status: You will be able to determine programs return type by the value held in %eax. A statement in C such as: return(0); Would result in the assembly equivelant to: xor %eax,%eax ret (Our return value is obviously held in %eax) Procedure epilog: mov %ebp,%esp pop %ebp ret This is obviously moving our contents of ebp into esp. (The original %esp) pop %ebp Is popping the pushed base pointer from the stack. (The original %ebp) ret is obviously returning control. mercy@hewey:~/examples/gdb > gdb 1 -q (gdb) disass main Dump of assembler code for function main: 0x8048410
: push %ebp 0x8048411 : mov %esp,%ebp 0x8048413 : sub $0x18,%esp 0x8048416 : nop 0x8048417 : movl $0x0,0xfffffffc(%ebp) 0x804841e : mov %esi,%esi 0x8048420 : cmpl $0x9,0xfffffffc(%ebp) 0x8048424 : jle 0x8048428 0x8048426 : jmp 0x8048440 0x8048428 : add $0xfffffff4,%esp 0x804842b : push $0x80484b4 0x8048430 : call 0x8048300 0x8048435 : add $0x10,%esp 0x8048438 : incl 0xfffffffc(%ebp) 0x804843b : jmp 0x8048420 0x804843d : lea 0x0(%esi),%esi 0x8048440 : xor %eax,%eax 0x8048442 : jmp 0x8048444 0x8048444 : mov %ebp,%esp 0x8048446 : pop %ebp 0x8048447 : ret End of assembler dump. (gdb) 0x8048410
: push %ebp Push the ebp (extended base pointer) onto the stack. 0x8048411 : mov %esp,%ebp Move the current esp (extended stack pointer) into the old ebp register, in turn setting up a new bp. 0x8048413 : sub $0x18,%esp Set aside 24 bytes for local variables. 0x8048416 : nop Do nothing. ( no operation ) 0x8048417 : movl $0x0,0xfffffffc(%ebp) put 0 at -4 ebp, or four bytes from the ebp. 0x804841e : mov %esi,%esi two byte padding, next instruction main+14+2. 0x8048420 : cmpl $0x9,0xfffffffc(%ebp) compare long -4 ebp, with the value 9. 0x8048424 : jle 0x8048428 if it is lower than 9, jump to adress 0x8048428 0x8048426 : jmp 0x8048440 if it is equal, then jump to 0x8048440 0x8048428 : add $0xfffffff4,%esp if it was lower than 9, subtract 12 from esp (stack setup) 0x804842b : push $0x80484b4 push %s\n\000 onto the stack 0x8048430 : call 0x8048300 make a call to printf 0x8048435 : add $0x10,%esp change the stack pointer 0x8048438 : incl 0xfffffffc(%ebp) increment long, -4 ebp ( +1 to the value at ebp) 0x804843b : jmp 0x8048420 go back to main+16 0x804843d : lea 0x0(%esi),%esi load effective adress of esi into esi. (I beleive it is 2 byte padding again, I may be wrong though.) 0x8048440 : xor %eax,%eax xor the value in eax with itself, in turn putting 0 in eax, this will be the exit status. 0x8048442 : jmp 0x8048444 jump to adress 0x8048444 0x8048444 : mov %ebp,%esp move the base poiter into the stack pointer. 0x8048446 : pop %ebp pop the inital ebp value. (0x8048410) 0x8048447 : ret return. >From that very basic assembly dump, we were able to construct a fairly accurate image of what the programs source may have been. Lets start with putting the info together. put a null byte at -4 ebp, or four bytes from the ebp. <== that tells me that there is one size int variable been set aside. compare long -4 ebp, with the value 9. <== compare the value in the int with 9, if it isnt jump here: if it is continue here: push %s\n\000 onto the stack <== shows me it is just going to print a random string. make a call to printf <== obviously, printf("%s\n"); (from the above) increment long, -4 ebp ( +1 to the value at ebp) <== if the value is not 9, print the line above, add 1 to the value. go back to main+16 <== go back to compare long (compare ebp with $0x09). xor the value in eax with itself, in turn putting 0 in eax, this will be the exit status. <== tells me to exit with status(0) (success). That above is the basis of our re-construction code, there are a few other values in there, though the above is what info we really need to be able to re-construct the binary into C format, here is my attempt: #include int main(int argc, char **argv) { int a; // int for(a = 0; a <=9; a++) // compare long loop, increment a; printf("%s\n"); // print the pushed value %s\n\000 return(0); // return } Above is my reconstructed C code, below is the actual program: #include int main(int argc, char **argv) { int i; for(i = 0; i < 10; i++) printf("%s\n"); return(0); } As you can see, there is basically no difference besides the name of local variables, re-construction of binaries can be very complex for large programs, thats why i suggest if you think you have found a hole, re-construct the vuln function rather than the whole binary. And that is just about a rap, any questions feel free to e-mail me at: mercy@dtors.net /* Some people may be wondering where we got our pushed values; %s\n\000, you simply do this on the push value: (gdb) x/5bc 0x80484b4 0x80484b4 <_IO_stdin_used+4>: 37 '%' 115 's' 10 '\n' 0 '\000' Cannot access memory at address 0x80484b8 (gdb) and that is how I got what i needed, have fun and until next time, peace */ Paper Two: Reverse engineering code, a function? a function? a function? sheeeeesh. # milw0rm.com [2006-04-08]