|=---------------=[ Linux Assembly and Disassembly the Basics ]=--------------=| |=----------------------------------------------------------------------------=| |=-------------------------=[ lhall@telegenetic.net ]=------------------------=| ---[ Introduction to as, ld and writing your own asm from scratch. First off you have to know what a system call is. A system call, or software interrupt is the mechanism used by an application program to request a service from the operating system. System calls often use a special machine code instruction which cause the processor to change mode or context (e.g. from "user more" to "supervisor mode" or "protected mode"). This switch is known as a context switch, for obvious reasons. A context is the protection and access mode that a piece of code is executing in, its determined by a hardware mediated flag. If you have ever heard of people talking about ring zero or cr0 they are referring to code that executes at protected or supervisor mode such as all kernel code. A context switch allows the OS to perform restricted actions such as accessing hardware devices or the memory management unit. Generally, operating systems provide a library that sits between normal programs and the rest of the operating system, usually the C library (libc), such as Glibc, or the Windows API. This library handles the low-level details of passing information to the kernel and switching to supervisor mode. These API's give you access functions that make your job easier, for instance printf to print a formatted string or the *alloc family to get more memory. In linux the system calls are defined in the file /usr/include/asm/unistd.h. entropy@phalaris entropy $ cat /usr/include/asm/unistd.h #ifndef _ASM_I386_UNISTD_H_ #define _ASM_I386_UNISTD_H_ /* * This file contains the system call numbers. */ #define __NR_restart_syscall 0 #define __NR_exit 1 #define __NR_fork 2 #define __NR_read 3 #define __NR_write 4 #define __NR_open 5 #define __NR_close 6 [...snip...] Each system call is shown as the system call name preceded by __NR_ and then followed by the system call number. The system call number is very important for writing asm programs that don't use gcc, a compiler or libc. The system call and fault low-level handling routines are contained in the file /usr/src/linux/arch/i386/kernel/entry.S although this is over our head for now. The text that you type for the instructions of the program is known as the source code. In order to transform source code into a executable program you must assemble and link it. These steps are done for you by a compiler, but we will do them seperatly. Assembling is the process that transforms your source code into instructions for the machine. The machine itself only reads numbers but humans work much better with words. An assembly language is a human readable form of machine code. The linux assemblers name is `as`, you can type `as -h` to see its arguments. `as` generates and object file out of a source file. An object file is machine code that has not been fully put together yet. Object files contain compact, pre-parsed code, often called binaries, that can be linked with other object files to generate a final executable or code library. An object file is mostly machine code. The linker is the program responsible for putting all the object files together and adding information so the kernel knows how to load and run it. `ld` is the name of the linker on linux. So to summarize, source code must be assembled and linked in order to produce and executable program. On linux x86 this is accomplished with as source.s -o object.o ld object.o -o executable Where "source.s" is your assembly code, "object.o" is the object file produced from `as` and output (-o), and "executable" is the final executable produced when the object file has been linked. In the last tutorial we used gcc to generate the asm and then to compile the program. When we called write we pushed the length of the string, the address of the string, and the file descriptor onto the stack and then issued the instruction "call write". This needs some explanation because how we do it now is totally different. That way, pushing values onto the stack, is because we were using C (hello.c) and hence gcc generates C code which uses the C calling convention. A calling convention is the way that variables are stored and the parameters and return values are transfered, C takes its parameters and passes its return variables in a stack frame (eg pushl $14, pushl $.LC0, pushl $1, call write). A stack frame is a piece of the stack that holds all the info needed to call a function. So when we issued the "call write" instruction we were using the C Library (libc), and the write there was really the system call write, same name, but wrapped in libc (eg. getpid() is a wrapper for syscall(SYS_get_pid)). Now when we write our own asm for now we will not be using libc, even though that was is easier its not always possible to use and its good to know whats happening on a lower level. Here's our first program. entropy@phalaris asm $ cat hello.s .section .data hello: .ascii "Hello, World!\n\0" .section .text .globl _start _start: movl $4, %eax movl $14, %edx movl $hello, %ecx movl $1, %ebx int $0x80 movl $1, %eax movl $0, %ebx int $0x80 entropy@phalaris asm $ as hello.s -o hello.o entropy@phalaris asm $ ld hello.o -o hello entropy@phalaris asm $ ./hello Hello, World! Same output as before and accomplished the same thing but done very differently. .section .data Starts the section .data where all our data goes. We could just as easily have done .section .rodata like what gcc generated in the intro and then the string would have been read only but its much more common to put initialized data into the .data section. .rodata section is more like we wanted to do a #define hello "Hello, World\n" in C, in the .data section its more similar to char hello[] = "Hello, World\n". hello: The label hello, which remember is a symbol (a symbol being a string representation for an address) followed by a colon. A label says, when you assemble, take the next instruction or data following the colon and and make that the labels value. .ascii "Hello, World!\n\0" And here is what the value of the label hello: is going to be, the label hello is going to point to the first character of the string (.ascii defines a string) "Hello, World!\n\0". .section .text Here we start our code section. .globl _start `as` expects _start while `gcc` expects main to be the starting function of an executable. Again .globl tells the assembler that it shouldn't get rid of the symbol after assembly because the linker needs it. _start: _start is a symbol that is going to be replaced by an address during either assembly or linking. _start here is where our program will start to execute when loaded by the kernel. movl $4, %eax When calling a system call the system call number you want to call is put into the register eax. As we saw above in the file /usr/include/asm/unistd.h, the write system call was defined as "#define __NR_write 4". So here we are moving the immediate value 4 into eax, so when we call the kernel to do its work it will know we want write. movl $14, %edx movl $hello, %ecx movl $1, %ebx The write system call is expecting three arguments namely, the file descriptor to write to, the address of the string to write, and the length of the string to write. When calling system calls, function arguments are passed in registers, which differs from the C Library or libc convention which expects function arguments to be pushed onto the stack. So we have the system call number goes into eax, the first argument goes into ebx, the third into ecx, the fourth into edx. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. So we fill in the registers that write needs to do its job, we move 1 which is STDOUT into ebx, we put the label hello's value (which is the address of the string "Hello, World!\n\0") into ecx, and we put the length of the string 14, into edx. int $0x80 This instruction int(errupts) the kernel($0x80) and asks it to do the system function whos index is in eax. An interrupt interrupts the programs flow and asks the kernel to do something for us. The kernel will then preform the system function and then return control to our program. Before the interrupt we were executing in a user mode context, during the system call we were executing in a protected mode context, and when the kernel is done and returns control to our program we are again executing in a user mode context. So the kernel reads eax does a write of our string and returns. movl $1, %eax Now were done and we need to exit, so what number do we use to execute exit? Look back at unistd.h and we see that exit is "#define __NR_exit 1". movl $0, %ebx exit expects one argument namely the return code (0 means no errors), so we put that into ebx. int $0x80 Call the kernel to execute exit with return code 0 and were done. Onto the disassembly. Compile with debugging symbols, `as` uses the same -g or -gstabs that `gcc` does. entropy@phalaris asm $ as -g hello.s -o hello.o And link it. entropy@phalaris asm $ ld hello.o -o hello Start gdb. entropy@phalaris asm $ gdb hello GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you arewelcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". Set a breakpoint at the address of _start so we can step through it. (gdb) break *_start Breakpoint 1 at 0x8048094: file hello.s, line 7. (gdb) run Starting program: /home/entropy/asm/hello Hello, World! Program exited normally. Current language: auto; currently asm (gdb) The breakpoint didn't work, Im not sure why this happens but we can do a quick fix. Here is the fixed asm. entropy@phalaris asm $ cat hello.s .section .data hello: .ascii "Hello, World!\n\0" .section .text .globl _start _start: nop movl $4, %eax movl $14, %edx movl $hello, %ecx movl $1, %ebx int $0x80 movl $1, %eax movl $0, %ebx int $0x80 The only difference is the nop or no operation right after _start. Now we can set our breakpoint and it will work. Reassemble and link. entropy@phalaris asm $ as -g hello.s -o hello.o entropy@phalaris asm $ ld hello.o -o hello Start gdb. entropy@phalaris asm $ gdb hello GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you arewelcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". List our assembly. I put in comments so it would be easier to follow. Breakpoint 1 at 0x8048095: file hello.s, line 8. (gdb) list _start 2 hello: # label hello, address of the first char 3 .ascii "Hello, World!\n\0" # .ascii defines a string 4 .section .text # our code start 5 .globl _start # the start symbol defined as .globl 6 _start: # the start label 7 nop # no operation for debugging with gdb 8 movl $4, %eax # mov 4 into %eax, 4 is write(fd, buf, len) 9 movl $14, %edx # 14 is the length of our string 10 movl $hello, %ecx # the address of our string 11 movl $1, %ebx # 1 is STDOUT, to the screen (gdb) 12 int $0x80 # call the kernel 13 movl $1, %eax # move 1 into %eax, 1 is syscall exit() 14 movl $0, %ebx # move 0 into %ebx, exit's return value 15 int $0x80 # call kernel Set a break point at our nop. (gdb) break *_start+1 Breakpoint 1 at 0x8048095: file hello.s, line 8. And run it. (gdb) run Starting program: /home/entropy/asm/hello Breakpoint 1, _start () at hello.s:8 8 movl $4, %eax # mov 4 into %eax, 4 is write(fd, buf, len) Current language: auto; currently asm Now the breakpoint works. (gdb) step _start () at hello.s:9 9 movl $14, %edx # length for write 14 is the length of our string (gdb) step _start () at hello.s:10 10 movl $hello, %ecx # the address of our string (gdb) step _start () at hello.s:11 11 movl $1, %ebx # 1 is STDOUT, to the screen (gdb) step _start () at hello.s:12 12 int $0x80 # call the kernel Check the registers to see if they have the correct information in them. (gdb) print $edx $1 = 14 (gdb) x/s $ecx 0x80490b8 : "Hello, World!\n" (gdb) print $ebx $2 = 1 (gdb) print $eax $3 = 4 (gdb) Looks good so let the kernel do its work. (gdb) step Hello, World! _start () at hello.s:13 13 movl $1, %eax # move 1 into %eax, 1 is syscall exit() It has executed the write system call, you can see the printed string and returned to gdb. Now we call exit. (gdb) step _start () at hello.s:14 14 movl $0, %ebx # move 0 into %ebx, exit's return value (gdb) step _start () at hello.s:15 15 int $0x80 # call kernel (gdb) step Program exited normally. And its done. (gdb) q entropy@phalaris asm $ Check out the objdump output. entropy@phalaris asm $ objdump -d hello hello: file format elf32-i386 Disassembly of section .text: 08048094 <_start>: 8048094: 90 nop 8048095: b8 04 00 00 00 mov $0x4,%eax 804809a: ba 0e 00 00 00 mov $0xe,%edx 804809f: b9 b8 90 04 08 mov $0x80490b8,%ecx 80480a4: bb 01 00 00 00 mov $0x1,%ebx 80480a9: cd 80 int $0x80 80480ab: b8 01 00 00 00 mov $0x1,%eax 80480b0: bb 00 00 00 00 mov $0x0,%ebx 80480b5: cd 80 int $0x80 entropy@phalaris asm $ Notice the difference from the last tutorial objdump output, no snipping of tons of lines of extra sections and such, its only the code we coded in there which is so much cleaner. Notice how easy it would be to take the opcodes and make some tiny shell code out of, again without the nulls. What? You want some shellcode to print "Hello, World!\n" for your next 0day? Next time my friend, next time. # milw0rm.com [2006-04-08]