Writing Assembler for FreeBSD

EDB-ID:

13222

CVE:

N/A

Author:

lhall

Type:

papers

Platform:

Multiple

Published:

2006-04-08

|=----------------------=[ Writing Assembler for FreeBSD ]=-------------------=|
|=----------------------------------------------------------------------------=|
|=-------------------------=[ lhall@telegenetic.net ]=------------------------=|


----[ Intro

 *If you are familiar with Linux assembly you must know a couple things first
or else this could get highly confusing. FreeBSD assembly differes from Linux 
assembly in a couple of ways. FreeBSD uses the C calling convention of pushing 
the function arguments onto the stack, although like Linux it does put the syscall 
number into eax and return either the return value or error code in eax. Because 
FreeBSD "calls" syscalls using the C calling convention when we call a syscall we 
must push an extra long onto the stack to compensate for the return address.
 
*Note: If you know Linux assembly this paper will be highly redundant because
of their similarities. The most important things to remember is that FreeBSD pushl's
its arguments onto the stack *from the right to the left* and does not always return 
the system calls return value in eax. If you already understand Linux system calls, 
context switches, and the stack you may just want to skim this paper for the 
differences. If not I recommend you read my intro to Linux assembly and disassembly 
first to which is a more basic paper then this one, and goes into more details.

---[ System Calls

 A system call, or software interrupt is the mechanism used by an application 
program to request a service from the operating system. System calls often use a 
special machine code instruction which cause  the processor to change mode or context 
(e.g. from "user more" to "supervisor mode" or "protected mode"). This switch is known 
as a context switch, for obvious reasons. A context is the protection and access mode 
that a piece of code is executing in, its determined by a hardware mediated flag.
A context switch allows the OS to perform restricted actions such as accessing hardware 
devices or the memory management unit. Generally, operating systems provide a library 
that sits between normal programs and the rest of the operating system, usually the C
library (libc), such as Glibc, or the Windows API. This library handles the low-level 
details of passing information to the kernel and switching to supervisor mode. These 
API's give you access functions that make your job easier.


---[ Assembling and linking

 The text that you type for the instructions of the program is known as the source 
code. In order to transform source code into a executable program you must assemble 
and link it. These steps are done for you by a compiler, but we will do them seperatly. 
Assembling is the process that transforms your source code into instructions for the 
machine. The machine itself only reads numbers but humans work much better with words. 
An assembly language is a human readable form of machine code. The linux assemblers 
name is `as`, you can type `as -h` to see its arguments. `as` generates and object 
file out of a source file. An object file is machine code that has not been fully put 
together yet. Object files contain compact, pre-parsed code, often called binaries, 
that can be linked with other object files to generate a final executable or code 
library. An object file is mostly machine code. The linker is the program responsible 
for putting all the object files together and adding information so the kernel knows 
how to load and run it. `ld` is the name of the linker on FreeBSD. 

So to summarize, source code must be assembled and linked in order to 
produce and executable program. On FreeBSD x86 this is accomplished with

	as source.s -o object.o
	ld object.o -o executable

Where "source.s" is your assembly code, "object.o" is the object file 
produced from `as` and output (-o), and "executable" is the final 
executable produced when the object file has been linked.


---[ Identifying system call numbers and function parameters

 The file that contains the syscall numbers is /usr/src/sys/kern/syscalls.master, 
on my 5.4 box it looks something like:

[...snip...]

0       MSTD    { int nosys(void); } syscall nosys_args int
1       MSTD    { void sys_exit(int rval); } exit sys_exit_args void
2       MSTD    { int fork(void); }
3       MSTD    { ssize_t read(int fd, void *buf, size_t nbyte); }
4       MSTD    { ssize_t write(int fd, const void *buf, size_t nbyte); }
5       MSTD    { int open(char *path, int flags, int mode); }
; XXX should be         { int open(const char *path, int flags, ...); }
; but we're not ready for `const' or varargs.
; XXX man page says `mode_t mode'.
6       MSTD    { int close(int fd); }
7       MSTD    { int wait4(int pid, int *status, int options, \
                            struct rusage *rusage); } wait4 wait_args int
8       MCOMPAT { int creat(char *path, int mode); }
9       MSTD    { int link(char *path, char *link); }
10      MSTD    { int unlink(char *path); }
11      OBSOL   execv

[...snip...]

Here we have the syscall number, followed by either MSTD (standard syscall), MCOMPAT
(for compatibility) or OBSOL (for obsolete), followed by the function definition. For 
instance the _exit syscall would be:

entropy@redcell {~/advsc} cat /usr/src/sys/kern/syscalls.master | grep exit
1       MSTD    { void sys_exit(int rval); } exit sys_exit_args void

which if you notice is the same definition as the libc definition:

entropy@redcell {~} man 2 _exit

[...snip...]

SYNOPSIS
     #include <unistd.h>

     void
     _exit(int status);

[...snip...]

From this we can start coding a most basic program that just exits, we know
sys_exit()'s system call number is 1, and it takes one value - the return value.
So all we have to do is pushl a return value, put 1 in eax, pushl an extra long
to compensate for the missing return address push (because we are not using call)
and then call the kernel, cake.

entropy@redcell {~/asm} cat exit.s

.section .text           # start of the .text or code section
.globl _start            # _start is a special symbol
_start:                  # the start label
    movl $1, %eax        # move long 1 into eax, 1 is exit's syscall number
    pushl $42            # push long 42 onto the stack for exits return value
    pushl %eax           # push the extra long to compensate for no `call`
    int $0x80            # call the kernel
    addl $4, %esp        # clean up stack from our push, not necessary
                         # because we just exited

Assemble, link and execute.

entropy@redcell {~/asm} as exit.s -o exit.o

entropy@redcell {~/asm} ld exit.o -o exit

entropy@redcell {~/asm} ./exit

entropy@redcell {~/asm} echo $?
42

Looks good, you can use strace to make sure it did indeed call sys_exit().

entropy@redcell {~/asm} strace ./exit

execve(0xbfbfe864, [0xbfbfed48], [/* 0 vars */]) = 0
exit(42)                                = ?

Trying to break with gdb and assembly poses a small problem:

Use -gstabs with `as` to generate the debugging info in the object file.

entropy@redcell {~/asm} as -gstabs exit.s -o exit.o

entropy@redcell {~/asm} ld exit.o -o exit

entropy@redcell {~/asm} gdb exit

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

(gdb) break *_start+1
Breakpoint 1 at 0x8048075: file exit.s, line 5.

(gdb) run
Starting program: /home/entropy/asm/exit

Program exited with code 052.

(gdb) quit
entropy@redcell {~/asm} nano exit.s

No break...
In order to be able to break at breakpoints in assembly on both Linux and FreeBSD 
you need to have a nop in the program after the point where you want to break, which 
gives us code like:

entropy@redcell {~/asm} cat exit.s

.section .text           # start of the .text or code section
.globl _start            # _start is a special symbol
_start:                  # the start label
    nop                  # nop here so gdb can break
    movl $1, %eax        # move long 1 into eax, 1 is exit's syscall number
    pushl $42            # push long 42 onto the stack for exits return value
    pushl %eax           # push the extra long to compensate for no `call`
    int $0x80            # call the kernel
    addl $4, %esp        # clean up stack from our push, not necessary
                         # because we just exited

Reassemble and link.

entropy@redcell {~/asm} as -gstabs exit.s -o exit.o

entropy@redcell {~/asm} ld exit.o -o exit

entropy@redcell {~/asm} gdb exit

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

(gdb) break *_start+1
Breakpoint 1 at 0x8048075: file exit.s, line 6.

(gdb) list

1
2       .section .text           # start of the .text or code section
3       .globl _start            # _start is a special symbol
4       _start:                  # the start label
5           nop                  # nop here so gdb can break
6           movl $1, %eax        # move long 1 into eax, 1 is exit's syscall number
7           pushl $42            # push long 42 onto the stack for exits return value
8           pushl %eax           # push the extra long to compensate for no `call`
9           int $0x80            # call the kernel
10          addl $4, %esp        # clean up stack from our push, not necessary
(gdb)
11                               # because we just exited
12

(gdb) run
Starting program: /home/entropy/asm/exit

Breakpoint 1, _start () at exit.s:6
6           movl $1, %eax        # move long 1 into eax, 1 is exit's syscall number
Current language:  auto; currently asm

(gdb) step
_start () at exit.s:7
7           pushl $42            # push long 42 onto the stack for exits return value

(gdb)
_start () at exit.s:8
8           pushl %eax           # push the extra long to compensate for no `call`

(gdb)
_start () at exit.s:9
9           int $0x80            # call the kernel

(gdb)

Program exited with code 052.
(gdb) quit

Much better, although with this program its not so useful. Lets try one that write()'s.
Find the syscall number and the function definition of write() these are both in 
syscall.master.

entropy@redcell {~/asm} grep write /usr/src/sys/kern/syscalls.master

4       MSTD    { ssize_t write(int fd, const void *buf, size_t nbyte); }

You can also get the function definition from man 2 write, as section 2 of 
the man pages is the FreeBSD system calls section, keep in mind the syscalls and 
the libc calls will not always have the same name, in this case they are.

entropy@redcell {~/asm} man 2 write

[...snip...]

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include <sys/types.h>
     #include <sys/uio.h>
     #include <unistd.h>

     ssize_t
     write(int d, const void *buf, size_t nbytes);

[...snip...]

write() writes to the file descriptor 'd', takes the address of the string to write 
and the number of bytes to write. Remembering that you must pushl the arguments from 
the right to the left we will pushl the length of the string, the address of the 
string and then the file descriptor (STDOUT or 1), move 4 into eax, pushl the extra 
long and call the kernel. One way to define a string is to have a label followed by 
.ascii followed by your string, for example:

myString:
   .ascii "This is my string.\n"

The label (label's are symbols of addresses followed by colons) myString's addrsss 
is the first character of the string. Write the simple program that writes a string 
then uses the exit() syscall from above.

entropy@redcell {~/asm} cat write.s

.section .rodata
hello:
   .ascii "Hello, World!\n"

.equ STDOUT, 1

.section .text           # start of the .text or code section
.globl _start            # _start is a special symbol
_start:                  # the start label
    nop                  # nop here so gdb can break
                         # write syscall
    pushl $14            # the length of our string
    pushl $hello         # the address of our string
    pushl $STDOUT        # the equate of STDOUT which is 1 (file descriptor)
    movl $4, %eax        # move 4 into eax, 4 is the write syscall
    pushl %eax           # push the extra long to compensate for no `call`
    int $0x80            # call the kernel
    add $12, %esp        # clean stack, we had three long pushes each 4 bytes 3*4
                         # exit syscall
    movl $1, %eax        # move long 1 into eax, 1 is exit's syscall number
    pushl $0             # push long 0 onto the stack for exits return value
    pushl %eax           # push the extra long to compensate for no `call`
    int $0x80            # call the kernel
    addl $4, %esp        # clean up stack from our push, not necessary
                         # because we just exited


First off the order that you do either the pushl's of the arguments and then the 
movl'ing of the syscall number into eax does not matter, so you can movl 4 into eax 
then do the pushl's or the other way around. I put the string in the read only data 
section (.rodata) as it will never be changed, and also added an equate, .equ, 
(similar to a #define) just to make the code more readable. 

Assemble link and execute.

entropy@redcell {~/asm} as write.s -o write.o

entropy@redcell {~/asm} ld write.o -o write

entropy@redcell {~/asm} ./write
Hello, World!

Use strace to see the syscalls happening.

entropy@redcell {~/asm} strace ./write

execve(0xbfbfe864, [0xbfbfed48], [/* 0 vars */]) = 0

write(1, "Hello, World!\n", 14Hello, World!
)         = 14

exit(0)                                 = ?

And finally if you want to step throught it compile with -gstabs. 

 The next program will do something not easily done in higher level languages, 
namely get the OS type from the %gs register, and get the CPU vendor id from 
the CPU. We will figure out what OS we are on by examining the value in the 
segment register %gs, for Linux were looking for a 0x00, for FreeBSD a 0x2f and 
for OpenBSD a 0x1f. We look at this register then just write the correct string 
out to STDOUT. The cpuid instruction takes a value in eax which is used to determine 
what information is produced. With the value 0 in eax the vendor id string is 
returned in ebx, edx and ecx in that order.
 
The returned registers contain:

 ebx first 4 bytes
 edx middle 4 bytes
 ecx last 4 bytes

 First set up your strings, get the value of gs and compare it, if its equal 
then just to the matching print label, if none match print unknown. Then execute 
the cpuid instruction, put the address of the buffer in edi (edi is used for 
destination string operations; esi is used for source string operations), then mov 
ebx into the address of edi, edx into the address of edi + 4 and ecx into the 
address of edi + 8, then print. It looks bad but its easy if you break it into sections.

entropy@redcell {~/asm} cat system.s

.section .rodata
linux:
   .ascii "You are running a Linux OS.\n"
freebsd:
   .ascii "You are running a FreeBSD based OS.\n"
openbsd:
   .ascii "You are running a OpenBSD based OS.\n"
unknown:
   .ascii "You are running an UNKNOWN OS.\n"
start_id:
   .ascii "Your processor ID is '"
end_id:
   .ascii "'.\n"
.equ STDOUT, 1

.section .bss
   .lcomm buf, 12         # our 12 byte buffer on the heap

.section .text
.globl _start
_start:
   nop                    # so we can break with gdb
   movw %gs, %cx          # move the word sized register into cx for compares

   cmpw $0x00, %cx        # compare the word size value 0x00 to cx
   je print_linux         # jump if equal to print_linux

   cmpw $0x2f, %cx        # compare the word size value 0x2f to cx
   je print_freebsd       # jump if equal to print_freebsd

   cmpw $0x1f, %cx        # compare the word size value 0x1f to cx
   je print_openbsd       # jump if equal to print_openbsd

   jmp print_unknown      # nothing matched jump to print_unknown

print_linux:              # print_linux label
   movl $4, %eax          # 4 is write syscall
   pushl $28              # pushl length of string
   pushl $linux           # pushl address of string
   pushl $STDOUT          # pushl file descriptor to write too
   pushl %eax             # pushl fake return address
   int $0x80              # call kernel
   add $12, %esp          # fix up the stack 3 pushls that count
   jmp print_cpu          # print the cpu id

print_freebsd:            # print_freebsd label
   movl $4, %eax          # 4 is write syscall
   pushl $36              # pushl length of string
   pushl $freebsd         # pushl address of string
   pushl $STDOUT          # pushl file descriptor to write too
   pushl %eax             # pushl fake return address
   int $0x80              # call kernel
   add $12, %esp          # fix up the stack 3 pushls that count
   jmp print_cpu          # print the cpu id

print_openbsd:            # print_openbsd label
   movl $4, %eax          # 4 is write syscall
   pushl $36              # pushl length of string
   pushl $openbsd         # pushl address of string
   pushl $STDOUT          # pushl file descriptor to write too
   pushl %eax             # pushl fake return address
   int $0x80              # call kernel
   add $12, %esp          # fix up the stack 3 pushls that count
   jmp print_cpu          # print the cpu id

print_unknown:            # print_unknown label
   movl $4, %eax          # 4 is write syscall
   pushl $31              # pushl length of string
   pushl $unknown         # pushl address of string
   pushl $STDOUT          # pushl file descriptor to write too
   pushl %eax             # pushl fake return address
   int $0x80              # call kernel
   add $12, %esp          # fix up the stack 3 pushls that count
   jmp print_cpu          # print the cpu id

print_cpu:                # print_cpu label
   xor %eax, %eax         # set eax to 0, aka cpuid generates Vendor ID
   cpuid                  # execute cpuid
   movl $buf, %edi        # movl the address of buf to edi
   movl %ebx, (%edi)      # move first 4 chars into address of edi
   movl %edx, 4(%edi)     # move next 4 chars into address of edi + 4
   movl %ecx, 8(%edi)     # move next 4 chars into address of edi + 8
                          # the vendor id string is now in the buffer
   movl $4, %eax          # 4 is write syscall
   pushl $22              # pushl the length of the string
   pushl $start_id        # pushl address of string
   pushl $STDOUT          # pushl file descriptor to write too
   push %eax              # pushl fake return address
   int $0x80              # call kernel
   addl $12, %esp         # fix up the stack 3 pushls that count
   movl $4, %eax          # 4 is write syscall
   pushl $12              # pushl the length of the string
   pushl $buf             # pushl address of string
   pushl $STDOUT          # pushl file descriptor to write too
   push %eax              # pushl fake return address
   int $0x80              # call kernel
   addl $12, %esp         # fix up the stack 3 pushls that count
   movl $4, %eax          # 4 is write syscall
   pushl $3               # pushl the length of the string
   pushl $end_id          # pushl address of string
   pushl $STDOUT          # pushl file descriptor to write too
   push %eax              # pushl fake return address
   int $0x80              # call kernel
   addl $12, %esp         # fix up the stack 3 pushls that count

exit:                     # exit label
   movl $1, %eax          # 1 is sys_exit syscall
   pushl $0               # 0 is the return value of exit
   pushl %eax             # pushl fake return address
   int $0x80              # call kernel
   addl $4, %esp          # fix up the stack 1 pushl that counts

Assemble and link.

entropy@redcell {~/asm} as -gstabs system.s -o system.o

entropy@redcell {~/asm} ld system.o -o system

First run under gdb to see the gs register, so you get an idea of where its 
coming from.

entropy@redcell {~/asm} gdb system

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

(gdb) break *_start+1
Breakpoint 1 at 0x8048075: file system.s, line 23.

(gdb) r
Starting program: /home/entropy/asm/system

Breakpoint 1, _start () at system.s:23
23         movw %gs, %cx          # move the word sized register into cx for compares
Current language:  auto; currently asm

(gdb) info reg
eax            0x0      0
ecx            0x0      0
edx            0x0      0
ebx            0x0      0
esp            0xbfbfed4c       0xbfbfed4c
ebp            0x0      0x0
esi            0x0      0
edi            0x0      0
eip            0x8048075        0x8048075
eflags         0x202    514
cs             0x1f     31
ss             0x2f     47
ds             0x2f     47
es             0x2f     47
fs             0x2f     47
gs             0x2f     47

Here you can see the not only gs but all the segment registers execpt for cs 
have the FreeBSD OS value in them, 0x2f.

entropy@redcell {~/asm} ./system

You are running a FreeBSD based OS.
Your processor ID is 'GenuineIntel'.

Finally strace it to see the syscalls happening.

entropy@redcell {~/asm} strace ./system

execve(0xbfbfe864, [0xbfbfed48], [/* 0 vars */]) = 0

write(1, "You are running a FreeBSD based "..., 36You are running a FreeBSD based OS.
) = 36

write(1, "Your processor ID is \'", 22Your processor ID is ') = 22

write(1, "GenuineIntel", 12GenuineIntel)            = 12

write(1, "\'.\n", 3'.
)                    = 3

exit(0)                                 = ?

Writing asm under FreeBSD isnt much different then under Linux as long as you keep 
in mind: the syscall areguments are pushed right to left, the syscall number goes 
in eax and that the syscall return value is not always goign to end up in eax.

# milw0rm.com [2006-04-08]