Writeup about source code auditing
How to to break code by reading it
by kingcope/2006 [kingcope@gmx.net]
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Source code auditing at first is reading source, probably auditing
different kinds of programming languages, to find security related bugs.
The goal of source code auditing is to find critical bugs in for example
open source software and exploit this bugs later on to gain unauthorized
access to computers.
Here are some programming languages which you could audit and the skills
you need to have to find bugs in.
PHP 					Easy to read code and find bugs in no time
PERL					Easy to read but more secure than PHP
C/C++					From easy to really hard depends on the program to audit
Java					Easy to read, but hard to find bugs,
						since Java does not have many critical issues by design
Sometimes known bugs in source code are spread in a vast amount of files,
because sometimes open source code is shared between projects.
So you could find a bug in one software and also in different kinds of 
other projects. We will begin by looking at C code, because this is the most
widespread programming language among the open source community. At first you take
some C based code from the internet (@freshmeat.net for example) unpack it and
preferrably install it on your system. After that you need some kind of cool tool
to surf inside the code. Just plain file auditing without a source code browser
takes too much time.
In my experience PowerGrep is the best tool out there to audit in all languages
available. Source Code Navigator written by RedHat is also an option.
Basically at first you grep for common mistakes and later on digg deeper into the code.
For example here is a list of mistakes in C, which could lead to an exploit later on.
Insecure use of sprintf
sprintf(buffer, "%s", somevariable) if buffer is too small for somevariable than we
have a buffer overflow, also if you see something like that look above this function
call to identify if this could be a stack based or heap based overflow. A char *buffer;
with a later allocation (for example malloc) indicates a heap overflow. A char buffer[1024]
for example indicates a stack based overflow and a static buffer[1024] will result
in a bss overflow. Most of the time you will see many plain sprintf's, but nowadays they
are checked before for enough buffer size.
For example consider this type of code:
buffer = (char*) malloc(strlen(somevariable)+1);
sprintf(buffer, "%s", somevariable);
This is a way to secure the sprintf call by allocating enough space for the buffer
to write to. The secure way is to use snprintf(buffer, sizeof(buffer), "%s", somevariable);.
There could also be security issues if the second parameter can be manipulated
but this is very rare.
Sometimes you can also see code like this:
sprintf(buffer, somevariable) which is really bad, this will result in a
format string vulnerability if somevariable is user supplied.
It is also very important that you look for security vulnerabilities in code paths
where the data to the function calls is really user supplied. By that I mean that
somevariable can be manipulated in some way by the attacker. If you see functions
and parameters which cannot be supplied by the attacker or user you should
not further analyze them. Sometimes it is hard to say if a variable is user supplied,
but by using the source auditing tools you should quickly check if this is the case.
When you grep for insecure function calls then always look where the function is
called and in which function it resides as well. If the function call is checked
before for length or other checks always look if these checks are made well.
Sometimes they are just broken in some way.
If you found something that looks suspicious to yourself, like something that looks
like a buffer overflow then try it out on a real testbed. Program a small script that
just calls this function for example remotely (if the software you are auditing
is a server). gdb can be handy, break at the function where the suspicious function
call is located in so you can always be sure that the right codepath is taken by your
testing script. For this to work you need to compile the C program with debugging support.
Also common mistakes are done in different kinds of parsing code. You could just grep
for while and for loops which do string manipulation on their own. Like grepping
for .*++\s=\s.*++; (yes this is a regular expression) in the code could reveal
a string copy routine. Consider something like the following code:
for (p=buf,q=usersuppliedstring;*q;*p++=*q++); if there is no length check inside
the loop the code is very broken, because it just copies the user supplied string into the buffer,
which results in a buffer overflow. So by looking into loops where string manipulation is done
you can find buffer overflows. You should also look for functions like memcpy and bcopy
which copy byte buffers into other ones with a length. If the destination buffer is smaller
than the source buffer and the length is good you also have buffer overflows.
Something like:
memcpy(buffer, usersuppliedstring, strlen(usersuppliedstring)); could be a buffer overflow
if no checks are done before supporting the memcpy function. One thing to know is that
most of the times buffer copies are checked very often in major software. More
unkown software and badly maintained have such obvious security issues. But even major
software has bugs in the source code.
1.) Very obvious bugs, are rare but often spread over different function calls so not easy to find
2.) Bugs deep inside the code
For remote exploitation of servers you should always look in the handling of the
protocol. Take for example a mail server. Look where the strings which are user supplied
are handled and how the parsing of the client input is done. Especially internal functions
used everywhere in the code are most often insecure. Also code paths of client input can
have insecure function calls like sprintf or strcpy. Finding format string bugs is far more
easy then finding buffer overflows. Just look at the prototype of all functions which
use format strings and after that grep for the calls made to them. Often you see code where
a %s is missing and you have a format string bug inside the code. An example is
syslog(LOG_DEBUG, string). This is a very obvious bug. You should not see this kind of bug
anymore in well maintained projects. But sometimes even believed secure code has
format string bugs, not in the standard function calls like syslog but in functions calls to
functions written by the programmer which use format string specifiers.
Buffer overflows can be quite different. For example off by ones are hard to spot,
because only one byte is written beyond the buffer. But you can try your luck and grep for
something like .*if.*>\ssizeof or .*if.*>MAX.* . These are checks made by the programmer
that no buffer overflow occurs.
Take this for example:

#define MAXBUFFERSIZE 1024

char buffer[MAXBUFFERSIZE];
if (strlen(somevariable)>MAXBUFFERSIZE+1)
	return ERR;
strcpy(buffer, somevariable);

You can see that because of the +1 the security checking code is flawed because during
the strcpy the buffer is overwritten with some bytes. Using regular expressions is an easy
way to spot bugs. Just think about what could be an issue in the source code and grep for
it using a regular expression with some cool tool.
If you are auditing kernel code for example the linux kernel or freebsd kernel
the code is very different from user mode programs of course.
But the auditing process remains the same. Look where you can find entry points where you
can manipulate things, for example system calls or ioctls. In Linux you could look for
copy_to_user and copy_from_user which copy data from kernel space into user space and vice versa.
The equivalent in BSD systems and Solaris is copyout and copyin. Insecure use of copyout
can result in a kernel memory disclosure and copyin in a buffer overflow in kernel land.
Another way to spot bugs in kernel source code are calls to allocation routines.
For example a plain allocation of a variable with a size field supplied by the attacker
can be a real issue because if you allocate 0x80000000 bytes in kernel land the system
crashes immediately and reboots. Most bugs in kernel source codes result due to
integer overflows. Nowadays many security checks are done that no integer
overflows/underflows occur, but be sure that there are some out there unknown to
anyone. So basically there are integer overflows/underflows in kernel source code.
For example:

int attackervalue;
int maxsize;
if (attackervalue > maxsize)
	return ERR;
copyout(buffer, userlandbuffer, attackervalue);
So if attackervalue is negative this results in a copyout of bigger size than buffer
and you have a kernel memory disclosure.
Also look at this code:

int attackervalue;
int maxsize;

if ((attackervalue < 0) || (attackervalue > maxsize)) return ERR;
attackervalue--;
copyout(kernelbuffer, userbuffer, attackervalue);

If you supply attackervalue as 0 then the copyout will have a very large value and
a kernel memory disclosure occurs. As you can see there can be very many ways to
trick the kernel with integer sizes and get buffer overflows and memory disclosures.
Today this kind of signdness issues are very rare or hidden across function calls.
There can also be bugs in kernel code where unknown memory is dereferenced.
It is quite different than in usermode, because this kind of invalid dereference
will result in a panic of the kernel and not in just a crash of the usermode application.
As a note you should start to audit system calls and ioctls or network code in kernels
because these are the most easy to manipulate. You can also get results by just fuzzing.
Suppling values like 0x80000000, negative values or invalid memory like 0x00000000 to the
syscalls,ioctls or network code can have really bad results like crashing the kernel.
For my experience BSD systems and Solaris are more likely to crash or disclose memory.
Linux is a bit hard to audit, it's overaudited, which does not mean that you cannot find
critical bugs in Linux, but you have to be really elite.

# milw0rm.com [2006-12-17]