Writeup about source code auditing How to to break code by reading it by kingcope/2006 [kingcope@gmx.net] +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Source code auditing at first is reading source, probably auditing different kinds of programming languages, to find security related bugs. The goal of source code auditing is to find critical bugs in for example open source software and exploit this bugs later on to gain unauthorized access to computers. Here are some programming languages which you could audit and the skills you need to have to find bugs in. PHP Easy to read code and find bugs in no time PERL Easy to read but more secure than PHP C/C++ From easy to really hard depends on the program to audit Java Easy to read, but hard to find bugs, since Java does not have many critical issues by design Sometimes known bugs in source code are spread in a vast amount of files, because sometimes open source code is shared between projects. So you could find a bug in one software and also in different kinds of other projects. We will begin by looking at C code, because this is the most widespread programming language among the open source community. At first you take some C based code from the internet (@freshmeat.net for example) unpack it and preferrably install it on your system. After that you need some kind of cool tool to surf inside the code. Just plain file auditing without a source code browser takes too much time. In my experience PowerGrep is the best tool out there to audit in all languages available. Source Code Navigator written by RedHat is also an option. Basically at first you grep for common mistakes and later on digg deeper into the code. For example here is a list of mistakes in C, which could lead to an exploit later on. Insecure use of sprintf sprintf(buffer, "%s", somevariable) if buffer is too small for somevariable than we have a buffer overflow, also if you see something like that look above this function call to identify if this could be a stack based or heap based overflow. A char *buffer; with a later allocation (for example malloc) indicates a heap overflow. A char buffer[1024] for example indicates a stack based overflow and a static buffer[1024] will result in a bss overflow. Most of the time you will see many plain sprintf's, but nowadays they are checked before for enough buffer size. For example consider this type of code: buffer = (char*) malloc(strlen(somevariable)+1); sprintf(buffer, "%s", somevariable); This is a way to secure the sprintf call by allocating enough space for the buffer to write to. The secure way is to use snprintf(buffer, sizeof(buffer), "%s", somevariable);. There could also be security issues if the second parameter can be manipulated but this is very rare. Sometimes you can also see code like this: sprintf(buffer, somevariable) which is really bad, this will result in a format string vulnerability if somevariable is user supplied. It is also very important that you look for security vulnerabilities in code paths where the data to the function calls is really user supplied. By that I mean that somevariable can be manipulated in some way by the attacker. If you see functions and parameters which cannot be supplied by the attacker or user you should not further analyze them. Sometimes it is hard to say if a variable is user supplied, but by using the source auditing tools you should quickly check if this is the case. When you grep for insecure function calls then always look where the function is called and in which function it resides as well. If the function call is checked before for length or other checks always look if these checks are made well. Sometimes they are just broken in some way. If you found something that looks suspicious to yourself, like something that looks like a buffer overflow then try it out on a real testbed. Program a small script that just calls this function for example remotely (if the software you are auditing is a server). gdb can be handy, break at the function where the suspicious function call is located in so you can always be sure that the right codepath is taken by your testing script. For this to work you need to compile the C program with debugging support. Also common mistakes are done in different kinds of parsing code. You could just grep for while and for loops which do string manipulation on their own. Like grepping for .*++\s=\s.*++; (yes this is a regular expression) in the code could reveal a string copy routine. Consider something like the following code: for (p=buf,q=usersuppliedstring;*q;*p++=*q++); if there is no length check inside the loop the code is very broken, because it just copies the user supplied string into the buffer, which results in a buffer overflow. So by looking into loops where string manipulation is done you can find buffer overflows. You should also look for functions like memcpy and bcopy which copy byte buffers into other ones with a length. If the destination buffer is smaller than the source buffer and the length is good you also have buffer overflows. Something like: memcpy(buffer, usersuppliedstring, strlen(usersuppliedstring)); could be a buffer overflow if no checks are done before supporting the memcpy function. One thing to know is that most of the times buffer copies are checked very often in major software. More unkown software and badly maintained have such obvious security issues. But even major software has bugs in the source code. 1.) Very obvious bugs, are rare but often spread over different function calls so not easy to find 2.) Bugs deep inside the code For remote exploitation of servers you should always look in the handling of the protocol. Take for example a mail server. Look where the strings which are user supplied are handled and how the parsing of the client input is done. Especially internal functions used everywhere in the code are most often insecure. Also code paths of client input can have insecure function calls like sprintf or strcpy. Finding format string bugs is far more easy then finding buffer overflows. Just look at the prototype of all functions which use format strings and after that grep for the calls made to them. Often you see code where a %s is missing and you have a format string bug inside the code. An example is syslog(LOG_DEBUG, string). This is a very obvious bug. You should not see this kind of bug anymore in well maintained projects. But sometimes even believed secure code has format string bugs, not in the standard function calls like syslog but in functions calls to functions written by the programmer which use format string specifiers. Buffer overflows can be quite different. For example off by ones are hard to spot, because only one byte is written beyond the buffer. But you can try your luck and grep for something like .*if.*>\ssizeof or .*if.*>MAX.* . These are checks made by the programmer that no buffer overflow occurs. Take this for example: #define MAXBUFFERSIZE 1024 char buffer[MAXBUFFERSIZE]; if (strlen(somevariable)>MAXBUFFERSIZE+1) return ERR; strcpy(buffer, somevariable); You can see that because of the +1 the security checking code is flawed because during the strcpy the buffer is overwritten with some bytes. Using regular expressions is an easy way to spot bugs. Just think about what could be an issue in the source code and grep for it using a regular expression with some cool tool. If you are auditing kernel code for example the linux kernel or freebsd kernel the code is very different from user mode programs of course. But the auditing process remains the same. Look where you can find entry points where you can manipulate things, for example system calls or ioctls. In Linux you could look for copy_to_user and copy_from_user which copy data from kernel space into user space and vice versa. The equivalent in BSD systems and Solaris is copyout and copyin. Insecure use of copyout can result in a kernel memory disclosure and copyin in a buffer overflow in kernel land. Another way to spot bugs in kernel source code are calls to allocation routines. For example a plain allocation of a variable with a size field supplied by the attacker can be a real issue because if you allocate 0x80000000 bytes in kernel land the system crashes immediately and reboots. Most bugs in kernel source codes result due to integer overflows. Nowadays many security checks are done that no integer overflows/underflows occur, but be sure that there are some out there unknown to anyone. So basically there are integer overflows/underflows in kernel source code. For example: int attackervalue; int maxsize; if (attackervalue > maxsize) return ERR; copyout(buffer, userlandbuffer, attackervalue); So if attackervalue is negative this results in a copyout of bigger size than buffer and you have a kernel memory disclosure. Also look at this code: int attackervalue; int maxsize; if ((attackervalue < 0) || (attackervalue > maxsize)) return ERR; attackervalue--; copyout(kernelbuffer, userbuffer, attackervalue); If you supply attackervalue as 0 then the copyout will have a very large value and a kernel memory disclosure occurs. As you can see there can be very many ways to trick the kernel with integer sizes and get buffer overflows and memory disclosures. Today this kind of signdness issues are very rare or hidden across function calls. There can also be bugs in kernel code where unknown memory is dereferenced. It is quite different than in usermode, because this kind of invalid dereference will result in a panic of the kernel and not in just a crash of the usermode application. As a note you should start to audit system calls and ioctls or network code in kernels because these are the most easy to manipulate. You can also get results by just fuzzing. Suppling values like 0x80000000, negative values or invalid memory like 0x00000000 to the syscalls,ioctls or network code can have really bad results like crashing the kernel. For my experience BSD systems and Solaris are more likely to crash or disclose memory. Linux is a bit hard to audit, it's overaudited, which does not mean that you cannot find critical bugs in Linux, but you have to be really elite. # milw0rm.com [2006-12-17]