Understanding a format string vulnerability

Posted: September 18, 2011 in Exploitation, Stack Smashing

For understanding this vulnerability, you first have a thorough understanding of the stack layout.

  • What is a format string?

From my understanding (quite naive), a format string is a string which determines the format of the rest of the arguments passed to a function. In the case of C language, functions such as printf(), scanf(), fprintf(), vsprintf() take format strings as argument.

Example: printf(“%d”,i);      => Here, ‘%d’ is a format string which determines the format of ‘i’ (in this case – integer).

  • What is a format string vulnerability?

A format string vulnerability is where an unsanitized input is passed to a function which is capable of taking a format string as argument. From a programmer’s perspective, this unsanitized string is assumed to be a normal string, but if given as a format string, can lead to examining values of the stack and much more.

  • What can be done by exploiting one?

By exploiting such a vulnerability, an attacker can dump the entire program’s stack, write into arbitrary memory locations and even execute custom written machine code.

Let’s take an example and proceed:

#include <stdio.h>

#include <string.h>

int main ( int argc, char **argv ) {

char buffer[1024];

if ( argc >= 2 ) {

strncpy ( buffer, argv[1], strlen(argv[1]) > 1024 ? 1024 : strlen(argv[1]) );

printf ( buffer );

}

return 0;

}

This program should ideally print the first command line argument (maximum length 1024) given to the program. One question we have to ask ourselves is:

  • Since the input given to the printf() function is unsanitized, what if we give a format string as argument?

Let’s analyze this a bit more. Let’s consider:

printf ( “%d”, i );

Here, you can see that there are two arguments to be passed to the printf function. Now, when printf takes the first argument and parses it, it sees “%d”. There, it knows that there is a second argument which must’ve been provided and that’s an integer i. So it takes the next argument (next 4 bytes below 1st argument) from the stack (which it assumes the calling function must’ve pushed) and prints it.

Let’s take the case:

printf ( buffer );        // Here, ‘buffer’ contains “%d”.

It’s clear from this that if ‘%d’ is present in buffer, printf tries to take the next argument from stack. Now, there isn’t any argument that would’ve been pushed by the calling function, and yet printf assumes that an argument should’ve been pushed and takes the next word from stack as the integer and prints it. Now, the value which is printed is some random value (mostly a local variable of the calling function). So, since we have the freedom to enter whatever format string we want in ‘buffer’, we have to ability to print the local variables of the calling function, and much more.

Let’s also learn about some possibilities in format strings:

  • “%p” – prints the value of the pointer provided as argument (essentially a 4 byte hex dump).
  • “%n” – stores the number of bytes written till now, in the pointer provided as argument.
  • “%Nd” – prints the integer provided as argument prefixed by ‘N’ empty spaces.

So given all this and a vulnerable printf() function:

  • We can dump the values of the stack by appropriate number of %p’s.
  • We can write the number of bytes written till now into an address provided as argument using ‘%n’.
  • We can use ‘%Nd’ to control the value written into the pointer.

Coming back to our example, we know that the argument passed to printf is the starting address of the character array ‘buffer’. The memory layout is:

Stack Layout of Format String

Stack Layout of Format String

Now, let’s see what we can put in ‘buffer’:

  • If ‘buffer’ = “abcd”, nothing new will happen and the function will just print ‘abcd’.
  • If ‘buffer’ = “%p”, then printf will see the next 4 bytes (part of buffer  itself) from the stack as the argument and print the value.
  • If ‘buffer’ = “%nAB”, then printf will consider the next argument (first 4 bytes of ‘buffer’) as a pointer to a variable and write the number of bytes printed till now (that’s zero), into the address pointed to by these 4 bytes. Essentially, it should give segmentation fault (SIGSEGV) as the value of the address to be written is “0x4241673e” (hex of ‘BAn%’ in little-endian) and that address is not accessible by this process.
  • If ‘buffer’ = “\x9c\xd8\xff\xbf%n”, then printf will consider the next argument as a pointer and try to write 4 into it (since \x9c\xd8\xff\xbf is 4 bytes). The next argument itself is 0xbfffd89c, which we assume in this case to be the address where the return address of printf is stored. If not, we try to debug the program, get the pointer to the return address of printf appropriately store that value. Hence, the program will again give a segmentation fault since after printf executes, it’s return address is 0x00000004 (overwritten by printf itself), and tries to execute instructions at that address, resulting in SIGSEGV.
  • If ‘buffer’ =
    • “\x01\x01\x01\x01\x9c\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9d\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9e\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9f\xd8\xff\xbf”+
    • “%10u%n%10u%n%10u%n%10u%n”

In this case, assuming that the return address is 4 btyes from 0xbfffd89c to 0xbfffd89f, printf will write 26 into the first byte of return address, 37 into the second byte, 48 into the third byte and 59 into the fourth byte. Thus the return address of printf this time is overwritten as 0x3b30251a. It’ll still result in a segmentation fault, but atleast we now know that we can write any arbitrary value into the return address of printf.

  • If we place some shellcode in the buffer after our format string and we know the starting address of the format string, then we can successfully execute our own code. Let’s assume that the format string starts at address 0xbfffd8a4. Now, the format string will end at 0xbfffd8a4 + 40 = 0xbfffd8cc and this is where our shellcode starts. Hence, we need to write this address 0xbfffd8cc as the return address of printf.
    • buffer =“\x01\x01\x01\x01\x9c\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9d\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9e\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9f\xd8\xff\xbf”+
    • “%175u%n%64u%n%217u%n%204u%n”+
    • “<shellcode>”

The above code will write the value ofbfffd8a4 into the return address of printf. 16 bytes were already written for the return address, so 175+16 = 191 (decimal of bf) will be written on the first byte of the return address. Subsequently, 191+64 = 255 (decimal of ‘ff’) will be written on the second byte of return address, and so on

Hence, with the above exploit, we’ll be able to execute any code we want using the format string.

How do we make our code secure?

Simple, don’t allow the user to be able to pass a format string to the function. A simple solution would be to write the format string yourself, and expect just the data from the user.

printf ( buffer );              —> INSECURE

printf ( “%s”, buffer );  —-> SECURE

Leave a comment