Archive for the ‘Reverse Engineering’ Category

I was introduced to smashthestack.org by Zubin, my partner in crime. 😛 I looked at io.smashthestack.org. At first I didn’t like it much, but then later on, it got really addictive! It has challenges starting from the testing the basics of C, moving on to buffer overflows, format string vulnerability, etc. I am currently on level11, and I’m thinking of post some hints on how to solve the problems.

The way the entire challenge is organized is really cool – simple, yet cool. There’s a remote linux box, onto which we have access over ssh protocol on port 2224. The password for level1 is “level1”. Password for each level is stored in the file /home/level<num>/.pass. So we need to be the user “level<num>” in the first place to access that level’s password! At first, it doesn’t look simple, but if we see the way the challenges are configured in the linux box, it becomes easy.

$ ssh -p 2224 level1@io.smashthestack.org
level1@io.smashthestack.org’s password:
level1@io:~$ cd /levels

level1@io:/levels$ ls -l level01
-r-sr-x— 1 level2 level1 7500 Nov 16  2007 level01
level1@io:/levels$

Here, we can see that the executable level01 is executable only by user ‘level1’ but owned by user ‘level2’. Hence, while executing the challenge’s executable, if we’re able to get back a shell (with setuid privileges), then are ‘level2’ in that shell! Hence, from that new shell, we can read the next level’s password and voila, we can access the next level’s executable.

This is how it basically works, and some of the challenges themselves are coded in such a way that if you’re able to do the right stuff, it gives you back a shell. It’s really interesting and addictive, so readers, if you get time and you’re interested in reverse engineering/binary analysis, this is definitely the way. 🙂

When I started off on reverse engineering, I noticed that there was a lack of good debuggers for the linux platform. GDB is awesome, but not very user-friendly. DDD aimed at creating a nice GUI for gdb, but somehow nothing seems to match IDA Pro. :-/

So, as an addition to my learning process, I wanted to code dd4linux, an abbreviation for “Disassembler and Debugger for Linux”. With a bit of motivation and advice, I started off coding it. I’ve also made it as a project on Sourceforge. Since I’m only a beginner, I focused only on disassembling ELF files on x86 32 bit architecture.

My aim – Develop a full GUI based debugger for the linux platform based on gtk2, mimicking the features of IDA Pro. The debugger will also have extra features such as vulnerability detection.

There are two main steps during disassembly:

  1. File format – Get to know what is the file format of the executable and how the instructions are stored in it.
  2. Instruction Decoding – Once you get to know where the instructions start, the next step is to decode them. Searching for tutorials on how to decode instructions didn’t yield much, but decoding is just the reverse of encoding an instruction. So you just have to reverse the process, and voola, there’s your instruction!

How did I do it?

The right way to do is to always do some background research on what exactly you’re trying to tackle and apply the knowledge in the form of code. In this case, reading the ELF specification manual and the x86-32 instruction manual from Intel. I first tried reading the ELF specification manual. Me being me, I read only the basics and started coding in C to decode an ELF header. I was successful, but only later did I realize that without reading fully, I will not be able to perform full decoding of the ELF file.

Once I could decode the ELF header and know exactly where the instructions are present, I went on to decode each instruction manually with Intel’s instruction manual as reference. I found a very good tutorial on the internet and that helped me a lot to understand instruction decoding.

It takes a lot of patience and memory to decode instructions successfully, and I’m nowhere near the end. I used the source code of objdump, libdisasm as reference on how to decode instructions. I then created a table of x86 instructions in one file, wrote another python file which’ll parse the table and create a structure of opcodes and corresponding instructions. So when I encounter an opcode, I check with the table as to which instruction it means, get the encoding format of that instruction – operand sizes, read the entire instruction, store the instruction in a character buffer and then print it. Simple isn’t it – yes in fact the process is very simple, but the minute details is what boggles your mind and yet matters a lot.

Currently, I’m able to decode single byte x86 instructions with proper virtual addressing for each instruction. Multibyte is yet to be done, and the table still lacks instructions with XMM instruction and also VMX instructions. I’ve kept that for later. Symbol table decoding has been done, but dynamic symbol table decoding and hash table decoding is yet to be done.

Well, how do I start! :-/

A long time ago, Zubin once motivated me to join him in reverse engineering. Well, at first I was like “WTH, I have better stuff to do”. I started looking at it anyway, starting from reverse engineering WIN32 executables. Initially, the look of instructions with their opcodes made my head swirl, but then when i actually started understanding it, it was no big deal! Seriously, every program which I have written till now is just a series of “mov”s, “add”s, “sub”s, “lea”s, “call”s, “jmp*”s and definitely a lot of “nop”s.  😛

When it came to debugging, I didn’t know ‘abc’ of it! Initially googling it up gave me “Ollydbg”, “Immunity Debugger”, etc. Tried all, not very appealing to an end user. But along came the king – IDA Pro! Phew, that is one damn good debugger that I’ve worked with till date. Man, it almost puts the cake in your mouth! 😀 The view of an entire program with graphs reduced the complexity in a reverser’s brain to almost nothing!

Anyway, back to the point. When I tried reversing applications, I had found so much of information and vulnerabilities in code which I don’t think is possible any other way! From my experience, reverse engineering is definitely worth it!

“Coding a program is one thing, knowing how it works is an entirely different thing – A true programmer knows both.”