When I started off on reverse engineering, I noticed that there was a lack of good debuggers for the linux platform. GDB is awesome, but not very user-friendly. DDD aimed at creating a nice GUI for gdb, but somehow nothing seems to match IDA Pro. :-/

So, as an addition to my learning process, I wanted to code dd4linux, an abbreviation for “Disassembler and Debugger for Linux”. With a bit of motivation and advice, I started off coding it. I’ve also made it as a project on Sourceforge. Since I’m only a beginner, I focused only on disassembling ELF files on x86 32 bit architecture.

My aim – Develop a full GUI based debugger for the linux platform based on gtk2, mimicking the features of IDA Pro. The debugger will also have extra features such as vulnerability detection.

There are two main steps during disassembly:

  1. File format – Get to know what is the file format of the executable and how the instructions are stored in it.
  2. Instruction Decoding – Once you get to know where the instructions start, the next step is to decode them. Searching for tutorials on how to decode instructions didn’t yield much, but decoding is just the reverse of encoding an instruction. So you just have to reverse the process, and voola, there’s your instruction!

How did I do it?

The right way to do is to always do some background research on what exactly you’re trying to tackle and apply the knowledge in the form of code. In this case, reading the ELF specification manual and the x86-32 instruction manual from Intel. I first tried reading the ELF specification manual. Me being me, I read only the basics and started coding in C to decode an ELF header. I was successful, but only later did I realize that without reading fully, I will not be able to perform full decoding of the ELF file.

Once I could decode the ELF header and know exactly where the instructions are present, I went on to decode each instruction manually with Intel’s instruction manual as reference. I found a very good tutorial on the internet and that helped me a lot to understand instruction decoding.

It takes a lot of patience and memory to decode instructions successfully, and I’m nowhere near the end. I used the source code of objdump, libdisasm as reference on how to decode instructions. I then created a table of x86 instructions in one file, wrote another python file which’ll parse the table and create a structure of opcodes and corresponding instructions. So when I encounter an opcode, I check with the table as to which instruction it means, get the encoding format of that instruction – operand sizes, read the entire instruction, store the instruction in a character buffer and then print it. Simple isn’t it – yes in fact the process is very simple, but the minute details is what boggles your mind and yet matters a lot.

Currently, I’m able to decode single byte x86 instructions with proper virtual addressing for each instruction. Multibyte is yet to be done, and the table still lacks instructions with XMM instruction and also VMX instructions. I’ve kept that for later. Symbol table decoding has been done, but dynamic symbol table decoding and hash table decoding is yet to be done.

Well, how do I start! :-/

A long time ago, Zubin once motivated me to join him in reverse engineering. Well, at first I was like “WTH, I have better stuff to do”. I started looking at it anyway, starting from reverse engineering WIN32 executables. Initially, the look of instructions with their opcodes made my head swirl, but then when i actually started understanding it, it was no big deal! Seriously, every program which I have written till now is just a series of “mov”s, “add”s, “sub”s, “lea”s, “call”s, “jmp*”s and definitely a lot of “nop”s.  😛

When it came to debugging, I didn’t know ‘abc’ of it! Initially googling it up gave me “Ollydbg”, “Immunity Debugger”, etc. Tried all, not very appealing to an end user. But along came the king – IDA Pro! Phew, that is one damn good debugger that I’ve worked with till date. Man, it almost puts the cake in your mouth! 😀 The view of an entire program with graphs reduced the complexity in a reverser’s brain to almost nothing!

Anyway, back to the point. When I tried reversing applications, I had found so much of information and vulnerabilities in code which I don’t think is possible any other way! From my experience, reverse engineering is definitely worth it!

“Coding a program is one thing, knowing how it works is an entirely different thing – A true programmer knows both.”