dd4linux – My very own Disassembler and Debugger for Linux

Posted: September 17, 2011 in Reverse Engineering

When I started off on reverse engineering, I noticed that there was a lack of good debuggers for the linux platform. GDB is awesome, but not very user-friendly. DDD aimed at creating a nice GUI for gdb, but somehow nothing seems to match IDA Pro. :-/

So, as an addition to my learning process, I wanted to code dd4linux, an abbreviation for “Disassembler and Debugger for Linux”. With a bit of motivation and advice, I started off coding it. I’ve also made it as a project on Sourceforge. Since I’m only a beginner, I focused only on disassembling ELF files on x86 32 bit architecture.

My aim – Develop a full GUI based debugger for the linux platform based on gtk2, mimicking the features of IDA Pro. The debugger will also have extra features such as vulnerability detection.

There are two main steps during disassembly:

  1. File format – Get to know what is the file format of the executable and how the instructions are stored in it.
  2. Instruction Decoding – Once you get to know where the instructions start, the next step is to decode them. Searching for tutorials on how to decode instructions didn’t yield much, but decoding is just the reverse of encoding an instruction. So you just have to reverse the process, and voola, there’s your instruction!

How did I do it?

The right way to do is to always do some background research on what exactly you’re trying to tackle and apply the knowledge in the form of code. In this case, reading the ELF specification manual and the x86-32 instruction manual from Intel. I first tried reading the ELF specification manual. Me being me, I read only the basics and started coding in C to decode an ELF header. I was successful, but only later did I realize that without reading fully, I will not be able to perform full decoding of the ELF file.

Once I could decode the ELF header and know exactly where the instructions are present, I went on to decode each instruction manually with Intel’s instruction manual as reference. I found a very good tutorial on the internet and that helped me a lot to understand instruction decoding.

It takes a lot of patience and memory to decode instructions successfully, and I’m nowhere near the end. I used the source code of objdump, libdisasm as reference on how to decode instructions. I then created a table of x86 instructions in one file, wrote another python file which’ll parse the table and create a structure of opcodes and corresponding instructions. So when I encounter an opcode, I check with the table as to which instruction it means, get the encoding format of that instruction – operand sizes, read the entire instruction, store the instruction in a character buffer and then print it. Simple isn’t it – yes in fact the process is very simple, but the minute details is what boggles your mind and yet matters a lot.

Currently, I’m able to decode single byte x86 instructions with proper virtual addressing for each instruction. Multibyte is yet to be done, and the table still lacks instructions with XMM instruction and also VMX instructions. I’ve kept that for later. Symbol table decoding has been done, but dynamic symbol table decoding and hash table decoding is yet to be done.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s