Archive for the ‘Background Information’ Category

A bit of gyan (knowledge)

The internet is maturing at an extremely fast rate day-by-day, and the world-wide-web (www)  has become a central hub for information available worldwide. Nowadays, communication between the far ends of the world has become trivial. The dot-com boom happened in the mid-1990’s and companies have started depending hugely on the internet since then. This has paved way to a huge number of possibilities, along with risks. Companies and customers and retailers can buy and sell online and e-commerce has become substantially important because of this.

What I’ve found is that however fast technology grows, people’s minds don’t change. No matter how secure you tend to keep your transaction between the client and server, e-commerce’s growth has not increased very much because of the constant fear in people’s minds – “How can I trust this fellow when I cannot even see him? What if I pay online but don’t get my package?”. A typical example is the huge number of credit card frauds over the decades, which has just increased the fear in people’s minds.

Each time a vulnerability is discovered on a particular website, it has been exploited and has incurred huge losses for the company hosting that website. Time and again, people have tried to keep websites as secure as possible. Theoretically, algorithms (used in security) have been proven to be secure (till date) and yet, attackers have always found ways and means to breach security.

In my opinion, it is just plain ignorance of the designer to ignore the security aspects to make his work easier. Though development of technology is rapidly increasing and we learn new things everyday, secure coding practices are not learnt in the process. This in turn leads to security holes in the implementation of software, which are then exploited by attackers causing huge losses to companies.

Let’s try to answer some simple questions:

  • How do you host webpages over the world-wide-web?
    • In most cases, web pages are accessed using the http(s) or (s)ftp protocols. If a person wants to host a website over the world-wide-web, (s)he has to first register his/her domain name. This means that the domain name will get mapped to a particular IP address which is reachable from anywhere in the world (called as ‘public ip’). Next, the person has to enable the website to be accessible from the machine having the assigned IP address, which is generally done using a web server to host his/her website. Now, the website is available to anyone who either knows the public IP or the registered domain name.
  • What programming language can be used while implementing the same?
    • There are a huge number of scripting languages available, which designers can use to create websites. Examples are PHP, JSP, ASP, etc. Programming constructs differ in each language, but end up doing the same things. There is also CGI (common gateway interface) where you can use scripting languages such as Python, Perl, Ruby, etc. to do the same job.
  • What should one do to make my web application secure?
    • This question cannot be answered in one paragraph. Anyway, I’ll try listing a few:
      • Firstly, it requires a good knowledge of the exact working of the code which designers write. Talking with an example, it means that knowing that “strcpy()” function copies one string to another is not enough, but rather the programmer needs to know how exactly it copies and why it is made so.
      • Secondly, the programmer who implements the software needs to have deep knowledge about secure coding practices – what, why and how. Secure coding practices try to ensure that there are minimal security holes in software being designed, thus ensuring safety, security and stability of software. Other factors such as reliability, integrity tag along if these conditions are met.

Now, based on the three questions answered above, we can come to a standpoint as to what factors determine how secure a website is. In decreasing order of importance and difficulty:

  1. Knowledge of the programmer.
  2. Network layout being used.
  3. Configurations being used in software.

We know that the only way to access a website hosted on a public IP is through the internet. Without the internet, the world-wide-web becomes a big joke. When we look at how the internet is designed, we see that networking plays a huge role. Hence, the protocols being implemented during transfer of data have to be secure. No matter how secure the application is, if the networking protocols being implemented are insecure, security is threatened. This is one basic fact that all web designers have to understand. Most of the devices used in the internet today, use the 5 layer hybrid protocol stack. This protocol stack is known to be insecure, and is prone to MITM attacks (DNS cache poisoning, ARP spoofing, IP spoofing, etc.)

Management of a website is normally done through configuration settings. These configuration settings determine how users of the website can access data and with what level of permissions. These configuration settings for the website can be divided into two parts – configurations of web server and the configurations of the user who is accessing the website. Configurations of the web server mean those configurations which affect all users accessing the website, whereas user-specific configurations apply to single users accessing the website. An example of a web-server configuration is the “Directory Listing” option, where a user can list the contents of a directory accessible through the website, without a webpage displaying it. An example of a user-specific configuration is the access control being specified to each user, controlled by an ACL (Access Control List). Programming languages sometimes influence how these user-specific configurations are specified.

Can we make the world-wide-web ‘entirely’ secure?

A simple answer would be “Entirely secure?! I don’t think so!”. But there are a lot of factors to consider while answering this question. Let’s look at some of them.

Firstly, the programmer implementing the software has a good knowledge of secure coding practices. He/she has to know exactly how the code is being implemented and how secure it is. This is where programming languages play an important role. Some programming languages provide very high-level programming constructs to make the job easier for the programmer, but this actually blinds the programmer from the inner implementation of the constructs and how secure they are. Thus security does not only rely on how the the programmer codes, but also how the code is being implemented by the compiler/interpreter of that particular programming language. The programmer has to take care of this, carefully considering the programming language that is being used and how it is actually being implemented.

There isn’t much that can be done about the security level of the entire protocol stack. This is because if we have to modify the protocols in the protocol stack to make it secure (below the application layer), then we would have to change the firmware in every hub, switch, router and computer all around the world. For a long time, people have been changing the protocols at the application layer to secure ones (such as SSL), trying to prevent MITM attacks at the application layer. But then we have to understand that whatever is done on the application layer is specific only to that layer. The security mechanisms used in the application layer are totally blind to attacks happening at the lower layers. Thus, if we actually would have to make the network layout totally secure, that wouldn’t be possible. But what we can do is to provide more encryption mechanisms at the application layer, hoping for the best. So from the network point of view, the world-wide-web is still insecure and will continue to be until the entire protocol stack can be made secure.

In most of today’s websites, vulnerabilities arise due to insecure configurations being used. The programmer is lazy, thus leaving insecure configurations on the website, paving way for information leak and potential exploits. Though this is relatively easier to handle when compared to the other factors, it is important when it comes to security of a website.

What now?

The very need of security arises because of the fact – all of us are not responsible citizens. There would be no need for policemen if there were no thieves. But this is definitely not achievable, because changing hardware and software is a lot easier than changing people! There is a reason that I’ve said that “knowledge of the programmer” is more important and harder to achieve than “making the network layout secure”. What I mean is that it is easier to change all the hubs, switches, routers and computers all over the world to achieve security, than to strive to achieve that every programmer has to have the knowledge of secure coding practices! 😀

During my under-graduation, a professor had once said “It is a never-ending race between designers, attackers and security experts”. Designers keep developing technology, while attackers keep finding security holes in the implementation of that technology, and security experts try to come up with workarounds to patch these holes. This seems to be true, not only with computers, but with any technology used in this world!

We have to do best with what we have. We know that there are attackers prowling in the wild, looking for vulnerable websites to deface, or probably steal data from. So it is our responsibility to secure our data, no matter what. We have talked about some of the factors influencing security, so we will have to look deeper into the same and try to come up with an effective, yet secure implementation.

Advertisements

Let’s discuss a few ways of effectively using GDB while debugging programs. Though DDD aims at providing a reasonable UI for GDB, it still uses the command line interface of GDB for it’s work. So let’s see what options are available in GDB. There are a lot of tutorials which can be found online regarding the same, but I wish to keep this a bit naive for beginners.

This is just a note on my experiences with gdb. You can look at tutorials from betterexplained.com and RMS for more features.

  • What are the prerequisites to be done before starting to debug the program using gdb?
    • Firstly, you have to compile the program with the ‘-g’ option to provide debugging symbols for gdb. This can be done by:
      • user@host:~$ gcc program.c -o program_executable -g
    • Next, you have to invoke the executable using ‘gdb’. In my Ubuntu system, I normally do by:
      • user@host:~$ gdb program_executable
  • How do I set breakpoints in the program?
    • In case of setting breakpoints in source code:
      • (gdb) break program.c:29    (set breakpoint at line number 29 of source file program.c)
      • (gdb) b program.c:29
      • (gdb) b 29 (This case is when there is only one C file compiled)
    • In case of setting breakpoints at specific addresses:
      • (gdb) break *0x8048255
      • (gdb) b *0x8048255
  • How do I disassemble only particular addresses within a program using gdb?
    • In case of disassembly of a function using it’s name:
      • (gdb) disas <name>
    • In case of disassembly using addresses:
      • (gdb) info line <name_of_function> (optional)
      • (gdb) disas <start_addr> <end_addr>
  • How do I print the value of a register?
    • To see values in all registers:
      • info registers
    • To see value in individual registers:
      • print (or) x/bwx <register> (register=$eax/$ebx/$ecx/$edx/$esp/$ebp/$eip/$edi/$esi)
  • In what ways can I view the value of memory address/variable?
    • HEX:
      • x/x <var>           (only 1 byte)
      • x/2x <var>         (2 bytes)
      • x/2bwx <var>   (2 words)
    • As string:
      • x/s <var>
    • As character:
      • x/c <var>
  • How do dump the stack?
    • Get 32 words from stack:
      • x/32bwx $esp
  • How can I disassemble a particular instruction at a particular address?
    • At any address:
      • x/i <address>
    • The next instruction:
      • x/i $esp
  • How do I step into functions?
    • Step by source lines:
      • step (OR) s
    • Step by instructions:
      • stepi (OR) si
  • How do I step over functions?
    • Step over source lines:
      • next (OR) n
    • Step over instructions:
      • nexti (OR) ni
  • How do I continue till I meet another breakpoint?
    • continue (OR) c
  • How do I perform multiple operations with one command?
    • (gdb) define function1
    • > stepi
    • > echo “ebp = “
    • > x/bwx $esp
    • > end
    • (gdb) function1
    • esp = 0xbfffd89c
    • (gdb)

There are a lot more advanced features that can be used, I’ll keep adding new stuff as and when I encounter them. 🙂

Stack Memory Layout

Posted: September 18, 2011 in Background Information

Let’s first understand the memory layout in the stack of a program during execution. When a program loads into memory, an address space is allocated for the program in the stack. One point to note is that all these addresses are virtual addresses we’re talking about, not physical addresses. Then, the stub code is called, which loads the program into memory and does various administration stuff. Next, when the starting function is called, which is normally present in the .text section of an ELF header. The interesting part comes only when the actual program starts (that’s the main() function).

Each function when called, sets up it’s own stack frame on the stack, executes, and upon finishing, gives back the control to the calling function. There are three registers we have to understand properly in order to proceed:

  • EBP register: This register contains an address, which is the base address of the stack frame of the current function in execution. We call this the “base pointer”, as this signifies that this address is the base and all the memory which the executing function uses on the stack starts from here.
  • EIP register: This register holds the address of the next instruction to be executed and thus is called the “instruction pointer”.
  • ESP register: This register contains an address, which signifies the TOS (top-of-stack) of the executing function. Thus the value of “ESP-EBP” gives the size of the stack of a function.

Now, let’s move onto how the stack frames are set up by a function when it loads. Let’s take an example where a function “A” calls another function “B”. Now, ‘A’ has to issue a “call” instruction to the starting address of function ‘B’.

A common question:

What’s the difference between a “call” instruction and a “jmp” instruction? Why cant the calling function just say “jmp <addr>” instead of “call <addr>”?

The answer to this is quite simple. When function ‘A’ calls function ‘B’ and ‘B’ finishes executing, it must return back to the address of the instruction in ‘A’ and continue execution. Now, how does ‘B’ know the address that must be put onto EIP register so that ‘A’ can resume execution? This is done by the “call” and “ret” instructions.

Let’s say ‘A’ has the following instructions:

0x8048350 call 0x8048360 <function ‘B’>

0x8048355 test eax, eax

And ‘B’ has the following instructions:

0x8048360 mov eax, 0x01

0x8048362 ret

When “call <addr>” is executed by ‘A’, the address from which execution must resume (in this case 0x8048355) is pushed onto the stack. Then ‘B’ starts executing, and after it finishes executing, the “ret” instruction pops the value from TOS onto EIP, which in turn is the address of the next instruction to be executed. We call this address the return address. Now, “jmp 0x8048360” will not push the return address on the stack, and hence the system will never be able to know which instruction it has to resume execution from. Thus, we always call functions using the “call” instruction.

Before function ‘A’ calls function ‘B’, it must somehow make the arguments available for ‘B’. This is done by pushing the arguments onto the stack before the “call” instruction is made. The arguments are pushed in reverse order, since ‘B’ must be able to see argument 1 on TOS, argument 2 just below that, etc. Therefore, before ‘A’ calls ‘B’ (say 2 arguments), the following will take place:

push <argument 2>

push <argument 1>

call <B_addr>

The stack layout just as ‘B’ starts execution will be:

Stack Layout when function 'A' calls function 'B'

Stack Layout when function 'A' calls function 'B'

When function ‘A’ calls function ‘B’, the return address (from where execution has to be resumed) is pushed onto the stack, and ‘B’ starts executing. Now, ‘B’ also has to have it’s own stack frame. So the first 2 instructions of any function are:

push ebp

mov ebp, esp

These two instructions set up the stack for the called function ‘B’. The previous base pointer of function ‘A’ is saved, by pushing it onto the stack. It then sets the value of the base pointer as TOS, representing this as the bottom-of-stack for function ‘B’. All local variables come on top of the base pointer of any function.

Now, when the called function ‘B’ exits, it has to restore the previous value of the base pointer in register EBP (base pointer of calling function ‘A’). This is done by popping the value of TOS to the EBP register after execution of ‘B’ is finished. Hence, an instruction which you’ll see in every function just before it returns is:

pop ebp

NOTE: The value of addresses used in the stack decreases as the size of the stack increases. A common phrase is that the program stack grows downwards, but what it actually means is that the ‘value’ of each subsequent address in the stack keeps decreasing as the stack grows.