Stack Memory Layout

Posted: September 18, 2011 in Background Information

Let’s first understand the memory layout in the stack of a program during execution. When a program loads into memory, an address space is allocated for the program in the stack. One point to note is that all these addresses are virtual addresses we’re talking about, not physical addresses. Then, the stub code is called, which loads the program into memory and does various administration stuff. Next, when the starting function is called, which is normally present in the .text section of an ELF header. The interesting part comes only when the actual program starts (that’s the main() function).

Each function when called, sets up it’s own stack frame on the stack, executes, and upon finishing, gives back the control to the calling function. There are three registers we have to understand properly in order to proceed:

  • EBP register: This register contains an address, which is the base address of the stack frame of the current function in execution. We call this the “base pointer”, as this signifies that this address is the base and all the memory which the executing function uses on the stack starts from here.
  • EIP register: This register holds the address of the next instruction to be executed and thus is called the “instruction pointer”.
  • ESP register: This register contains an address, which signifies the TOS (top-of-stack) of the executing function. Thus the value of “ESP-EBP” gives the size of the stack of a function.

Now, let’s move onto how the stack frames are set up by a function when it loads. Let’s take an example where a function “A” calls another function “B”. Now, ‘A’ has to issue a “call” instruction to the starting address of function ‘B’.

A common question:

What’s the difference between a “call” instruction and a “jmp” instruction? Why cant the calling function just say “jmp <addr>” instead of “call <addr>”?

The answer to this is quite simple. When function ‘A’ calls function ‘B’ and ‘B’ finishes executing, it must return back to the address of the instruction in ‘A’ and continue execution. Now, how does ‘B’ know the address that must be put onto EIP register so that ‘A’ can resume execution? This is done by the “call” and “ret” instructions.

Let’s say ‘A’ has the following instructions:

0x8048350 call 0x8048360 <function ‘B’>

0x8048355 test eax, eax

And ‘B’ has the following instructions:

0x8048360 mov eax, 0x01

0x8048362 ret

When “call <addr>” is executed by ‘A’, the address from which execution must resume (in this case 0x8048355) is pushed onto the stack. Then ‘B’ starts executing, and after it finishes executing, the “ret” instruction pops the value from TOS onto EIP, which in turn is the address of the next instruction to be executed. We call this address the return address. Now, “jmp 0x8048360” will not push the return address on the stack, and hence the system will never be able to know which instruction it has to resume execution from. Thus, we always call functions using the “call” instruction.

Before function ‘A’ calls function ‘B’, it must somehow make the arguments available for ‘B’. This is done by pushing the arguments onto the stack before the “call” instruction is made. The arguments are pushed in reverse order, since ‘B’ must be able to see argument 1 on TOS, argument 2 just below that, etc. Therefore, before ‘A’ calls ‘B’ (say 2 arguments), the following will take place:

push <argument 2>

push <argument 1>

call <B_addr>

The stack layout just as ‘B’ starts execution will be:

Stack Layout when function 'A' calls function 'B'

Stack Layout when function 'A' calls function 'B'

When function ‘A’ calls function ‘B’, the return address (from where execution has to be resumed) is pushed onto the stack, and ‘B’ starts executing. Now, ‘B’ also has to have it’s own stack frame. So the first 2 instructions of any function are:

push ebp

mov ebp, esp

These two instructions set up the stack for the called function ‘B’. The previous base pointer of function ‘A’ is saved, by pushing it onto the stack. It then sets the value of the base pointer as TOS, representing this as the bottom-of-stack for function ‘B’. All local variables come on top of the base pointer of any function.

Now, when the called function ‘B’ exits, it has to restore the previous value of the base pointer in register EBP (base pointer of calling function ‘A’). This is done by popping the value of TOS to the EBP register after execution of ‘B’ is finished. Hence, an instruction which you’ll see in every function just before it returns is:

pop ebp

NOTE: The value of addresses used in the stack decreases as the size of the stack increases. A common phrase is that the program stack grows downwards, but what it actually means is that the ‘value’ of each subsequent address in the stack keeps decreasing as the stack grows.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s