In this series I am going to introduce the concept of a stack, explain what it does, how it works and why it’s so important and what can be done when it’s not protected correctly.
I’m going to be explaining how the stack works on Linux based systems on x86-64 CPUs but it’s in most parts similar on Windows and other OSes as well as other processor architectures.
In order to fully understand topics covered you need to know how to do linux command line stuff (basics). You also need to have some experience with C programming. A bit of x86-64 assembly would also be useful but I’ll be trying to explain that part.
Although this post is primarily about the stack, I’ll be also explaining some aspects of the memory in general like for example alignment and endianness
Stack as a data type
If you read any article/blog post on the topic you’ll most likely see something along the lines of (in own words):
A stack is an abstract data type of LIFO (Last In First Out) characteristics that has two main operations: push and pop
Let’s break it down. We start with the part saying that it’s an abstract data type. That means that it is a sort of container for whatever data we need it to hold. It has its own specific behaviour, set of operations that can be executed on it, and specific constraints (more on that later)
Next part is important. Stack has a LIFO (Last In First Out) characteristic and uses PUSH and pop operations to access the data. What it means is that if you have something on the stack and you PUSH there something else you’ll not be able to access the item you put on first until you get rid of the one you put there later. It might seem complicated so let’s show how it works on the real world example.
Let’s say that you have a desk, you start a new project and you got a document printed (let’s call it document1). For easy access you place it next to you computer on that desk and start working on the project. But then you need to print another thing (document2) and you stack it on top of the document1. Then you add another document on top. At this point you have a stack of three documents looking more or less like that:
If you want to access document3 you just take it from the top of the stack. If you want to access document1 on the other hand, you need to first take document3 and document2 off the top of the stack and only when document1 is on top of the stack you can finally access it. That approach scales up. If you put another 20 documents on top of the stack, in order to access the first one you need to take all 20 off.
That is similar to how stack works in computer science. You have a surface (base pointer – RBP (more details later)) and you put data on top of the stack using instruction PUSH. If you want to access something that is not on top of the stack you need to remove everything that is above it by using POP instruction and then you can POP the required value into the register of your choice.
Stack implementation
Here is the simple demo showing how the stack works:
As you can see when you push something to the stack, its size is increased and a new item is added on top of it. When you pop that item, it gets removed.
You might have noticed that I’m talking all that time about stack growing and putting something on top of it but in the image, the item is added to the bottom of the stack. The explanation is a bit counter intuitive but simple. The stack grows downwards (it you add a new element to the stack, its address is going to be lower than that of the previous one). Which means the base of the stack (RBP – base register) has higher address in the memory and the top of the stack (RSP – stack pointer) has lower address. Think about it as like hanging something from the ceiling, if you add new item to the chain you add it to the end (bottom of it), the distance from the ground to the base of the chain (attached to the ceiling) is still the same as it was before adding new item but the end of the chain is now dangling closer to the ground. Same with the stack in memory – if you add new item to the stack the offset between the beginning of the memory and the base of the stack (the address of the base, RBP) stays the same but the end of the stack (top of it, RSP) is now closer to the start of the memory (its address is lower)
Stack as a memory
Although that is the formal definition of the stack, you have to remember one thing. Stack might be an abstract data type with some artificial constraint but it’s still just a chunk of a memory. That means that while in a theoretical model you can only access the top, in reality you can still access it at whatever point you want if you have an address to it. Think about is as like knowing that the book you need in the stack of books on your desk is the 3rd from the bottom. Instead of removing all the books from top you can just cheat a bit and just get it directly. Same in the computer, if you know the exact location of the necessary data on the stack you don’t have to go through the top of the stack but you can access that data directly.
There is just one problem. The whole point of the stack is to have a dynamic memory that can grow or shrink as needed without us having to worry about addresses. That means that we have to calculate the address on the fly. That also means that we have to base our address calculations on something. The two known addresses that have something to do with the stack are stack pointer stored in the RSP register (top of the stack) and base pointer stored in the RBP register (base of the stack). We can’t use a stack pointer cause like mentioned earlier it changes dynamically whenever we add or remove something from the stack. That means that out only option is a base pointer.
Base pointer is going to always (in normal cases) be static for a given function context. This means that if we want to save some data and we place it in the memory on address that is calculated as 8 lower than what RBP points to, we can do the same address calculation to get that data back (as long as we are in the same function context and nothing overwrites it). That’s why if you look at the program disassembly you can often find things like
mov dword [rbp - 0x334], edi
or
movl %edi, -0x334(%rbp)
Those two lines do exactly the same thing which is moving the content of EDI register to the memory location that is 0x334 lower than what RBP points at. The reason why they look different is that they use different assembly syntax.
The first one is Intel assembly and that’s what I’ll be mostly using in this blog from now on. First we have an opcode (MOV) saying what operation should me executed by a processor, next we can have the operand size (can be byte – 1 byte, word – 2 bytes, dword – 4 bytes or qword – 8 bytes). This is used to instruct the processor to operate on that data size. (Note: while in assembly level you can have different sizes with the exact same operand (mov byte, mov word etc.) in byte code they are actually different instructions). Size is followed by a destination operand which is followed by a source operand.
The second example is exactly the same line just using AT&T syntax. In it we first have an opcode (instruction) with the operand size directly appended to it (for movl: mov – instruction, l – size). Possible sizes are: b – byte, w – 2 bytes, l – 4 bytes, q – 8 bytes. This is followed by source operand (notice the % sign for register) followed by destination.
Here is a same stack demo but this time with addresses visible (like on 64-bit system, starting address is not relevant and was chosen at random):
Stack alignment
While I’ve said that the address was chosen at random it’s not completely true. There is one rule that you’ll see followed pretty much always: the stack is aligned to 64 bits (8 bytes) on 64-bit systems and 32 bits (4 bytes) on 32-bit ones.
That means that on 64 bit systems the addresses on the stack will be ending with either 0 or 8 (multiplies of 8) and on 32 bit systems with 0, 4, 8, or C (in base 16). That also means that if you push or pop something from/to stack the stack pointer address will not change by 1 as you might expect but by 4 (32-bit) or 8 (64-bit).
Note: In theory nothing is preventing the programmer from setting the stack pointer to not aligned address but with many instructions not aligned memory will cause (sometimes serious) performance penalties.
Values
Here is the final demo of the stack, this time with the ability to specify your own data and see how they will react:
You might have noticed that when you were typing a number in hex the bytes have been actually stored in the memory in a reversed order. This is due to something called endianness. x86 architecture (and 64-bit variant) is what’s called little-endian by design. That means that the order of bytes is reversed in the memory. As such if you were trying to store a value:
0x12345678
in memory it’ll be stored as
0x78 0x56 0x34 0x12
but whenever there is any operation done on that value it’ll still be treated as if it was 0x12345678, it’s just the matter of encoding, nothing else but something to be aware of.
As such if you examine the memory in a debugger, a chunk of memory holding 0x12345678 can be represented as one of those:
0x12345678
0x5678 0x1234
0x78 0x56 0x34 0x12
That’s also true for assembly where you have any number larger than 1 byte represented in hex. It’s also worth noticing that the order of bits inside the byte are preserver (0b00111010 will not be represented as 0b01011100).
Functions
Now, what is it about that function context? As the stack is mostly used for storing local variables, if you call a function you don’t want to have to worry that that function will mess up your stack. In fact every function wants to have it’s own private stack so that it can do whatever it wants with it. Another important thing is that when you call a function, that function needs to know where to return the execution after it finishes. Both of those problems are solved by something called a stack frame.
CALL/RET
Let’s start from the beginning: what starts when you call a function and when you return from it. When you execute a
CALL address ; for example CALL 0x1234
instruction, the address of the next instruction following the call is pushed onto the stack and the execution jumps to the address specified.
For example when you execute following assembly code: (numbers on the left are addresses, addresses and values are as in 32-bit mode to make numbers smaller)
0x07ff1231 PUSH 0x00001234 0x07ff1236 CALL 0x07ff2390 ; <-- [1] 0x07ff123a INC eax ; <--[return_address] ... 0x07ff2390 DEC ebx 0x07ff2391 INC ecx 0x07ff2392 POP edx ; <-- [2] 0x07ff2393 RET
Just after the processor executed the CALL instruction [1], the execution would jump to 0x00ff2390 (the EIP register (32-bit version of RIP register, instruction pointer) would be set to that value). Stack would look as follows (first as full 32-bit numbers then in single bytes, remember about little-endianness, memory addresses grow from left to the right)
32-bit: 0x07ff123a 0x00001234
--------------------------
8-bit (single bytes): 3a 12 ff 07 34 12 00 00
That means that if you execute the POP instruction later [2], the value that ends up in edx register would not be 0x00001234 but actually the address of INC eax instruction (return pointer) – 0x07ff123a [return_address]
What RET instruction does is the exact opposite to CALL. It pops the value from the stack and jumps to it. For example in the code
0x07ff1231 PUSH 0x00001234 0x07ff1236 CALL 0x07ff2390 ; <--[1] 0x07ff123a INC eax ... 0x07ff2390 DEC ebx 0x07ff2391 INC ecx 0x07ff2392 RET ; <--[3]
after the RET [3] instruction executes the execution would get back to 0x07ff123a (INC eax) and the stack would again only hold 0x00001234 as the return address was removed from it during RET instruction.
As a little challenge try to think what would happen if the RET instruction executed in the previous example.
Stack Frame
If you have ever looked at a disassembly of a c-compiled program you might have seen a structure looking like that:
PUSH RBP MOV RBP,RSP SUB RSP,0x10 ... LEAVE RET
The first three lines of that code are called function prologue and their role is to set up something called stack frame (more on that later). Last two lines are called function epilogue and they restore the stack frame of the previous function.
First let’s discuss what the stack frame is in the first place. Long story short it’s a chunk of stack that holds such information as:
– function parameters
– return address
– local variables
Normally, when you call a function you first push the parameters to the stack (in reversed order, they will be later accessible at [rbp+offset], next call instruction pushes the return address ([rbp], base pointer (rbp) is set to the address of that return address after which there is some space made on the stack for local variables ([rbp-offset])
Let’s analyze it step by step: for a function above called in C with arguments (1,2,3) (just so that we have some parameters) (and assuming that the address of instruction directly following call is 0x998876)
CALL
At the moment we enter the function using call operation we have the following stack layout: (the lowest addresses are on the tom, highest on the bottom for readability, pushing new data would add it to the top), [in square brackets there are locations of where the registers are pointing at], on the left there are addresses
[RSP points here]
0x7fffff11224110 76 88 99 00 00 00 00 00
0x7fffff11224118 01 00 00 00 00 00 00 00
0x7fffff11224120 02 00 00 00 00 00 00 00
0x7fffff11224128 03 00 00 00 00 00 00 00
[RBP somewhere beyond this point, address is let's say 0x7fffff11223300 ]
PUSH RBP
Next we have a push RPB instruction. It might be counter intuitive that we are pushing to the stack one of the stack delimiters but you’ll see that it actually makes sense. This instruction saves the current address of the base pointer to the stack. Here is how the memory looks now:
[RSP points here]
0x7fffff11224108 00 33 22 11 FF 7F 00 00
0x7fffff11224110 76 88 99 00 00 00 00 00
0x7fffff11224118 01 00 00 00 00 00 00 00
0x7fffff11224120 02 00 00 00 00 00 00 00
0x7fffff11224128 03 00 00 00 00 00 00 00
[RBP somewhere beyond this point, address si let's say 0x7fff11223300 ]
MOV RBP, RSP
This is where the magic happens. The base pointer is set to the same value as a top of the stack. This means that the function effectively starts with a clean stack, with its own local variables (accessed as [rbp-x] where x is an offset) and with function parameters accessible as [rbp+x]. Here is how it looks now:
[both RSP and RBP point here]
0x7fffff11224108 00 33 22 11 FF 7F 00 00
0x7fffff11224110 76 88 99 00 00 00 00 00
0x7fffff11224118 01 00 00 00 00 00 00 00
0x7fffff11224120 02 00 00 00 00 00 00 00
0x7fffff11224128 03 00 00 00 00 00 00 00
Notice one thing: while it looks like we have now lost track of where the base was for the previous function, it is just a single POP rbp away!!
SUB RSP, 0x10
All that instruction does is allocate the memory on the stack for the local variables. It is done so that the local variables can be accessed by offset from RBP and at the same time we can PUSH and POP data at the stack without overwriting them (obviously within reason, if we POP more data than we PUSHed it will start messing with the local variables). Our memory now looks like that:
[RSP points here]
0x7fffff11224108 XX XX XX XX XX XX XX XX
0x7fffff11224108 XX XX XX XX XX XX XX XX
[RBP points here]
0x7fffff11224108 00 33 22 11 FF 7F 00 00
0x7fffff11224110 76 88 99 00 00 00 00 00
0x7fffff11224118 01 00 00 00 00 00 00 00 <- beginning of our parameters
0x7fffff11224120 02 00 00 00 00 00 00 00
0x7fffff11224128 03 00 00 00 00 00 00 00
[data of previous function are still here]
And that is the state of the stack when in the function. You might have noticed that I didn’t give the values for our local variables and instead wrote Xs. The reason is simple. While POPing data changes the location of the stack pointer, that data is still there! That means that until you write something to that memory location there can be anything in there, including some malicious payload that will make your application do something else than you planned, that’s why you ALWAYS want to initialize the variables. Another thing to note here is that not in every case there will be parameters to the function on the stack, even if that function has arguments. This can be cause by following a different calling convention, for example using registers instead of the stack!
LEAVE
After we are done with our function we need to clean up and restore the stack frame of the calling function. To do that we use a LEAVE instruction. It is basically an equivalent to the following code:
MOV RSP, RBP POP RBP
What it’ll do is first effectively undoing all the pushes/pops and allocation on the stack by restoring the top of the stack pointer to the base. And then it reads the address from a stack and sets a base register to it (remember when I said that the old base pointer is just a POP away?) That brings us back to this situation:
RET
After that all we have to do is return from the function which will jump back to the calling function and restore the stack to the state that it was in when execution left the caller:
[RSP points here]
0x7fffff11224118 01 00 00 00 00 00 00 00
0x7fffff11224120 02 00 00 00 00 00 00 00
0x7fffff11224128 03 00 00 00 00 00 00 00
[RBP somewhere beyond this point, address si let's say 0x7fff11223300 ]
Summary
And that is the basics of how the stack operates. Quick TL;DR:
- Stack is a LIFO structure
- operated on with PUSH and POP instructions
- delimited with addresses in RSP (top of the stack) and RBP (base of the stack)
- grows towards lower addresses
- RSP changes with PUSH and POP, RPB is static for the function
- local variables can be accessed as [rbp-x] where x is an offset
- Function parameters can be (often) accessed at [rbp+x]
- When function is called, the return address is stored on the stack
Practice
To practice the topics covered in this post you can play with the “assembly playground” which shows how the memory reacts to different instructions and scenarios. The playground is coming soon.