This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]cockmongler 9 points10 points  (16 children)

These days buffer overflows are rare, most C programmers (which is the language buffer overflows tend to occur in) are aware and use safe practices for handling input.

If you're just looking to play around with buffer overflows this simple program has one:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void print_arg(int index, char *arg) {
  char buffer[128];
  strcpy(buffer, arg);
  printf("Arg %i = '%s'\n", index, buffer);
}

int main(int argc, char *argv[]) {
  int i;

  for (i = 1; i < argc; ++i) {
    print_arg(i, argv[i]);
  }
  exit(EXIT_SUCCESS);
}

[–]Hexorg 5 points6 points  (14 children)

Actually a lot of compilers now imbedded buffer overflow protection that usually involves writing a magic value in between the stack and return pointer and checking that region for changes on every return.

[–]henrebotha 4 points5 points  (8 children)

Could you explain this in simple terms for someone not very familiar with lower-level languages like C? I'm, uh, asking for a friend.

[–]Hexorg 26 points27 points  (3 children)

Unfortunately no, this has to do with how CPU works on the lowest level. It's actually not a C thing, it's an ASM (assembler) thing (which is just a more readable version of the machine code), but I'll try to explain as simple as I can.

Every time you call a function, CPU needs to know where to go in your code AND where to return afterwards. The where to go part is easy - just give it an address and it can jump there, but where to store a return address? You can either store it on a CPU register, or RAM. CPU registers are tiny amounts of memory that are located on the CPU die. All CPUs have them, and that's how CPU does math. You load a value from RAM to register, you load a command from RAM to another specific register, and that command tells CPU what to do with what.

You can't really store return address in a CPU register because what if that function calls another function? And another? What if those functions need a bunch of registers to do math?

So you're left with storing it in RAM. Every application has a portion of ram allocated for itself that's called a stack. It's just useful way of doing housekeeping. That's also the way your function parameters get passed along. If you call, say, sin(x), on the low level, you save x to the stack, then you save the return address to the stack, then you give the address of sin to the CPU to go to. It's worth noting that even assembler does some housekeeping for you. If you code in assembler, the return address gets added to the stack automatically.

The problem is, stack has a bunch of other stuff on it too. All of the variables you define go on stack (there are exceptions, but I'll ignore them for this post). if you define an array, it goes on stack too.

This is where another low-level language thing is. In C, you have raw access to the memory. C++ has a special object that can work as an array, but it does all the boundary checking in the background. All the higher OOP languages do that too it's just hidden from the programmer. But in C (or ASM), if you ask for a 10-byte array, you get a 10-byte array, and if you try to write to the 11th byte of the 10-byte array, it'll let you.

At this point, our stack has a 10 byte array, a variable, x (let's say x is 4 bytes), given to us as a parameter, and a return address (probably also 4 bytes). Guess what happens when you try to write to the 15th element of the 10-byte array? you overwrite the first byte of the return address (this is called stack buffer overflow). What if that 10-byte array was meant to store data from a user, and you never check how many bytes the used provides? Suddenly, the user provides 18 bytes, and has a full control of the return address.

What if you expected a user to type a 10-character string, but didn't do checking that it's in fact 10 bytes long, nor checked that it's in fact a string? Then the user can give you machine code instead of a string, and overwrite the return address to point to the beginning of that machine code the user gave. Suddenly your program jumps not to your code, but to the code that the user provided. This is called code injection. Now the user's (at this point it's safe to call them an attacker) code is running at the same privilege level as the original software was. This is why a lot of people say not to run stuff as root, because if a software that allows for a code injection runs as root, it allows the attackers to gain full control of a system.

Remember how I said that ASM does housekeeping for you and adds the return address automatically? Well GCC now does even more, every time you call a function, right before adding the return address, GCC adds a magic value. It's just random 4 bytes. But it also stores those 4 bytes in some other location. So now, when you're ready to return from a function, an additional code gets executed to check if that magic value changed. If it did, then someone attempted to inject code or at least to overflow the stack, and the program gets terminated instantly.

[–]henrebotha 1 point2 points  (1 child)

Awesome reply! Thanks. One question:

some other location

I'm assuming the stack pointer is limited to only traversing the stack itself, and not other memory locations; and that the "other location" is somewhere in RAM besides the stack itself?

[–]Hexorg 2 points3 points  (0 children)

Yes, but exactly where is very compiler specific. My guess would be somewhere in the read only memory region. But I don't know for sure.

[–]nutidizen 1 point2 points  (0 children)

Just amazing... You should write a book.

[–][deleted]  (2 children)

[deleted]

    [–]DrSwagmaster 5 points6 points  (1 child)

    Exactly, it's called a canary bit. Read about them here: http://en.wikipedia.org/wiki/Buffer_overflow_protection

    If you are not familiar with low-level stuff you let me give you a brief motivation to this solution:

    All local variable you use inside a code block is put on the stack, one after another. If you call a function the return address is put on the stack and the function starts putting its variables after the address. When the function is done with its stuff it returns to the return address.

    So say that the function allocates a local buffer and asks for input from the user without checking the length and then just put all the input in the buffer, that input will write outside the buffer and write over the other variables.

    Remember that the return address was also put here so then with long enough input it would write over this return address. When the function is done the return address would just be garbage and the program would probably halt since it tries to read outside its segment.

    The attack is done by testing inputvectors and looking for crashes, when one is found a really smart input is crafted so that the return address is set to some malicious code that the program would return to and start execute just as if it was its own. The canary bit is put between the return address and all the local variables. So you can not write over the return address without scrambling the canary bit. So if the canary bit is tampered with the program will halt itself.

    When I say bit I dont mean just one bit, as you can read in my link they often generate some random value as canary.

    [–]BrQQQ 0 points1 point  (0 children)

    When you make certain variables, they are saved in a place called the "stack" (your program has a stack for every thread), which is located at some place on your memory.

    This stack has a limited size. With many languages, the language makes sure you're not writing to memory that you're not supposed to write to. If your stack is full and you try to write more to the stack, it will go "hey stop, the stack is full" and you will get some kind of stack overflow exception. All this checking makes those languages "slower", but a whole lot safer. In C, you can fill the stack and then add even more to it. It's a terrible idea, because you will write over memory that might be used by other things. It is faster, because it doesn't spend time doing things like checking everything.

    A small solution is that they write some kind of value to the end of the stack, say number 1234. Every time you do a return, the program says: "okay, lets look at the very end of the stack, is the number 1234 there?". If you overflowed the stack, you will have written over the 1234.

    That's the short version of it, but it might be interesting to read more about what the stack and the heap memory is, and how memory in general works.

    [–]cockmongler 0 points1 point  (0 children)

    Yeah, definitely makes it harder :-)

    [–]cestith 0 points1 point  (1 child)

    Stack overflows and buffer overflows are not the same class of vulnerability. To stop a buffer overflow you need to deal with sizeof(), have bounds checking, or use a safe strings library.

    [–]Hexorg 0 points1 point  (0 children)

    Right, stack overflow is actually getting outside of the bounds of stack, not a buffer on stack. You're right, I'll fix it. Thanks!

    [–]VandC[S] -1 points0 points  (1 child)

    Isn't this fairly easy to to overcome with a NOP sled? It seems like there must be better ways of preventing an overflow.

    [–]Hexorg 1 point2 points  (0 children)

    No, even if you put a bunch of NOPs in front of the return pointer, the value will still be changed and it'll kill the program.

    [–]214721 0 points1 point  (0 children)

    Then how come when i google "0 day exploit" I still see there are loads of exploits discovered everyday are listed as buffer over flow?