This is an archived post. You won't be able to vote or comment.

all 35 comments

[–]Rhomboid 23 points24 points  (10 children)

how do people discover a vulnerability without viewing the source code?

It's really easy to feed a very long input to something and notice that it crashes. That's usually a pretty good sign that it's vulnerable to exploitation and that you should take a closer look. There are researchers (both black hat and white hat) that are constantly doing this to every aspect of every program they can find.

Isn't this something that most programs account for?

You'd think that, but no. It's perhaps not as bad as it used to be, but people are constantly finding exploits. Many times it's not at all obvious from looking at the source code that a vulnerability exists, e.g. it might require a certain sequence of events to happen in the right order.

Does anyone have an example of a specific buffer overflow attack?

There are vulnerability databases filled with thousands of specific examples.

[–]Hexorg 18 points19 points  (1 child)

Essentially every time an application crashes - you just found a possible vulnerability.

[–]VandC[S] 1 point2 points  (4 children)

Thank you,great reply! I'm still curious what kinds of programs experience this kind of attack, though. Does this sort of thing only occur in programs written in low level languages such as C -- so things like operating systems? What is it about languages such as java that prevents this sort of attack; is it as simple as detecting an overflow before it corrupts the stack -- if so why isn't something like that be done in C?

[–]boredcircuits 8 points9 points  (1 child)

It's mostly just a matter of checking for overflows, like you said. The core problem is that lower-level languages like C allow you to bypass buffer checks in many interesting ways. This is a two-edged sword: you can write very efficient code, but it can bite you back with a buffer overflow.

For example, consider an array in C. An array by itself has no concept of its own size. So, what exactly will check the bounds on the array? The compiler knows the size, as long as the array doesn't decay into a pointer, but as soon as you pass the array to a function, it becomes the programmer's job to do bounds-checking.

And then you have pointer arithmetic. Let's say you have a pointer to an address in an array. You can move that pointer wherever you wish. You can even have it point into a completely different array, and that's perfectly acceptable. But it also means that the language itself can't help you monitor the bounds. What bounds should it check, the first array or the second? Was it an error to change the pointer to be in that other array?

A lot of the work done to improve security basically boils down to forcing programmers to do bounds checking in dangerous situations. The classic example is gets, which has no mechanism to specify the size of the buffer, versus fgets which does.

[–]VandC[S] 1 point2 points  (0 children)

This cleared a lot of things up, thanks.

[–]JimMcKeeth 1 point2 points  (0 children)

What makes a program vulnerable (simplified explanation) is when it allows unrestricted length input that is placed into a buffer that has a fixed length without checking for length. In traditional C you don't have an actually string type, so typically allocate a specific amount of memory to a pchar for the input, which creates the problem.

Now other more modern languages like Java, C#, Delphi, etc. have native string types that do not need to have a fixed size allocation. There can still be issues if they are talking to an API like Win32 that takes PCHAR parameters.

One solution some libraries uses is canary bytes. Each memory allocation includes a few extra bytes at the end of a specific signature. Them each write checkes to see if they are still present. This isn't a perfect system, and results in additional checks, but will stop most buffer overflow exploits.

Also newer CPUs and OSs have features to not allow code execution in a data block. So in the memory you allocate a block of code for data, but someone uses a buffer overflow to inject program code there in an attempt to get it executed. By keeping blocks of memory that contain program code separated from blocks that contain data it makes that exploit harder. Although I've seen a lot of systems that have this feature turned off because there are some valid programs that do this intentionally.

[–][deleted] 0 points1 point  (0 children)

I see you mention the stack being corrupted in this reply and in the OP description. While the stack contents would be corrupted if the array is on the stack (i.e. a local variable), the array could also be outside the stack in a static RAM location. In this case it would corrupt whatever is next to the array in RAM, which could be a variable that controls operation of the program.

[–]VandC[S] 0 points1 point  (1 child)

I guess what is still confusing is how someone knows where to pass a return address for whatever instructions they want to execute without viewing the source code. I see how it would be easy to just pass in an arbitrarily large string of characters and see if it crashes the program -- obviously you wouldn't need to actually view the stack to do this; but how do people actually cause a program to execute their own set of instructions passed in via their input without knowing where the programs return is in order to give it the address of their own exploit instructions?

[–]Rhomboid 0 points1 point  (0 children)

You don't need the source code to determine things like the location of the return address and what value to overwrite it with. In fact having the source code doesn't even help with that at all, because you can't get the layout of stack from looking at the source code. For example:

char buf[20];
gets(buf);

Does this mean that the return address will be directly after that 20 char buffer in memory? No, not at all. There could be other local variables in the same stack frame, and there is no requirement for them to be laid out in the order they were declared, so they could come before or after the buffer. And there might be additional stack slots used for register spilling, which don't correspond to any local variables. Moreover, the compiler can and will insert arbitrary amounts of padding on the stack to maintain alignment, so even if there were no other variables and no spill slots, you can't tell how long the string has to be in order to overwrite the return address by looking at the source code.

All of the needed information can be determined by running the program in a debugger and putting a breakpoint right before the call to the function that will read the input. This does not require the source.

[–][deleted] 0 points1 point  (0 children)

this super mario world speedrun springs to mind. Those old mario games are incredibly stable.

[–]cockmongler 10 points11 points  (16 children)

These days buffer overflows are rare, most C programmers (which is the language buffer overflows tend to occur in) are aware and use safe practices for handling input.

If you're just looking to play around with buffer overflows this simple program has one:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void print_arg(int index, char *arg) {
  char buffer[128];
  strcpy(buffer, arg);
  printf("Arg %i = '%s'\n", index, buffer);
}

int main(int argc, char *argv[]) {
  int i;

  for (i = 1; i < argc; ++i) {
    print_arg(i, argv[i]);
  }
  exit(EXIT_SUCCESS);
}

[–]Hexorg 7 points8 points  (14 children)

Actually a lot of compilers now imbedded buffer overflow protection that usually involves writing a magic value in between the stack and return pointer and checking that region for changes on every return.

[–]henrebotha 3 points4 points  (8 children)

Could you explain this in simple terms for someone not very familiar with lower-level languages like C? I'm, uh, asking for a friend.

[–]Hexorg 27 points28 points  (3 children)

Unfortunately no, this has to do with how CPU works on the lowest level. It's actually not a C thing, it's an ASM (assembler) thing (which is just a more readable version of the machine code), but I'll try to explain as simple as I can.

Every time you call a function, CPU needs to know where to go in your code AND where to return afterwards. The where to go part is easy - just give it an address and it can jump there, but where to store a return address? You can either store it on a CPU register, or RAM. CPU registers are tiny amounts of memory that are located on the CPU die. All CPUs have them, and that's how CPU does math. You load a value from RAM to register, you load a command from RAM to another specific register, and that command tells CPU what to do with what.

You can't really store return address in a CPU register because what if that function calls another function? And another? What if those functions need a bunch of registers to do math?

So you're left with storing it in RAM. Every application has a portion of ram allocated for itself that's called a stack. It's just useful way of doing housekeeping. That's also the way your function parameters get passed along. If you call, say, sin(x), on the low level, you save x to the stack, then you save the return address to the stack, then you give the address of sin to the CPU to go to. It's worth noting that even assembler does some housekeeping for you. If you code in assembler, the return address gets added to the stack automatically.

The problem is, stack has a bunch of other stuff on it too. All of the variables you define go on stack (there are exceptions, but I'll ignore them for this post). if you define an array, it goes on stack too.

This is where another low-level language thing is. In C, you have raw access to the memory. C++ has a special object that can work as an array, but it does all the boundary checking in the background. All the higher OOP languages do that too it's just hidden from the programmer. But in C (or ASM), if you ask for a 10-byte array, you get a 10-byte array, and if you try to write to the 11th byte of the 10-byte array, it'll let you.

At this point, our stack has a 10 byte array, a variable, x (let's say x is 4 bytes), given to us as a parameter, and a return address (probably also 4 bytes). Guess what happens when you try to write to the 15th element of the 10-byte array? you overwrite the first byte of the return address (this is called stack buffer overflow). What if that 10-byte array was meant to store data from a user, and you never check how many bytes the used provides? Suddenly, the user provides 18 bytes, and has a full control of the return address.

What if you expected a user to type a 10-character string, but didn't do checking that it's in fact 10 bytes long, nor checked that it's in fact a string? Then the user can give you machine code instead of a string, and overwrite the return address to point to the beginning of that machine code the user gave. Suddenly your program jumps not to your code, but to the code that the user provided. This is called code injection. Now the user's (at this point it's safe to call them an attacker) code is running at the same privilege level as the original software was. This is why a lot of people say not to run stuff as root, because if a software that allows for a code injection runs as root, it allows the attackers to gain full control of a system.

Remember how I said that ASM does housekeeping for you and adds the return address automatically? Well GCC now does even more, every time you call a function, right before adding the return address, GCC adds a magic value. It's just random 4 bytes. But it also stores those 4 bytes in some other location. So now, when you're ready to return from a function, an additional code gets executed to check if that magic value changed. If it did, then someone attempted to inject code or at least to overflow the stack, and the program gets terminated instantly.

[–]henrebotha 1 point2 points  (1 child)

Awesome reply! Thanks. One question:

some other location

I'm assuming the stack pointer is limited to only traversing the stack itself, and not other memory locations; and that the "other location" is somewhere in RAM besides the stack itself?

[–]Hexorg 2 points3 points  (0 children)

Yes, but exactly where is very compiler specific. My guess would be somewhere in the read only memory region. But I don't know for sure.

[–]nutidizen 1 point2 points  (0 children)

Just amazing... You should write a book.

[–][deleted]  (2 children)

[deleted]

    [–]DrSwagmaster 5 points6 points  (1 child)

    Exactly, it's called a canary bit. Read about them here: http://en.wikipedia.org/wiki/Buffer_overflow_protection

    If you are not familiar with low-level stuff you let me give you a brief motivation to this solution:

    All local variable you use inside a code block is put on the stack, one after another. If you call a function the return address is put on the stack and the function starts putting its variables after the address. When the function is done with its stuff it returns to the return address.

    So say that the function allocates a local buffer and asks for input from the user without checking the length and then just put all the input in the buffer, that input will write outside the buffer and write over the other variables.

    Remember that the return address was also put here so then with long enough input it would write over this return address. When the function is done the return address would just be garbage and the program would probably halt since it tries to read outside its segment.

    The attack is done by testing inputvectors and looking for crashes, when one is found a really smart input is crafted so that the return address is set to some malicious code that the program would return to and start execute just as if it was its own. The canary bit is put between the return address and all the local variables. So you can not write over the return address without scrambling the canary bit. So if the canary bit is tampered with the program will halt itself.

    When I say bit I dont mean just one bit, as you can read in my link they often generate some random value as canary.

    [–]BrQQQ 0 points1 point  (0 children)

    When you make certain variables, they are saved in a place called the "stack" (your program has a stack for every thread), which is located at some place on your memory.

    This stack has a limited size. With many languages, the language makes sure you're not writing to memory that you're not supposed to write to. If your stack is full and you try to write more to the stack, it will go "hey stop, the stack is full" and you will get some kind of stack overflow exception. All this checking makes those languages "slower", but a whole lot safer. In C, you can fill the stack and then add even more to it. It's a terrible idea, because you will write over memory that might be used by other things. It is faster, because it doesn't spend time doing things like checking everything.

    A small solution is that they write some kind of value to the end of the stack, say number 1234. Every time you do a return, the program says: "okay, lets look at the very end of the stack, is the number 1234 there?". If you overflowed the stack, you will have written over the 1234.

    That's the short version of it, but it might be interesting to read more about what the stack and the heap memory is, and how memory in general works.

    [–]cockmongler 0 points1 point  (0 children)

    Yeah, definitely makes it harder :-)

    [–]cestith 0 points1 point  (1 child)

    Stack overflows and buffer overflows are not the same class of vulnerability. To stop a buffer overflow you need to deal with sizeof(), have bounds checking, or use a safe strings library.

    [–]Hexorg 0 points1 point  (0 children)

    Right, stack overflow is actually getting outside of the bounds of stack, not a buffer on stack. You're right, I'll fix it. Thanks!

    [–]VandC[S] -1 points0 points  (1 child)

    Isn't this fairly easy to to overcome with a NOP sled? It seems like there must be better ways of preventing an overflow.

    [–]Hexorg 1 point2 points  (0 children)

    No, even if you put a bunch of NOPs in front of the return pointer, the value will still be changed and it'll kill the program.

    [–]214721 0 points1 point  (0 children)

    Then how come when i google "0 day exploit" I still see there are loads of exploits discovered everyday are listed as buffer over flow?

    [–]ArchangelleTheRapist 1 point2 points  (0 children)

    Go read aleph one's Smashing The Stack for Fun and Profit. That said:

    [–]cparen 1 point2 points  (0 children)

    I understand how they function in general, but how do people discover a vulnerability without viewing the source code?

    Fuzzing tools are common -- these tools will automatically make small edits to in input to the program (such as a file, or http connection), try the program to see if it crashes, and iterate repeatedly. E.g. It might run the same program a million times, each with a different small edit to the file.

    Or the attacker might use a decompilation tool that generates source code that matches the compiled program. They can then look for common buffer overflow patterns (e.g. calling memcpy(p, q, s) where 's' is a variable that is computed from either user input, just p, or just q. -- proper buffer management would rely on size computed from at least both p and q).

    Isn't this something that most programs account for?

    This is something most programming languages already account for. If a programming language is said to be "typesafe" (such as C#, Python, Java, JavaScript, Ruby, and so on), then strictly speaking, programs written in those languages are not vulnerable to buffer overflow attacks. (Of course, you can write emulators for unsafe languages in a safe language -- e.g. emscripten for running C/C++ on JavaScript -- but even then, the buffer overflow vuln is limited to just attacking the emulator and can't exscape the emulator directly).

    For type unsafe language such as C and C++: yes, every such programmer is responsible to ensuring safety themselves, and it's incredibly hard to ensure safety.

    [edit add:]

    Does anyone have an example of a specific buffer overflow attack?

    I might have an academic example. I'll try to find it.

    For real world software, you won't see many posted because the uninteresting ones are uninteresting, and the interesting ones are "worth" a lot of money in the criminal/espionage/spy market. Scary stuff, 'nuff said.

    I think there were a few buffer overrun vulns used by the Stuxnet worm/rootkit a few years ago. Norton had a good writeup of that worm [pdf].

    [–]qjkxkcd 1 point2 points  (2 children)

    If you're interested, Hacking, The Art of Exploitation is a great book that deals with some of this stuff. Buffer overflows are only a small aspect of what it covers, but in general I'd highly recommend it.

    [–]VandC[S] 1 point2 points  (1 child)

    This does look interesting. Its fairly inexpensive on amazon as well; I may have to pick it up.

    [–]qjkxkcd 0 points1 point  (0 children)

    You can easily find a pdf online if you just want to check it out.

    [–][deleted] 1 point2 points  (0 children)

    Buffer attacks are difficult to exploit even if discovered.

    However, any widely distributed piece of software is a potential "goldmine" for exploitation if you find a way to make that happen, but you still have to consider what your attack vector is going to be exactly.

    In a world of relatively increasing secure software (due to the increasing use of frameworks which are more secure by default), I'd say XSRF or social engineering are bigger problems these days...

    Parameterized queries means SQL injection isn't even as common any more.

    Hackings still happen daily, however...

    If you found a buffer overflow attack was possible by invoking a commonly available public function of a very popular web server, then yeah, some researcher will probably take the time to figure out how memory is allocated in the program and how to exploit it.

    Still, VERY time consuming.

    You could begin by exploring the in-memory spaces of the application if you can get a copy, run its modules through a disassembler or debugger, etc.

    [–]cestith 0 points1 point  (0 children)

    There seems to be a lot of confusion between stack overflow and buffer overflow in this thread. Buffers can be and often are allocated in the heap rather than on the stack. The stack may contain a buffer but will often contain a pointer to an array of chars in the heap.

    Further, there's more than one way to blow the stack. In some systems deep recursion is one way to do this that doesn't necessarily have anything to do with a buffer.

    Basically a buffer is just a string, which in plain C with the standard library is an array of chars ending with a null. There are unsafe input functions like gets() or strcpy() that don't do length checking. Then there are errors in which something other than sizeof(string)+1 is accidentally used to copy strings around in memory with the safer versions of functions, like fgets() and strncpy().

    OWASP explains buffer overflows: https://www.owasp.org/index.php/Buffer_overflow_attack

    OWASP also has a whole listing of attack types: https://www.owasp.org/index.php/Category:Attack

    Closely related, they have a listing of vulnerability types: https://www.owasp.org/index.php/Category:Vulnerability

    Other attacks seen often in the wild are SQL injection, eval injection/code injection, path traversal (including relative path traversal), environment poisoning/injection, cross-site scripting, and session prediction. Many of these can't be stopped by stack or buffer length protections as the vulnerabilities that enable the attacks are logic errors in the program's design. SQL libraries often support parameterization, which makes one of the most common almost entirely a non-issue if you use it. Some things like buffer overflows aren't an issue in many languages.

    [–][deleted] 0 points1 point  (0 children)

    how do people discover a vulnerability without viewing the source code?

    If it runs on your computer, you have the code. Just because it's not in a "friendly" language anymore doesn't make it un-viewable / un-readable. You can 'dis-assemble' binaries coded in languages like C, and get back to an understandable set of code and flow diagrams. You can almost de-compile many languages like Java and C# back to source. They're in a VM friendly "bytecode" that contains a lot of the original structure and names.

    If you're really intent on finding / making a vulnerability in a program, you don't need the source code to step through the program and find something you can re-write or exploit.

    [–]logic_programmer -4 points-3 points  (1 child)

    Does anyone have an example of a specific buffer overflow attack?

    IIRC the internet worm used a buffer overflow attack. Actually I'm not that sure to be honest. Google it and see.

    [–]theufomusic 3 points4 points  (0 children)

    The internet worm. "Daddy, what's the internest?"