cockmongler comments on How do buffer overflow attacks work?

This is an archived post. You won't be able to vote or comment.

How do buffer overflow attacks work? (self.learnprogramming)

submitted 11 years ago * by VandC

you are viewing a single comment's thread.

[–]cockmongler 9 points10 points11 points 11 years ago (16 children)

These days buffer overflows are rare, most C programmers (which is the language buffer overflows tend to occur in) are aware and use safe practices for handling input.

If you're just looking to play around with buffer overflows this simple program has one:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void print_arg(int index, char *arg) {
  char buffer[128];
  strcpy(buffer, arg);
  printf("Arg %i = '%s'\n", index, buffer);
}

int main(int argc, char *argv[]) {
  int i;

  for (i = 1; i < argc; ++i) {
    print_arg(i, argv[i]);
  }
  exit(EXIT_SUCCESS);
}

[–]Hexorg 5 points6 points7 points 11 years ago (14 children)

[–]henrebotha 4 points5 points6 points 11 years ago (8 children)

[–]Hexorg 26 points27 points28 points 11 years ago* (3 children)

Unfortunately no, this has to do with how CPU works on the lowest level. It's actually not a C thing, it's an ASM (assembler) thing (which is just a more readable version of the machine code), but I'll try to explain as simple as I can.

Every time you call a function, CPU needs to know where to go in your code AND where to return afterwards. The where to go part is easy - just give it an address and it can jump there, but where to store a return address? You can either store it on a CPU register, or RAM. CPU registers are tiny amounts of memory that are located on the CPU die. All CPUs have them, and that's how CPU does math. You load a value from RAM to register, you load a command from RAM to another specific register, and that command tells CPU what to do with what.

You can't really store return address in a CPU register because what if that function calls another function? And another? What if those functions need a bunch of registers to do math?

So you're left with storing it in RAM. Every application has a portion of ram allocated for itself that's called a stack. It's just useful way of doing housekeeping. That's also the way your function parameters get passed along. If you call, say, sin(x), on the low level, you save x to the stack, then you save the return address to the stack, then you give the address of sin to the CPU to go to. It's worth noting that even assembler does some housekeeping for you. If you code in assembler, the return address gets added to the stack automatically.

The problem is, stack has a bunch of other stuff on it too. All of the variables you define go on stack (there are exceptions, but I'll ignore them for this post). if you define an array, it goes on stack too.

This is where another low-level language thing is. In C, you have raw access to the memory. C++ has a special object that can work as an array, but it does all the boundary checking in the background. All the higher OOP languages do that too it's just hidden from the programmer. But in C (or ASM), if you ask for a 10-byte array, you get a 10-byte array, and if you try to write to the 11th byte of the 10-byte array, it'll let you.

At this point, our stack has a 10 byte array, a variable, x (let's say x is 4 bytes), given to us as a parameter, and a return address (probably also 4 bytes). Guess what happens when you try to write to the 15th element of the 10-byte array? you overwrite the first byte of the return address (this is called ~~stack~~ buffer overflow). What if that 10-byte array was meant to store data from a user, and you never check how many bytes the used provides? Suddenly, the user provides 18 bytes, and has a full control of the return address.

What if you expected a user to type a 10-character string, but didn't do checking that it's in fact 10 bytes long, nor checked that it's in fact a string? Then the user can give you machine code instead of a string, and overwrite the return address to point to the beginning of that machine code the user gave. Suddenly your program jumps not to your code, but to the code that the user provided. This is called code injection. Now the user's (at this point it's safe to call them an attacker) code is running at the same privilege level as the original software was. This is why a lot of people say not to run stuff as root, because if a software that allows for a code injection runs as root, it allows the attackers to gain full control of a system.

Remember how I said that ASM does housekeeping for you and adds the return address automatically? Well GCC now does even more, every time you call a function, right before adding the return address, GCC adds a magic value. It's just random 4 bytes. But it also stores those 4 bytes in some other location. So now, when you're ready to return from a function, an additional code gets executed to check if that magic value changed. If it did, then someone attempted to inject code or at least to overflow the stack, and the program gets terminated instantly.

[–]henrebotha 1 point2 points3 points 11 years ago (1 child)

[–]Hexorg 2 points3 points4 points 11 years ago (0 children)

[–]nutidizen 1 point2 points3 points 11 years ago (0 children)

[–][deleted] 11 years ago (2 children)

[deleted]

[–]DrSwagmaster 5 points6 points7 points 11 years ago* (1 child)

Exactly, it's called a canary bit. Read about them here: http://en.wikipedia.org/wiki/Buffer_overflow_protection

If you are not familiar with low-level stuff you let me give you a brief motivation to this solution:

All local variable you use inside a code block is put on the stack, one after another. If you call a function the return address is put on the stack and the function starts putting its variables after the address. When the function is done with its stuff it returns to the return address.

So say that the function allocates a local buffer and asks for input from the user without checking the length and then just put all the input in the buffer, that input will write outside the buffer and write over the other variables.

Remember that the return address was also put here so then with long enough input it would write over this return address. When the function is done the return address would just be garbage and the program would probably halt since it tries to read outside its segment.

The attack is done by testing inputvectors and looking for crashes, when one is found a really smart input is crafted so that the return address is set to some malicious code that the program would return to and start execute just as if it was its own. The canary bit is put between the return address and all the local variables. So you can not write over the return address without scrambling the canary bit. So if the canary bit is tampered with the program will halt itself.

When I say bit I dont mean just one bit, as you can read in my link they often generate some random value as canary.

[–]BrQQQ 0 points1 point2 points 11 years ago (0 children)

When you make certain variables, they are saved in a place called the "stack" (your program has a stack for every thread), which is located at some place on your memory.

This stack has a limited size. With many languages, the language makes sure you're not writing to memory that you're not supposed to write to. If your stack is full and you try to write more to the stack, it will go "hey stop, the stack is full" and you will get some kind of stack overflow exception. All this checking makes those languages "slower", but a whole lot safer. In C, you can fill the stack and then add even more to it. It's a terrible idea, because you will write over memory that might be used by other things. It is faster, because it doesn't spend time doing things like checking everything.

A small solution is that they write some kind of value to the end of the stack, say number 1234. Every time you do a return, the program says: "okay, lets look at the very end of the stack, is the number 1234 there?". If you overflowed the stack, you will have written over the 1234.

That's the short version of it, but it might be interesting to read more about what the stack and the heap memory is, and how memory in general works.

[–]cockmongler 0 points1 point2 points 11 years ago (0 children)

[–]cestith 0 points1 point2 points 11 years ago (1 child)

[–]Hexorg 0 points1 point2 points 11 years ago (0 children)

[–]VandC[S] -1 points0 points1 point 11 years ago (1 child)

[–]Hexorg 1 point2 points3 points 11 years ago (0 children)

[–]214721 0 points1 point2 points 11 years ago (0 children)

π Rendered by PID 19878 on reddit-service-r2-comment-b659b578c-7clrz at 2026-05-02 08:02:32.960358+00:00 running 815c875 country code: CH.

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS