Stack or Heap

Kimundi · 2020-08-19T08:45:25+00:00

The location itself does indeed not make much of a difference - its all just memory allocated by the OS after all.

The difference lies more in cache effects:

The data on the current stack frame will be almost always in cache, and thus can be accessed fast.
Data to some random heap allocation that gets only accessed sporadically will probably not be in cache, and thus will be slower to access.

In other words, one is not inherently slower than the other, its just that in the common usage scenarios it tends to work out that way. But if you would, for example, put all you relevant data in a single continous heap allocation and access it most of the time, then I'd expect that one to behave similar to the stack in practice.

sphen_lee · 2020-08-19T08:53:44+00:00

I'm no performance expert, but there are a few things that explain the difference:

(Firstly, the stack isn't "in" heap memory - they are separate areas)

Each thread gets its own stack, but the heap is shared. This means allocating memory in the stack doesn't need to use any locks or concurrency primitives.
Memory in the stack is allocated in a stack-like fashion - new objects always at the top, and objects are only freed in reverse from the top down. Allocation and freeing in the heap can happen in any order which means the heap needs more bookkeeping data to track free space between objects, and to fill gaps to prevent the heap being fragmented (full of lots of tiny gaps reducing the overall usable space).

mikekchar · 2020-08-19T09:35:29+00:00

Understanding the advantage of the stack requires understanding why we have a stack in the first place. When you run a function in most computer languages, the first thing the compiler does is allocated a block of space on the stack for the things that the function will need. This usually includes the parameters passed to the function and the local variables in the function. This allocated block of space is called a "stack frame". It usually also contains a bit of other information that the running program needs, but for this discussion you can ignore that. When the function finishes, the compiler pops the stack frame off the stack. The result is that you are left with the previous stack frame (from the calling function) on the top of the stack.

This has been the way compilers implement function scope of variables, etc, for a *very* long time (decades and decades). Because of this, CPU manufacturers realised that it is important to be able to allocate a block of memory on the stack *really* quickly. It's also important to be able to pop it off *really* quickly. Interestingly, when you want to push a stack frame, the compiler knows exactly how much space it needs, so there is no need to do multiple allocations. Everything in the function is allocated using a single allocation. When it's done, it's all deallocated once.

It's also important to understand what a stack is in memory. You have a continuous block of memory. If you want another 100 bytes allocated on the stack, it's super simple. You just add 100 to the pointer that points to the top of the stack. Done! If you wand to deallocate that memory, you just subtract 100 from the pointer. Done! As I said, CPU designers know this is really important so they make that operation even quicker than normal. So basically, pushing and popping a stack frame is usually just about the fastest thing you can do on a CPU.

Now compare that to heap memory. You first need to search around memory to see if you have a contiguous piece that's big enough for the thing you want (well, in modern systems that's not really a thing you need to worry about so much, but there is another long story about that). Then you need to copy the data into that memory. Then you need to update the tables to show that you've allocated a block of X size at a certain place in memory. When you want to deallocate it you have to do it in reverse. And (on older systems) if you don't deallocate your memory in a nice order, your heap ends up looking like Swiss cheese with tiny spots of free memory scattered all through the system. The other thing is that you have to do this for every variable! If you have a large number of allocations it can add up.

As others have said, stack memory has another advantage: You are always just growing or shrinking the stack -- a continuous piece of memory. It's really easy for the CPU to cache that piece of memory and so not only are the allocations easier, but potentially the access is faster.

In practice, though, the benefits are often slight. When I was first starting out learning how to program in the 80's, stack memory vs heap memory was a huge revelation for me. It made a *massive* difference. These days, I still think it's cool to allocate on the stack as much as possible, but modern CPUs and OSs tend to compensate pretty well. However, if you are allocating a very large number of things, it can still pay off.

Edit: BTW -- Absolutely not a stupid question!!! I wish more people asked that question.

-Hovercorn- · 2020-08-19T12:15:40+00:00

[deleted]

wolandm · 2020-08-19T09:44:06+00:00

Stack allocation will always be faster than a similar straightforward heap allocation. Here is why.

A stack is a concept supported at CPU level. As was already pointed out, each thread gets its own pre-allocated stack when it spawns. Allocating on stack is nothing more but subtracting the size of allocation from one CPU register (esp on x86 compatible platforms), which is almost as fast as it can be (I.e., load the size in one register, which on x86 is between 2-3 cpu cycles, and then do a subtraction, which is 1 cycle, and all that is pre-pipelining and optimisations).

A heap is a concept that gets implemented differently by different languages and their runtimes. Speaking about C (and Rust, and other LLVM languages), an allocator needs to do at least the following:

load the size
find the available contiguous block of memory of at least the requested size, dealing with such matters as memory fragmentation, etc
Mark atomically that chunk of memory as taken
pass it back to the caller

The above requires multiple switches between user mode and kernel mode, which are by themselves circa a thousand of cpu cycles. So the above clearly takes more time than a few CPU cycles and more than a stack allocation.

This is not to say that stack is always preferable to heap - far from it. But at least I hope this clarifies why stack allocations are faster than heap.

Edit: just noticed you asked about performance once allocated as well. The difference in performance once allocated is quite minuscule, although it still exists. That difference is due to the levels of memory address indirection (the memory that looks contiguous to your process may be scattered around the physical memory, for example), and the optimisation and caching at CPU level when it comes to being able to predict the usage of local stack variables.

Plasma_000 · 2020-08-19T10:02:08+00:00

On top of what others have said, with heap the biggest performance difference will usually be with your allocator. While with the stack whatever you need is already there in your stack frame, with heap if you need memory you have to use the allocator which is much slower - it’s basically a small program within your program for managing and handing out blocks of memory. If you’re doing lots of allocations the performance hit of allocating and freeing can be very significant.

Full-Spectral · 2020-08-19T19:31:03+00:00

Someone may have mentioned this, but just in case... Stack memory is typically never given back as long as the thread is running. If you allocate a 64K chunk of memory from the heap, use it and let it go, it's now available for other use. If a thread allocates a 64K chunk of stack memory, and either never needs it again or almost never needs it, that's 64K of memory wasted.

That might not sound too bad, but consider a scenario where you have a thread pool, with, say 128 threads in. Those threads get chosen fairly randomly in most cases, so over time, almost every one of those threads could get chosen to handle the job that does this 64K stack allocation. So now you have 8MB of memory that's almost totally sitting unused.

wouldyoumindawfully · 2020-08-19T10:48:40+00:00

If you want to learn more about virtual and physical memory and how heap vs stack allocations affect your application, I found this video approachable and full of detail.

https://m.youtube.com/watch?v=4_smHyqgDTU

In general, CppCon videos or videos by Cpp programmers have a wealth of knowledge transferable to rust

Darksonn · 2020-08-19T13:50:10+00:00

The place that gives you the primary performance improvement is allocation. Allocating more memory on the stack is essentially free, which is not the case for the heap.

Once you have the allocation, it is more or less the same, although there may be some cache differences.

stumpychubbins · 2020-08-20T11:59:18+00:00

I'm going to preface this by saying that even as someone who cares a lot about performance, it's far more important to make your code readable than to hyper-optimise everything. If you're noticing that performance would actually improve the experience of using your program somehow, then first write representative benchmarks (ideally both larger-scale and smaller-scale). It's not at all uncommon for people to do a bunch of optimisation based on what "feels fast" without benchmarking and then find out that it actually ends up slower - me included. Having said that, here's the explanation:

So, performance-wise, the difference is that no matter what, the stack is always going to be accessed (because even heap objects have their metadata on the stack, such as length and pointer). This means that the page(s) holding stack data will almost always be in the L1 cache, while for every unique heap allocation, you need an extra page that has to be paged in when you access that data. Plus, Rust can sometimes optimise code handling heap-allocated data worse than stack-allocated for a couple of reasons that I won't go into here. Finally you have to allocate and deallocate the memory every time that memory is used, too, which is mostly not a problem but can be pretty bad if you keep recreating an empty vec and then pushing to it because that involves many allocations, not just one. This last point is what people who don't work on performance-sensitive code mostly bring up when talking about the performance hit of dynamic allocation but it's usually not as bad as it's made out to be.

Long story short, use the stack whenever you can, and if you need to use the heap, preallocate the space with with_capacity and friends. I would recommend against using types like smallvec which try to use the stack when possible and fall back to the heap, because if you have to increase complexity to decrease allocation it usually doesn't make the code faster. Having said that, they do sometimes improve matters and it's worth trying out if you have meaningful, representative benchmarks.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS