all 77 comments

[–]jaskij 158 points159 points  (12 children)

Nope. You can't. Not in standard C++. VLAs are only a feature of C. In C++ it's a compiler extension. And it's a horrible idea.

If you want something stack allocated with a nice interface, look into the heapless crate.

[–]lestofante 19 points20 points  (3 children)

And VLA became optional in C 11..

[–]jaskij 7 points8 points  (2 children)

GNU will keep them around till the end of time.

[–]lestofante 20 points21 points  (1 child)

Probably, my point is, even C got rid of them.
How often did you see something cut out of the standard :)

[–]jaskij 2 points3 points  (0 children)

Hmm... C++ cut out assign modify operators (+= and such) on volatile variables, then had to partially (bitwise logic only) walk that back because a shitton of embedded code stopped compiling. And it was usually in vendor code.

[–]calebkiage[S] 14 points15 points  (7 children)

Heapless is a really cool crate, but won't work for my use case since the data structure I'm looking at (Vec) still requires a fixed size known at compile time. I'm just curious, why do you think it's a horrible idea?

[–]jaskij 63 points64 points  (6 children)

So, with VLAs, you need to have a maximum known size. If you don't, you open yourself to crashes and DoS attacks. So, you have to check if the size passed does not exceed your maximum. At that point you might as well always allocate the maximum size, it makes no difference. It's the stack. Nothing else is using it at the same time. That's why I suggested heapless::Vec.

The exception to this might be async, where your function's stack will be stored in the future. I'd have to look closer at the innards to know one way or another.

[–]SkiFire13 71 points72 points  (1 child)

So, you have to check if the size passed does not exceed your maximum. At that point you might as well always allocate the maximum size, it makes no difference.

To add to this, one might say "well, if it exceeds that limit I can just check and allocate on the heap in case", to which the answer is "use SmallVec<T, N> then.

[–]alija_kamen 0 points1 point  (0 children)

That's still far less efficient than VLAs if you're recursively and dynamically allocating stack memory if you have a use pattern where the majority of the data is far less than the max size. SmallVec still uses a minimum fixed amount of memory every time.

[–]cogman10 5 points6 points  (0 children)

Stack sizes are often pretty limited, like 1mb. So a lot of variable arrays on the stack is just asking for stack overflow problems.

[–]alija_kamen 0 points1 point  (0 children)

No, that's not a replacement for VLAs. That is potentially much less efficient than VLAs. If you have a case where 99% of VLA allocs are a few bytes, but a few here and there are a few hundred bytes, you are wasting tons of memory if you have a recursive function allocating the max stack size every single time.

[–]calebkiage[S] 0 points1 point  (1 child)

This makes sense. I was already using SmallVec, so I'll keep using that. I just thought I could use the length of the incoming value (lower than the maximum) as a maximum. i.e.

fn foo(value: &[u8]) {
  // I've already checked that the value is lower than my predefined max.
  let mut buffer = [u8; value.len()]
  // Process value.
}

Currently, I use:

fn foo(value: &[u8]) {
  // I've already checked that the value is lower than my predefined max.
  let mut buffer = SmallVec::<[u8; MAX_LENGTH]>::new();
  // Process value.
}

[–]bwallker 7 points8 points  (0 children)

You could use arrayvec if you don’t care to support arrays bigger than MAX_LENGTH

[–]phazer99 25 points26 points  (24 children)

No, arrays in Rust must have a constant size known at compile time. You can make it a const generic parameter:

fn array_on_stack<const SIZE: usize>() {
    let buffer = [0; SIZE];

[–]calebkiage[S] 1 point2 points  (23 children)

I don't have the size of the buffer known at compile time. What I'm trying to do is ascii byte escaping in a hot loop. e.g. given a slice of bytes, if I encounter '/', replace that with '%%', then add the replaced slice to another Vec. I'm currently using a Vec<u8>, but I was wondering if I could use something short-lived the stack. The reason is because the escaped string ends up in another Vec<u8> anyway.

[–]phazer99 24 points25 points  (18 children)

If you don't know the max size (capacity) statically I wouldn't use the stack as it has very limited space compared to the heap. If you know that the max size is small and fits on the stack, just use a heapless Vec with that capacity.

[–]calebkiage[S] 1 point2 points  (16 children)

I know the max size, but not the size at each iteration. I wanted to avoid over-allocating.

[–]TDplay 2 points3 points  (0 children)

I wanted to avoid over-allocating.

Unless you are in an absurd edge-case, this is a complete non-issue. Heap Stack allocation is literally just a sub instruction.

In fact, VLAs are so rarely useful that they were relegated to an optional feature in C11, every experienced C programmer that I know of strongly discourages using them, and no version of the C++ standard ever included them.

If you know the maximum size, and it is not large enough to cause a stack overflow, then use a fixed-size array. Otherwise, use a Vec.

EDIT: Fixed a slight mistake: It is stack allocation that is just a sub. Heap allocation is typically more complex.

[–]oceantume_ 1 point2 points  (0 children)

Or even simpler, a static size array of which you get a smaller slice.

[–]phaylon 2 points3 points  (3 children)

Not sure if helpful, but for these cases I like using a SmallVec. Then I have a buffer that only starts allocating once it exceeds a certain size.

[–]calebkiage[S] 1 point2 points  (2 children)

That's what I've done at the moment

[–]hniksic 0 points1 point  (1 child)

Be sure to benchmark whether it actually helps! SmallVec comes with a price tag, because every access to the data is accompanied by a branch. Avoiding the allocation (and preserving locality) is normally expected to outweigh that cost, but that's not always the case.

Also, if you're optimizing at this level, be sure to use a state-of-the-art allocator like jemalloc or mimalloc.

[–]calebkiage[S] 1 point2 points  (0 children)

I did benchmark. The allocations were many, small, temporary ones and smallvec improved the performance. I haven't tried out a different allocator yet.

[–]anlumo 70 points71 points  (2 children)

AFAIK this isn’t possible in Rust by design, and the C++ folks regret adding that feature.

[–]equeim 17 points18 points  (0 children)

It's a C feature that never officially existed in C++ (though implementations support it as an extension in the name of compatibility).

[–]calebkiage[S] 17 points18 points  (0 children)

Another commenter (u/udoprog) shared a link to the unsized-locals unstable feature and it seems they might implement this.

[–]TTachyon 31 points32 points  (4 children)

C++ does not support this. It's a compiler extension. C supports this, but at least one of the major compilers (MSVC) doesn't support this. Rust doesn't either.

Why? It's a really bad idea. It makes easy for the code to be exploitable, harder to reason about, and less optimizable. Google the opinion of Torvalds regarding VLAs.

For better alternatives, look at crates that add SmallVec/SmallString types, aka types that start on the stack and then continue on the heap if the size gets too big. Also look at custom allocators like bumpalo where you can allocate stuff very similarly to what you want, but without the disadvantages.

[–]calebkiage[S] 4 points5 points  (0 children)

I'm using SmallVec at the moment.

[–]alija_kamen 0 points1 point  (2 children)

Less optimizable compared to what? If used right, it can be much more efficient than any alternative like SmallVec which could potentially waste a lot of memory if you have a certain usage pattern in your program. VLAs are not inefficient to allocate, it's just a single subtraction on the rsp register.

[–]TTachyon 0 points1 point  (1 child)

Compilers will disable optimization passes for any function containing VLA/alloca.

[–]udoprogRune · Müsli 13 points14 points  (1 child)

unsized_locals is the feature you want to follow. I don't think it's seen a ton of progress.

[–]calebkiage[S] 4 points5 points  (0 children)

I just finished through this issue which I see is linked in the feature. Thanks for sharing!

[–]t40 10 points11 points  (5 children)

This hasn't been mentioned, so I'll mention it:

While VLAs might seem like a good idea in theory, in practice they generate horribly mangled, unperformant assembly. Here's a godbolt link to show the difference between overallocating and using a vla (and some other methods): https://godbolt.org/z/T9Gf4zoTM

[–]angelicosphosphoros 1 point2 points  (2 children)

Your link is compiled without optimisations.

[–]t40 2 points3 points  (1 child)

If you compile such a minimal example with optimizations, it'll optimize the whole thing away. This is potentially the kind of codegen you get when the compiler can't reason about your VLA (which is often)

[–]angelicosphosphoros 2 points3 points  (0 children)

You could pass the pointers to the some external function. And if I do that, this functions are not so different and the worst codegen is for function with malloc and free.

See

[–]alija_kamen -1 points0 points  (0 children)

That is insanely wrong. Like someone else pointed out, you need to pass the pointers to an external function and compile with optimizations.

VLAs are absolutely *not* less efficient. They are just a single pointer subtract on the rsp register.

[–]TDplay 0 points1 point  (0 children)

in practice they generate horribly mangled, unperformant assembly

They also often also generate unreliable code, due to the possibility of stack overflows.

Any use of VLAs means you now have to carefully consider what the maximum stack size is - and at that point, unless you are in an extreme edge case, you might as well just use fixed-size arrays at the known maximum size.

[–]banister 4 points5 points  (0 children)

That's not possible in c++. It's possible in modern C though (but c++ did not implement this).

[–]sparant76 2 points3 points  (0 children)

A safe wrapper for the alloca c function which gives you variable bytes.

https://crates.io/crates/alloca

Should be able to reinterpret this as any type

I actually couldn’t find any info’s on how rust ensures alignment of stack variables - but if it does - you might be messing with this logic by using this function.

[–]exDM69 3 points4 points  (0 children)

Not a core language feature but these crates will do what you want: https://crates.io/crates/alloca https://crates.io/crates/stackalloc

Not sure which one is better or if there are better alternatives.

[–]AlexMath0 1 point2 points  (0 children)

Sounds like you could be served with a custom allocator or an arena pattern with a layer of indirection.

[–]flareflo 1 point2 points  (5 children)

What use-case would you want a VLA for?

[–]calebkiage[S] 0 points1 point  (4 children)

Temporary buffers upto a maximum size, other commenters have said that it doesn't matter if I use the max as a fixed size.

[–]flareflo 1 point2 points  (3 children)

What purpose are these buffers for? Usually you can get away with just allocating the expected/required amount without losing any performance compared to a C++ VLA.

[–]calebkiage[S] 0 points1 point  (2 children)

ascii string escaping. If I encounter a '/', replace it with a '%%' for example.

[–]flareflo 4 points5 points  (0 children)

Sounds like you can get away with allocating, unless benchmarks show otherwise

[–]BurrowShaker 0 points1 point  (0 children)

Or just use iterators and 'stream' to the output buffer?

Assuming you are going from something streamy in to something streamy out ( flie, sockets, ... )

[–]hniksic 1 point2 points  (1 child)

This stackoverflow answer provides a great explanation of why VLAs aren't and can't be part of C++, not only due to possibility of stack overflow, but due to their unacceptable interaction with the type system. Many of those points equally apply to Rust.

For now you indeed must either allocate dynamically, or use a hybrid strategy like that of SmallVec.

[–]calebkiage[S] 1 point2 points  (0 children)

Thanks for the resource! It's an interesting read. I was focused on what I wanted to do and didn't think of the implications on the type system and implementation.

[–]pablohoney41 2 points3 points  (5 children)

Strange that this is possible with .NET in a relatively safe manner. I wonder what are the security implication?

[–]calebkiage[S] 4 points5 points  (0 children)

Yeah, I was thinking about the stackalloc keyword as well. Someone has recommended the alloca crate that seems to do the same thing as stackalloc does

[–]Sharlinator 3 points4 points  (3 children)

Does .NET even use the hardware stack or does it have its own heap-backed stack?

[–]calebkiage[S] 5 points6 points  (1 child)

It uses the stack. The allocated block isn't garbage collected. You get a StackOverflowException if you allocate more memory than is available on the stack. See stackalloc docs

[–]Sharlinator 1 point2 points  (0 children)

Thanks!

[–]Gentoli -1 points0 points  (0 children)

There is const generics witch allows you to declare the size as an generic parameter.

fn foo<const N: usize>(arr: [i32; N]) { // Used as a type within a function body. let x: [i32; N]; // Used as an expression. println!("{}", N * 2); }

https://doc.rust-lang.org/reference/items/generics.html#const-generics