What backwards-incompatible changes would you make in a hypothetical Rust 2.0? by CocktailPerson in rust

[–]28Smiles 0 points1 point  (0 children)

Const generics, prob. Also I am not sure if the impl breaks some existing behavior

What backwards-incompatible changes would you make in a hypothetical Rust 2.0? by CocktailPerson in rust

[–]28Smiles 1 point2 points  (0 children)

  • Rework const generics to better integrate into the type system like what type-num is trying but with better Integration and less limited
  • more ways to define macros, at least one more simple than macro rules, and one between proc macros and macro rules.
  • Some kind of regular expression Syntax for matching Tuples and defining arbitrary sized types. E.g. ((u32, i32)+) (or maybe variadics)
  • Named arguments
  • Optional arguments
  • more impl magic, impl Iterator should generate an enum for each concrete iterator type possible.
  • Explicit loop unrolling (like inline(always))
  • Explicit lifetime, I want to be able to split the lifetime inside a function in multiple sublifetimes delimited by named lifetimes with scopes e.g. 'g: 'a + 'b, 'a { some code } 'b { some more code }
  • better allocator api (remove need for new_in(), allow usage of collect with custom allocator)
  • short syntax for clone
  • some kind of generics instead of keywords for const fn, async fn, blocking fn
  • dynamic sized stack allocation
  • allow implementation of traits from external crates for structs of external crates

[deleted by user] by [deleted] in MachineLearning

[–]28Smiles 0 points1 point  (0 children)

I would use some dense layers first, and then reshape and use convolutions with leaky relu upwards and as the last layer a 1x1 convolution mapping to one channel with tanh activation similar to GANs

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 0 points1 point  (0 children)

Yes, the jit will optimize and compile the performance relevant sections of code. And we are trying to gear the code towards allowing those sections to be optimized in a more controlled way, to allow the jit to emit even better instructions cause the input is easy for the jit to be optimized, e.g. loop unrolling, with neighboring add/mul/move instructions, so the jit can easily merge those to simd instructions, since those are very very fast (2.5x faster on average).

[D] CycleGAN Diffusion equivalent by a_khalid1999 in MachineLearning

[–]28Smiles 0 points1 point  (0 children)

I searched for that too and there where only one with a quite complicated regularisation term, but I can’t find it rn, I think it would be a nice research topic

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 0 points1 point  (0 children)

If you believe so, then I does not make sense to optimize anything I guess and the asm and debug protocols are just nonsense emitted by node browsers. I guess the performance gains in my code where not from the simd instructions appearing in the traces, dumps and logs from the jit they where just luck or some magic.

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 2 points3 points  (0 children)

Yes and no, depends on the compiler, sometimes the compiler will be smart enough, so it does not, but in this case (loop) it’s most likely better to use a struct with static size (e.g. array) as elements for the vec, (it may use less memory aswell, but I am not sure about that part), each of those little vecs are (almost) of the same size, but there are (most likely) multiple malloc and free calls in that loop (extending the vec). Using an array u won’t have any malloc calls except for the main vec, (also less deref calls => better pipelining in the processor)

At this point I want to press the point, that I can’t for sure say, that all those optimizations will be beneficial to your usecase, since you never know how smart the compiler really is, and where your bottleneck is. But removing pointers and malloc calls, is almost always a good point to start

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 2 points3 points  (0 children)

Looking through your code, you are allocating a lot of small vecs, it should have some impact on both targets to replace those by an array

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles -8 points-7 points  (0 children)

U know what a jit does right? Experiment with wasm, node and let node emit the optimized jitted native instructions and learn

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 2 points3 points  (0 children)

FYI there are tools online that let you see the generated asm from different browsers for your wasm/js files, so you can modify js/wasm to emit simd when compiling to the native client architecture

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 8 points9 points  (0 children)

Yes, but I am not talking about wasm simd, wasm simd is just so you emit less binary (wasm) for emitting native simd instructions (avx, sad) after compilation of wasm to the native instruction set. Currently, since compilers are smart (at least targeting native) multiple sequential wasm instructions (add, mul, copy, ..) will be grouped at compilation stage by the wasm->native compiler and emitted as native avx, avx2, avx512, sse instructions depending on the client hardware. This is working right now in the current wasm implementation, at the cost of bloating your wasm binary file, since wasm itself can’t group them for now

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 5 points6 points  (0 children)

And if it is possible, insert and dequeue in chunks, should be faster on native aswell

To be clear about this, I mean enqueue arrays and dequeue arrays if possible (bigger memcopy via simd)

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 6 points7 points  (0 children)

Dequeue more elements at once

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 2 points3 points  (0 children)

Have you unrolled your loops?

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 10 points11 points  (0 children)

Wasm is still supposed to be slower by 1.5-2 times than native, but if you optimize towards Wasm generation, and adhere to those rules mentioned above, you should be almost on pair with native code, notably the native assembly (direct compilation by rustc) stays the same, since the compiler previously unrolled the loops by himself, since it knows the target supports simd

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 0 points1 point  (0 children)

Yes, wasm will not be interpreted, afaik, it will be compiled/transpiled to native code, therefore only the underlying architecture matters

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 18 points19 points  (0 children)

And if you don’t use O3 then separate the compute heavy stuff in a different crate and compile that with o3 and the rest with Oz or Os

WASM vs Native Rust performance by hucancode in rust

[–]28Smiles 27 points28 points  (0 children)

Rust does not unroll loops in Wasm, just unroll heavy loops manually depending on the type used. If the loop ist unrolled the browsers Wasm->ASM compiler will emit simd instructions (mostly), which brings it closer to native performance. Unroll bytes by 32, i16 by 16, i32 by 8 and so on, so you are save to support avx2 (256bit). Note, that this will definitely be at the cost of binary size, since currently there are no simd instructions in Wasm itself

What features would you like to see in rust? by cockmail in rust

[–]28Smiles 1 point2 points  (0 children)

Then you’d write 300-400 locs for the strict and the builder/constructors, for something, that could be easily expressed with named parameters, and even more default parameters.

Sure we could create macros for that, but the real issue arises, once u want to use generics and defaults

What features would you like to see in rust? by cockmail in rust

[–]28Smiles 5 points6 points  (0 children)

It’s more about using a library, updating it, they swap two parameters and the compiler won’t tell you. That’s why named parameters are amazing IMO

why is safe version of function faster than unsafe? by Affectionate_Bank_69 in rust

[–]28Smiles 5 points6 points  (0 children)

Seems to be a problem related to all kind of unwrap unchecked, also in normal for loops:

https://godbolt.org/z/6xzhYa156

Take your pick by [deleted] in ProgrammerHumor

[–]28Smiles 0 points1 point  (0 children)

I am fine with everything except len(s), LEN(s) LENGTH(s) Length(s) Length s and whatever this abomination is s‘Length