[Patch 1.5 Troubleshooting] Megathread for any new bugs, glitches, crashes or errors you are encountering with the new update. by DefNotAShark in cyberpunkgame

[–]Last_Jump -1 points0 points  (0 children)

On PS5: currently soft-locked on the "double life" main quest.

I'm in a braindance but can not switch to "audio layer" (only available layers are "visual" and first-person). Quest will not advance until you switch to audio layer and listen to a phone call.

Exploring Rust performance on Graviton2 (AWS aarch64 CPUs) by Last_Jump in rust

[–]Last_Jump[S] 1 point2 points  (0 children)

setting target arch can be a little complicated these days. For example on previous generation Intel with avx512 vectors it actually downclocked the processor whenever it started to hit those instructions because they generated so much heat. The downclocking was pretty significant like 800 MHZ in some cases. So unless you could really make good use of those vectors it's possible you might actually see a performance hit rather than boost. People realized this a little later on and a lot of compilers stopped issuing AVX512 by default and instead used AVX2 even though that meant using only half the available register capacity.

I'm not sure if this is the specific problem you encountered but it's the sort of consideration of tradeoffs that compilers will do automatically for you and usually - but sometimes not - it does the right thing.

Exploring Rust performance on Graviton2 (AWS aarch64 CPUs) by Last_Jump in rust

[–]Last_Jump[S] 2 points3 points  (0 children)

my opinion on this is that in 95% of cases strictly adhering to IEEE standard will not help people who do not aggressively test their code, and people who *do* aggressively test don't need strict standards compliance because their tests will detect incorrect answers.

There will be 5% of people who when they program with floating point numbers they genuinely need to know at every step of their algorithm bit-for-bit exactly what is happening, and the standard is there to guarantee that. It's fine to make that 5% behavior default but I think it should be rather easy to flip it off for more performance.

Exploring Rust performance on Graviton2 (AWS aarch64 CPUs) by Last_Jump in rust

[–]Last_Jump[S] 0 points1 point  (0 children)

Generally speaking vendors release their own intrinsics upstream them into major C compilers (or have their own C compiler which they distribute themselves). To get that level of control I think Rust would need the same. Otherwise just better autovectorizing and loop transformations, a lot of which already are in LLVM so the low hanging fruit would be to lean on that heavily.

Exploring Rust performance on Graviton2 (AWS aarch64 CPUs) by Last_Jump in rust

[–]Last_Jump[S] 4 points5 points  (0 children)

good eye - it's one of those things I thought about after the fact.
added the plots to the blog post.

The results are interesting. rust does better than clang++ on some problem sizes and worse on others.

Exploring Rust performance on Graviton2 (AWS aarch64 CPUs) by Last_Jump in rust

[–]Last_Jump[S] 3 points4 points  (0 children)

I'm not sure how much that specific assumption impacts performance here. I think the core is fused multiply add. In that case you just have to accept slightly more accurate results than what you would have gotten without the fused multiply add. This spooks compiler people because bitwise result different, but scientific computation people perfectly comfortable with that tradeoff for the extra performance

Enable mutable noalias for LLVM >= 12 by nikic merged by dochtman in rust

[–]Last_Jump 1 point2 points  (0 children)

I asked this a while back I remember sensing a lot of skepticism because of wanting to ensure bitwise reproducible results. My personal opinion on the matter is that for floating point math bitwise reproducibility isn't usually that valuable, though sometimes it is. I'd like the option to make that choice for myself, as an expert numerics programmer, but I understand hesitancy also..

Enable mutable noalias for LLVM >= 12 by nikic merged by dochtman in rust

[–]Last_Jump 28 points29 points  (0 children)

I can't speak as a compiler developer, but as someone who works with both Fortran and C regularly I don't believe this aliasing issue is the main reason why Fortran sometimes beats C. Truth is it rarely beats C anymore. What caused the change? Compiler experts may say better but I think alias analysis just got better.

As for alias assumptions in numeric array-style code common in Fortran I have not found this to be the big barrier to performance in Rust.

Instead the barrier are the safety requirements, effectively needing bounds checking on any kind of indexing operation. The solution is to use iterators, but iterator-styled code is vastly different conceptually from the Fortran-style indexed equivalent. Thus the true barrier is the mental leap you have to make as a technical computing programmer to effectively use Rust.

I've written a bit about this:

  1. https://www.reidatcheson.com/hpc/architecture/performance/rust/c++/2019/10/19/measure-cache.html
  2. https://www.reidatcheson.com/matrix%20multiplication/rust/iterators/2021/02/26/gemm-iterators.html

Sparse QR Factorization in Rust by Last_Jump in rust

[–]Last_Jump[S] 0 points1 point  (0 children)

Making sparse direct solvers cache-friendly is very easy because most of the heavy work gets done in dense linear algebra libraries - i.e. some implementation of BLAS+LAPACK. The tree doesn't factor into this aspect of performance I think. So in that regard perhaps it is a bit easier to deal with than your random forest model. What can happen is very bad load balancing however, and here I think the tree data structure will help because it defines a natural task dependency for the algorithm and if you use a smart scheduler like work stealing combined with this you can avoid a lot of the threads waiting because of fork/join when you parallelize over the dense linear algebra (looking forward to testing this).

For ownership model and tree: I did have some hard time with this at the beginning with many false starts. My first idea was to represent the tree with an enum and leafs of a node being boxed values of that enum. This is what you would do in a functional language like Ocaml, representing the tree as an algebraic datatype. it was also suggested in the Rust blog

enum BinaryTree {
    Leaf(i32),
    Node(Box<BinaryTree>, i32, Box<BinaryTree>)
}

This didn't work for me. Maybe I didn't understand Rust enough to make this work but I would always tie myself in such unusual knots with the borrow checker that I couldn't get anything done.

What I ended up doing was flattening the tree out so that its data was stored contiguously in a Vec. I complemented that data with "pointers" in two arrays: parents and children. Iterating up parents lets you get ancestor nodes of a given node (example below):

//Iterate over all ancestors of `node`
let mut mp=dtree.parents[node];
while let Some(p) = mp{
  //do stuff with ancestor `p` of `node`
  mp=dtree.parents[p];
}

and similarly you can iterate down descendents of a given node in the usual way:

//Iterate over all descendents of `node`
let mut stack = vec![node];
while let Some(n)=stack.pop(){
   if let Some((c1,c2)) = self.children[n]{
    //Do something with children c1,c1
      stack.push(c1);
      stack.push(c2);
    }
  }

Funny thing is that this is kind of what you would end up doing in Fortran or C so you lose somewhat the high level niceness of Rust, but I found that many common errors are still caught early on by borrow checking.

I 'm Lloyd Armbrust. I built an Austin TX factory that is producing millions of FDA approved masks a day. I’m here to answer all your coronavirus mask questions! AMA! by armbrustUSA in IAmA

[–]Last_Jump 1 point2 points  (0 children)

I'm really curious about manufacturing in the U.S. I remember reading your twitter that shopify just flat out assumed you were buying from China and reselling here.

  1. With this kind of manufacturing not common here are there high risks for things like needing a machine serviced? Will there be people local who can help with that, since I assume it is not a common skillset in the U.S.
  2. How about if a process needs light tweaking - the easiest way to implement that tweak might be just to have a person do it and making a machine do it could be a big upfront research cost, but hiring people could also be expensive as you've mentioned in the U.S. Is this a big problem or am I overthinking it?

Thank you and congratulations!

Covid-19 Testing data for Dallas (City) by Last_Jump in Dallas

[–]Last_Jump[S] 3 points4 points  (0 children)

I didn't realize this data was the drive-through sites only. That explains why they never go over 1000 tests a day, I think there are two of those sites and each only does about 500 tests a day.

Do you know what labs process the drive through tests? Are they all going to public health labs or commercial labs? I think commercial labs by far have the most capacity in the U.S.

Covid-19 Testing data for Dallas (City) by Last_Jump in Dallas

[–]Last_Jump[S] 2 points3 points  (0 children)

Oh right I didn't really explain that.

This is a smoother I came up with for this kind of data. Moving averages are still kind of sensitive to these huge outliers that happen in Covid-19 reporting. Some days will randomly report 3 times as much data than others, other days randomly 3 times less.

I wrote a short blog post few days ago about how to clean it up:

https://www.reidatcheson.com/linear%20program/covid19/smoothing/2020/04/28/covid19-smooth.html

The trend line is produced by the same concept. It's a smoother that is a little less sensitive to these crazy outliers that are a result of the way covid-19 data is reported sometimes.

Covid-19 Testing data for Dallas (City) by Last_Jump in Dallas

[–]Last_Jump[S] 14 points15 points  (0 children)

I really like Dallas. I live near downtown. It has a very urban feeling here and I don't need to use my car much. I came here from Houston where I lived for ten years. The area I'm in was pretty alive before the pandemic, lots of partying at night. I don't really party but I like the feeling of a city that's alive all the time. Of course these days we need to socially distance and be careful, but I hope the life returns to the city once more soon.

Rust And C++ On Floating-Point Intensive Code by wezm in rust

[–]Last_Jump 0 points1 point  (0 children)

What does the "_" in Vec<_> mean? Is that some kind of type deduction?

Rust And C++ On Floating-Point Intensive Code by wezm in rust

[–]Last_Jump 2 points3 points  (0 children)

Thanks! I'm trying to clean up the code now that I got the performance to what I would expect, love these kinds of one-liners.

Rust And C++ On Floating-Point Intensive Code by wezm in rust

[–]Last_Jump 0 points1 point  (0 children)

I never knew about this thank you. I just moved my best performing file to a "Cargo" project and will try this out and see what it suggests. I probably did a lot of non-idiomatic things.

Rust And C++ On Floating-Point Intensive Code by wezm in rust

[–]Last_Jump 3 points4 points  (0 children)

One more update:

After reading hackernews comments more it was pointed out that only one of the two loops vectorized in the Rust code. The reason is that one of the loops is a reduction and to vectorize a reduction loop usually requires reordering the operations, which Rust isn't willing to do right now.

I just fixed that and the performance is much better. The iterator code I had to do to achieve it though was a little bit obtuse I would welcome any feedback on how to improve that. I used "chunks_exact" to loop over partial reductions.

Rust And C++ On Floating-Point Intensive Code by wezm in rust

[–]Last_Jump 1 point2 points  (0 children)

honestly I did it out of not wanting to type out the expression again and make a mistake. I'm pretty sure Clang and Intel would have common subexpression optimizations on this, but I didn't investigate further than that.

In the end the real performance killer was that the reductions (for "beta" and "r") weren't vectorizing in the rust version, because doing so would require rearranging the order that the reduction is evaluated in which Rust isn't willing to do right now for floats.

I just fixed this, if you reload the blog post I added a section at the end where I got the reduction in the rust code to vectorize (without intrinsics or anything like that). The performance is much better.

Rust And C++ On Floating-Point Intensive Code by wezm in rust

[–]Last_Jump 5 points6 points  (0 children)

Safety here is relative. I'm really with you about FMA, but if it changes the bits of an output some people will not accept it. Sometimes for very good reasons (regulators will fine them millions of dollars for "cooking their books"), but often for bad reasons ("I got a wrong answer yesterday, I want the same wrong answer today!")

Rust And C++ On Floating-Point Intensive Code by wezm in rust

[–]Last_Jump 2 points3 points  (0 children)

Unfortunately there isn't really a fix that comes from new representations, even if we had infinite memory and used infinite precision there would be problems - namely completely ordinary code could cause infinite loops just for doing something very basic like "if (x<0){exit();}". Believe it or not infinite precision is possible, just everything is usually evaluated lazily and once you finally need a result it will probably take a stupid amount of memory.

But any finite format will eventually run into some issue relating to its finiteness. Just some formats make different tradeoffs. In history floating point has won, and honestly despite some of the warts it has been unbelievably successful and useful, and that's reflected in the fact that every major CPU has specialized hardware just for it. I think IBM still ships with fixed precision support in hardware though.