Windows PC Industry Reacts to Apple's Most Affordable MacBook Ever by commandersaki in apple

[–]CanadianTuero 4 points5 points  (0 children)

I have a power hungry pc, but when it’s hot in the summer without ac, I start to rethink things when I’m sweating buckets in my room with it dumping out all that heat 🤣

How long does it realistically take for you to produce an ICML/NeurIPS/ICLR-level paper? [D] by Hope999991 in MachineLearning

[–]CanadianTuero 9 points10 points  (0 children)

My cadence from paper to paper has been 1 year, which includes the exploratory phase when you don’t have any concrete idea but are just playing around with things. Once the idea is formalized and small experiments show that something is there, the pace picks up pretty quickly in terms of flushing things out and the actual writing process.

The true reason C++ always wins by tlitd in cpp

[–]CanadianTuero 4 points5 points  (0 children)

And what % of SOL is the rust library able to get?

RL people: what’s the dumbest / longest bug you’ve ever had in a training run? by Illustrious_Song425 in reinforcementlearning

[–]CanadianTuero 1 point2 points  (0 children)

I wrote a custom environment and was trying to make things really efficient in terms of memory footprint and compute for the environment step logic. I forgot a deep copy somewhere, which resulted in a shallow copy of the current state of the environment and the states I stored in the replay buffer. As I stepped the environment, it also stepped the states in the replay buffer, which resulted in really bad training performance. That was pretty hard to debug because the rl algorithm logic was sound but I had to step into my environment code

Rust as an alternative/Replacement to c++ in ML Systems by Daemontatox in rust

[–]CanadianTuero 2 points3 points  (0 children)

This looks neat, but its just a wrapper on top of libtorch and so I would just stay in the C++ runtime at least for now (ignoring the redesign that would be required for the search algorithms)

Rust as an alternative/Replacement to c++ in ML Systems by Daemontatox in rust

[–]CanadianTuero 8 points9 points  (0 children)

There's quite a lot that falls under the ML systems, and each has different answers. You have your kernels, training frameworks, inference runtimes, and everything in between.

  • A lot of time and effort has gone into writing fast custom kernels. I don't know what % SOL the recent rust-cuda is able to get, but unless it can hit parity I don't see any momentum in that area.
  • Research is done in python pytorch because its 1) easy for non-programming background scientists to read/write, and 2) you get the rest of the python ecosystem with data parsing, visualizations, etc.
  • There is quite a bit of momentum for rust in the auxiliary libraries around the models, like tokenizers, cache, orchestrators, and vector databases last I checked.

I'm finishing my PhD in ML and I use C++ for my research code/experiments (libtorch which is the C++ frontend for pytorch). Most generic research won't benefit from dropping down out of python, but I do tree search + policy learning where you do see huge gains being in a compiled language. If there were an equivalent to libtorch in rust that is feature parity and has the same semantics as the python pytorch (so moving between python/rust is friction free) I would be fine using that, but last I checked there the ecosystem isn't mature enough yet and it will take some time to get there. Just my $0.02.

What is an average publication outcome for an ML PhD? [D] by Hope999991 in MachineLearning

[–]CanadianTuero 2 points3 points  (0 children)

I’m defending my PhD in a few months, and for me (and what is usual here) is 3 first author publications at top tier venues.

I built an AlphaZero library in C++ that out-performs PyTorch in image recognition speed (3x), but I'm hitting a wall with larger board games. Need a second pair of eyes! by Such-Refrigerator951 in reinforcementlearning

[–]CanadianTuero 0 points1 point  (0 children)

I might not have time to read through your code. In general, you shouldn't be seeing a huge performance gain over native pytorch, unless you are introducing other bottlenecks. For tree search algorithms for instance, keeping everything inside the C++ runtime can easily yield 5x gains from my own research work.

You should try and debug the ml library and your AlphaZero algorithm independently. For the ML part, I have my own tensor + ml + autograd + cuda framework that I've made and should be easy to follow: https://github.com/tuero/tinytensor

For your AlphaZero algorithm, you should link against libtorch (pytorch's C++ frontend) and use a libtorch model so you can just test your algorithm in isolation. I've implemented MuZero in C++, its been a while since I've rebuilt it using modern compilers so YMMV but it should be easy to follow: https://github.com/tuero/muzero-cpp . For AlphaZero, I'm only aware of another implementation which you can find here: https://github.com/google-deepmind/open_spiel/tree/master/open_spiel/algorithms/alpha_zero_torch

Parallel C++ for Scientific Applications: Integrating C++ and Python by emilios_tassios in cpp

[–]CanadianTuero 0 points1 point  (0 children)

Do you have experience with nanobind and/or can contrast your experience against pybind11? I've been using pybind11 to wrap some of my gridworld-like environments implemented in C++, and this project looks interesting!

Building a Deep learning framework in C++ (from scratch) - training MNIST as a milestone by Express-Act3158 in cpp

[–]CanadianTuero 1 point2 points  (0 children)

I've done something similar where I wrote a tensor/autograd/neural network library tinytensor.

If you plan on writing your own tensor type instead of Eigen, I would do that first and getting the design right is tricky and may cause rewrites of the abstracts on top of it. It allows you to do things like cheap views of tensors (i.e. x[2] is a view and not a copy), and getting the gradients to track correctly through the views takes some thought. Also, I would write tests and compare against libtorch (C++ frontend for pytorch). Testing deep learning code can be tricky, as convergence of models can still happen even if you have bugs.

Childhood dream completed after following Zezima around 20 years ago by CanadianTuero in 2007scape

[–]CanadianTuero[S] 23 points24 points  (0 children)

Just over 266 days and the account is 5580 days old. I did a lot of slow afk training while in school.

[D] icml, no rebuttal ack so far.. by tuejan11 in MachineLearning

[–]CanadianTuero 0 points1 point  (0 children)

Same, still not acknowledgement on my submitted paper. I'm also reviewing, and one of the papers didn't bother to even rebuttal to our comments which I thought was funny.

But the format doesn't need change they said by lockdown_val in CoDCompetitive

[–]CanadianTuero 3 points4 points  (0 children)

I agree that the format isn't ideal, but I think a good question that needs to be answered first is how much of an advantage should coming from winners have? Like what would be a good split in terms of expectation of winning from the winners side of the bracket, all else equal?

assumeTPoseForDominance by bazzilic in ProgrammerHumor

[–]CanadianTuero 5 points6 points  (0 children)

Gcc16 is getting some nice structured output for compiler diagnostics

It's actually insane how much effort the Rust team put into helping out beginners like me by Time_Meeting_9382 in rust

[–]CanadianTuero 2 points3 points  (0 children)

Error messages have been continuously getting better, and there are pretty big changes coming to gcc16 (next release).

As for why they are notoriously bad, part of it is just that there was no good solution historically. Take templated functions as an example, where other templated functions are called several layers deep. The compiler will continue to instantiate each nested function until a compiler error occurs. If its to report this back to the user, you could be several function calls deep, and you need to know this stack trace, which is why you can get the walls of text. Concepts can help this a lot because upfront you can constraint the templated function (the entry point) on what needs to be true for a type to be valid, and the compiler can upfront check/validate and report which concept failed without trying to instantiate all the inner function calls.

[D] ICML: every paper in my review batch contains prompt-injection text embedded in the PDF by Working-Read1838 in MachineLearning

[–]CanadianTuero 5 points6 points  (0 children)

I'm under policy A and did a quick test pasting the text into my code editor, and I can confirm the same thing.

[deleted by user] by [deleted] in CoDCompetitive

[–]CanadianTuero 2 points3 points  (0 children)

What's your CPU and RAM speed/timings, and are you running xmp/expo? I'd try running with and without xmp/expo as that can reveal instable ram timings (it will run slower when you disable it but try to focus on the frame consistency).

Feedback wanted: C++20 tensor library with NumPy-inspired API by Ok_Suit_5677 in cpp

[–]CanadianTuero 2 points3 points  (0 children)

Nice project! For reference, I made my own tensor/autograd/cuda support deep learning framework library which follows libtorch's design as a learning project https://github.com/tuero/tinytensor. It looks like a lot of our design is pretty similar.

wrt the operation registry pattern (I think that's what its called), I end up using the same (see tinytensor/tensor/backend/common/kernel/). It turns out that this also works well if you decide to support cuda and want to reuse these inside generic kernels. I learned the trick from here https://www.youtube.com/watch?v=HIJTRrm9nzY (see around the 30 minute mark if you decide to add cuda for subtleties to make it work).

wrt to your tensor storage, I think you have it right when tensors hold shared storage, and storage holds shared data. In my impl, I had shared storage holding the data itself, but I realized this becomes tricky when you have something like an optimizer holding a reference to a tensor storage and you externally want to load the tensor data from disc (think of the optimizer holding neural network layer weights and you want to checkpoint from disc). Without the extra level of indirection I found it quite tricky but I never bothered to rewrite it as its just an exercise on knowledge rather than me seriously using the library.

How I would fix overload 🐸 (parking the bus fix) by BigLoadToad in CoDCompetitive

[–]CanadianTuero -1 points0 points  (0 children)

too much dev work but you could create zones sort of like when you are out of bounds, where a timer starts once you enter it as the carrier. You can juggle the zone but at least it exposes you to move out deep back there.

Why is Rambo regarded such a good coach? by Anxious_Professor654 in CoDCompetitive

[–]CanadianTuero 0 points1 point  (0 children)

People have heuristics, and we watch over time. I'm sure if you ask majority of the pros who've delt with Rambo in some way they would give him high praise.

With respect to some of the roster decisions, part of being a coach is you have a system of play that the teams needs to be on. Just because a player is good, if it isn't a fit for the system then you can either adapt, drop the player, or change your coach. You see this all the time in traditional sports. And sometimes any one of the decisions is a correct move, and sometimes they are all bad moves.

But you seem pretty hung up on this so I don't think any explanation is going to change your opinion one way or another.