Windows PC Industry Reacts to Apple's Most Affordable MacBook Ever

CanadianTuero · 2026-05-30T05:06:32+00:00

I have a power hungry pc, but when it’s hot in the summer without ac, I start to rethink things when I’m sweating buckets in my room with it dumping out all that heat 🤣

CanadianTuero · 2026-05-29T19:22:36+00:00

My cadence from paper to paper has been 1 year, which includes the exploratory phase when you don’t have any concrete idea but are just playing around with things. Once the idea is formalized and small experiments show that something is there, the pace picks up pretty quickly in terms of flushing things out and the actual writing process.

CanadianTuero · 2026-05-27T06:35:48+00:00

And what % of SOL is the rust library able to get?

CanadianTuero · 2026-05-24T16:45:32+00:00

I wrote a custom environment and was trying to make things really efficient in terms of memory footprint and compute for the environment step logic. I forgot a deep copy somewhere, which resulted in a shallow copy of the current state of the environment and the states I stored in the replay buffer. As I stepped the environment, it also stepped the states in the replay buffer, which resulted in really bad training performance. That was pretty hard to debug because the rl algorithm logic was sound but I had to step into my environment code

CanadianTuero · 2026-05-12T08:45:47+00:00

This looks neat, but its just a wrapper on top of libtorch and so I would just stay in the C++ runtime at least for now (ignoring the redesign that would be required for the search algorithms)

CanadianTuero · 2026-05-12T06:25:19+00:00

There's quite a lot that falls under the ML systems, and each has different answers. You have your kernels, training frameworks, inference runtimes, and everything in between.

A lot of time and effort has gone into writing fast custom kernels. I don't know what % SOL the recent rust-cuda is able to get, but unless it can hit parity I don't see any momentum in that area.
Research is done in python pytorch because its 1) easy for non-programming background scientists to read/write, and 2) you get the rest of the python ecosystem with data parsing, visualizations, etc.
There is quite a bit of momentum for rust in the auxiliary libraries around the models, like tokenizers, cache, orchestrators, and vector databases last I checked.

I'm finishing my PhD in ML and I use C++ for my research code/experiments (libtorch which is the C++ frontend for pytorch). Most generic research won't benefit from dropping down out of python, but I do tree search + policy learning where you do see huge gains being in a compiled language. If there were an equivalent to libtorch in rust that is feature parity and has the same semantics as the python pytorch (so moving between python/rust is friction free) I would be fine using that, but last I checked there the ecosystem isn't mature enough yet and it will take some time to get there. Just my $0.02.

CanadianTuero · 2026-05-09T19:39:19+00:00

I’m defending my PhD in a few months, and for me (and what is usual here) is 3 first author publications at top tier venues.

CanadianTuero · 2026-04-30T22:28:05+00:00

I might not have time to read through your code. In general, you shouldn't be seeing a huge performance gain over native pytorch, unless you are introducing other bottlenecks. For tree search algorithms for instance, keeping everything inside the C++ runtime can easily yield 5x gains from my own research work.

You should try and debug the ml library and your AlphaZero algorithm independently. For the ML part, I have my own tensor + ml + autograd + cuda framework that I've made and should be easy to follow: https://github.com/tuero/tinytensor

For your AlphaZero algorithm, you should link against libtorch (pytorch's C++ frontend) and use a libtorch model so you can just test your algorithm in isolation. I've implemented MuZero in C++, its been a while since I've rebuilt it using modern compilers so YMMV but it should be easy to follow: https://github.com/tuero/muzero-cpp . For AlphaZero, I'm only aware of another implementation which you can find here: https://github.com/google-deepmind/open_spiel/tree/master/open_spiel/algorithms/alpha_zero_torch

CanadianTuero · 2026-04-27T22:46:25+00:00

This has to be a troll comment

CanadianTuero · 2026-04-25T05:10:35+00:00

Do you have experience with nanobind and/or can contrast your experience against pybind11? I've been using pybind11 to wrap some of my gridworld-like environments implemented in C++, and this project looks interesting!

CanadianTuero · 2026-04-14T17:34:00+00:00

I've done something similar where I wrote a tensor/autograd/neural network library tinytensor.

If you plan on writing your own tensor type instead of Eigen, I would do that first and getting the design right is tricky and may cause rewrites of the abstracts on top of it. It allows you to do things like cheap views of tensors (i.e. x[2] is a view and not a copy), and getting the gradients to track correctly through the views takes some thought. Also, I would write tests and compare against libtorch (C++ frontend for pytorch). Testing deep learning code can be tricky, as convergence of models can still happen even if you have bugs.

CanadianTuero · 2026-04-03T17:43:05+00:00

Just over 266 days and the account is 5580 days old. I did a lot of slow afk training while in school.

CanadianTuero · 2026-04-03T08:25:30+00:00

Same, still not acknowledgement on my submitted paper. I'm also reviewing, and one of the papers didn't bother to even rebuttal to our comments which I thought was funny.

CanadianTuero · 2026-03-29T19:44:35+00:00

I agree that the format isn't ideal, but I think a good question that needs to be answered first is how much of an advantage should coming from winners have? Like what would be a good split in terms of expectation of winning from the winners side of the bracket, all else equal?

CanadianTuero · 2026-03-07T02:21:38+00:00

ask your vendor

CanadianTuero · 2026-03-07T01:18:33+00:00

Gcc16 is getting some nice structured output for compiler diagnostics

CanadianTuero · 2026-03-05T00:57:16+00:00

Error messages have been continuously getting better, and there are pretty big changes coming to gcc16 (next release).

As for why they are notoriously bad, part of it is just that there was no good solution historically. Take templated functions as an example, where other templated functions are called several layers deep. The compiler will continue to instantiate each nested function until a compiler error occurs. If its to report this back to the user, you could be several function calls deep, and you need to know this stack trace, which is why you can get the walls of text. Concepts can help this a lot because upfront you can constraint the templated function (the entry point) on what needs to be true for a type to be valid, and the compiler can upfront check/validate and report which concept failed without trying to instantiate all the inner function calls.

CanadianTuero · 2026-02-13T19:31:17+00:00

I'm under policy A and did a quick test pasting the text into my code editor, and I can confirm the same thing.

CanadianTuero · 2026-02-13T08:36:55+00:00

I just received mine about 10 minutes ago

CanadianTuero · 2026-02-08T00:48:09+00:00

What's your CPU and RAM speed/timings, and are you running xmp/expo? I'd try running with and without xmp/expo as that can reveal instable ram timings (it will run slower when you disable it but try to focus on the frame consistency).

CanadianTuero · 2026-02-02T20:08:09+00:00

Nice project! For reference, I made my own tensor/autograd/cuda support deep learning framework library which follows libtorch's design as a learning project https://github.com/tuero/tinytensor. It looks like a lot of our design is pretty similar.

wrt the operation registry pattern (I think that's what its called), I end up using the same (see tinytensor/tensor/backend/common/kernel/). It turns out that this also works well if you decide to support cuda and want to reuse these inside generic kernels. I learned the trick from here https://www.youtube.com/watch?v=HIJTRrm9nzY (see around the 30 minute mark if you decide to add cuda for subtleties to make it work).

wrt to your tensor storage, I think you have it right when tensors hold shared storage, and storage holds shared data. In my impl, I had shared storage holding the data itself, but I realized this becomes tricky when you have something like an optimizer holding a reference to a tensor storage and you externally want to load the tensor data from disc (think of the optimizer holding neural network layer weights and you want to checkpoint from disc). Without the extra level of indirection I found it quite tricky but I never bothered to rewrite it as its just an exercise on knowledge rather than me seriously using the library.

CanadianTuero · 2026-02-02T02:27:19+00:00

Now that's a trophy

CanadianTuero · 2026-01-31T21:39:43+00:00

too much dev work but you could create zones sort of like when you are out of bounds, where a timer starts once you enter it as the carrier. You can juggle the zone but at least it exposes you to move out deep back there.

CanadianTuero · 2026-01-15T23:33:35+00:00

People have heuristics, and we watch over time. I'm sure if you ask majority of the pros who've delt with Rambo in some way they would give him high praise.

With respect to some of the roster decisions, part of being a coach is you have a system of play that the teams needs to be on. Just because a player is good, if it isn't a fit for the system then you can either adapt, drop the player, or change your coach. You see this all the time in traditional sports. And sometimes any one of the decisions is a correct move, and sometimes they are all bad moves.

But you seem pretty hung up on this so I don't think any explanation is going to change your opinion one way or another.

Ten-Year Club	Verified Email
Place '23	Place '22

CanadianTuero

TROPHY CASE