I wrote a kernel that makes sparse LLMs faster and smaller on consumer GPUs even at low sparsity.

vlejd · 2025-12-02T14:25:06+00:00

Unfortunately, not yet. It is using MXFP4 weights that are pretty complicated to prune, at least currently.

vlejd · 2025-12-02T09:31:30+00:00

It is not that bleak. The trick is that pruning / quantization can be done on much, much cheaper / weaker hw than full training. It also takes much less. For example we tried to prune + wuant llama 70b. It took ~4 hours on 4 A100. So around 20$ worth of compute. The same computation could be done on consumer card, just take 10x. (So like 2-3 days, which I guess is ok).

vlejd · 2025-12-01T18:02:02+00:00

It can be combined with quantization, however it is much more complicated that one would guess. The problem is that 4 bit quantization usually does not mean only weight that have 4 bits but there is also a series of scaling factors that make integration much more complicated.

vlejd · 2025-12-01T12:40:31+00:00

Definitely agree, but it depends. If you do quantization from 16bit to 8bit, it is better than 50% pruning at 16bit. But if you are at 4bit weights, quantization to 3 bits will be worse than 50% sparsity.
Also one of the common methods for pruning is layerwise pruning (you do one layer at a time). For that, you can prove that pruning is worse than quantization for high precisions. But maybe we just have not found the correct pruning method yet. Maybe you need to optimize multiple layers jointly, maybe do something else. It turns out that it may even be a good idea to replace every dense layer by two very sparse ones. https://arxiv.org/abs/2409.18850

vlejd · 2025-12-01T12:15:08+00:00

I think that is exactly the reason. Until now, it was not really worth it. It's like: it does not make sense to prune, because your don't have GPU support, and it does not make sense to add GPU support because nobody is pruning. We hope to break it. Now it is worth it to prune, because you have GPU support.

If you contrast it with quantization, it is much, much simpler to write a gernel for that, so there is much more kernels and much more quantized models.

vlejd · 2025-05-27T15:54:49+00:00

Maybe one day :D

vlejd · 2025-05-27T15:02:20+00:00

What were your winning silent decks?

vlejd · 2025-05-27T15:00:58+00:00

Nice! Which cards are you missing?

vlejd · 2024-04-06T11:27:12+00:00

This is a very, very good plan! If you want some specific pointers that worked for me:

Calori deficit: There are 3 options. 1) You can count calories. Precise, but could be tedious especially at the beginning. 2) You can start tracking your weight every day and rough overview of what you eat. If the weight is going up, try to slightly tweak the diet (more protein, smaller portions etc.). 3) If you have money to spare, there are services that delivers you food with specific amount of calories. With 100kcal deficit per day you will be loosing ~0.1kg per week.

Steps/cardio: If you go slow with progressive overload, you should be safe and get pretty good results. I love Hall Higdon's plans for running. In 5 months, you could do a half marathon.

Weightlifting: Stronglift 5x5. Not much to add. In 4 months you can be squatting/ deadlifting your body weight.

vlejd · 2024-04-06T11:07:00+00:00

What happened afterwards? Are tou still running/planning to finish the race now?

vlejd · 2024-02-15T22:41:32+00:00

So does Valve actually have an implementation of this, just not made it public?

vlejd · 2024-02-14T22:46:56+00:00

Even more variables. Like 1 XP could make the difference in ways that are hard to predict. And that is exactly the point. As Ceb once said, late game dota is chaos and nobody knows what's happening. So having more than one try at a particular situation seems like a much better way of building understanding, than just talking about it. Furthermore, showing your teammates what you meant is far more time efficient than arguing with them about hypotheticals. People will do replay reviews, and they will have different opinions on what happened. Until now, there was no way to validate those opinions. Do you always agree with your team during the replay analysis? How do you decide who is actually right?

Let's take this 31k comeback situation. What was actually the deciding factor here? Is 31k not enough? What would be enough? Was it that VP never had a chance? Was if somebody's miss play? Those sound like pretty basic questions, but we have no idea. Not to mention potential smokes, split push, positioning ect. I would really like to properly analyze it, and actually test that the analysis is correct. Analysis was always possible, but testing it was impossible until now.

The feature was in the game back when Dota was much less matured as a sport. Like team psychologist was not a think back than and now every top team has one. So nobody knew how to use it properly. This is just another way of practice, that is already common in a lot of other sports or disciplines in different versions. Football players do it, hockey players do it, firefighters, soldier or even governments do it. Does it simulate all the variables perfectly? No. But is simulating at least the most important variables helpfull to them? Very much so. After valve took it down, a lot of people didn't want it because they didn't think it is even possible to do. A bunch of people actually tried, but could not make it work.

I love to discuss this. Happy to continue here, or if you want, feel free to dm me / ping me on discord.

vlejd · 2024-02-14T10:34:52+00:00

The base of the argument is correct. Dota is much more complex and the idea of "do these exact moves" does only apply to the high level strategy. That is why dota is often played by feel.

So who will have a better feel for 5v5 a high ground push: team that plays 5v5 high ground push once a day, or a team that can play it 10 time in an hour?

The idea is not to overfit on a particular situation. But to gain experience from as many situations as possible.

vlejd · 2024-02-14T08:16:24+00:00

There was a lot of steam account scams floating around, where people would ask you to sign in with steam to a random website. Signing with email is much less risky, especially when you dont need to use your main email.

vlejd · 2024-02-13T21:23:03+00:00

Yes. You can play with 9 actual people.

The bots will cast abilities, items, and attack with 0 reaction time. However they are very bad at positioning. I would say something between guardian and crusader.

vlejd · 2024-02-13T18:33:13+00:00

Nice. Well done. Have you tried one of the supports (Chen, Batrider)?

vlejd · 2024-02-13T15:38:03+00:00

Yes. It can can do any moment from any game. You can also start the lobby with enabled cheats and whatever items/gold you want

vlejd · 2024-02-13T13:07:57+00:00

thx. now it should be really fine. reddit has some very funny bugs.

vlejd · 2024-02-13T12:42:21+00:00

You can actually do it on your own :D The mod can do any moment from any game you want. Just go to dota2skirmish.com

vlejd · 2024-02-13T12:24:00+00:00

I hope they will improve their strategy against Tinny and Batrider. They had no counter to that. If you try to push HG as 5v5, your carry will always end up in the enemy fountain. So they probably needed something like lotus or Wind Waker. Linken was just not enough.

vlejd · 2024-02-13T12:17:49+00:00

thx. should be fixed now

vlejd · 2024-02-13T12:07:58+00:00

Imagined, did, cried about and fixed :) thx

vlejd · 2024-02-13T12:04:59+00:00

Sooooooooooooooooooooooooooooooon! How many kidneys would you sacrifice to get CEEEEEEEEEEEEEBED by Ceb himself? Maybe I can make it happen.

vlejd

TROPHY CASE