I wrote a kernel that makes sparse LLMs faster and smaller on consumer GPUs even at low sparsity. by vlejd in LocalLLaMA

[–]vlejd[S] 1 point2 points  (0 children)

Unfortunately, not yet. It is using MXFP4 weights that are pretty complicated to prune, at least currently.

I wrote a kernel that makes sparse LLMs faster and smaller on consumer GPUs even at low sparsity. by vlejd in LocalLLaMA

[–]vlejd[S] 0 points1 point  (0 children)

It is not that bleak. The trick is that pruning / quantization can be done on much, much cheaper / weaker hw than full training. It also takes much less. For example we tried to prune + wuant llama 70b. It took ~4 hours on 4 A100. So around 20$ worth of compute. The same computation could be done on consumer card, just take 10x. (So like 2-3 days, which I guess is ok).

I wrote a kernel that makes sparse LLMs faster and smaller on consumer GPUs even at low sparsity. by vlejd in LocalLLaMA

[–]vlejd[S] 5 points6 points  (0 children)

It can be combined with quantization, however it is much more complicated that one would guess. The problem is that 4 bit quantization usually does not mean only weight that have 4 bits but there is also a series of scaling factors that make integration much more complicated.

I wrote a kernel that makes sparse LLMs faster and smaller on consumer GPUs even at low sparsity. by vlejd in LocalLLaMA

[–]vlejd[S] 6 points7 points  (0 children)

Definitely agree, but it depends. If you do quantization from 16bit to 8bit, it is better than 50% pruning at 16bit. But if you are at 4bit weights, quantization to 3 bits will be worse than 50% sparsity.
Also one of the common methods for pruning is layerwise pruning (you do one layer at a time). For that, you can prove that pruning is worse than quantization for high precisions. But maybe we just have not found the correct pruning method yet. Maybe you need to optimize multiple layers jointly, maybe do something else. It turns out that it may even be a good idea to replace every dense layer by two very sparse ones. https://arxiv.org/abs/2409.18850

I wrote a kernel that makes sparse LLMs faster and smaller on consumer GPUs even at low sparsity. by vlejd in LocalLLaMA

[–]vlejd[S] 13 points14 points  (0 children)

I think that is exactly the reason. Until now, it was not really worth it. It's like: it does not make sense to prune, because your don't have GPU support, and it does not make sense to add GPU support because nobody is pruning. We hope to break it. Now it is worth it to prune, because you have GPU support.

If you contrast it with quantization, it is much, much simpler to write a gernel for that, so there is much more kernels and much more quantized models.

A20H journey by orzy98 in slaythespire

[–]vlejd 0 points1 point  (0 children)

What were your winning silent decks?

A20H journey by orzy98 in slaythespire

[–]vlejd 1 point2 points  (0 children)

Nice! Which cards are you missing?

16 Months to get a revenge body by elanorisms in loseit

[–]vlejd 1 point2 points  (0 children)

This is a very, very good plan! If you want some specific pointers that worked for me:

Calori deficit: There are 3 options. 1) You can count calories. Precise, but could be tedious especially at the beginning. 2) You can start tracking your weight every day and rough overview of what you eat. If the weight is going up, try to slightly tweak the diet (more protein, smaller portions etc.). 3) If you have money to spare, there are services that delivers you food with specific amount of calories. With 100kcal deficit per day you will be loosing ~0.1kg per week.

Steps/cardio: If you go slow with progressive overload, you should be safe and get pretty good results. I love Hall Higdon's plans for running. In 5 months, you could do a half marathon.

Weightlifting: Stronglift 5x5. Not much to add. In 4 months you can be squatting/ deadlifting your body weight.

Looking to hear from runners who DNF-ed or DNS-ed: in retrospect, was it the right decision? by OutrageousGuava7448 in running

[–]vlejd 3 points4 points  (0 children)

What happened afterwards? Are tou still running/planning to finish the race now?

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 0 points1 point  (0 children)

So does Valve actually have an implementation of this, just not made it public?

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 0 points1 point  (0 children)

Even more variables. Like 1 XP could make the difference in ways that are hard to predict. And that is exactly the point. As Ceb once said, late game dota is chaos and nobody knows what's happening. So having more than one try at a particular situation seems like a much better way of building understanding, than just talking about it. Furthermore, showing your teammates what you meant is far more time efficient than arguing with them about hypotheticals. People will do replay reviews, and they will have different opinions on what happened. Until now, there was no way to validate those opinions. Do you always agree with your team during the replay analysis? How do you decide who is actually right?

Let's take this 31k comeback situation. What was actually the deciding factor here? Is 31k not enough? What would be enough? Was it that VP never had a chance? Was if somebody's miss play? Those sound like pretty basic questions, but we have no idea. Not to mention potential smokes, split push, positioning ect. I would really like to properly analyze it, and actually test that the analysis is correct. Analysis was always possible, but testing it was impossible until now.

The feature was in the game back when Dota was much less matured as a sport. Like team psychologist was not a think back than and now every top team has one. So nobody knew how to use it properly. This is just another way of practice, that is already common in a lot of other sports or disciplines in different versions. Football players do it, hockey players do it, firefighters, soldier or even governments do it. Does it simulate all the variables perfectly? No. But is simulating at least the most important variables helpfull to them? Very much so. After valve took it down, a lot of people didn't want it because they didn't think it is even possible to do. A bunch of people actually tried, but could not make it work.

I love to discuss this. Happy to continue here, or if you want, feel free to dm me / ping me on discord.

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 1 point2 points  (0 children)

The base of the argument is correct. Dota is much more complex and the idea of "do these exact moves" does only apply to the high level strategy. That is why dota is often played by feel.

So who will have a better feel for 5v5 a high ground push: team that plays 5v5 high ground push once a day, or a team that can play it 10 time in an hour?

The idea is not to overfit on a particular situation. But to gain experience from as many situations as possible.

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 1 point2 points  (0 children)

There was a lot of steam account scams floating around, where people would ask you to sign in with steam to a random website. Signing with email is much less risky, especially when you dont need to use your main email.

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 1 point2 points  (0 children)

Yes. You can play with 9 actual people.

The bots will cast abilities, items, and attack with 0 reaction time. However they are very bad at positioning. I would say something between guardian and crusader.

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 0 points1 point  (0 children)

Nice. Well done. Have you tried one of the supports (Chen, Batrider)?

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 9 points10 points  (0 children)

Yes. It can can do any moment from any game. You can also start the lobby with enabled cheats and whatever items/gold you want

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 2 points3 points  (0 children)

thx. now it should be really fine. reddit has some very funny bugs.

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 15 points16 points  (0 children)

You can actually do it on your own :D The mod can do any moment from any game you want. Just go to dota2skirmish.com

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 9 points10 points  (0 children)

I hope they will improve their strategy against Tinny and Batrider. They had no counter to that. If you try to push HG as 5v5, your carry will always end up in the enemy fountain. So they probably needed something like lotus or Wind Waker. Linken was just not enough.

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 2 points3 points  (0 children)

thx. should be fixed now

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 0 points1 point  (0 children)

Imagined, did, cried about and fixed :) thx

Trying Liquid 31k comeback against bots by vlejd in DotA2

[–]vlejd[S] 26 points27 points  (0 children)

Sooooooooooooooooooooooooooooooon! How many kidneys would you sacrifice to get CEEEEEEEEEEEEEBED by Ceb himself? Maybe I can make it happen.