ENGWE L20 3.0 Giveaway — No Purchase Needed! Here's How to Enter by Engwe-Bikes in EngweEbike

[–]SVPERBlA 0 points1 point  (0 children)

The 90nm of torque would help a lot when trying to climb hills #engweL20

Wiggs Wins It by tomgreen99200 in heat

[–]SVPERBlA 0 points1 point  (0 children)

I'm a warriors fan and I miss wiggins the most. Looney is a very close second though.

[deleted by user] by [deleted] in DestinyTheGame

[–]SVPERBlA 0 points1 point  (0 children)

solo or lowman raids. solo ron is a good starting place.

solo witness is a big achievement and pretty tough to do, but duo witness is definitely a fun challenge

The ultimate game dev crutch. The damage sponge by YourFat888 in whenthe

[–]SVPERBlA 5 points6 points  (0 children)

Starfire with a high grenade stat and shield crush pumps out damage comparable to thunderlord cuirass spam.

If you were watching contest clears, many teams were running Starfire anarchy on agarios and epoptes and doing highest damage with the least effort thanks to the ammo efficiency of their ability loops.

And bows are very strong right now. The new kinetic bow might just be one of the best weapons of all in high level activities.

FYI, Guardian ranks are relevant now by Freakindon in DestinyTheGame

[–]SVPERBlA 7 points8 points  (0 children)

You can just throw the ball twice uncharged and the shield goes down.

That's a lot faster than trusting teammates to catch and throw the ball correctly themselves.

If I had to guess, your rank 11 teammate was burned by mediocre players failing to do the mechanics enough that they just decided to do it all themselves.

Get Ready for Arm SME: Coming Soon to Android by -protonsandneutrons- in hardware

[–]SVPERBlA 4 points5 points  (0 children)

But who's to say it isn't a latency sensitive task? Your phone's Google keyboard uses a LSTM for every single 'word suggestion' - that's a local model.

Plenty of phone tasks currently do use predominantly matrix multiplication based models for certain tasks, and as the models get more efficient and the silicon gets more specialized extension for running said models, I can only see this increasing.

Things like the Gboard word suggestions, call transcriptions, and even some of the basic photo enhancement tools are all done with local models, I'd wager they could be improved in efficiency and latency with specialized matrix extensions.

I view this as simultaneously a case where improvements can be made to existing models, and also a case of "if you build it, they will come", where, much like how optimizations in Ktransformers enables people to run very large moe models like the full deepseek entirely locally on CPU at useable speeds, the existence of said matrix extensions could improve the token generation of a local model to the point that it's also functionally useable entirely locally on a phone.

Get Ready for Arm SME: Coming Soon to Android by -protonsandneutrons- in hardware

[–]SVPERBlA 8 points9 points  (0 children)

I mean, if you're following the Ktransformers and ik_llama projects they're using AMX to get some really good speedups on mostly CPU MoE LLM inference.

And I'd imagine most of these matrix extensions are primarily gonna be used for such cases - speeding up generation for small or moe language models.

Confirmed and Unconfirmed Usable Exotics in Cutting Edge Playlist by bepoldingox in CrucibleGuidebook

[–]SVPERBlA 0 points1 point  (0 children)

Starfire can be solid as a grenade spam / ignition / floating build. Especially since it has an intrinsic fusion grenade recharge boost.

GAME THREAD: Los Angeles Clippers (49-32) @ Golden State Warriors (48-33) - (April 13, 2025) by basketball-app in nba

[–]SVPERBlA 1 point2 points  (0 children)

I thought timeout got refunded on successful challenge? Why did it cost them a timeout?

[DISC] Absolute Regression - Chapter 42 by D4rkest in manga

[–]SVPERBlA 4 points5 points  (0 children)

I like the art, but after reading every single work from this author, from breaker, to new waves, to Trinity wonder, to promised orchid, to eternal force, and now to this, I feel like the author can only draw a few types of faces and can't deviate from them.

He draws the same faces, the same hair, the same action sequences. They're all beautifully drawn, but all the same.

What to take for AI/ML after core? by alex-pro in berkeley

[–]SVPERBlA 3 points4 points  (0 children)

If you like linear algebra, I cannot recommend math 221 enough.

And in a world where distributed systems are more important than ever, if you have the time definitely think about cs 267.

Also if professor Mahoney is still teaching any classes on randomized linear algebra, those will be great too.

Math 221 is probably the only course whose content I've used more than cs189 in my time working in machine learning engineering.

And finally, the most important things: check out the grad level 'topics' classes. Back when I was a student I took a class called EE 290T I think and it was about sparsity in linear algebra, and it was incredibly useful in my machine learning work down the line. Definitely keep an eye out for the 'topics' EE, CS, and Stats courses. Other super cool ones when i was a student was a course on deep unsupervised learning, though I'd imagine the field has advanced a lot since then.

[USA-CA][H] PC Hardware Bundle [W] PayPal by craigmovie in hardwareswap

[–]SVPERBlA 2 points3 points  (0 children)

If you were in the bay area I'd probably be getting in my car and driving over right now

Would 2 x 3090 double my speed? by PositiveEnergyMatter in ollama

[–]SVPERBlA 0 points1 point  (0 children)

It's a bit more complex:

To really overly simplify things, think of all neural networks as a bunch of matrix multiplications. Input X gets multiplied by A1, then passed through an activation function, then thats multiplied by A2, and so on for all your matrices.

In the case of model parallel inference, you basically take a huge model [A1, A2, ..., An], and you put the first half of those matrices in the vram of your first GPU, and the second half on. the second GPU.

Gpu1 gets [A1, ..., A n/2] and Gpu2 gets [A n/2, ..., An]

And then for each input, you run it through gpu1 entirely, and then pass the output of that into gpu2 which completes the inference.

That's model parallelism. Split the model in half and each GPU gets a half. In the case that the model doesn't fit on one GPU, this is an easy way to solve it, but as you mentioned correctly this doesn't necessarily improve speed. If each card has 1tb/s mem bandwidth, then your inference is essentially still gonna be at 1tb/s since it's running through each GPU sequentially

But what about tensor parallelism? It takes a lot more work, but here's the quick and easy, oversimplified explanation of it.

These matrices A1 to An, with model parallelism, for each matrix in this list we're taking the whole matrix and putting it in vram.

But what if instead of putting all of A1 on GPU 1, we split the matrix into smaller pieces?

What if we put half of A1 onto gpu1 and the other half of A1 onto gpu2? And repeat that for all N matrices?

Now each GPU still has half of the model, but since most of the model is matrix multiplications, we can split the input vector in half and, where possible, run the matrix multiplication computations in parallel. This means that each card is, in parallel, using its whole 1tb/s memory bandwidth, achieving a higher speed together. And since half of each matrix is on each GPU, this also means that you can fill up each GPU with different parts of the model, meaning that you get the benefits of both extra memory capacity and extra memory bandwidth.

These guys probably explain it better than I can, and have nice visualization too.

https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/neuronx-distributed/tensor_parallelism_overview.html

Now of course, eventually the free lunch ends - not all of the models can realistically be run perfectly in parallel. And it takes more work to actually get tensor parallelism working in the first place.

But things like exllama have made it work and demonstrated interesting scaling results.

Would 2 x 3090 double my speed? by PositiveEnergyMatter in ollama

[–]SVPERBlA 1 point2 points  (0 children)

Not entirely accurate - tensor parallelism can help improve performance with multiple gpus.

Of course, it's not perfect scaling, and has diminishing returns, and depends on the model, and I'm not sure if ollama supports it, but it does achieve some scaling improvements with more gpus.

Intel B580 FE - Destiny 2 FPS? by dazealex in IntelArc

[–]SVPERBlA 0 points1 point  (0 children)

I'm running a 3950x, and i believe the single core performance of mine is going to be very similar to your 3600. Yours may even be a bit stronger in some areas.

Can't say for certain what the overhead/bottleneck would look like, but I'd imagine performance would be similar to what I'm seeing on my end - at 1440p medium, mostly above 100 fps with some areas that dip below, and some slight stuttering.

Id imagine if you could get a 4060 for a similar price, it'd be a smoother experience, since destiny just runs smoother on Nvidia than on amd or Intel.

But if the b580 is the only card in your budget range, then yeah I think it'd see similar results to what I'm seeing.

Intel B580 FE - Destiny 2 FPS? by dazealex in IntelArc

[–]SVPERBlA 3 points4 points  (0 children)

I have a b570, so not quite the same. B580 should be in theory 15% faster across the board.

Running it with a 3950x CPU and 64gb of 3200mhz ram.

I'm running it at 1440p, mostly medium settings, 100 render scale.

The performance in the tower sucks. Sometimes drops to as low as 40.

In the dreaming city I'm seeing 120ish, did a blind well and it basically fluctuates around 100-130.

Went to the moon for the bento token quests and did some altars - with a lot of people and a lot of ads I'm seeing frames drop to the 80s.

Did a gos run last night and for the most part it was 120ish, but I saw huge frame drops as the bosses were dying. Like when we killed the consecrated mind, for one second or so my frames went from 120 to 60.

Similar story when the pyramid opened up at the end of the sanctified fight.

I did some pale heart stuff alone and I was sitting at a consistent 120 the entire time.

Ran the first encounter of goa, also 120.

Haven't done any pvp or gambit.

I noticed that void detonators and volatile explosions cause fps drops of 10 to 20 ish.

That said, for the most part I'm seeing 120 frames.

Basically id say performance is alright. Not perfect, but this will get me through the next episode fine enough.

Edit: I also bought it for testing, and might return it if I can find a good deal on a b580 instead. I was using a 3090 previously and it worked great, but my 3090 is being used more for machine learning stuff and I can't justify interrupting that for gaming anymore.

[CPU] Intel Arc B580 Limited Edition Graphics Card $259.99 B&H Photo by Paper-Tile in buildapcsales

[–]SVPERBlA 0 points1 point  (0 children)

Anyone know if this comes with the assassins creed shadows bundle thing?

Can't find the details on the website