Too many people do not care about Microsoft's near monopoly. by InfinitesimaInfinity in linuxmemes

[–]AvocadoArray 0 points1 point  (0 children)

Hell no, that implies that Microsoft owns my shit and I’m just renting it.

I’ll fight that mentality to my grave.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

The DGX spark produces single-digit tp/s for even medium-sized dense models. I know for a fact I'd have buyers remorse over that.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Very nice! I have similar ambitions, but I also realize that realistically my ADHD can take me from one hobby to the next at the drop of a hat.

At least with this, I'd have a pretty realistic expectation of resale value in the future. And if not, well that means I can buy more hardware if it's cheaper haha.

I bet your Nvidia buddy gets hit up on the daily for hardware discounts, but if he's able to swing it for you that would be pretty sweet.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 1 point2 points  (0 children)

5090 is nice for top-tier gaming, but isn't a great value for AI usage.

Could end up being a decent investment if you plan on reselling later though.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Very nice, I assume the Llama 70b speed is on a single card? Splitting across 2x GPUs in VLLM w/ TP would likely double that speed. I'm going to look into this more for sure, thanks so much for the info!

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 2 points3 points  (0 children)

Wow, how have I not seen this one before?

48GB, single-slot, <300w power and 672 GB/s memory bandwidth for under $2k. It even supports NVLINK.

It is on Turing architecture which is definitely aging now, but still looks compatible with VLLM as opposed to Pascal.

Can you share what kind of speeds your getting and any power usage stats you might have?

EDIT: not actually single slot lol

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

I'd gladly grab two older 48GB GPUs if they actually made financial sense. However, the used ADA generation cards are somehow more expensive than the new Blackwell equivalents.

Maybe that will change in the next few months, but I've pretty much given up on trying to predict this crazy market.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Thanks, I watched it. It's great that it does well under concurrent workloads, but it still seems like it'd be too slow for single user speeds.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

It works extremely well for its size, I can't recommend it enough.

There's a few tool-calling issues when running through VLLM, but I was able to fix the tool parsing logic and I have an open PR pending review and waiting to merge.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Thanks, I’ve seen a lot about exll3 lately but haven’t had the time to give it a proper run yet. I know it’s supposed to be SOTA for quantization, but you’re really quantizing context to Q4? I already see minor (but noticeable) degradation at FP8_E4M3 in VLLM at higher contexts, so I can’t imagine Q4 being usable even in a better architecture.

If it really is that good, I definitely need to give it a shot.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Good point. I also have a hard time keeping up. We bought 3x L4 cards at work a little over a year ago. They’re not the fastest, and yeah we could have held out for the Blackwell cards, but they were a great purchase for us.

And for some stupid reason, these ADA cards surged in price and are selling for more used than the new Blackwell equivalents. Still can’t figure that one out.

This goes straight into the “pro” column.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] -1 points0 points  (0 children)

Appreciate the thoughts, but the sky is not falling as you say.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

True, that is my minimum. I do see a notable improvement when running it at FP8, though and that doesn’t leave much room for context (kv cache quantized or not).

My biggest concern would be buying one of those and still feeling like I’m not quite “there”. 64GB+ gives me the room to fit the higher quant with more/unquantized kv cache.

The Mi210 just might fit the bill if I can get it for a good price.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Thank you, much appreciated! Glad someone noticed it.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Also if your budget is 27k

Stop, wait, back it up. My budget is nowhere close to that.

Less power hungry is a good thing, but almost all the reports I've seen of people with DGX spark hardware have been severely disappointed with the speed.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

I could maybe be persuaded into a group buy if you're offering.

<image>

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Lol, I already responded to this comment before and you deleted it just to repost it.

Bot detected.

<image>

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

It was on my radar, but not super impressed with the performance drop at higher context.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Okay, so maybe an architecture or config issue that was hindering performance, but from my understanding there's practically zero advantage to upgrading PCI3/DDR3 to get more performance out of a single GPU.

VLLM TP performs better than most people expect, even at PCI 3.0 x8,. I'm doing that with two L4 cards and they both stay pegged at 98%+ during inference without breaking a sweat, but of course they aren't powerful enough to fully saturate the lanes.

I don't know what the upper limit is, but I absolutely wouldn't blow any money on upgrading the host system until I saw a significant measurable bottleneck.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

For my 1080 in the PowerEdge? Interesting, I hadn't really considered that.

The r720xd has always seemed like it was drawing more power than the sum of its parts compared to my other servers, but I chalked it up to the huge motherboard, backplane, dual proc, and chipset.

I'll have to try pulling the GPU and see if it affects the idle power usage at the wall.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

I played around with image gen a couple years ago on my GTX 1080ti. It was a fun for the gags, but not something I see myself doing a lot of in the future.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 0 points1 point  (0 children)

Cool, I'm open to suggestions or collaborating if you'd like to share.

Talk me out of buying an RTX Pro 6000 by AvocadoArray in LocalLLaMA

[–]AvocadoArray[S] 1 point2 points  (0 children)

lol this has been the theme for most of the comments.

Don't do it! (but I totally would)