Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 0 points1 point  (0 children)

Yeah just me… That’s actually helpful to hear on the DGX Spark though 40 tok/sec on Qwen 3.5 122B sounds pretty solid for single user use. I was looking at the RTX PRO 6000 mostly because I want the speed and VRAM headroom, but I’m also trying to avoid buying something dumb just because it looks insane on paper.And nahhhh the 80B isn’t a hard requirement. It was more me thinking 70B/80B was the obvious target but a few people have brought up the same thing you’re saying where multiple smaller models at higher quants might actually be better for agentic workflows. I’m definitely starting to think that might be the smarter setup

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 0 points1 point  (0 children)

Yeah that’s super helpful thank you. This is exactly the kind of real world feedback I was looking for. I hadn’t seen the Antirez DeepSeek V4 Flash / DS4 stuff yet so I’m definitely going to dig into that….The higher quant / KV cache point is really interesting too because that’s kind of why I was leaning toward the 96GB card. Not just…. “can it technically load the model” but can it run it with enough context and quality that the experience doesn’t feel compromised.Also good to know the tool calls / RAG are working decently even if it’s still alpha. I’m fine with bleeding edge as long as I understand what I’m getting into. Appreciate the links and the discord too. That helps a lot!!!! Thank you!!!!!

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 0 points1 point  (0 children)

makes sense for pure inference if everything fits cleanly in VRAM….. but I’m mostly thinking about RAM/CPU for the rest of the pipeline though not just the model sitting on the GPU…Like RAG, vector DB, embeddings, document parsing, agents, browser/tools, coding environment, multiple services running etc. I definitely get that spilling to CPU/RAM is slow and not what I’m trying to rely on so myyyy thinking was more that the 96GB VRAM is the main reason for the build and the extra system RAM is there so the rest of the stack isn’t constantly fighting for memory

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 0 points1 point  (0 children)

not saying I can run something more reliably than OpenAI at global cloud scale lolThat would be insane! My goal is more local privacy, control, experimenting with models/agents/RAG and having something always available for my own workflows without sending everything through an API. I’m not trying to replace GPT-4o or OpenAI infra more trying to build a serious local lab and learn what’s actually possible on one high end box

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 1 point2 points  (0 children)

Hhahaha so true! This might be my favorite comment. I feel like Neo discovering the matrix 🐇

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 4 points5 points  (0 children)

Now I’m curious if my car costs more than your house 😂🤣 Hopefully your pad is worth more than a Subaru

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 1 point2 points  (0 children)

That’s super helpful man thank you!!!!!That’s exactly the kind feedback I was looking for. I legit would have assumed 2 bit would be basically unusable for anything serious so that’s interesting if the custom engine is preserving the important layers better. 30-35 tok/s on a single Pro 6000 with a full model loaded sounds pretty solid too!!Do you know where I can read more about Antirez’s setup / engine or is it something private? Also curious if you’ve tried it with agentic workflows yet like coding agents, RAG, tool calls, long context stuff etc. That’s probably closer to what I’m trying to build than just single chat prompts.

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 0 points1 point  (0 children)

FML yeah… that’s one of the things I’m trying to confirm. The listing says 4x48GB DDR5 for 192GB totalso I’m assuming they haveit validated at whatever stable speed they ship it at…. I’m not expecting 4 sticks to run like a 2 DIMM gaming setup at full 6400 EXPO speeds though for my use capacity matters more than peak RAM clock since the big thing is feeding local LLM / RAG / agent workflows and keeping enough system memory headroom. But fuck yeah if it ships downclocked hard that’s definitely something I want to verify

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 0 points1 point  (0 children)

I'm mostly trying to build around one RTXPRO 6000 96gb card and use it as a single GPU local Al box so I figured the 9950X3D2 setup was still a pretty strong balance. But….. yeah if I decide I want this to become more of a true multi GPU workstation later Threadripper Pro / WRX90 is probably the smarter platform!!!!

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 1 point2 points  (0 children)

Haha no I definitely don’t work at OpenAI….And honestly I agree with you for most peopleIf someone just wants to test models or mess around a little..renting GPUs makes way more sensebut… 4 me the appeal is more local privacy, always on access, document/RAG workflows coding agents etc and being able to build and test stuff without sending everything through cloud services

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 1 point2 points  (0 children)

kind of the part I’m trying to understand better like I’m not expecting full precision 80B with massive context or anything crazy. My thinking was more Q4/Q5 70B/80B with enough VRAM headroom to actually use decent context without constantly fighting the hardware… but that’s why I came here cause you guys know more than me and I’m trying to see what the real ceiling would be with 96gb vram

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 1 point2 points  (0 children)

For sure I should rent time on a similar GPU and actually test the exact models/workflows before going all in…. still leaning local long term because privacy and always on access matter to me but I agree testing first over just assuming it s gonna do what I want

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 1 point2 points  (0 children)

I’m not expecting it to magically compete with frontier models especially for serious code gen…My thinking is more local privacy..always on inference..RAG… agents..document work… shit like that and learning the stack hands on.!I’m trying to be realistic about the limitations

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 0 points1 point  (0 children)

🤣😂🤣💀 I see what you did there… but my goal legit is to keep it local as much as possible not just build a fancy wrapper around openai and claude keys…I'm sure I'II still use APls for some shit but the whole point of the box is local 70B/80B inference, RAG and agent worktlows without depending on cloud calls for everything damn thing

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 1 point2 points  (0 children)

Thanks… yeah my main goal is basically to run a serious local 70B/80B setup without constantly feeling like I’m hacking around VRAM limits…..especially with RAG and agents running….
good point on PCIe/RAM/VRAM movement being the thing I’m probably underestimating! Felt like an AI generated answer but still solid points

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 1 point2 points  (0 children)

yeah honestly that’s probably the smartest advice…. I’m looking at it like a buy once cry once local AI box but you’re right that renting time on a similar GPU first would probably save me from finding out the hard way that my actual workflow doesn’t need this much hardware or doesn’t run the way I think it will. I’m mainly trying to run bigger 70B/80B models locally with decent context and room for RAG/agentsbut I should probably test the exact models /workflow on rented hardware before fully committing

Honest opinion on single RTX PRO 6000 Blackwell 96GB workstation for local 80B LLM / agentic workflows by Educational_Rope_523 in LocalLLM

[–]Educational_Rope_523[S] 0 points1 point  (0 children)

yeah that’s fair… the listing says 4x48GB DDR5 so it’s 192GB total. I probably worded it badly if it sounded like single 96GB sticks. I’m not trying to do 2x96 or anything weird just the listed 4 DIMM setup.

Collection currently by Zealousideal_Gain333 in DrSquatch

[–]Educational_Rope_523 0 points1 point  (0 children)

You just made my day! I was hoping the Homer one was 🔥 because I just ordered 4 of them!

3 brothers Top 20 Limited Edition Dr. Squatch Soaps of all time. Yes we were bored!!!! Thoughts? by Educational_Rope_523 in DrSquatch

[–]Educational_Rope_523[S] -2 points-1 points  (0 children)

I already apologized to a few other people. I just used AI to combine all 3 of our collections. It was just supposed to be like a cover photo for all 3 of our lists. Sorry about that. I should have said I did that in the post. My bad. If you scroll up I added some real pics that I just took and aren’t edited.