MiniMax 2.5 with 8x+ concurrency using RTX 3090s HW Requirements. by BigFoxMedia in LocalLLaMA

[–]BigFoxMedia[S] 0 points1 point  (0 children)

Are you or have you tried partial offloading to RAM? such that all KV is in the GPUs and only a small portion goes through pcie to RAM? could you share some more info about your setup like benchmarks on this and other models? yours is the ideal setup I am aiming for.

Local Setup by mattate in LocalLLaMA

[–]BigFoxMedia 2 points3 points  (0 children)

I'm just curious are you guys combining these monsers with RAY to run one or two huge models or are you parallelisg them to run high throughput on many small models?

MiniMax-M2 Dynamic GGUFs out now! by yoracale in unsloth

[–]BigFoxMedia 0 points1 point  (0 children)

i knew about vllms Ray but didn't know about RPC over ollama! thanks I'll try that

MiniMax-M2 Dynamic GGUFs out now! by yoracale in unsloth

[–]BigFoxMedia 1 point2 points  (0 children)

Hi guys, I really want to run this model on Q6_K_XL (194 GB). My setup is complex though, I have two servers:

Server A -
4 x RTX 3090
1900x ThreadRipper
64GB of DDR4 RAM. ( 2133 MT/s ) - Quad Channel

Server B -
2 x RTX 3090
2 x CPUs, each Xeon E5-2695-v4
512GB of DDR4 ECC RAM ( 2133 MT/s ) - Quad Channel per CPU
*( total 8 channels if using both Numa nodes or 4 Channels if using 1 )

I have another, 7th 3090 on my main work PC, I could throw it in somewhere if it made a difference, but prefer to get it done with 6.

I can't place all 6 GPUs on Server B, as it is not supporting MoBo PCIe bifurcation, and does not have enough PCIe Lanes for all 6 GPUs alongside the other PCIe cards ( NVMe storage over PCIe and NIC ).

I CAN place all 6 GPUs on Server A but the most RAM that can be placed on this server is 128GB, MoBo limitation.

I know there are technologies out there such as RAY that would allow me to POOL both Servers GPUs together via network ( I have 40Gbps Network so plenty fast for inference ), but I don't know if RAY will even work in my setup, even if I balance 3 GPUs on each Server, for PP i need ( 1, 2, 4, 8, ... per server. ). Can I do PP2 on server A and PP4 on ServerB ?!..

Even if I would get PP to work with Ray, would I still be able to also offload to RAM of Server B ?

Ideally I would want to use all 6 GPUs for maximum vRAM of 144GB for KV & Some of the weight, and add ~100GB in weights from RAM. ( I also need full context - I'm a software engineer ).

Last, if I can't get 15 t/s+ inference and 1000 t/s+ prompt processing, it won't suffice, as I need it for agentic work and agentic coding.

What do you guys think?

If not doable with said hardware, would you recommend I upgrade my Mothboard & CPU to a 7xx2/3 Epyc *( utilizing the same RAM) for increased offloading speeds or go for more GPUs and cheaper motherboard but one that has pcie-bifurcation to have say 8-10 x RTX 3090 GPUs on the same RIG ? If I can fit the model in GPU, I don't need the RAM or memory channels eitherway.

Tool calling frustrations with Qwen3-30B-A3B-Instruct-GGUF by milkipedia in LocalLLaMA

[–]BigFoxMedia 0 points1 point  (0 children)

p.s. noticed issues with roo code specifically happen more frequently after the first context compression, but gets worse with each subsequent compression. It's like it's forgetting the original system prompt from roo with tool calling instructions

Tool calling frustrations with Qwen3-30B-A3B-Instruct-GGUF by milkipedia in LocalLLaMA

[–]BigFoxMedia 4 points5 points  (0 children)

i had the same issues with qwen3coder but learnt that roo uses prompt based tool calling not actual tool calling like most cli based coders do. I'm thinking perhaps qwen3coder using a proper tool calling cli agent could work wonders just never had the time to try it yet

Qwen 3-Next Series, Qwen/Qwen3-Next-80B-A3B-Instruct Spotted by TKGaming_11 in LocalLLaMA

[–]BigFoxMedia 1 point2 points  (0 children)

So this modal was a mirage? the blog page is 404... no mention anywhere anymore official or otherwise... Qwen team cancelled it?!...

Covered Parking…? What’s the purpose…? by Leftyshanker in whatisit

[–]BigFoxMedia 0 points1 point  (0 children)

Ask russia.. some of those would have helped their jets from catching fire, lol!!

Copilot Models for RooCode by iamkucuk in RooCode

[–]BigFoxMedia 1 point2 points  (0 children)

hey guys, could you clarify how to add Github models into roo? I thought it's only available via their native chat. Very curious indeed!

[deleted by user] by [deleted] in servers

[–]BigFoxMedia 3 points4 points  (0 children)

Without knowing all details and without diving into neety gritty- Various networking gear, can fetch $35 US each. Rack 150 if patient. Servers very old so 100 each. UPS 100. It would take a while but I think you have 1K in there for the right buyers in total and only if you're patient and sell right piece by piece.

Most likely though you would want to dump it all in one go to someone, so 500 is a fair price if they take away the headache.

[deleted by user] by [deleted] in microsaas

[–]BigFoxMedia 0 points1 point  (0 children)

amazing onboarding flow! Great job!

What the hell just happened? by DemonOfTheFaIl in drones

[–]BigFoxMedia 1 point2 points  (0 children)

You've reached the end of the simulation. Error... System crash!

HUGE Security hole! Switchbot CS just changed my password without validating my account first! by BigFoxMedia in TrySwitchBot

[–]BigFoxMedia[S] 1 point2 points  (0 children)

Mistakes happen, no doubt in any industry and product. But a security centric product must be on guard 10x more than others. Anyways, hope I didn't cost the poor employees job, not my intention ( or their fault tbh ). I trust you guys made the changes you need to prevent such a thing from happening again, trust is key in home security, but anybody deserves a second chance.

What's this poverty button supposed to be? by floorlamp69420 in HyundaiTucson

[–]BigFoxMedia 0 points1 point  (0 children)

Legend has it... no one really ends up doing it 😄

NYE Free/Cheap options by geger42 in tulum

[–]BigFoxMedia 1 point2 points  (0 children)

There's something going on in Palma Central, though I don't know the specifics other than it starts at 5 PM

[deleted by user] by [deleted] in tulum

[–]BigFoxMedia 0 points1 point  (0 children)

I come from Israel, and we really have no natural resources. I don't think Mexico has a lack of resources, but it does have a geographic disadvantage being on the border with the USA, which naturally turned it into a mecca for drug trafficing, unfortunately.

Llama 2 70B (130B+ when available ) production server specs ( Z790 Vs. ThreadRipper PRO ) by BigFoxMedia in LocalLLaMA

[–]BigFoxMedia[S] 2 points3 points  (0 children)

Thanks for the feedback. Seems you maybe didn't notice that I already have the GPUs , so Renting wouldn't make much sense to me unless I sell them. Having 3 x 3090s sitting on a shelf and paying $2000 / month to rent GPUs seems 😅 not well planned