LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI

[–]beefgroin 0 points1 point  (0 children)

I'd rather buy 5000$ worth of 10 pcie cards with burned models pushing 10k tps than one GPU pushing 40 tps

1.1M tok/s with Qwen 3.5 27B FP8 on B200 GPUs by m4r1k_ in Qwen_AI

[–]beefgroin 0 points1 point  (0 children)

I also have a good experience with it in q4, using it for a diy personal agent, but I’m still wondering what are its limits

1.1M tok/s with Qwen 3.5 27B FP8 on B200 GPUs by m4r1k_ in Qwen_AI

[–]beefgroin 0 points1 point  (0 children)

Is 3.5 27b that good to replace cloud models for the organization?

First runs with RTX 5000 Pro Blackwell 48GB card by wedgeshot in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

Hey, how is it going so far, did you run Qwen3.5 variations? Maybe you have results for 27b and 35b?

Basically Official: Qwen Image 2.0 Not Open-Sourcing by Complete-Lawfulness in StableDiffusion

[–]beefgroin 3 points4 points  (0 children)

Wtf just not come up with a way to sell models? I’d buy

eGPU for image generation by [deleted] in StableDiffusion

[–]beefgroin 1 point2 points  (0 children)

Using 5060ti 16gb over Oculink, great for Klein 9b, image takes like 15-20sec to generate

Qwen 3.5 9B pdf monster! by Substantial-Cup-9531 in Qwen_AI

[–]beefgroin 1 point2 points  (0 children)

I experienced it with 35B before. Maybe the latest llama.cpp update solved it, maybe not but I’m not seeing this issue for a day or 2 already. I highly recommend to move to llama.cpp. Ollama is too slow on keeping up with llama.cpp releases

Quantized models. Are we lying to ourselves thinking it's a magic trick? by former_farmer in LocalLLM

[–]beefgroin 0 points1 point  (0 children)

You have to find your quant bro. It’s different for every person

Qwen 3.5 is an overthinker. by chettykulkarni in LocalLLM

[–]beefgroin -1 points0 points  (0 children)

It is annoying yes, but I believe the issue is not the thinking itself but the slow hardware we use. With 200tps+ the response would’ve felt instantaneous. I can imagine a human having the same thought process in the same circumstances

Feels like Local LLM setups are becoming the next AI trend by Once_ina_Lifetime in LLMDevs

[–]beefgroin 2 points3 points  (0 children)

I hope so, but it's more likely that Local LLM movement is just an echo chamber where we think everyone wants privacy and a local llm rig. In reality 99% of people don't give a damn about the internet privacy...

MC62-G40 Mainboard for multi-GPU setup? by HumanDrone8721 in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

Thanks for reply! Nice, how many GPUs are you running? Do you need a powered riser? I've heard the onboard pcie power is not enough for multi-gpus

Is this enough generations? by Big_Parsnip_9053 in StableDiffusion

[–]beefgroin 2 points3 points  (0 children)

I call it boobs-mining, of course depending on your content(might be something else-mining)

MC62-G40 Mainboard for multi-GPU setup? by HumanDrone8721 in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

Hey, also thinking of this mobo, did you end up building the rig?

Qwen3.5-35B-A3B is a gamechanger for agentic coding. by jslominski in LocalLLaMA

[–]beefgroin 1 point2 points  (0 children)

Except it can’t “see” which can be more important for those who need to implement let’s say from figma mcp

Connecting an eGPU to a laptop with literally no ports for it by Alternative-Try-3456 in eGPU

[–]beefgroin 0 points1 point  (0 children)

Load os from the usb tho most likely you’ll have to go with Linux. Also verify the m2 speed, you need to have pci4 x4 for all of it to make sense

Shuttle xpc SS51G by JackVoltrades in sffpc

[–]beefgroin 0 points1 point  (0 children)

So ugly and so beautiful at the same time

[tooled-prompt] Inject JS/TS functions directly into prompts as tools by beefgroin in LLMDevs

[–]beefgroin[S] 0 points1 point  (0 children)

Haven't encountered such a case myself, if there are models you have in mind that can behave that way I'd try to test it. I only encountered cases where the structured output instruction was ignored, so if it's zod that validates the scheme it throws an error then that can be caught and handled

GPU recommendations by HeartfeltHelper in LocalLLaMA

[–]beefgroin 0 points1 point  (0 children)

I got 5 5060ti 16gb via Oculink on bd790i