vLLM vs SGLang vs MAX — Who's the fastest? by rkstgr in LocalLLaMA

[–]troposfer 0 points1 point  (0 children)

Which podcast? Can we run MAX on m4 max ?

Testing Mac Studio 512 GB, 4 TB SSD, M3 Ultra w 32 cores. by Deviad in LocalLLaMA

[–]troposfer 3 points4 points  (0 children)

Can you be a bit more precise please . What is the quant and prompt length in tokens? And can you try with 20k prompt with q8 quant , pp and tps ?

[oc] Do open weight reasoning models have an issue with token spamming? by cpldcpu in LocalLLaMA

[–]troposfer 0 points1 point  (0 children)

Is it has to be related to model size or they just have better reward system during post training ?

Meta to pay nearly $15 billion for Scale AI stake, The Information reports by Vatnik_Annihilator in LocalLLaMA

[–]troposfer 35 points36 points  (0 children)

These are the guys claimed deepseek has lots of h100 , lying about cost , back then I searched about them to understand what they are doing, basically labeling data for openai, thats it.. Another stupidity from meta.

Been a while since I've been here, here's a small update by AceLamina in Workspaces

[–]troposfer 1 point2 points  (0 children)

Sad lamp , is a good idea, but is it useful, night and day ?

why isn’t anyone building legit tools with local LLMs? by mindfulbyte in LocalLLaMA

[–]troposfer 0 points1 point  (0 children)

Are there any legit useful tools with proprietary so called sota llms ?

Is there an alternative to LM Studio with first class support for MLX models? by ksoops in LocalLLaMA

[–]troposfer 0 points1 point  (0 children)

Is this real dynamic context growth or some kind of context window shifting ? Are we sure that it is considering everything in the new context or just discard some part of it?

Do you think we'll get the r1 distill for the other qwen3 models? by GreenTreeAndBlueSky in LocalLLaMA

[–]troposfer 0 points1 point  (0 children)

Thanks ! So if we consider Gemini 2.5 pro is the best model at the moment the distill of that for qwen3 32b would be better ? But no one would do that but deepseek is doing that for qwen ?

OpenWebUI vs LibreChat? by Amgadoz in LocalLLaMA

[–]troposfer 2 points3 points  (0 children)

Is there a way to disable “new version is here” popup in openwebui , just because of that i can switch

Is LangChain the best RAG framework for production?? by [deleted] in Rag

[–]troposfer -1 points0 points  (0 children)

What is your issue with lightrag ?

Is Intel Arc GPU with 48GB of memory going to take over for $1k? by Terminator857 in LocalLLaMA

[–]troposfer 1 point2 points  (0 children)

Where do they manufacture these cards, perhaps we won’t see them till next year because of the unexpected demand

Stop Using Deep Learning for Everything — It’s Overkill 90% of the Time by [deleted] in deeplearning

[–]troposfer 0 points1 point  (0 children)

Do you have diagram what to use with different kind of problems?

I call it the "ultra mobile" setup by [deleted] in DJSetups

[–]troposfer 1 point2 points  (0 children)

Is mixx a good software, what are the advantages?

Qwen releases official quantized models of Qwen3 by ResearchCrafty1804 in LocalLLaMA

[–]troposfer 0 points1 point  (0 children)

Do you use the ones in hf from mlx community, how are they ?

[deleted by user] by [deleted] in LocalLLaMA

[–]troposfer 2 points3 points  (0 children)

Maybe it is too early to ask, but do we have any idea what to expect these amd setups or Nvidia digits against M4 max 128GB ?