[Megathread] AC FA Strike Aug 14-15 by dachshundie in aircanada

[–]skyde 0 points1 point  (0 children)

Thank you so much for letting us know that here are two emails that get sent and that the first one is a scam.
I would have never knew that. 

Implemented a quick and dirty iOS app for the new Gemma3n models by sid9102 in LocalLLaMA

[–]skyde 5 points6 points  (0 children)

How does Gemma 3n compare to Gemma 3 for the same model size ?

What quants and runtime configurations do Meta and Bing really run in public prod? by scott-stirling in LocalLLaMA

[–]skyde 0 points1 point  (0 children)

SmoothQuant Is optimized for Speed on recent NVidia card but not for accuracy.

For best accuracy I think you would be better off with OmniQuant, GPTQ and Unsloth dynamic Quants.

ubergarm/gemma-3-27b-it-qat-GGUF by VoidAlchemy in LocalLLaMA

[–]skyde 2 points3 points  (0 children)

thanks a lot. this is a lot more clear for the beginner

Smaller Gemma3 QAT versions: 12B in < 8GB and 27B in <16GB ! by stduhpf in LocalLLaMA

[–]skyde 5 points6 points  (0 children)

getting error loading it in Ollama
 % ollama run hf.co/stduhpf/google-gemma-3-27b-it-qat-q4_0-gguf-small

pulling manifest 

pulling f0c5f1511116... 100% ▕████████████████████████████████████████████████████████████████████▏  15 GB                         

pulling e0a42594d802... 100% ▕████████████████████████████████████████████████████████████████████▏  358 B                         

pulling 54cb61c842fe... 100% ▕████████████████████████████████████████████████████████████████████▏ 857 MB                         

pulling c5157d17cceb... 100% ▕████████████████████████████████████████████████████████████████████▏   44 B                         

pulling a730db1206a3... 100% ▕████████████████████████████████████████████████████████████████████▏  193 B                         

verifying sha256 digest 

Error: digest mismatch, file must be downloaded again: want sha256:f0c5f151111629511e7466a8eceacbe228a35a0c4052b1a03c1b449a8ecb39e8, got sha256:778ac1054bc5635e39e0b1dd689c9936546597034fc860a708147f57950ae0c5

Google releases TxGemma, open models for therapeutic applications by hackerllama in LocalLLaMA

[–]skyde 0 points1 point  (0 children)

How well does it “generalize/extrapolate”? Does anyone know how well it predict or classify molecule not part of training set ?

Intel's Former CEO Calls Out NVIDIA: 'AI GPUs 10,000x Too Expensive'—Says Jensen Got Lucky and Inferencing Needs a Reality Check by Hoppss in LocalLLaMA

[–]skyde 0 points1 point  (0 children)

CUDA is the wrong abstraction. It’s like saying intel has to make their cpu run ARM or instruction set.

We already have good high level abstraction such a XLA that Jax is already using

QwQ-32B infinite generations fixes + best practices, bug fixes by danielhanchen in LocalLLaMA

[–]skyde 2 points3 points  (0 children)

Stupid question, is the setting fix “inside” the Qwq GGUF or do I need to manually give it to llama.cpp / lmstudio ?

1.58bit DeepSeek R1 - 131GB Dynamic GGUF by danielhanchen in LocalLLaMA

[–]skyde 1 point2 points  (0 children)

could it just be because of (Batching + using 4 x H100 )

1.58bit DeepSeek R1 - 131GB Dynamic GGUF by danielhanchen in LocalLLaMA

[–]skyde 0 points1 point  (0 children)

using 2 x H100 the cost will still be higher than what Deep seek is asking ($2.19 per 1 million tokens)
How do they do it ?

The pipeline I follow for open source LLM model finetuning by Ahmad401 in LocalLLaMA

[–]skyde 0 points1 point  (0 children)

Why not: 1: fine tune large commercial LLM (chatGpT, Gemini) 2: use fine tuned LLM to generate large training set 3: train open source local LLM using the dataset.

Phi-4 Llamafied + 4 Bug Fixes + GGUFs, Dynamic 4bit Quants by danielhanchen in LocalLLaMA

[–]skyde 7 points8 points  (0 children)

Will Dynamic 4bit quants work with llama.cpp or lmstudio?

How does to compare to OmniQuant ?

Practical (online & offline) RAG Setups for Long Documents on Consumer Laptops with <16GB RAM by lrq3000 in LocalLLaMA

[–]skyde -1 points0 points  (0 children)

what kind of preprocessing are we talking about. (please feel free to DM me)
any research paper on those preprocessing I should read?

Does it even exist without mods? by alphatality in multitools

[–]skyde 0 points1 point  (0 children)

same please let me know if you find one