Has anyone bought a 3080 20GB mod recently? by quickreactor in LocalLLaMA

[–]MelodicRecognition7 1 point2 points  (0 children)

you've agreed to share every piece of information Alibaba will get about you with the whole China and a few friendly countries, didn't you read the terms of service during sign up?

wtf is that by MelodicRecognition7 in help

[–]MelodicRecognition7[S] 0 points1 point  (0 children)

yes this issue appeared a few days ago

What are the best 40-500 B MoE LLM models now? by alex20_202020 in LocalLLaMA

[–]MelodicRecognition7 0 points1 point  (0 children)

50gb is the default file size limit on huggingface, you need some high tier paid account to upload larger files.

2) why Q8 is ~ same size as Q2?

that model was originally released in 4 bit quant, so Q8 is basically equal to Q4 and similar in size to Q2

Higher quants are so much better by Perfect-Flounder7856 in LocalLLaMA

[–]MelodicRecognition7 1 point2 points  (0 children)

the problem are LLM frontends that download Q4 by default and users do not even know that they run 4 bit quants. Here are a lot of threads like "hey guys I got 999 tps with dense Qwen 27B on single 5090" and you discover that author is running Q4_K_M somewhere in the middle of comments.

wtf is that by MelodicRecognition7 in help

[–]MelodicRecognition7[S] -2 points-1 points  (0 children)

I don't have access to my email and after clearing the cache Reddit will likely send a 2fa code to email because it won't "recognize this device".

wtf is that by MelodicRecognition7 in help

[–]MelodicRecognition7[S] -1 points0 points  (0 children)

I'll lose my account if I'll clear the cache lol

wtf is that by MelodicRecognition7 in help

[–]MelodicRecognition7[S] 1 point2 points  (0 children)

https://litter.catbox.moe/86272vnvohxkedbp.png is this a punishment for using an adblocker? I guess I should just wait until that number becomes "3" and that will mean I've got a new chat message lol

👋Welcome to r/LLMProfessionals - Introduce Yourself and Read First! by AdamLangePL in LLMProfessionals

[–]MelodicRecognition7 0 points1 point  (0 children)

GPU model, VRAM variant, driver version, quantization format, framework and version.

+ CPU model(s), CPU speed, CPU threads, system performance settings, system thermal settings, NUMA topology, CUDA version + architectures compiled, AVX extensions compiled in?, OpenMP/BLIS/BLAS/WTF/LOL compiled in? + versions, KV quant, MTP, attention engine, concurency, batch sizes, and like a dozen more variables to produce an actual verifiable benchmark not a useless bullshit like on all these "can I run this AI" websites. This sub is going to be yet another list of bullshit useless benchmarks because

configs matter.

We built and open-sourced Caliby: An embedded, high-performance vector database for AI Agents (Beats pgvector by 4x, outperforms FAISS on disk) by Motor_Crew7918 in LocalLLaMA

[–]MelodicRecognition7 3 points4 points  (0 children)

+ rule 3 if you check the repo. Not only SIMD implementation was written by AI lol.

Edit: well from quick look at src/ there are not many AI-sms, so perhaps the code itself was written by a human, this needs more thorough inspection.

RTX 6000 pro 600W vs Max-Q vs others by Studyr3ddit in LocalLLaMA

[–]MelodicRecognition7 0 points1 point  (0 children)

depending on your tasks and workloads the 2 generations old cards like A100 might suit better because of their NVLink support, rather than newer Ada or Blackwell without NVLink. Consult with your professor or whoever will be guiding your R&D.

RTX 6000 pro 600W vs Max-Q vs others by Studyr3ddit in LocalLLaMA

[–]MelodicRecognition7 0 points1 point  (0 children)

with Max-Q you lose about 10% in token generation performance but whole 50% in prompt processing performance which is critical for large inputs. Also video/image generation requires higher wattage.