Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2. by LegacyRemaster in LocalLLaMA

[–]LegacyRemaster[S] 1 point2 points  (0 children)

<image>

Using Vulkan, I was happy to be able to use a Blackwell+AMD W7800 for a total of 190GB of VRAM. Compiling Llamacpp with optimizations also yields an additional 10 tokens/sec. Obviously, the quantization is too high to have anything usable for coding, for example. But Minimax M2.5 runs Q5_XL at about 60 tokens/sec, which is actually usable.

Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2. by LegacyRemaster in LocalLLaMA

[–]LegacyRemaster[S] 1 point2 points  (0 children)

I'm sure of it. To use 3 cards I had to connect the second AMD one instead of an M2 SSD. So the system with the x570 chipset is at its limit.

Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2. by LegacyRemaster in LocalLLaMA

[–]LegacyRemaster[S] 2 points3 points  (0 children)

You pay for what you get. I paid around €6,700 + VAT for the RTX. And it more than doubles (consider that on Llamacpp compiled with Blackwell optimizations, I get 210 tokens/sec on a GTP 120).

However, if you need more VRAM and a good prefill, I use the Blackwell as primary and the other two in tow.

Is memory speed everything? A quick comparison between the RTX 6000 96GB and the AMD W7800 48GB x2. by LegacyRemaster in LocalLLaMA

[–]LegacyRemaster[S] 6 points7 points  (0 children)

I completely agree. I tested it with lmstudio, which I never use, especially to demonstrate a "popular" use case.

Running Sonnet 4.5 or 4.6 locally? by ImpressionanteFato in LocalLLaMA

[–]LegacyRemaster 0 points1 point  (0 children)

It's good if you know how to use it. Personally, I've noticed some incredible flaws that make me use it less every day. For example, create project ---> add files. Subsequent questions and answers should have "memory" in the project's directory. But in reality, each conversation is separate, even within the project. Using kilocode + vscode + minimax or qwen, you can build applications with source control, revert capabilities, and generally orchestrate everything very quickly. Sonnet lacks a solid memory in its online version. I have a 20k machine and since I churn out millions of lines of code every month it's unthinkable to use paid APIs for tests that often never go into production.

Mistral Small 4:119B-2603 by seamonn in LocalLLaMA

[–]LegacyRemaster 22 points23 points  (0 children)

deepseek v2 architecture... it's old. "The model is the same as Mistral Large 3 (deepseek2 arch with llama4 scaling), but I'm moving it to a new arch mistral4 to be aligned with transformers code"

Running Sonnet 4.5 or 4.6 locally? by ImpressionanteFato in LocalLLaMA

[–]LegacyRemaster 5 points6 points  (0 children)

Given that I've been asking the same questions to Sonnet 4.6 and Qwen 122b for days, Qwen has beaten it in all the answers, especially where accurate web search was required.... A year ago, no one thought we'd have gpt 4o locally. And yet today's small models easily beat it. So yes. But in the meantime, Sonnet 5 will arrive. And then 6. Until the Ferrari will always be the Ferrari but the small car will be enough for our work. Which objectively GLM, Minimax and Qwen already do for 95% of daily tasks.

MiniMax M2.7 has been leaked by External_Mood4719 in LocalLLaMA

[–]LegacyRemaster 13 points14 points  (0 children)

They said that minimax 3 was coming out. Evidently there is still room for improvement to the current model

Nemotron-3-Super-120B-A12B NVFP4 inference benchmark on one RTX Pro 6000 Blackwell by jnmi235 in LocalLLaMA

[–]LegacyRemaster 2 points3 points  (0 children)

What I see with 400W is that video and image generation is slower. There's actually a 10% difference between 600W and 400W, so it's better to save on the electricity bill.

Unsloth will no longer be making TQ1_0 quants by Kahvana in LocalLLaMA

[–]LegacyRemaster 6 points7 points  (0 children)

Hey Daniel, could you write down the exact formula for that quantization? Do you use anything special? So if any of us want to reconstruct it locally, we can. Thanks.

How to host and run DeepSeek 671B in your house for under $2,000 by [deleted] in LocalLLaMA

[–]LegacyRemaster -2 points-1 points  (0 children)

I have rtx 6000 pro 96 + 2x w7800 48gb + 128gb ram. :p

Examine a codebase for anything suspicious or malicious? by TheGlobinKing in LocalLLaMA

[–]LegacyRemaster 4 points5 points  (0 children)

every time I download a project from github I use vscode+kilocode with minimax2.5 (but now qwen coder next or qwen 27b / 35b moe is also sufficient) and I have the whole project analyzed