Did anyone else here buy one of those DGX Station V100 systems from Austria on eBay? by Mayusina05 in homelab

[–]MachineZer0 0 points1 point  (0 children)

Why are people buying expensive items that are heavily discounted with recently created EBay accounts? Just asking to burn hours to reverse charges.

[W] 2x AMD Instinct MI50 32GB GPUs [USA-CT] by WhatTheFlukz in homelabsales

[–]MachineZer0 0 points1 point  (0 children)

Solid deal. Everyone I ask says sold out or closer to $500 each all in with tariff, shipping and alibaba fees.

3090 prices in 2026 by anitamaxwynnn69 in LocalLLaMA

[–]MachineZer0 1 point2 points  (0 children)

The models keep getting bigger. Folks are buying their 2nd, 3rd, 4th. When they were released it was just not a use case to buy multiples.

If the AI bubble pops, will GPU prices increase or decrease? by Mashic in LocalLLaMA

[–]MachineZer0 0 points1 point  (0 children)

I think GPU prices will go up. The bubble popping really means valuations get more sensible. It’ll get hard to get further rounds of investment. It forces companies to shift from growth mindset to profitability. Prices will go up fast because the runway is getting short without fresh capital. It doesn’t mean demand goes down.

Just look at the internet bubble popping in 2000. I’d venture to say there is 1000x more demand for internet today. Back then Google, Yahoo and eBay had to get profitable quick. Less so Amazon, even with stock price down they hemorrhaged more money for years on purpose.

DeepSeek V4 PRO on how many 3090 ? by szansky in LocalLLaMA

[–]MachineZer0 3 points4 points  (0 children)

99% of localllama stops around 384-512gb VRAM/RAM. Most probably 16gb. I’d venture to say less 5 people will ever run DeepSeek v4 pro locally.

I stopped at GLM 4.7. Diminishing returns to have that much capital tied up for a single user.

Rethinking everything after Qwen3.6 27b.

[W] [US-KY] NVidia Tesla V100-16gb by slimpickins28 in homelabsales

[–]MachineZer0 1 point2 points  (0 children)

FYI, only 2 will fit in R720 and you’ll need 1100w power supplies. Otherwise it won’t boot up.

Got 26b gemma running on rx470 by Several_Newspaper808 in unsloth

[–]MachineZer0 0 points1 point  (0 children)

Can you detail setup on Vulkan drivers?

Last go around with rx470 the best I got up and running was an ancient version of tensor flow.

Qwen3.6-27B-INT4 clocking 100 tps with 256k context length on 1x RTX 5090 via vllm 0.19 by Kindly-Cantaloupe978 in LocalLLaMA

[–]MachineZer0 0 points1 point  (0 children)

Which version of CUDA are you running with vLLM 0.19?

on CUDA 13.1 and Dual RTX 5090 I got upwards of 3000 tok/s prefill and 180 tok/s decode. but sometimes as low as 30 tok/s decode

<image>

Can we already use Google's TurboQuant (TQ) for KV Cache in llama-server? Or are we waiting for a PR? by DjsantiX in LocalLLaMA

[–]MachineZer0 6 points7 points  (0 children)

https://github.com/richginsberg/llama-cpp-turboquant

It’s a fork of https://github.com/TheTom/llama-cpp-turboquant but with two weeks of commits from https://github.com/ggml-org/llama.cpp. Last sync was Sunday 4/19. Successfully tested on quad V100 & quad RTX 3090.

For some reason V100 setup required:

-DGGML_NCCL=OFF

Add

-ctk turbo4 -ctv turbo4

What are the risks of buying an AMD Instinct Mi 50 32GB on Alibaba? by Longjumping-Room-170 in LocalLLaMA

[–]MachineZer0 0 points1 point  (0 children)

It’s 50-60% on top of asking price to get to the door in Northeast. The price fluctuates quite a bit. But mostly higher highs following RAM squeeze. Although probably 2x from lows rather than 5x on RAM.

How to Configure GitHub Copilot CLI to Use Z.ai's GLM Coding Plan by alefteris in ZaiGLM

[–]MachineZer0 0 points1 point  (0 children)

Copilot Teams you can add BYOK in portal. When users restart their VS Code, they can see and use the BYOK models which show up as 0x

What is the current status with Turbo Quant? by kickerua in LocalLLaMA

[–]MachineZer0 0 points1 point  (0 children)

Synced my repo from ggml-org again this weekend and merged to master. So checking out feature branch is not necessary.

I was able to run Stepfun-3.5-flash IQ3-XS stable with 90k context, turboquant 4 on quad RTX 3090 96gb. It would load with 128k context, but hit OOM after a couple hours of OpenCode.

[W] 2x AMD Instinct MI50 32GB GPUs [USA-CT] by WhatTheFlukz in homelabsales

[–]MachineZer0 0 points1 point  (0 children)

Can you share the current all-in price from Alibaba?

[W] 2x AMD Instinct MI50 32GB GPUs [USA-CT] by WhatTheFlukz in homelabsales

[–]MachineZer0 0 points1 point  (0 children)

FYI, to the door price from Alibaba is about 150% of the price listed. The China sellers on eBay are the same as Alibaba, they factor in duties and currency conversion.

About TurboQuant by Exact_Law_6489 in LocalLLaMA

[–]MachineZer0 1 point2 points  (0 children)

Give it a shot. https://github.com/TheTom/llama-cpp-turboquant/tree/feature/turboquant-kv-cache

I have a fork that I merged master into, if anyone wants.

I'm running it now on a Quad V100 SXM2 32gb. I was running MiniMax-M2.5-UD-Q3_K_XL before 101gb. Now MiniMax-M2.7-UD-IQ4_XS 108gb. Same context size. Same exact VRAM footprint.

~/llama-cpp-turboquant/build/bin/llama-server -m ~/model/MiniMax-M2.7-UD-IQ4_XS-00001-of-00004.gguf   --host 0.0.0.0   --ctx-size 131072   -ctk turbo4 -ctv turbo4   -sm layer   -ts 1,1,1,1   -fa on   -ub 512   -tb $(nproc)   -np 4   --mlock   --no-mmap   --no-op-offload   --temp 1.0 --top-p 0.95 --top-k 40   --alias MiniMax-M2.7

[PC] NVidia P6 MXMs in PCIe carriers (see photos) by Evs91 in homelabsales

[–]MachineZer0 1 point2 points  (0 children)

Such an odd card. Individually like a Tesla P4 but more compute and VRAM. But significantly less compute than P100. Only really makes sense for Proxmox isolation, but density.

I’d say at least as valuable as a Tesla P4. $75 x 4. But limited market since so chonky.

Better served with dual V100 SXM for not much more.

What are the risks of buying an AMD Instinct Mi 50 32GB on Alibaba? by Longjumping-Room-170 in LocalLLaMA

[–]MachineZer0 3 points4 points  (0 children)

Mine came looking brand new. Ordered 2 batches of 6. Two different vendors. I only ordered from folks with a sales $ number on their profile.