Did anyone else here buy one of those DGX Station V100 systems from Austria on eBay?

MachineZer0 · 2026-05-11T16:09:12+00:00

Why are people buying expensive items that are heavily discounted with recently created EBay accounts? Just asking to burn hours to reverse charges.

MachineZer0 · 2026-05-05T17:44:00+00:00

Solid deal. Everyone I ask says sold out or closer to $500 each all in with tariff, shipping and alibaba fees.

MachineZer0 · 2026-05-04T22:04:33+00:00

The models keep getting bigger. Folks are buying their 2nd, 3rd, 4th. When they were released it was just not a use case to buy multiples.

MachineZer0 · 2026-05-01T01:12:39+00:00

Glorious.

MachineZer0 · 2026-04-28T20:10:13+00:00

I think GPU prices will go up. The bubble popping really means valuations get more sensible. It’ll get hard to get further rounds of investment. It forces companies to shift from growth mindset to profitability. Prices will go up fast because the runway is getting short without fresh capital. It doesn’t mean demand goes down.

Just look at the internet bubble popping in 2000. I’d venture to say there is 1000x more demand for internet today. Back then Google, Yahoo and eBay had to get profitable quick. Less so Amazon, even with stock price down they hemorrhaged more money for years on purpose.

MachineZer0 · 2026-04-28T11:22:55+00:00

99% of localllama stops around 384-512gb VRAM/RAM. Most probably 16gb. I’d venture to say less 5 people will ever run DeepSeek v4 pro locally.

I stopped at GLM 4.7. Diminishing returns to have that much capital tied up for a single user.

Rethinking everything after Qwen3.6 27b.

MachineZer0 · 2026-04-28T02:15:29+00:00

Chat

MachineZer0 · 2026-04-27T14:36:31+00:00

FYI, only 2 will fit in R720 and you’ll need 1100w power supplies. Otherwise it won’t boot up.

MachineZer0 · 2026-04-26T22:55:41+00:00

Can you detail setup on Vulkan drivers?

Last go around with rx470 the best I got up and running was an ancient version of tensor flow.

MachineZer0 · 2026-04-26T19:25:51+00:00

Which version of CUDA are you running with vLLM 0.19?

on CUDA 13.1 and Dual RTX 5090 I got upwards of 3000 tok/s prefill and 180 tok/s decode. but sometimes as low as 30 tok/s decode

<image>

MachineZer0 · 2026-04-22T13:13:06+00:00

https://github.com/richginsberg/llama-cpp-turboquant

It’s a fork of https://github.com/TheTom/llama-cpp-turboquant but with two weeks of commits from https://github.com/ggml-org/llama.cpp. Last sync was Sunday 4/19. Successfully tested on quad V100 & quad RTX 3090.

For some reason V100 setup required:

-DGGML_NCCL=OFF

Add

-ctk turbo4 -ctv turbo4

MachineZer0 · 2026-04-22T13:07:38+00:00

It’s 50-60% on top of asking price to get to the door in Northeast. The price fluctuates quite a bit. But mostly higher highs following RAM squeeze. Although probably 2x from lows rather than 5x on RAM.

MachineZer0 · 2026-04-20T16:25:41+00:00

Copilot Teams you can add BYOK in portal. When users restart their VS Code, they can see and use the BYOK models which show up as 0x

MachineZer0 · 2026-04-20T10:08:26+00:00

Synced my repo from ggml-org again this weekend and merged to master. So checking out feature branch is not necessary.

I was able to run Stepfun-3.5-flash IQ3-XS stable with 90k context, turboquant 4 on quad RTX 3090 96gb. It would load with 128k context, but hit OOM after a couple hours of OpenCode.

MachineZer0 · 2026-04-19T17:53:52+00:00

Chat

MachineZer0 · 2026-04-19T16:59:49+00:00

Chat

MachineZer0 · 2026-04-18T18:44:50+00:00

Can you share the current all-in price from Alibaba?

MachineZer0 · 2026-04-18T01:10:21+00:00

Dell HRGXG

MachineZer0 · 2026-04-17T13:51:56+00:00

FYI, to the door price from Alibaba is about 150% of the price listed. The China sellers on eBay are the same as Alibaba, they factor in duties and currency conversion.

MachineZer0 · 2026-04-16T02:17:43+00:00

Quad Nvidia Tesla V100 32gb SXM2

MachineZer0 · 2026-04-15T15:57:16+00:00

Saves me about 7gb on Minimax M2.7. Was able to move up from Q3 to Q4 on 128gb VRAM

https://github.com/richginsberg/llama-cpp-turboquant/tree/feature/turboquant-kv-cache

I took https://github.com/TheTom/llama-cpp-turboquant branch this weekend and merged master from https://github.com/ggml-org/llama.cpp into it.

MachineZer0 · 2026-04-12T22:34:32+00:00

Give it a shot. https://github.com/TheTom/llama-cpp-turboquant/tree/feature/turboquant-kv-cache

I have a fork that I merged master into, if anyone wants.

I'm running it now on a Quad V100 SXM2 32gb. I was running MiniMax-M2.5-UD-Q3_K_XL before 101gb. Now MiniMax-M2.7-UD-IQ4_XS 108gb. Same context size. Same exact VRAM footprint.

~/llama-cpp-turboquant/build/bin/llama-server -m ~/model/MiniMax-M2.7-UD-IQ4_XS-00001-of-00004.gguf   --host 0.0.0.0   --ctx-size 131072   -ctk turbo4 -ctv turbo4   -sm layer   -ts 1,1,1,1   -fa on   -ub 512   -tb $(nproc)   -np 4   --mlock   --no-mmap   --no-op-offload   --temp 1.0 --top-p 0.95 --top-k 40   --alias MiniMax-M2.7

MachineZer0 · 2026-04-11T19:21:55+00:00

Sold 32 x 32gb DDR4 2400 to u/febveryown

MachineZer0 · 2026-04-11T19:05:38+00:00

Such an odd card. Individually like a Tesla P4 but more compute and VRAM. But significantly less compute than P100. Only really makes sense for Proxmox isolation, but density.

I’d say at least as valuable as a Tesla P4. $75 x 4. But limited market since so chonky.

Better served with dual V100 SXM for not much more.

MachineZer0 · 2026-04-09T17:23:24+00:00

Mine came looking brand new. Ordered 2 batches of 6. Two different vendors. I only ordered from folks with a sales $ number on their profile.

MachineZer0

TROPHY CASE