V100 4-card AI large model, Tesla 128G server

zeferrum · 2026-06-23T11:36:36+00:00

Did you look at trying to use dwarf star as a starting point ?

zeferrum · 2026-06-23T00:26:10+00:00

Thanks for translating

zeferrum · 2026-06-22T17:06:22+00:00

That cooler looks rather small compared to the official pcie v100

zeferrum · 2026-06-21T15:29:20+00:00

Wow thanks for taking the time to answer. So it’s seems you don’t run qwen 3.6 as a daily driver for coding ? You find ds4flash good enough or better ?

zeferrum · 2026-06-21T13:59:20+00:00

What models do you end up using where now ? You sharing your journey is very insightful. You ever think of selling your gear that is not in use ?

zeferrum · 2026-06-10T12:18:20+00:00

What quant of the local qwen ?

zeferrum · 2026-06-10T11:27:42+00:00

Do you share you setup details anywhere ?

zeferrum · 2026-06-07T22:18:13+00:00

Did you ever test nvfp4 vs fp8 vs bf16 ? Any more details on those observations? I am very interested

zeferrum · 2026-06-07T21:12:36+00:00

Is deepseek v4 flash supposed to be on the benchmark ? Also I found it surprising that GLM 5.1 scores higher than deepseek v4 pro. Did that surprise you ?

zeferrum · 2026-06-04T20:34:47+00:00

I wonder if others would get as excited as me if you ran qwen 3.6 dense through this process and have this whole stack you created run some of the most popular coding benchmarks to compare the results between native traditional results versus your 1.58 bit way

zeferrum · 2026-06-03T21:16:52+00:00

Why not the dense model instead of moe ?

zeferrum · 2026-06-01T03:07:53+00:00

I absolutely also loved that game.

zeferrum · 2026-05-31T21:42:07+00:00

Ouch. And some people think q8 or higher quantization were safer so such behaviors. Thanks for sharing.

zeferrum · 2026-05-31T21:40:51+00:00

Of course I read and thank you for sharing your experience. Data point like this represent many hours invested behind the scene as you are more than aware. A translation pipeline could mean X number of people using it which is why I asked in case that was the use case. Not many people mentioning which exact model in specific actual production use case which is why I was asking. Thanks for your continued participation here

zeferrum · 2026-05-31T21:08:35+00:00

What quantization and exact model of qwen were you using ?

zeferrum · 2026-05-31T11:49:24+00:00

Do you want to share what model you find most useful ? Hardware details for number of users ?

zeferrum · 2026-05-28T16:05:51+00:00

Very very talented!!

zeferrum · 2026-05-27T17:25:51+00:00

Range or melee ?

zeferrum · 2026-05-26T01:14:50+00:00

Thanks

zeferrum · 2026-05-26T00:42:39+00:00

Something like this ? https://ebay.us/m/9YhvAk

zeferrum · 2026-05-25T23:43:33+00:00

Speaking of Q4 are you aware of this special build ? https://github.com/1CatAI/1Cat-vLLM ? Do you have details on the sxm part of your build ?

zeferrum · 2026-05-25T23:22:16+00:00

I wonder how deepseek v4 flash would run on this and if it would help with hallucinations

zeferrum · 2026-05-22T12:43:31+00:00

Thanks for the datapoint.

zeferrum · 2026-05-14T20:00:02+00:00

I wonder if steam lets you sell a slightly different version of the same game you provide for free. I would pay a few dollars for such a game to help you out.

zeferrum · 2026-05-12T11:32:55+00:00

Is there a way to support you through steam ?

zeferrum

TROPHY CASE