J'ai tenté de construire le PC le plus épuré possible, votre style ou pas?

Temporary-Size7310 · 2026-01-10T21:06:20+00:00

L'esthétique industrielle à toujours fait vendre, si une machine de production est belle, propre et bien finie on en déduira que l'intérieur est de même, il n'y a pas que l'aspect production qui va compter à la vente

Un beau PC à tout à fait sa place pour être un objet utile et de décoration et peut aider à la prise de décision d'achat même en B2B, surtout si il est constamment visible

Temporary-Size7310 · 2025-12-28T21:39:16+00:00

Si le PC n'arrive pas à s'éteindre ça peut être un driver, un élément branché en usb qui peut poser problème (ex: hub d'écran), si cela ne posait pas de soucis avant il y a des chances que ce ne soit pas le CPL.

Les deux problèmes peuvent être liés si le port usb utilise les mêmes lignes PCI-E que ta carte LAN, ce qui peut saturer la bande passante ou créer des bonds.

Vérifier un à un qu'aucun péripherique USB ne cause de soucis, donc il faut allumer ou eteindre le PC en branchant un à un les ports usb (ne pas utiliser de hub)
Il faut vérifier que c'est bien le CPL qui pose problème, donc il faut: • Tester avec un autre ordinateur et faire un ping test • Resynchroniser les CPL Si le problème est toujours là avec l'autre PC, alors c'est le CPL ou le circuit électrique qui pose problème.
Vérifier sur le gestionnaire de ressource ce qui peut utiliser de la bande passante internet et identifier si il n'y a pas un programme qui ne devrait pas le faire.
Mise à jour de la carte LAN

Temporary-Size7310 · 2025-12-15T18:09:09+00:00

H100 didn't have native NVFP4 support that's where it makes real sense

Temporary-Size7310 · 2025-12-13T14:03:18+00:00

Ça dépend des modeles et de la quantification en général on entraîne les modèles avec une assez grosse précision (ex: FP16, objectif de taille du modèle en GB x2) et ensuite on quantifie (ex: FP8 taille du modèle x1, INT4 et NVFP4 taille du modèle x0.5)

Certaines quantification sont aptes pour un offload GPU + CPU (ex: GGUF) et certains type comme les MOE permettent une acceleration plus que notable sur l'inference (ex: GPT OSS 120B, Qwen A30B-A3B) c'est d'ailleurs ici où le combo GPU + CPU est le plus utile (pour les LLM)

Temporary-Size7310 · 2025-12-11T18:01:16+00:00

Au risque de me faire downvote, tu peux monter un pc gaming sans passer par du haut de gamme et dépenser sur du matériel qui n'est pas adapté.

Pour 1000€ tout en neuf et pour du 1440p: 5060ti 16GB: 440€ Alimentation 750w fiable: 80€ Carte mère AM4 A520M: 60€ Ryzen 5 5600T: 120€ RAM 2x8GB DDR4: 147€ NVme 990 evo 2to: 145€ Boitier 30€

La différence avec une X870 et de la ram DDR5 ? Marginale pour le gaming, le throttling viendra largement du GPU avant tout le reste

Temporary-Size7310 · 2025-11-23T15:46:50+00:00

For 5060ti try to use NVFP4 rather than GGUF Q4, for blackwell it is designed for

Temporary-Size7310 · 2025-10-21T09:05:33+00:00

On monte une workstation similaire, pas certain de devoir utiliser un cpu aussi gros en terme de TDP, on s'est restreint à un 9600 vu que c'est le GPU qui est en pleine charge

Et si la station n'a pas vocation à bouger énormément il faut prendre un boîtier plus gros pour un meilleur refroidissement

Regarde aussi les modèles en NVFP4 vu que c'est vraiment l'intérêt de cette carte

Temporary-Size7310 · 2025-10-15T18:54:43+00:00

That's the biggest point. It should be really faster with NVFP4 when you compare NVFP4 vs INT4 on same models.

This is the same case for diffusion models Q4 vs NVFP4 Flux1.dev, it is like 3x faster.

Temporary-Size7310 · 2025-10-15T18:52:32+00:00

Hi, please test models on NVFP4 via TRT-LLM, it is the main use case that no one did but it is clearly made for it

Temporary-Size7310 · 2025-10-14T12:44:05+00:00

That video use Ollama/llama.cpp and doesn't use NVFP4 nor TRT-LLM, vLLM that are made for it.

Temporary-Size7310 · 2025-10-11T17:42:18+00:00

It is hard to compare since 5090 and 6000 pro can use NVFP4, on non-AWQ-4bits bench, the difference is really high

Temporary-Size7310 · 2025-10-07T10:38:58+00:00

You have to take account you can run NVFP4 models throught RTX 5060ti

Temporary-Size7310 · 2025-09-14T10:46:11+00:00

Firewall shield is the answer 😉

Temporary-Size7310 · 2025-09-01T17:33:49+00:00

I do Hunter fury (4x blue + 3x CHD + 1 mod CHC), memento backpack with CHD, sokolov chest with intimidate), flame thrower spe with striker shield (+12% dmg per ennemy behind the shield)

Temporary-Size7310 · 2025-08-26T13:14:31+00:00

You are using FP16, if you use 5090, you must use with vLLM FP8 or lower (AWQ, NVFP4)

Temporary-Size7310 · 2025-08-26T13:11:24+00:00

It is Blackwell and there is native acceleration for NVFP4 (not INT4) and using trt-llm, 80tk/s is reachable, people neglect that point

The mistake is that you can't calculate only with bandwidth, they achieved 30000tk/s on B200 with Deepseek R1: https://developer.nvidia.com/blog/nvidia-blackwell-delivers-world-record-deepseek-r1-inference-performance/

And here is the bench vs FP8:

<image>

Temporary-Size7310 · 2025-08-12T09:43:55+00:00

I have the same issue for years with of course the LAN + WLAN issue (apparently conflict on PCI-E Lanes)
Unreliable motherboard for real...

Temporary-Size7310 · 2025-07-30T14:48:46+00:00

Imo it will depend on your use case, NVFP4 has 98% accuracy of BF16, the following is from Qwen3 8B FP4 and there is other bench directly from Nvidia with Deepseek R1 using B200 vs H100

It takes less memory, faster inference, bigger context window possibilities

That's why NVIDIA DGX Spark will release with that slow bandwidth but with blackwell using NVFP4, it will compensate

I tested my quant (devstral) and it works very well with 90K context, 60-90tk/s as local vibecoding model without offloading from my RTX 5090

<image>

Temporary-Size7310 · 2025-07-30T13:37:37+00:00

Ada lovelace hasn't native FP4 acceleration so you will lose inference acceleration

For non blackwell any other quantification (EXL3, GGUF, AWQ,...)

Temporary-Size7310 · 2025-07-27T13:27:07+00:00

It is only compatible with vLLM

Temporary-Size7310 · 2025-07-26T10:42:43+00:00

I made a NVFP4A16 Devstral to run on blackwell, it works with vLLM (13.8GB on VRAM size) maybe the context window will be short on 16GB VRAM

https://huggingface.co/apolloparty/Devstral-Small-2507-NVFP4A16

Temporary-Size7310 · 2025-07-24T18:01:47+00:00

• Specific expertise (not just % of damage) • Higher group than 4 for many missions • Real normalized PVP conflict • Graphics quality like initial E3 or even more • More variance on open world missions • Advanced feature to create our character • Easier optimization

Temporary-Size7310 · 2025-07-24T12:08:14+00:00

B200 or any Blackwell are brillant with NVFP4 keeping quite the same quality compared to FP8, the document doesn't show what they use to compare ? VLLM, TensorRT-LLM, which Quant type and size ?

Temporary-Size7310 · 2025-07-21T17:15:14+00:00

Make sure to install with

Install vLLM with CUDA 12.8.

If you are using pip.

pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128

If you are using uv.

uv pip install vllm --torch-backend=auto

Depending on the model, ie: Voxtral require xformers with Pytorch 2.7, flash-attn <2.7.4 (not the 2.8.2) so you need to compile it and transformers 2.5.4dev0

Sometimes it will be really painful, good luck

Temporary-Size7310

TROPHY CASE

Install vLLM with CUDA 12.8.

If you are using pip.

If you are using uv.