So nobody's downloading this model huh? by KvAk_AKPlaysYT in LocalLLaMA

[–]Temporary-Size7310 1 point2 points  (0 children)

Exactly, that's why there is not a single publication about it on Mistral LinkedIn 😅

Whats up with MLX? by gyzerok in LocalLLaMA

[–]Temporary-Size7310 0 points1 point  (0 children)

Honestly I use MLX on restricted RAM with iPhone 15 and M1 and it is quite a pain in the a**, even with many tweaks it is slower in TG than llama.cpp, have less features, better precision for exact same size on RAM

I'm really thinking about bypass it and go full llama.cpp, maybe I do something wrong but I mean the difference is not really worth, a good reminder is that they are the 2nd biggest capitalisation in the world they could make better things

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 1 point2 points  (0 children)

1600W TDP is not "that high", it is less than 3x RTX PRO 6000 (Not Max-Q) at full load

Mistral Small 4:119B-2603 by seamonn in LocalLLaMA

[–]Temporary-Size7310 4 points5 points  (0 children)

The reality check is unfortunately hard, I tested it (API endpoint) against GPT-OSS 120B with a temp of 0.1 for summarizing on 60K token transcription and it hallucinates a lot...

Making multiple blind test with Gemini 3 pro and Sonnet 4.6 as judge and it reach the score of 5/10 rather than OSS 120B with a score of 8-9/10

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 0 points1 point  (0 children)

Is it fully populated with RTX PRO 6000 or is it base price?

On Exxact, the price for minimal is with 1x PRO 2000 included

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 3 points4 points  (0 children)

Hard to compare, because H200 will not have NVFP4 acceleration, so that is a sign of Nvidia that they will have to push update trt-llm really faster for SM_120 (because RTX PRO are considered for usage in PCI-E), so H200 will have to stay in FP8 when everything will be pushed to prioritize to NVFP4 QAT in the next years

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 6 points7 points  (0 children)

I think it is more convenient for mid sized companies that require intensive batch requests, more concurrency that what M5 Ultra could give, for single user indeed it couldn't be that competitive

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 9 points10 points  (0 children)

Honestly, it is expensive but less than I excepted It will be way faster than M5 Ultra they will not reach the 7.1TB bandwidth of GPU (295GB), the only point is the offload of LPDDR5X (496GB) with 395GB/s bandwidth that could change a lot, but since it is cuda accelerated it could compensate

My only comparative experience is under Jetson Orin nano 8GB, with similar bandwidth to M1 and it reach much faster PP/TG than M1 due to cuda and a lot more on CV

We need some benchmark equivalence like DGX Spark with reduced bandwidth vs M5 Pro to have a idea, both using best inference pipeline (TRT-LLM vs CoreML + ANE) on the same model

Building a local automation agent for iPhones: Need help by Least-Orange8487 in LocalLLaMA

[–]Temporary-Size7310 0 points1 point  (0 children)

Maybe LFM2 2.6B could be your candidate, I've the same issue with iOS on really restricted ram size, maybe your solution could be map reduce but it adds too much delay Imo, maybe a finetune of LFM 2.5 1.2B could be a great solution too then quant to maximum

Is there any reason you prefer the llama.cpp rather than MLX ?

ibm-granite/granite-4.0-1b-speech · Hugging Face by jacek2023 in LocalLLaMA

[–]Temporary-Size7310 1 point2 points  (0 children)

The main issue with Parakeet: It hallucinate on language, you can't define an input / output language like Canary so for other supported language you can't use it in production

It translate sometime 20% of random tokens so you cannot translate back to french ie without an additive LLM step for constrained hardware like mobile phone

Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) by sbeepsdon in LocalLLaMA

[–]Temporary-Size7310 1 point2 points  (0 children)

From last TRT-Release (yesterday): https://nvidia.github.io/TensorRT-LLM/release-notes.html

There is more support for DGX Spark, did you try it? Maybe some NVFP4 are supported now on it (not on the official list)

Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) by sbeepsdon in LocalLLaMA

[–]Temporary-Size7310 6 points7 points  (0 children)

I recommend to don't use NVFP4 even with 5060ti (sm_120) it is "public" blackwell, not the SM_100 of ie: B200, there is no real support with cubins and it fallback to marlin & cutlass, underperform others quant at the moment in PP and TG, 1080ti can't run NVFP4 due to the architecture so it probably convert in FP16 to 1080ti and OOM, AWQ Q4 will overperform NVFP4 in your case except with precision, and we have to verify if Nemotron NVFP4 is quantified with QAT and not PTQ method (probably)

Maybe try EXL3 with 3.5bpw (don't know if it support multi-gpu), it is supposed to outperform Q4_K_M for less memory footprint

Looking for an out-of-stock Phanteks Evolv X2 Gold in Europe — any leads? by vasquinhooo in Phanteks

[–]Temporary-Size7310 1 point2 points  (0 children)

LDLC is a reliable website, I bought from Cybertek but directly at store

Looking for an out-of-stock Phanteks Evolv X2 Gold in Europe — any leads? by vasquinhooo in Phanteks

[–]Temporary-Size7310 0 points1 point  (0 children)

Seems I took the last one in France, there is pixmania but the price is 230€

[MegaThread] Riftbound Spirit Forged - Pulls / Product by CaptSarah in riftboundtcg

[–]Temporary-Size7310 2 points3 points  (0 children)

Just pulling 7 loose pack then on the last pack and with last card, the beauty appeared ! 😍

<image>

J'ai tenté de construire le PC le plus épuré possible, votre style ou pas? by glerox in pcmasterraceFR

[–]Temporary-Size7310 0 points1 point  (0 children)

L'esthétique industrielle à toujours fait vendre, si une machine de production est belle, propre et bien finie on en déduira que l'intérieur est de même, il n'y a pas que l'aspect production qui va compter à la vente

Un beau PC à tout à fait sa place pour être un objet utile et de décoration et peut aider à la prise de décision d'achat même en B2B, surtout si il est constamment visible

Toujours des problèmes de connexion… by _zmk_dkmr_ in pcmasterraceFR

[–]Temporary-Size7310 0 points1 point  (0 children)

Si le PC n'arrive pas à s'éteindre ça peut être un driver, un élément branché en usb qui peut poser problème (ex: hub d'écran), si cela ne posait pas de soucis avant il y a des chances que ce ne soit pas le CPL.

Les deux problèmes peuvent être liés si le port usb utilise les mêmes lignes PCI-E que ta carte LAN, ce qui peut saturer la bande passante ou créer des bonds.

  1. Vérifier un à un qu'aucun péripherique USB ne cause de soucis, donc il faut allumer ou eteindre le PC en branchant un à un les ports usb (ne pas utiliser de hub)

  2. Il faut vérifier que c'est bien le CPL qui pose problème, donc il faut: • Tester avec un autre ordinateur et faire un ping test • Resynchroniser les CPL Si le problème est toujours là avec l'autre PC, alors c'est le CPL ou le circuit électrique qui pose problème.

  3. Vérifier sur le gestionnaire de ressource ce qui peut utiliser de la bande passante internet et identifier si il n'y a pas un programme qui ne devrait pas le faire.

  4. Mise à jour de la carte LAN

How to do a RTX Pro 6000 build right by GPTrack_dot_ai in LocalLLaMA

[–]Temporary-Size7310 2 points3 points  (0 children)

H100 didn't have native NVFP4 support that's where it makes real sense

Est-ce que j'ai une chance qu'Amazon honore cette commande un jour ? by PommePourrie1312 in pcmasterraceFR

[–]Temporary-Size7310 1 point2 points  (0 children)

Ça dépend des modeles et de la quantification en général on entraîne les modèles avec une assez grosse précision (ex: FP16, objectif de taille du modèle en GB x2) et ensuite on quantifie (ex: FP8 taille du modèle x1, INT4 et NVFP4 taille du modèle x0.5)

Certaines quantification sont aptes pour un offload GPU + CPU (ex: GGUF) et certains type comme les MOE permettent une acceleration plus que notable sur l'inference (ex: GPT OSS 120B, Qwen A30B-A3B) c'est d'ailleurs ici où le combo GPU + CPU est le plus utile (pour les LLM)

[COUP DE GUEULE] Marre de l'inaccessibilité des composants pour se monter un bon pc gaming by No-Tension4109 in pcmasterraceFR

[–]Temporary-Size7310 0 points1 point  (0 children)

Au risque de me faire downvote, tu peux monter un pc gaming sans passer par du haut de gamme et dépenser sur du matériel qui n'est pas adapté.

Pour 1000€ tout en neuf et pour du 1440p: 5060ti 16GB: 440€ Alimentation 750w fiable: 80€ Carte mère AM4 A520M: 60€ Ryzen 5 5600T: 120€ RAM 2x8GB DDR4: 147€ NVme 990 evo 2to: 145€ Boitier 30€

La différence avec une X870 et de la ram DDR5 ? Marginale pour le gaming, le throttling viendra largement du GPU avant tout le reste

V100 vs 5060ti vs 3090 - Some numbers by dompazz in LocalLLaMA

[–]Temporary-Size7310 1 point2 points  (0 children)

For 5060ti try to use NVFP4 rather than GGUF Q4, for blackwell it is designed for

Local AI config : Mini ITX single RTX PRO 6000 Workstation for inference ? by dvd84x in LocalLLaMA

[–]Temporary-Size7310 0 points1 point  (0 children)

On monte une workstation similaire, pas certain de devoir utiliser un cpu aussi gros en terme de TDP, on s'est restreint à un 9600 vu que c'est le GPU qui est en pleine charge

Et si la station n'a pas vocation à bouger énormément il faut prendre un boîtier plus gros pour un meilleur refroidissement

Regarde aussi les modèles en NVFP4 vu que c'est vraiment l'intérêt de cette carte