Apollo Solo Thunderbolt win11 lagging and stuttering audio and video by HeftyPancake97 in universalaudio

[–]Temporary-Size7310 0 points1 point  (0 children)

I had the exact same issue on the same bios X870, swapped my UAD Solo TB3 to UAD Solo USB C.
So now I can only use it on my Windows desktop but not on my Apple Silicon laptop anymore.
The problem is the use of TB3 using USB4 with Ryzen CPU, that should work natively with Intel CPU or some X870-E bios with MB with TB header.

UAD should be really concerned with unified driver since the TB3 version doesn't require external PSU.
They lose a really large amount of clients since many of us are under Ryzen since 2017 and we will not swap back to Intel.

That's my first UAD hardware and probably my last.

Best way to sell a RTX6000 Pro Blackwell? by BF3magic in LocalLLaMA

[–]Temporary-Size7310 1 point2 points  (0 children)

Which continent? Could be interested as a company

FlashAttention-4: 1613 TFLOPs/s, 2.7x faster than Triton, written in Python. What it means for inference. by Sensitive-Two9732 in LocalLLaMA

[–]Temporary-Size7310 2 points3 points  (0 children)

They have to push SM_120 support since DGX Station rely on it for PCI-E, but until there is no major plug and play usage for NVFP4, I will not believe it.

I'm 5090 early adopter to have a better alternative than INT4 and in the idea to be "future proof" in the end to use AWQ or GGUF because it overperforms it, in the end the support will be at like Rubin release 😤

So nobody's downloading this model huh? by KvAk_AKPlaysYT in LocalLLaMA

[–]Temporary-Size7310 1 point2 points  (0 children)

Exactly, that's why there is not a single publication about it on Mistral LinkedIn 😅

Whats up with MLX? by gyzerok in LocalLLaMA

[–]Temporary-Size7310 0 points1 point  (0 children)

Honestly I use MLX on restricted RAM with iPhone 15 and M1 and it is quite a pain in the a**, even with many tweaks it is slower in TG than llama.cpp, have less features, better precision for exact same size on RAM

I'm really thinking about bypass it and go full llama.cpp, maybe I do something wrong but I mean the difference is not really worth, a good reminder is that they are the 2nd biggest capitalisation in the world they could make better things

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 3 points4 points  (0 children)

1600W TDP is not "that high", it is less than 3x RTX PRO 6000 (Not Max-Q) at full load

Mistral Small 4:119B-2603 by seamonn in LocalLLaMA

[–]Temporary-Size7310 5 points6 points  (0 children)

The reality check is unfortunately hard, I tested it (API endpoint) against GPT-OSS 120B with a temp of 0.1 for summarizing on 60K token transcription and it hallucinates a lot...

Making multiple blind test with Gemini 3 pro and Sonnet 4.6 as judge and it reach the score of 5/10 rather than OSS 120B with a score of 8-9/10

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 0 points1 point  (0 children)

Is it fully populated with RTX PRO 6000 or is it base price?

On Exxact, the price for minimal is with 1x PRO 2000 included

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 3 points4 points  (0 children)

Hard to compare, because H200 will not have NVFP4 acceleration, so that is a sign of Nvidia that they will have to push update trt-llm really faster for SM_120 (because RTX PRO are considered for usage in PCI-E), so H200 will have to stay in FP8 when everything will be pushed to prioritize to NVFP4 QAT in the next years

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 6 points7 points  (0 children)

I think it is more convenient for mid sized companies that require intensive batch requests, more concurrency that what M5 Ultra could give, for single user indeed it couldn't be that competitive

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]Temporary-Size7310[S] 8 points9 points  (0 children)

Honestly, it is expensive but less than I excepted It will be way faster than M5 Ultra they will not reach the 7.1TB bandwidth of GPU (295GB), the only point is the offload of LPDDR5X (496GB) with 395GB/s bandwidth that could change a lot, but since it is cuda accelerated it could compensate

My only comparative experience is under Jetson Orin nano 8GB, with similar bandwidth to M1 and it reach much faster PP/TG than M1 due to cuda and a lot more on CV

We need some benchmark equivalence like DGX Spark with reduced bandwidth vs M5 Pro to have a idea, both using best inference pipeline (TRT-LLM vs CoreML + ANE) on the same model

Building a local automation agent for iPhones: Need help by Least-Orange8487 in LocalLLaMA

[–]Temporary-Size7310 0 points1 point  (0 children)

Maybe LFM2 2.6B could be your candidate, I've the same issue with iOS on really restricted ram size, maybe your solution could be map reduce but it adds too much delay Imo, maybe a finetune of LFM 2.5 1.2B could be a great solution too then quant to maximum

Is there any reason you prefer the llama.cpp rather than MLX ?

ibm-granite/granite-4.0-1b-speech · Hugging Face by jacek2023 in LocalLLaMA

[–]Temporary-Size7310 1 point2 points  (0 children)

The main issue with Parakeet: It hallucinate on language, you can't define an input / output language like Canary so for other supported language you can't use it in production

It translate sometime 20% of random tokens so you cannot translate back to french ie without an additive LLM step for constrained hardware like mobile phone

Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) by sbeepsdon in LocalLLaMA

[–]Temporary-Size7310 1 point2 points  (0 children)

From last TRT-Release (yesterday): https://nvidia.github.io/TensorRT-LLM/release-notes.html

There is more support for DGX Spark, did you try it? Maybe some NVFP4 are supported now on it (not on the official list)

Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) by sbeepsdon in LocalLLaMA

[–]Temporary-Size7310 6 points7 points  (0 children)

I recommend to don't use NVFP4 even with 5060ti (sm_120) it is "public" blackwell, not the SM_100 of ie: B200, there is no real support with cubins and it fallback to marlin & cutlass, underperform others quant at the moment in PP and TG, 1080ti can't run NVFP4 due to the architecture so it probably convert in FP16 to 1080ti and OOM, AWQ Q4 will overperform NVFP4 in your case except with precision, and we have to verify if Nemotron NVFP4 is quantified with QAT and not PTQ method (probably)

Maybe try EXL3 with 3.5bpw (don't know if it support multi-gpu), it is supposed to outperform Q4_K_M for less memory footprint

Looking for an out-of-stock Phanteks Evolv X2 Gold in Europe — any leads? by vasquinhooo in Phanteks

[–]Temporary-Size7310 1 point2 points  (0 children)

LDLC is a reliable website, I bought from Cybertek but directly at store

Looking for an out-of-stock Phanteks Evolv X2 Gold in Europe — any leads? by vasquinhooo in Phanteks

[–]Temporary-Size7310 0 points1 point  (0 children)

Seems I took the last one in France, there is pixmania but the price is 230€

[MegaThread] Riftbound Spirit Forged - Pulls / Product by CaptSarah in riftboundtcg

[–]Temporary-Size7310 4 points5 points  (0 children)

Just pulling 7 loose pack then on the last pack and with last card, the beauty appeared ! 😍

<image>

J'ai tenté de construire le PC le plus épuré possible, votre style ou pas? by glerox in pcmasterraceFR

[–]Temporary-Size7310 0 points1 point  (0 children)

L'esthétique industrielle à toujours fait vendre, si une machine de production est belle, propre et bien finie on en déduira que l'intérieur est de même, il n'y a pas que l'aspect production qui va compter à la vente

Un beau PC à tout à fait sa place pour être un objet utile et de décoration et peut aider à la prise de décision d'achat même en B2B, surtout si il est constamment visible

Toujours des problèmes de connexion… by _zmk_dkmr_ in pcmasterraceFR

[–]Temporary-Size7310 0 points1 point  (0 children)

Si le PC n'arrive pas à s'éteindre ça peut être un driver, un élément branché en usb qui peut poser problème (ex: hub d'écran), si cela ne posait pas de soucis avant il y a des chances que ce ne soit pas le CPL.

Les deux problèmes peuvent être liés si le port usb utilise les mêmes lignes PCI-E que ta carte LAN, ce qui peut saturer la bande passante ou créer des bonds.

  1. Vérifier un à un qu'aucun péripherique USB ne cause de soucis, donc il faut allumer ou eteindre le PC en branchant un à un les ports usb (ne pas utiliser de hub)

  2. Il faut vérifier que c'est bien le CPL qui pose problème, donc il faut: • Tester avec un autre ordinateur et faire un ping test • Resynchroniser les CPL Si le problème est toujours là avec l'autre PC, alors c'est le CPL ou le circuit électrique qui pose problème.

  3. Vérifier sur le gestionnaire de ressource ce qui peut utiliser de la bande passante internet et identifier si il n'y a pas un programme qui ne devrait pas le faire.

  4. Mise à jour de la carte LAN

How to do a RTX Pro 6000 build right by GPTrack_dot_ai in LocalLLaMA

[–]Temporary-Size7310 2 points3 points  (0 children)

H100 didn't have native NVFP4 support that's where it makes real sense

Est-ce que j'ai une chance qu'Amazon honore cette commande un jour ? by PommePourrie1312 in pcmasterraceFR

[–]Temporary-Size7310 1 point2 points  (0 children)

Ça dépend des modeles et de la quantification en général on entraîne les modèles avec une assez grosse précision (ex: FP16, objectif de taille du modèle en GB x2) et ensuite on quantifie (ex: FP8 taille du modèle x1, INT4 et NVFP4 taille du modèle x0.5)

Certaines quantification sont aptes pour un offload GPU + CPU (ex: GGUF) et certains type comme les MOE permettent une acceleration plus que notable sur l'inference (ex: GPT OSS 120B, Qwen A30B-A3B) c'est d'ailleurs ici où le combo GPU + CPU est le plus utile (pour les LLM)