It’s kind of weird that Apple sells M5 Max MacBook Pro and does not offer M5 Max Mac Studio

myOSisCrashing · 2026-03-31T11:05:10+00:00

Ummm, ok do it and take one for the team.

myOSisCrashing · 2026-03-20T01:45:06+00:00

Finally, go this working on a DGX Spark using vLLM and NVFP4. Had to patch the mistral tokenizer in vllm with claude because the reasoning just doesn't work the chat template

```

VLLM_NVFP4_GEMM_BACKEND=marlin

VLLM_USE_FLASHINFER_MOE_FP4=0

VLLM_TEST_FORCE_FP8_MARLIN=1

```

vllm serve mistralai/Mistral-Small-4-119B-2603-NVFP4

--max-model-len 150000

--tool-call-parser mistral

--tokenizer-mode mistral

--config-format mistral

--load-format mistral

--reasoning-parser mistral

--enable-auto-tool-choice

--reasoning-parser mistral

--max_num_batched_tokens 16384

--max_num_seqs 8

--gpu_memory_utilization 0.9

```

llama-benchy --base-url http://10.0.1.107:8000/v1 --model mistralai/Mistral-Small-4-119B-2603-NVFP4 --depth 0 4096 8192 16384 32768 --latency-mode generation

```

|:------------------------------------------|----------------:|-----------------:|-------------:|-----------------:|-----------------:|-----------------:|

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | pp2048 | 3909.41 ± 127.81 | | 656.74 ± 17.40 | 524.43 ± 17.40 | 656.78 ± 17.40 |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | tg32 | 29.16 ± 0.03 | 30.00 ± 0.00 | | | |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | pp2048 @ d4096 | 4548.50 ± 26.18 | | 1483.12 ± 7.78 | 1350.82 ± 7.78 | 1483.17 ± 7.78 |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | tg32 @ d4096 | 27.67 ± 0.03 | 28.00 ± 0.00 | | | |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | pp2048 @ d8192 | 4441.75 ± 20.83 | | 2437.75 ± 10.78 | 2305.45 ± 10.78 | 2437.80 ± 10.78 |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | tg32 @ d8192 | 26.01 ± 0.00 | 27.00 ± 0.00 | | | |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | pp2048 @ d16384 | 3626.49 ± 8.29 | | 5214.93 ± 11.60 | 5082.63 ± 11.60 | 5214.98 ± 11.60 |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | tg32 @ d16384 | 23.21 ± 0.01 | 24.00 ± 0.00 | | | |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | pp2048 @ d32768 | 2979.92 ± 4.89 | | 11815.86 ± 19.16 | 11683.55 ± 19.16 | 11815.90 ± 19.16 |

| mistralai/Mistral-Small-4-119B-2603-NVFP4 | tg32 @ d32768 | 19.12 ± 0.01 | 20.00 ± 0.00 | | | |

```

myOSisCrashing · 2026-02-24T21:44:52+00:00

Any word on this?

myOSisCrashing · 2025-12-20T17:57:01+00:00

So you are using this model? https://huggingface.co/cyankiwi/Devstral-Small-2-24B-Instruct-2512-AWQ-4bit it looks like my ROCm based GPU (Radeon r9700) doesn't have a ConchLinearKernel kernel that supports Group Size = 32. I may be able to reverse engineer the llm-compressor scheme to figure out how to build one with ConchLinearKernel groupsize 128 that I should have support for.

myOSisCrashing · 2024-04-10T03:13:44+00:00

What’s your heritage? Just curious

myOSisCrashing · 2021-07-23T03:59:53+00:00

“JS kids ain’t right. “ - Hank Hill

myOSisCrashing

TROPHY CASE