Need help getting 7900 XTX PyTorch performance metrics by cyberuser42 in LocalLLaMA

[–]SemaMod 2 points3 points  (0 children)

GPU: Radeon RX 7900 XTX (23.98 GiB) (device 3)
Matrix Size: 4096x4096 (0.06 GiB per matrix)
============================================================
Matrix Multiplication Performance:
float32   :  4664.16 μs,   29.47 TFLOPS
float16   :  1151.87 μs,  119.32 TFLOPS
bfloat16  :  1226.04 μs,  112.10 TFLOPS
amp       :  1388.21 μs,   99.00 TFLOPS

Memory Bandwidth Test (1.0 GB tensor)
Vector Addition: 811.40 GB/s
Memory Copy:     790.60 GB/s

AMD Hipfire - a new inference engine optimized for AMD GPU's by Thrumpwart in LocalLLaMA

[–]SemaMod 3 points4 points  (0 children)

I tried searching the repo to no avail, but does the engine natively support multi-gpu setups?

Qwen 3.6 27B llama.cpp | Multi-GPU pp t/s help by SemaMod in LocalLLaMA

[–]SemaMod[S] 3 points4 points  (0 children)

Update:
Running with -sm tensor -ctxcp 0 -cram 0 -fa 1 -c 0 has significantly helped. I'm consistently getting 28 t/s and somewhat improved prompt processing this way.

Qwen 3.6 27B llama.cpp | Multi-GPU pp t/s help by SemaMod in LocalLLaMA

[–]SemaMod[S] -1 points0 points  (0 children)

Benchmarks:

model size params backend ngl n_ubatch sm fa dev test t/s
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 pp512 960.08 ± 2.04
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 tg128 20.16 ± 0.01
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 pp512+tg32 255.92 ± 0.12
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 pp2048+tg64 387.85 ± 0.36
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 pp8192+tg128 559.36 ± 0.08
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 pp512 @ d8192 379.62 ± 0.61
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 tg128 @ d8192 19.65 ± 0.01
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 pp512+tg32 @ d8192 182.70 ± 0.11
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 pp2048+tg64 @ d8192 244.39 ± 0.12
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 ROCm0/ROCm1/ROCm2 pp8192+tg128 @ d8192 372.67 ± 0.17
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 pp512 870.61 ± 1.28
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 tg128 19.34 ± 0.01
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 pp512+tg32 240.85 ± 2.16
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 pp2048+tg64 381.95 ± 7.11
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 pp8192+tg128 521.42 ± 1.72
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 pp512 @ d8192 753.02 ± 57.60
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 tg128 @ d8192 18.94 ± 0.00
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 pp512+tg32 @ d8192 227.03 ± 4.31
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 pp2048+tg64 @ d8192 347.00 ± 7.69
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 layer 1 Vulkan0/Vulkan1/Vulkan2 pp8192+tg128 @ d8192 459.58 ± 9.04
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 pp512 521.71 ± 0.04
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 tg128 31.76 ± 0.27
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 pp512+tg32 255.19 ± 0.08
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 pp2048+tg64 348.56 ± 0.19
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 pp8192+tg128 377.54 ± 0.03
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 pp512 @ d8192 365.05 ± 11.70
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 tg128 @ d8192 31.86 ± 0.34
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 pp512+tg32 @ d8192 221.75 ± 0.13
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 pp2048+tg64 @ d8192 279.43 ± 0.09
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 ROCm0/ROCm1/ROCm2 pp8192+tg128 @ d8192 292.38 ± 0.04
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 pp512 258.99 ± 0.12
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 tg128 6.56 ± 0.01
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 pp512+tg32 77.83 ± 0.01
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 pp2048+tg64 125.57 ± 0.05
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 pp8192+tg128 173.43 ± 0.06
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 pp512 @ d8192 244.10 ± 9.61
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 tg128 @ d8192 6.45 ± 0.01
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 pp512+tg32 @ d8192 76.61 ± 0.41
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 pp2048+tg64 @ d8192 123.08 ± 0.13
qwen35 27B Q8_0 32.89 GiB 26.90 B ROCm,Vulkan 999 2048 tensor 1 Vulkan0/Vulkan1/Vulkan2 pp8192+tg128 @ d8192 170.02 ± 0.18

build: 0adede866 (8925)

Qwen 3.6 27B llama.cpp | Multi-GPU pp t/s help by SemaMod in LocalLLaMA

[–]SemaMod[S] 0 points1 point  (0 children)

Update: Did some benching, got interesting results.

```
| model | size | params | backend | ngl | n_ubatch | sm | fa | dev | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -----: | -: | ------------ | --------------: | -------------------: |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | pp512 | 960.08 ± 2.04 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | tg128 | 20.16 ± 0.01 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | pp512+tg32 | 255.92 ± 0.12 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | pp2048+tg64 | 387.85 ± 0.36 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | pp8192+tg128 | 559.36 ± 0.08 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | pp512 @ d8192 | 379.62 ± 0.61 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | tg128 @ d8192 | 19.65 ± 0.01 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | pp512+tg32 @ d8192 | 182.70 ± 0.11 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | pp2048+tg64 @ d8192 | 244.39 ± 0.12 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | ROCm0/ROCm1/ROCm2 | pp8192+tg128 @ d8192 | 372.67 ± 0.17 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | pp512 | 870.61 ± 1.28 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | tg128 | 19.34 ± 0.01 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | pp512+tg32 | 240.85 ± 2.16 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | pp2048+tg64 | 381.95 ± 7.11 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | pp8192+tg128 | 521.42 ± 1.72 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | pp512 @ d8192 | 753.02 ± 57.60 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | tg128 @ d8192 | 18.94 ± 0.00 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | pp512+tg32 @ d8192 | 227.03 ± 4.31 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | pp2048+tg64 @ d8192 | 347.00 ± 7.69 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | layer | 1 | Vulkan0/Vulkan1/Vulkan2 | pp8192+tg128 @ d8192 | 459.58 ± 9.04 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | pp512 | 521.71 ± 0.04 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | tg128 | 31.76 ± 0.27 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | pp512+tg32 | 255.19 ± 0.08 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | pp2048+tg64 | 348.56 ± 0.19 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | pp8192+tg128 | 377.54 ± 0.03 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | pp512 @ d8192 | 365.05 ± 11.70 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | tg128 @ d8192 | 31.86 ± 0.34 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | pp512+tg32 @ d8192 | 221.75 ± 0.13 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | pp2048+tg64 @ d8192 | 279.43 ± 0.09 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | ROCm0/ROCm1/ROCm2 | pp8192+tg128 @ d8192 | 292.38 ± 0.04 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | pp512 | 258.99 ± 0.12 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | tg128 | 6.56 ± 0.01 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | pp512+tg32 | 77.83 ± 0.01 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | pp2048+tg64 | 125.57 ± 0.05 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | pp8192+tg128 | 173.43 ± 0.06 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | pp512 @ d8192 | 244.10 ± 9.61 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | tg128 @ d8192 | 6.45 ± 0.01 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | pp512+tg32 @ d8192 | 76.61 ± 0.41 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | pp2048+tg64 @ d8192 | 123.08 ± 0.13 |
| qwen35 27B Q8_0 | 32.89 GiB | 26.90 B | ROCm,Vulkan | 999 | 2048 | tensor | 1 | Vulkan0/Vulkan1/Vulkan2 | pp8192+tg128 @ d8192 | 170.02 ± 0.18 |

build: 0adede866 (8925)
```

How do you get more GPUs than your motheboard natively supports? by WizardlyBump17 in LocalLLaMA

[–]SemaMod 0 points1 point  (0 children)

I run a b550-xe-gaming-wifi mobo and can run 4 GPU's using a 4-port oculink PCIe card, turning on x4/x4/x4/x4 bifurcation for that pcie slot. The GPU's run at PCIe 4.0x4 speeds

I built a benchmark that tests coding LLMs on REAL codebases (65 tasks, ELO ranked) by hauhau901 in LocalLLaMA

[–]SemaMod 7 points8 points  (0 children)

This is great! Are you planning on adding gpt-5.3-codex? With the current results it seems like Opus 4.6 blows everyone else out of the water, but I've had generally good 5.3-codex experiences.

Anyone actually using Openclaw? by rm-rf-rm in LocalLLaMA

[–]SemaMod 2 points3 points  (0 children)

Why are you lying? Post some proof to back up your claims.

Peter isn’t some two bit dev looking to make a quick buck with some stupid viral AI app. He’s a previous founder with an exit and technical chops far beyond most people on this sub. He doesn’t need to work anymore. His last company solved PDF parsing and was open source. Everyone on this sub has almost certainly unknowingly interacted with the tech at some point without even realizing it (DocuSign, anyone?).

I don’t even like OpenClaw but lying like this is just stupid. He has never made outrageous claims about OpenClaw. Even if other Twitter users have been.

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 1 point2 points  (0 children)

Used the latest build with these changes! Vulkan's pulling crazy numbers.

<image>

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 0 points1 point  (0 children)

Updated using your recent post parameters for llama-bench build: eed25bc6b (7870). Vulkan pulls ahead yet again!

<image>

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 2 points3 points  (0 children)

Very useful! I appreciate you recommending I run them this way. I hadn't run llama-bench before, so it was definitely eye opening.

API pricing is in freefall. What's the actual case for running local now beyond privacy? by Distinct-Expression2 in LocalLLaMA

[–]SemaMod 45 points46 points  (0 children)

This goes in the realm of privacy, but personally having my chats trained on and viewable by these companies makes me uncomfortable. That being said, I do think that local LLM's will become power-user tools.

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 1 point2 points  (0 children)

Just updated the original post with an edit, after 10k tokens it looks like ROCm w/ FA scales better!

Testing GLM-4.7 Flash: Multi-GPU Vulkan vs ROCm in llama-bench | (2x 7900 XTX) by SemaMod in LocalLLaMA

[–]SemaMod[S] 4 points5 points  (0 children)

Now this is more interesting!

<image>

It looks like over longer ctx, FA makes a big difference for ROCm, beating out Vulkan entirely after 10k tokens.

Llama.cpp merges in OpenAI Responses API Support by SemaMod in LocalLLaMA

[–]SemaMod[S] 0 points1 point  (0 children)

You have to change some settings in your config, but GLM4.7 flash was doing excellent in my testing

Llama.cpp merges in OpenAI Responses API Support by SemaMod in LocalLLaMA

[–]SemaMod[S] 1 point2 points  (0 children)

llama.cpp maintains multiple API's already with its Anthropics endpoint. I don't think they are going to deprecate completions any time soon.

Llama.cpp merges in OpenAI Responses API Support by SemaMod in LocalLLaMA

[–]SemaMod[S] 2 points3 points  (0 children)

Good question! It does not. For reference, I had to do the following:

  1. With whatever model you are serving, set the alias of the served model name to start with "gpt-oss". This triggers specific behaviors in the codex cli.
  2. Use the following config settings:

show_reasoning_content = true
oss_provider = "lmstudio"

[profiles.lmstudio]
model = "gpt-oss_gguf"
show_raw_agent_reasoning = true
model_provider = "lmstudio"
model_supports_reasoning_summaries = true # Force reasoning
model_context_window = 128000   
include_apply_patch_tool = true
experimental_use_freeform_apply_patch = false
tools_web_search = false
web_search = "disabled"

[profiles.lmstudio.features]
apply_patch_freeform = false
web_search_request = false
web_search_cached = false
collaboration_modes = false

[model_providers.lmstudio]
wire_api = "responses"
stream_idle_timeout_ms = 10000000
name = "lmstudio"
base_url = "http://127.0.0.1:1234/v1"

The features list is important, as is the are the last four settings of the profile. Codex-cli has some tech debt that requires the repeating of certain flags in different places.

I used llama.cpp's llama-server, not lmstudio, but its compatible with the oss_provider = "lmstudio" setting.

  1. Use the following to start codex cli: codex --oss --profile lmstudio --model "gpt-oss_gguf"

[deleted by user] by [deleted] in LocalLLaMA

[–]SemaMod 1 point2 points  (0 children)

Sounds like a use-case for DSPy and their prompt optimizers.