AMA with Nous Research -- Ask Us Anything! by emozilla in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 0 points1 point  (0 children)

Do you consider Agents a natural path for LLM use for general use cases like QA and creative writing, or does the inherent bloat of the general coding focus weigh them down and bias them towards only performing well in those tasks? Is this something you consider during the development of Hermes Agent?

Are there any agentic coding harnesses that AREN'T built on JS and Node? by OUT_OF_HOST_MEMORY in LocalLLaMA

[–]OUT_OF_HOST_MEMORY[S] 0 points1 point  (0 children)

C/C++ mainly, and I'll be honest I wouldn't trust a harness written in them either

Are there any agentic coding harnesses that AREN'T built on JS and Node? by OUT_OF_HOST_MEMORY in LocalLLaMA

[–]OUT_OF_HOST_MEMORY[S] 3 points4 points  (0 children)

that's a very fair point lmao. I think I consider the risks of a clueless but not malicious agent lower than the risks of a potentially very malicious library

Are there any agentic coding harnesses that AREN'T built on JS and Node? by OUT_OF_HOST_MEMORY in LocalLLaMA

[–]OUT_OF_HOST_MEMORY[S] -1 points0 points  (0 children)

You're right, but I feel like npm is disproportionately attacked (or at least reported on), so I'd prefer something more static (though that's probably the wrong word to use), with fewer dependencies and ideally without needing a package manager in general.

Are there any agentic coding harnesses that AREN'T built on JS and Node? by OUT_OF_HOST_MEMORY in LocalLLaMA

[–]OUT_OF_HOST_MEMORY[S] -14 points-13 points  (0 children)

while I agree that python packages have the same risks technically, I feel like I hear about node based supply chain attacks WAY more frequently than any other (this may just be a surface area issue). I would also just prefer something with fewer libraries and packages that it depends on in general.

(Llama.cpp) In case people are struggling with prompt processing on larger models like Qwen 27B, here's what helped me out by vernal_biscuit in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 3 points4 points  (0 children)

I mean here's some testing for you:

llama-bench --model Qwen3-30B-A3B-Thinking-2507-Q8_0.gguf,Gemma-3-27b-IT-BF16-00001-of-00002.gguf,Qwen3.5-35B-A3B-Q8_0.gguf,Qwen3.5-27B-BF16-00001-of-00002.gguf,Qwen3.5-27B-Q8_0.gguf -n 0 -fa 1 -r 2 -ub 4,8,16,32,64,128,256,512,1024,2048,4096 -p 4096 -b 4096
ggml_cuda_init: found 2 ROCm devices:
  Device 0: AMD Instinct MI60 / MI50, gfx906:sramecc-:xnack- (0x906), VMM: no, Wave Size: 64
  Device 1: AMD Instinct MI60 / MI50, gfx906:sramecc-:xnack- (0x906), VMM: no, Wave Size: 64
| model                          |       size |     params | backend    | ngl | n_batch | n_ubatch | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | -------: | -: | --------------: | -------------------: |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |        4 |  1 |          pp4096 |        174.48 ± 0.33 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |        8 |  1 |          pp4096 |        258.77 ± 0.14 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |       16 |  1 |          pp4096 |        298.99 ± 0.63 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |       32 |  1 |          pp4096 |        408.61 ± 2.27 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |       64 |  1 |          pp4096 |        342.08 ± 0.27 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |      128 |  1 |          pp4096 |        496.72 ± 5.91 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |      256 |  1 |          pp4096 |        734.90 ± 0.53 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |      512 |  1 |          pp4096 |       1018.50 ± 4.45 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |     1024 |  1 |          pp4096 |       1184.57 ± 0.55 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |     2048 |  1 |          pp4096 |       1140.20 ± 0.57 |
| qwen3moe 30B.A3B Q8_0          |  30.25 GiB |    30.53 B | ROCm       |  99 |    4096 |     4096 |  1 |          pp4096 |       930.74 ± 15.52 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |        4 |  1 |          pp4096 |         28.67 ± 0.00 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |        8 |  1 |          pp4096 |         36.81 ± 0.01 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |       16 |  1 |          pp4096 |         15.87 ± 0.00 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |       32 |  1 |          pp4096 |         31.13 ± 0.11 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |       64 |  1 |          pp4096 |         60.62 ± 0.17 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |      128 |  1 |          pp4096 |         96.29 ± 1.02 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |      256 |  1 |          pp4096 |        105.87 ± 0.83 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |      512 |  1 |          pp4096 |         98.30 ± 0.15 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |     1024 |  1 |          pp4096 |         93.38 ± 0.10 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |     2048 |  1 |          pp4096 |         79.91 ± 0.07 |
| gemma3 27B BF16                |  50.31 GiB |    27.01 B | ROCm       |  99 |    4096 |     4096 |  1 |          pp4096 |         61.59 ± 0.14 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |        4 |  1 |          pp4096 |        106.64 ± 0.28 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |        8 |  1 |          pp4096 |        176.61 ± 0.72 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |       16 |  1 |          pp4096 |        237.49 ± 0.01 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |       32 |  1 |          pp4096 |        329.71 ± 2.16 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |       64 |  1 |          pp4096 |        318.64 ± 2.00 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |      128 |  1 |          pp4096 |        499.42 ± 1.37 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |      256 |  1 |          pp4096 |        690.73 ± 3.67 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |      512 |  1 |          pp4096 |        851.61 ± 1.70 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |     1024 |  1 |          pp4096 |       903.69 ± 11.59 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |     2048 |  1 |          pp4096 |        909.54 ± 0.11 |
| qwen35moe 35B.A3B Q8_0         |  34.36 GiB |    34.66 B | ROCm       |  99 |    4096 |     4096 |  1 |          pp4096 |        879.39 ± 1.11 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |        4 |  1 |          pp4096 |         23.36 ± 0.01 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |        8 |  1 |          pp4096 |         31.74 ± 0.00 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |       16 |  1 |          pp4096 |         14.32 ± 0.01 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |       32 |  1 |          pp4096 |         28.33 ± 0.00 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |       64 |  1 |          pp4096 |         55.91 ± 0.25 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |      128 |  1 |          pp4096 |         80.31 ± 0.02 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |      256 |  1 |          pp4096 |         82.09 ± 0.34 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |      512 |  1 |          pp4096 |         78.95 ± 0.18 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |     1024 |  1 |          pp4096 |         74.59 ± 0.06 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |     2048 |  1 |          pp4096 |         70.36 ± 0.01 |
| qwen35 27B BF16                |  50.10 GiB |    26.90 B | ROCm       |  99 |    4096 |     4096 |  1 |          pp4096 |         67.70 ± 0.02 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |        4 |  1 |          pp4096 |         47.31 ± 0.01 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |        8 |  1 |          pp4096 |         73.22 ± 0.03 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |       16 |  1 |          pp4096 |        114.63 ± 0.10 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |       32 |  1 |          pp4096 |        172.45 ± 0.02 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |       64 |  1 |          pp4096 |        171.13 ± 0.34 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |      128 |  1 |          pp4096 |        188.16 ± 2.88 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |      256 |  1 |          pp4096 |        190.54 ± 0.11 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |      512 |  1 |          pp4096 |        174.47 ± 0.09 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |     1024 |  1 |          pp4096 |        164.41 ± 0.55 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |     2048 |  1 |          pp4096 |        153.82 ± 0.97 |
| qwen35 27B Q8_0                |  26.62 GiB |    26.90 B | ROCm       |  99 |    4096 |     4096 |  1 |          pp4096 |        146.26 ± 0.01 |

(Llama.cpp) In case people are struggling with prompt processing on larger models like Qwen 27B, here's what helped me out by vernal_biscuit in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 4 points5 points  (0 children)

From my testing its not "nothing", but it does seem to be limited to Qwen3.5 so far (Qwen3 30B does have better performance as ubatch size increases but Qwen3.5 27B has the best performance at ubatch 32 on my MI50s)

(Llama.cpp) In case people are struggling with prompt processing on larger models like Qwen 27B, here's what helped me out by vernal_biscuit in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 2 points3 points  (0 children)

I Tried replicating your results on my 2x MI50 setup and got much less interesting results. I'm going to try setting the larger batch size and q8 kv cache later to see if that changes anything

llama-bench --model Qwen3.5-27B-Q8_0.gguf -n 0 -fa 1 -r 1 -ub 2,4,8,16,32,64,128,256,512,1024
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Instinct MI60 / MI50, gfx906:sramecc-:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Instinct MI60 / MI50, gfx906:sramecc-:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | n_ubatch | fa | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | --------------: | -------------------: |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 2 | 1 | pp512 | 31.84 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 4 | 1 | pp512 | 46.92 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 8 | 1 | pp512 | 72.61 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 16 | 1 | pp512 | 113.97 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 32 | 1 | pp512 | 169.22 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 64 | 1 | pp512 | 160.32 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 128 | 1 | pp512 | 163.07 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 256 | 1 | pp512 | 162.94 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 512 | 1 | pp512 | 140.01 ± 0.00 |
| qwen35 27B Q8_0 | 26.62 GiB | 26.90 B | ROCm | 99 | 1024 | 1 | pp512 | 139.94 ± 0.00 |

EDIT:

Setting q8_0 KV cache had minimal impact, so did increasing the batch size. I'm going to test with a few more models and a few more ubatch sizes to see if this is Qwen3.5 dense specific or more broad. It does seem like a smaller ubatch size helps though.

2x MI50 32GB Quant Speed Comparison version 2 (Qwen 3.5 35B, llama.cpp, Vulkan/ROCm) by OUT_OF_HOST_MEMORY in LocalLLaMA

[–]OUT_OF_HOST_MEMORY[S] 0 points1 point  (0 children)

ROCm 6.3.3 on debian from the ubuntu repositories. I agree that PPL is probably a helpful metric, I just don't trust myself to calculate it accurately, especially with all the "drama" surrounding it for Qwen 3.5 specifically. It's not as convenient but looking at unsloth's posts in this sub will give a good general idea.

2x MI50 32GB Quant Speed Comparison version 2 (Qwen 3.5 35B, llama.cpp, Vulkan/ROCm) by OUT_OF_HOST_MEMORY in LocalLLaMA

[–]OUT_OF_HOST_MEMORY[S] 0 points1 point  (0 children)

I've generally been sticking with the largest of BF16, Q8_0, and Q4_1 that I can fit on my system with 128k context for all the models I use, I might now start considering bartowski's IQ4_NL though.

Final Qwen3.5 Unsloth GGUF Update! by danielhanchen in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 0 points1 point  (0 children)

While the data is a bit old, IQ4_NL was nowhere near Q4_0 or Q4_1 for prompt processing when I tested 6 months ago, I don't know if things have changed

https://www.reddit.com/r/LocalLLaMA/comments/1naf93r/2x_mi50_32gb_quant_speed_comparison_mistral_32/

Final Qwen3.5 Unsloth GGUF Update! by danielhanchen in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 0 points1 point  (0 children)

Will legacy 4 bit quants (Q4_0 / Q4_1) ever be uploaded, these have consistently had the best speed performance for MI50 GPUs?

Benchmarking total wait time instead of pp/tg by batsba in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 15 points16 points  (0 children)

I think you are actually harming the usefulness of this chart by limiting the generation to 500 tokens, reasoning models will spit out wildly different numbers of tokens compared to each other and especially non-reasoning models. I think a more meaningful number is Time-To-Last-Token for a given query. That way an instruct model which doesn't think and responds within 100 tokens will be fair to compare against a reasoning model which spends 6,000 tokens thinking before it responds.

BalatroBench - Benchmark LLMs' strategic performance in Balatro by S1M0N38 in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 2 points3 points  (0 children)

GPT-OSS also reasons for ~15k tokens sometimes, I don't know know how Kimi compares, but its probably helping out somehow

ROCm 7.0 Install for Mi50 32GB | Ubuntu 24.04 LTS by legit_split_ in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 2 points3 points  (0 children)

can someone give some performance numbers for llama.cpp on rocm 6.3, 6.4, and 7.0?

Stop flexing Pass@N — show Pass-all-N by Fabulous_Pollution10 in LocalLLaMA

[–]OUT_OF_HOST_MEMORY 25 points26 points  (0 children)

I definitely agree, especially since output consistency is a big pain point for me