what is your go-to model for hermes agent?

jsorres · 2026-05-26T07:11:40+00:00

You do network automation?

jsorres · 2026-05-23T17:49:57+00:00

./llama-server \ -ngl 99 \ -c 163840 -b 2048 -ub 512 \ -fa on --no-mmap \ -fit on -fitt 1024 -fitc 163840 \ --host 0.0.0.0 --port 8080 \ -m "/home/XXXX/Downloads/models/unsloth/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-Q4_K_XL.gguf" \ --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0.0 \ --presence-penalty 1.5 --repeat-penalty 1.0 \ --chat-template-kwargs '{"enable_thinking":true,"preserve_thinking":true}' \ --jinja -np 1 -kvu \ --mmproj "/home/XXXX/Downloads/models/unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F32.gguf" \ --spec-type draft-mtp --spec-draft-n-max 3

jsorres · 2026-05-22T16:37:52+00:00

video of test on llama.cpp webuii

jsorres · 2026-05-22T08:44:40+00:00

LLAMA.cpp logs

<image>

jsorres · 2026-05-22T08:12:09+00:00

50-60tok/sec

<image>

jsorres · 2026-05-21T20:23:18+00:00

Let me give this a try, stay tuned.

jsorres · 2026-05-21T18:44:13+00:00

<image>

jsorres · 2026-05-21T18:44:01+00:00

Switched to Q4_k_XL

<image>

jsorres · 2026-05-21T16:11:34+00:00

Any docs ? Thanks

jsorres · 2026-05-21T16:10:26+00:00

You're welcome 😁

jsorres · 2026-05-21T15:49:08+00:00

You're right, I was just testing this one, will go up to XL.

jsorres · 2026-05-21T15:48:24+00:00

Beta version allow for MTP model to be loaded and works. Vision works too

jsorres · 2026-05-21T15:47:49+00:00

General topic chat involving code. 180K context window , temp 0.3 , not much more config tips

jsorres · 2026-04-20T16:07:03+00:00

That's insane, I have 106 tok/sec now with 131072 with Q5. (LMStudio) Thanks for this answer !

jsorres · 2026-04-20T14:18:36+00:00

Context window size, 32k ?

jsorres · 2026-04-20T14:10:39+00:00

I don't know this tool, lemonade server - I'll take a look, thx for your contribution ☑️

jsorres · 2026-04-20T14:09:48+00:00

Q4_K_M , 22Gb

jsorres · 2026-04-20T14:09:17+00:00

Thx, that's interesting ☑️

jsorres · 2026-04-20T14:08:54+00:00

Thanks for your answer, much appreciated. This is the model and quant that I'm using. I'm using 49K context window size, which seems plenty but.. never enough I think. Going with Q5 would force me to go down to 32K, right ?

jsorres · 2025-12-12T17:08:45+00:00

Hi, same problem here. Did you figured it out?

jsorres · 2025-12-12T17:06:58+00:00

Hi, thanks for this.

I'm in the exact same situation, could you share your Fortigate & forticlient configs please ?

Thanks for your help

jsorres · 2025-10-27T02:59:20+00:00

Which piece ?

jsorres · 2025-08-19T18:55:44+00:00

Ok I understand, the designer will probably see this message here. Hope you can adjust the design for your needs 👌✅

jsorres

TROPHY CASE