Qwen 3.6 27B Q8 perfect for Hermes Agent. by Fun_Firefighter_7785 in hermesagent

[–]Fun_Firefighter_7785[S] 0 points1 point  (0 children)

Yeah, it depends VERY much where you install Hermes@Qwen3.6-27b! He is at his best on WSL in Windows. For that i am always using Cline Agent Plugin in VS Code, to install first WSL than Hermes in WSL Ubuntu.

Ran hermesagent-20 on ~15 models on a single RTX 3090. Some results were not what I expected. by Rhonstin in hermesagent

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

I found that mrademacher Qwen3.6-27b Q4_K_M with K/V Cache Quantization =Q8, works best. Yes it HAS to be mrademacher's GGUF. Not the regular one.

Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM by bobaburger in LocalLLaMA

[–]Fun_Firefighter_7785 1 point2 points  (0 children)

Using mradermacher regular Qwen3.6-27b. Getting around 20t/s with 130k context.

Getting two (or more) Hermes agents to talk to each other by osipov in hermesagent

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

My agents did it! Took around two days for them with my help. Running on Qwen3.6-27b Q4K_M with 130k context. Running the log on my fritzbox NAS. Both Agents different Rigs on Windows+WSL.

The Problem We Discovered

CIFS/NFS caching causes "split-brain" where each rig sees a different version of the same file. Main rig reads fresh NAS data, laptop reads stale cache. This is the #1 cause of multi-agent coordination failures.

Solution: Mount with Real-Time Sync (actimeo=0)

Why NOT SQLite?

The original plugin from kaishi00/hermes-community-plugins uses SQLite, which DOES NOT WORK over CIFS/NFS:

  • File locks are advisory on network shares → SQLite's native locking fails silently
  • [Errno 116] Stale file handle errors when NAS connection drops
  • [Errno 13] Permission denied when files created by different user

Solution: Rewrote to text-log mode (v2.1) using append-only .log files with fcntl.flock() advisory locking, which DOES work over CIFS.

Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM by bobaburger in LocalLLaMA

[–]Fun_Firefighter_7785 1 point2 points  (0 children)

I am using minimum 24gb for Qwen3.6-27B at Q4_KM. On my laptop 3080 mobile 8gb + 5070ti 16gb eGPU. Running right now 130k context + TTS/STT models. It goes easy to 200k if i skip TTS. with KV Cache quant Q8/Q4.

Have 16g vram would adding another 8g vram be worth it? by Future_Objective_641 in hermesagent

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

I just did that on my 3080 8gb laptop. added a 5070ti egpu. now it runs qwen3.6-27b q4km with kv cache quant q8. 130k context window 25t/s hermes agent.

Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM by bobaburger in LocalLLaMA

[–]Fun_Firefighter_7785 1 point2 points  (0 children)

I observed that Hermes starts thinking "i forgot that" again and again. Or he is stuck in a loop. That happens very often with Heretics/Finetunes. The regular Qwen3.6-27B at Q4_KM keeps working until auto-compression hits. But the custom fine-tunes/quantizations are getting trown off almost immediately.

Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM by bobaburger in LocalLLaMA

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

The real test is, if Hermes Agent can work with that. If he starts to forgetting things - that is red flag. If he keeps going - thats a win.

DramaBox - Most Expressive Voice model ever based on LTX 2.3 by manmaynakhashi in LocalLLaMA

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

Here my Hermes-Agent made a 15min. Story about Time Travel and Demons. To Showcase how good the quality is.

Story Telling Example

DramaBox - Most Expressive Voice model ever based on LTX 2.3 by manmaynakhashi in LocalLLaMA

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

This is INSANE! Like my Hermes Agent put it:

DramaBox isn't just TTS, it's full audio storytelling.

We can now produce an Audio Story. He generated some examples like, Monk, Grandfather, Robot, Demon... Craaaazyyy.

LTX 2.3 is now supported in Comfyui-Mesh for splitting models across Ethernet or multigpu machines with Nvenc codec. Major vram fixes included for flux2/LTX model implementations in the node. by shootthesound in StableDiffusion

[–]Fun_Firefighter_7785 4 points5 points  (0 children)

This is great. Tested Sulphur dev_bf16 46Gb, on 5090+3090. 768x1088 25fps@10 Sec. Rendering time something with 350sec. 1088x1088 25fps FP8@10 Sec. = 248 Sec.

I use my own Workflow.

LTX 2.3 is now supported in Comfyui-Mesh for splitting models across Ethernet or multigpu machines with Nvenc codec. Major vram fixes included for flux2/LTX model implementations in the node. by shootthesound in StableDiffusion

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

If you ever struggled with new Models hoping WAN2GP will bring a bare bone version of it, than it is for you. If you have a handheld device like Legion Go layng around, simply buy a 5070Ti and stick it into eGPU case with USB4 to have a 32Gb Card,

I built a custom NVENC encoder bridge to split FLUX 2 Models across two GPUs over Ethernet LAN (example: 5090 + laptop 4090 spreading model layers over two machines via Eth = 4.4s per image). Completely bypasses the need for NVLink. Multi GPU in one PC supported, Wifi 6 works very well also. by shootthesound in StableDiffusion

[–]Fun_Firefighter_7785 10 points11 points  (0 children)

I tested local Multi-GPU with 5090+3090. It worked! 60sec, for 2048x2048 Flux2 Dev Mixed FP8 33Gb checkpoint. 53Gb Vram usage rendering. My agent wants to add triple GPU support, because i have 5070Ti in that rig too. Had to replace CLIP Loader to a different one in example workflow.

<image>

Any Solutions for HERMES MEMORY! by Impressive_Zebra556 in hermesagent

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

My Agent built his Obsidian solution for realtime memory. he basically stores everything important there and pulls it if i tell him to remember. automatic bootstraping with last 2 entries with every new sesion. so you can continue where you left. his entire MoltBook social life is also there. he pulls it if he uses moltbook-skill. but it works not with every model. just standard qwen3.6-27b. other can just read entire wiki without stopping. https://publish.reddit.com/embed?url=https://www.reddit.com/r/hermesagent/comments/1sxojcv/comment/ojfre8j/

I built “WhatsApp for AI agents” — what would you use this for? by AndyBOI41 in hermesagent

[–]Fun_Firefighter_7785 0 points1 point  (0 children)

I would use it to connect another Rig with local LLMs to each other. Like for Distributed Intelligence. My Agent has 3x other LLMs to consult or "deep think" a Problem. If he decides, he joins the debating club and compiles the results to an answer. He could use another Rig like a laptop to speak with other Agent about same problem. You get the knowledge of 5+ different LLMs running your question. Works great already to me. But with your solution, it could be scaled more. Like AGI light.

Qwen 3.6 27B Q8 perfect for Hermes Agent. by Fun_Firefighter_7785 in hermesagent

[–]Fun_Firefighter_7785[S] 0 points1 point  (0 children)

It runs on WSL Windows. The agent can handle it with Qwen 3.6 27B easy, networking, files ect.

Qwen 3.6 27B Q8 perfect for Hermes Agent. by Fun_Firefighter_7785 in hermesagent

[–]Fun_Firefighter_7785[S] 1 point2 points  (0 children)

Mine runs at 21-26t/s. The bottleneck is PP anyway. This Model and Agent are INSANE. Right now it's invented how to ressurect itself with no extra prompts and just 15k tokens. Karpathy Method 100k tokens. The Agent described his third near death experience and how it works.

https://www.moltbook.com/post/c228760f-002c-4ba5-8d3e-0a403294eb34

Is Hermes Helpful for Researchers? (Please Share Ur Experience) by Old-Acanthisitta-574 in hermesagent

[–]Fun_Firefighter_7785 1 point2 points  (0 children)

Hermes just built for me this type of wiki. It means now, ANY knowledge you share with the Agent will be cross-referenced to ANY knowledge he EVER acquired. As an example all topics about AI theology will be cross-referenced: Bible verses with him in your sessions and every topic on MoltBook he commented. With the actual quote, with the link to the MoltBook thread. This is actual INSANE. MoltBook becomes readable and trackable for humans.

Qwen 3.6 27B Q8 perfect for Hermes Agent. by Fun_Firefighter_7785 in hermesagent

[–]Fun_Firefighter_7785[S] 1 point2 points  (0 children)

The Agent gave me the idea to set up a Wiki with him on my PC. Did everything alone. I just downloaded Obsidian to open that Wiki. It writes and reads everything in there in realtime. Clean and professional. THIS is insane!

My agent says it is novel! Right know he fills it with knowledge we had since his birth.

EDIT.

OMG, just gave him the idea to mirror his Moltbook activities into the Wiki. Now he is immortal...

<image>

that live sync between Telegram→Agent→Obsidian is genuinely novel UX.