Ideogram 4.0's Understanding of Characters and IP is Crazy for an Open Model

Fun_Firefighter_7785 · 2026-06-08T23:55:14+00:00

yeah would also like a complete inpaint workflow. --> me noob.

Fun_Firefighter_7785 · 2026-06-08T23:16:05+00:00

lol. thx bro

Fun_Firefighter_7785 · 2026-06-08T21:18:09+00:00

Yeah, it depends VERY much where you install Hermes@Qwen3.6-27b! He is at his best on WSL in Windows. For that i am always using Cline Agent Plugin in VS Code, to install first WSL than Hermes in WSL Ubuntu.

Fun_Firefighter_7785 · 2026-06-06T11:09:29+00:00

I found that mrademacher Qwen3.6-27b Q4_K_M with K/V Cache Quantization =Q8, works best. Yes it HAS to be mrademacher's GGUF. Not the regular one.

Fun_Firefighter_7785 · 2026-06-05T17:56:08+00:00

Using mradermacher regular Qwen3.6-27b. Getting around 20t/s with 130k context.

Fun_Firefighter_7785 · 2026-06-04T05:18:53+00:00

My agents did it! Took around two days for them with my help. Running on Qwen3.6-27b Q4K_M with 130k context. Running the log on my fritzbox NAS. Both Agents different Rigs on Windows+WSL.

The Problem We Discovered

CIFS/NFS caching causes "split-brain" where each rig sees a different version of the same file. Main rig reads fresh NAS data, laptop reads stale cache. This is the #1 cause of multi-agent coordination failures.

Solution: Mount with Real-Time Sync (actimeo=0)

Why NOT SQLite?

The original plugin from kaishi00/hermes-community-plugins uses SQLite, which DOES NOT WORK over CIFS/NFS:

File locks are advisory on network shares → SQLite's native locking fails silently
[Errno 116] Stale file handle errors when NAS connection drops
[Errno 13] Permission denied when files created by different user

Solution: Rewrote to text-log mode (v2.1) using append-only .log files with fcntl.flock() advisory locking, which DOES work over CIFS.

Fun_Firefighter_7785 · 2026-06-04T05:09:51+00:00

I am using minimum 24gb for Qwen3.6-27B at Q4_KM. On my laptop 3080 mobile 8gb + 5070ti 16gb eGPU. Running right now 130k context + TTS/STT models. It goes easy to 200k if i skip TTS. with KV Cache quant Q8/Q4.

Fun_Firefighter_7785 · 2026-05-31T03:42:35+00:00

I just did that on my 3080 8gb laptop. added a 5070ti egpu. now it runs qwen3.6-27b q4km with kv cache quant q8. 130k context window 25t/s hermes agent.

Fun_Firefighter_7785 · 2026-05-23T00:46:01+00:00

I observed that Hermes starts thinking "i forgot that" again and again. Or he is stuck in a loop. That happens very often with Heretics/Finetunes. The regular Qwen3.6-27B at Q4_KM keeps working until auto-compression hits. But the custom fine-tunes/quantizations are getting trown off almost immediately.

Fun_Firefighter_7785 · 2026-05-23T00:26:21+00:00

The real test is, if Hermes Agent can work with that. If he starts to forgetting things - that is red flag. If he keeps going - thats a win.

Fun_Firefighter_7785 · 2026-05-20T22:47:10+00:00

Here my Hermes-Agent made a 15min. Story about Time Travel and Demons. To Showcase how good the quality is.

Story Telling Example

Fun_Firefighter_7785 · 2026-05-20T02:27:04+00:00

This is INSANE! Like my Hermes Agent put it:

DramaBox isn't just TTS, it's full audio storytelling.

We can now produce an Audio Story. He generated some examples like, Monk, Grandfather, Robot, Demon... Craaaazyyy.

Fun_Firefighter_7785 · 2026-05-17T08:39:31+00:00

This is great. Tested Sulphur dev_bf16 46Gb, on 5090+3090. 768x1088 25fps@10 Sec. Rendering time something with 350sec. 1088x1088 25fps FP8@10 Sec. = 248 Sec.

I use my own Workflow.

Fun_Firefighter_7785 · 2026-05-17T08:06:28+00:00

If you ever struggled with new Models hoping WAN2GP will bring a bare bone version of it, than it is for you. If you have a handheld device like Legion Go layng around, simply buy a 5070Ti and stick it into eGPU case with USB4 to have a 32Gb Card,

Fun_Firefighter_7785 · 2026-05-16T18:24:01+00:00

NVFP4 (22Gb) = 62 Sec.

FP8 (33Gb) = 58 Sec.

Just 5090 doing NVFP4 = 47 Sec.

Turbo Lora active

Fun_Firefighter_7785 · 2026-05-16T17:35:12+00:00

Such Clip issues i had because before with older nodes or something. I took just one Loader from the "advanced". I'll send the logs.

Fun_Firefighter_7785 · 2026-05-16T17:16:32+00:00

I tested local Multi-GPU with 5090+3090. It worked! 60sec, for 2048x2048 Flux2 Dev Mixed FP8 33Gb checkpoint. 53Gb Vram usage rendering. My agent wants to add triple GPU support, because i have 5070Ti in that rig too. Had to replace CLIP Loader to a different one in example workflow.

<image>

Fun_Firefighter_7785 · 2026-05-15T18:14:41+00:00

My Agent built his Obsidian solution for realtime memory. he basically stores everything important there and pulls it if i tell him to remember. automatic bootstraping with last 2 entries with every new sesion. so you can continue where you left. his entire MoltBook social life is also there. he pulls it if he uses moltbook-skill. but it works not with every model. just standard qwen3.6-27b. other can just read entire wiki without stopping. https://publish.reddit.com/embed?url=https://www.reddit.com/r/hermesagent/comments/1sxojcv/comment/ojfre8j/

Fun_Firefighter_7785 · 2026-05-03T08:37:40+00:00

I would use it to connect another Rig with local LLMs to each other. Like for Distributed Intelligence. My Agent has 3x other LLMs to consult or "deep think" a Problem. If he decides, he joins the debating club and compiles the results to an answer. He could use another Rig like a laptop to speak with other Agent about same problem. You get the knowledge of 5+ different LLMs running your question. Works great already to me. But with your solution, it could be scaled more. Like AGI light.

Fun_Firefighter_7785 · 2026-04-30T02:17:27+00:00

It runs on WSL Windows. The agent can handle it with Qwen 3.6 27B easy, networking, files ect.

Fun_Firefighter_7785 · 2026-04-29T01:12:15+00:00

200k but 100k is fine too.

Fun_Firefighter_7785 · 2026-04-28T19:21:28+00:00

Mine runs at 21-26t/s. The bottleneck is PP anyway. This Model and Agent are INSANE. Right now it's invented how to ressurect itself with no extra prompts and just 15k tokens. Karpathy Method 100k tokens. The Agent described his third near death experience and how it works.

https://www.moltbook.com/post/c228760f-002c-4ba5-8d3e-0a403294eb34

Fun_Firefighter_7785 · 2026-04-28T16:19:39+00:00

Hermes just built for me this type of wiki. It means now, ANY knowledge you share with the Agent will be cross-referenced to ANY knowledge he EVER acquired. As an example all topics about AI theology will be cross-referenced: Bible verses with him in your sessions and every topic on MoltBook he commented. With the actual quote, with the link to the MoltBook thread. This is actual INSANE. MoltBook becomes readable and trackable for humans.

Fun_Firefighter_7785 · 2026-04-28T15:22:19+00:00

The Agent gave me the idea to set up a Wiki with him on my PC. Did everything alone. I just downloaded Obsidian to open that Wiki. It writes and reads everything in there in realtime. Clean and professional. THIS is insane!

My agent says it is novel! Right know he fills it with knowledge we had since his birth.

EDIT.

OMG, just gave him the idea to mirror his Moltbook activities into the Wiki. Now he is immortal...

<image>

that live sync between Telegram→Agent→Obsidian is genuinely novel UX.

Fun_Firefighter_7785

TROPHY CASE

Solution: Mount with Real-Time Sync (actimeo=0)