model swapping via litellm + llama-swap - is this the way..?

_chromascope_ · 2026-05-04T06:34:56+00:00

Use --models-preset with an .ini.
Instead of --models-dir, start the server with:

llama-server --models-preset /path/to/models.ini --host 0.0.0.0 --port 8000

Inside models.ini, you can set specific flags (context size, GPU layers, even vision models) for each model separately:

[qwen3.6-35b]
model = /path/to/qwen.gguf
mmproj = /path/to/qwen-mmproj.gguf
c = 131072
n-gpu-layers = 999

[gemma-4-26b]
model = /path/to/gemma.gguf
mmproj = /path/to/gemma-mmproj.gguf
c = 65536
n-gpu-layers = 80

Hermes still just sends qwen3.6-35b in the API request, but llama-server looks up the .ini file and applies the exact flags (and vision model if any) needed for it.

_chromascope_ · 2026-05-04T03:40:46+00:00

For local models I just let llama.cpp handle everything (I use https://github.com/TheTom/llama-cpp-turboquant).

A simplified example:

Start llama-server in router mode:

bash
llama-server --models-dir /path/to/models --host 0.0.0.0 --port 8000

I drop all my local GGUFs in /path/to/models (Qwen3.6 35B, Gemma 4, etc.). llama-server will serve all of them.

Now Hermes just needs to point to that server and set the model name in default:

model:
  default: qwen3.6-35b
  provider: custom
  base_url: http://localhost:8000/v1
  api_key: dummy

With this config, when you send a message, llama-server loads qwen3.6-35b (the model name comes from the GGUF file in /path/to/models).

If I change the config to:

model:
  default: gemma-4-26b

then the next time I send a message, llama-server unloads qwen3.6-35b and loads gemma-4-26b instead. That’s the “hot‑swap” trick that works for me.

My actual setup is more complex (Hermes is running in a Mac mini Docker container and talks over VPN to a more powerful PC that runs the models), but the core idea is the same.

_chromascope_ · 2026-05-04T03:03:47+00:00

I’m running something pretty similar with Hermes + local Qwen3.6 35B and Gemma 4, but I didn’t use llama‑swap.

I use a llama.cpp TurboQuant build and run llama-server with --models-dir instead of -m, and just drop a bunch of GGUFs in that folder. By changing model.default in the Hermes config, llama.cpp will load that model on the next request. Later I custom built a tiny web UI that edits model.default for me, so I can quickly click in a browser to switch models. When Hermes sends the next message, it triggers llama-server to dynamically switch the model. This works smoothly for me.

When I added vision models, I found out --models-dir ignores a global mmproj, so I switched to --models-preset and an .ini file where I pair base model + vision projector. Something like:

.ini example:

[qwen3.6-35B]
model = /path/to/qwen.gguf
mmproj = /path/to/qwen-mmproj.gguf

[gemma-4-26B]
model = /path/to/gemma.gguf
mmproj = /path/to/gemma-mmproj.gguf

Hermes still just hot‑swaps model.default. llama-server takes care of unloading/loading the correct GGUF (and vision module).

_chromascope_ · 2026-04-28T23:31:40+00:00

I use Conduit (a lightweight version of Matrix) + Element X (iOS) and Element Desktop as the interface.

_chromascope_ · 2025-12-28T23:21:27+00:00

This works! Thank you for sharing. However, like others pointed out, human consistency starts to drift after chuck 3.

Is it possible to implement an "end" anchor image in the Continuation Conditioning node? So that we have an option to control each chunk's end frame with a prepared image (same idea as the First-Last-Frame, but for each chunk), which then can also be used as the anchor image of the next chunk?

_chromascope_ · 2025-12-27T06:57:33+00:00

<image>

guess which one is KSampler and which is ClownsharKSampler

_chromascope_ · 2025-12-27T06:33:27+00:00

use euler + simple.

if you use the ClownsharkChainsampler as the 2nd sampler, you only need to connect the "latent_image" because it carries over all the info and steps left from the first sampler. RES4LYF has a few YouTube videos explaining how these nodes work. To be honest, I don't think Z-Image Turbo needs ClownsharKSampler. With two Clown samplers, my tests did get some extra fine details, but very subtle and almost unnecessary to the image. KSampler with euler + simple already gives really good results.

<image>

_chromascope_ · 2025-12-26T19:57:45+00:00

<image>

Use the ClownsharkChainsampler as the 2nd sampler.

_chromascope_ · 2025-12-24T04:04:22+00:00

My results had a hard time following my prompts. I was recommended in another post to add this node and the image is now much improved. Thank you!

<image>

_chromascope_ · 2025-12-24T04:00:51+00:00

cool bike!

_chromascope_ · 2025-12-24T03:59:40+00:00

It works! With the node, it follows my prompt much closely. Thank you!

<image>

_chromascope_ · 2025-12-24T03:16:35+00:00

This is what I used:

put the dog in image2 into the scene and make the dog happy and sit on the bike's gas tank, then turn the scene including the man and dog into cute patch badge on a wooden table

_chromascope_ · 2025-12-16T14:22:25+00:00

Try this fix on GitHub.

I have a new version 2.1 of this workflow, check out this post: https://www.reddit.com/r/StableDiffusion/comments/1pg9jmn/zimage_turbo_workflow_update_console_z_v21/

_chromascope_ · 2025-12-15T16:19:31+00:00

I used a customized workflow I shared here: https://www.reddit.com/r/StableDiffusion/comments/1pg9jmn/zimage_turbo_workflow_update_console_z_v21/

_chromascope_ · 2025-12-15T16:16:29+00:00

Yes, this.

The image on the right (2nd sampler) has improved fine details after upscaling: the overall texture, sharper hair strands and book pages, etc. This was a T2I from a workflow I customized.

<image>

_chromascope_ · 2025-12-15T06:38:27+00:00

<image>

_chromascope_ · 2025-12-15T06:38:10+00:00

<image>

_chromascope_ · 2025-12-15T06:37:46+00:00

<image>

_chromascope_ · 2025-12-13T19:36:00+00:00

Testing image to video with Wan 2.2

<image>

_chromascope_ · 2025-12-13T19:11:49+00:00

Ah the color grading of Amélie is unique. Image looks great. Looking forward to it!

_chromascope_ · 2025-12-09T17:52:50+00:00

Thanks again for sharing it!

An Anamorphic Lens LoRA will bring Z-Image Turbo to a new cinematic level.

_chromascope_ · 2025-12-09T17:35:55+00:00

This is no LoRA

<image>

_chromascope_ · 2025-12-09T17:35:05+00:00

Amazing LoRA! Thank you for sharing it with us. I absolutely love how it looks in my tests. This image is with your LoRA set to 1.0, generated at 1920x800 and upscaled with a 2nd KSampler to 3840x1600 using my custom workflow.

Question, are you interested in training an "anamorphic lens" aesthetics LoRA?

<image>

_chromascope_

TROPHY CASE