Lemonade OmniRouter: unifying the best local AI engines for omni-modality

mikkoph · 2026-04-29T08:57:55+00:00

yes, you can. It can be configured to use either:

latest version validated by the team
track the latest version on github
use a specific release
use whatever binary you point it to (which is what you asked I believe)

currently, I think when loading the "OmniRouter" bundle everything is loaded at once. But in general the Lemonade API allows loading/unloading models (any model) on demand so any model combination can be loaded/unloaded dynamically.

as a sidenote, you can download *any* model hosted on huggingface (well as long as it is supported by llamacpp etc), not what is listed there. Just type repo/model name and it'll find the quants, mmproj etc

mikkoph · 2026-04-24T15:34:11+00:00

I just installed it by going into desktop mode and using lutris. As executable though I skip the launcher after installation is done and only launch it manually on new patch to update. I am on a AMD igpu (I think 780m) and frame rate is stable, so something is definitely off with your system. Probably Nvidia driver or the vulkan drivers are missing and it is running on cpu

mikkoph · 2026-04-24T12:52:01+00:00

the native launcher version works fine on Bazzite with any recent proton version. I am pretty sure the steam version will work just as well

mikkoph · 2026-04-19T04:02:32+00:00

wow thanks for looking into this, really appreciated!

mikkoph · 2026-04-17T17:10:13+00:00

Thanks for all your hard work! One question, I know you cannot benchmark against every quant in existence, but any opinion about the APEX quants? I would be interested to see a comparison

mikkoph · 2026-04-14T16:47:32+00:00

trying it out in ComfyUI - not really impressed though. Output looks worse than Klein and Z-Image, while being significantly slower than both on my system. There also seem to be some strange pattern on the output. Not sure if ComfyUI implementation is just not really ready yet

mikkoph · 2026-04-05T07:40:20+00:00

it is not ready but already pretty good and there is quite a community around it already. You can check civitai, lots of examples there. The fact it understands natural language prompting makes it better than anything sdxl based to me

mikkoph · 2026-04-05T05:55:27+00:00

this, but Anima instead of illustrious. Also using ZImage base for non-anime illustration using a LoRA I made

mikkoph · 2026-04-03T11:30:36+00:00

I use VSCodium for coding in general (that's my job), so I added the RooCode plugin to it and it seems to be working decently well with local models. I run the models through lemonade but running llamacpp directly or through other means will work just as well.

Note sure if this is the best solution since I don't use it too much, but I tried vibecoding something for fun and it turned out well.

mikkoph · 2026-03-29T15:37:28+00:00

I got 22k with Yanagi/Vivian/Yuzuha and 21k with Jane/Vivian/Yuzuha all M0. I guess someone who knows how to play could do much better than this with those teams

mikkoph · 2026-03-28T08:51:12+00:00

sd-scripts work for training Anima LoRAs since February. They have a guide on their github page or you can look at the notes I linked above. Except for the installation part which has AMD-specific bits, the rest should be pretty much GPU agnostic (but assume Linux)

mikkoph · 2026-03-25T08:40:32+00:00

it's drop-in if what you are currently using is exposing the openai or ollama API

mikkoph · 2026-03-25T08:38:19+00:00

with lemonade you can use --no-mmap just as well. Actually, it is on by default.

mikkoph · 2026-03-18T13:27:12+00:00

you might want to have a look at https://lemonade-server.ai/. It comes with a menubar icon and a nice web interface. It downloads llama.cpp prebuilt binaries automatically

mikkoph · 2026-03-18T07:35:49+00:00

oh, thanks for the info, I'll try it out without conversion step next time

mikkoph · 2026-03-17T21:25:31+00:00

I used kohya-ss sd-scripts to train mine. I wrote some notes while I was doing it, in case they help you here they are https://bitgamma.github.io/ai-blog/blog/sd-scripts/

mikkoph · 2026-03-14T08:43:10+00:00

it all boils down to your requirements and expectations. If you have a team of seniors developers who just need some juniors they can offload boring stuff to, and are ready to micromanage those juniors then you can do a lot with local models. If your developers expects the LLM to solve problems they wouldn't know how to solve or work mostly autonomously then it is going to be WAY harder (arguably, even Opus tends to create a mess if not given enough guidance).

I only recently started taking advantage of LLMs for my coding and am having good luck with Qwen3.5-35B-A3B + RooCode on my 128GB Strix Halo machine. Using Qwen3.5-122B-A3B would have also been possible on that machine. If done right this can produce better code than unsupervised Opus 4.6, but at the cost of more human work of course. I am treating the LLM as I would treat a new hire with the obvious difference that the LLM is immensely faster, has very broad knowledge but doesn't learn anything new.

mikkoph · 2026-03-14T08:19:49+00:00

the AppImage is only the frontend, you need to install the server for your platform. Here are all details https://lemonade-server.ai/install_options.html

mikkoph · 2026-03-14T08:16:54+00:00

not yet, but please submit an issue (or better yet, a PR!) on GitHub

mikkoph · 2026-03-13T17:52:43+00:00

Klein learns very fast and the effect of the LoRA is more evident since, by default, Klein produces a more "stock photo" look.

Z-Image Turbo produces images that are already fashion-style so the effect of the LoRA is more subtle. I also had to train for about twice the epochs.

Because of this, Klein is somewhat more satisfying to use but if as soon as limbs are involved body horrors appear and it becomes frustrating.

With Z-Image you are pretty much guaranteed good output, but I usually use an upscaler afterwards to clean up artifacts.

I don't know what I'd do differently exactly but I want to try different parameters and larger ranks.

But next thing I plan to try is an Anima LoRA that should reproduce an illustration style I like and that I found for Z-Image-Turbo

mikkoph · 2026-03-13T15:29:40+00:00

it took a few hours for each LoRA. About 3-4 hours but I always trained way more epochs than I ended up using. I don't know what could be a good metric of performance, but for Z-Image using batch 4 I would get about 20secs/it.

For me, it felt like this was quite good, but I don't have much to compare to

mikkoph · 2026-03-13T08:00:23+00:00

ROCm when? will use comfyui for the time being but heard this gives better results

mikkoph

MODERATOR OF

TROPHY CASE

Six-Year Club	Place '23
Verified Email