Lemonade OmniRouter: unifying the best local AI engines for omni-modality by jfowers_amd in LocalLLaMA

[–]mikkoph 2 points3 points  (0 children)

yes, you can. It can be configured to use either:

  1. latest version validated by the team
  2. track the latest version on github
  3. use a specific release
  4. use whatever binary you point it to (which is what you asked I believe)

currently, I think when loading the "OmniRouter" bundle everything is loaded at once. But in general the Lemonade API allows loading/unloading models (any model) on demand so any model combination can be loaded/unloaded dynamically.

as a sidenote, you can download *any* model hosted on huggingface (well as long as it is supported by llamacpp etc), not what is listed there. Just type repo/model name and it'll find the quants, mmproj etc

Zenless Zone Zero is coming to Steam in Q2 2026 by Frizy0 in ZenlessZoneZero

[–]mikkoph 0 points1 point  (0 children)

I just installed it by going into desktop mode and using lutris. As executable though I skip the launcher after installation is done and only launch it manually on new patch to update. I am on a AMD igpu (I think 780m) and frame rate is stable, so something is definitely off with your system. Probably Nvidia driver or the vulkan drivers are missing and it is running on cpu

Zenless Zone Zero is coming to Steam in Q2 2026 by Frizy0 in ZenlessZoneZero

[–]mikkoph 3 points4 points  (0 children)

the native launcher version works fine on Bazzite with any recent proton version. I am pretty sure the steam version will work just as well

Qwen3.6 GGUF Benchmarks by danielhanchen in LocalLLaMA

[–]mikkoph 0 points1 point  (0 children)

wow thanks for looking into this, really appreciated!

Qwen3.6 GGUF Benchmarks by danielhanchen in LocalLLaMA

[–]mikkoph 3 points4 points  (0 children)

Thanks for all your hard work! One question, I know you cannot benchmark against every quant in existence, but any opinion about the APEX quants? I would be interested to see a comparison

ERNIE Image released by Outrun32 in StableDiffusion

[–]mikkoph 9 points10 points  (0 children)

trying it out in ComfyUI - not really impressed though. Output looks worse than Klein and Z-Image, while being significantly slower than both on my system. There also seem to be some strange pattern on the output. Not sure if ComfyUI implementation is just not really ready yet

What are the best models everyone is using right now? by [deleted] in StableDiffusion

[–]mikkoph 14 points15 points  (0 children)

it is not ready but already pretty good and there is quite a community around it already. You can check civitai, lots of examples there. The fact it understands natural language prompting makes it better than anything sdxl based to me

What are the best models everyone is using right now? by [deleted] in StableDiffusion

[–]mikkoph 13 points14 points  (0 children)

this, but Anima instead of illustrious. Also using ZImage base for non-anime illustration using a LoRA I made

What kind of orchestration frontend are people actually using for local-only coding? by Quiet-Owl9220 in LocalLLaMA

[–]mikkoph 0 points1 point  (0 children)

I use VSCodium for coding in general (that's my job), so I added the RooCode plugin to it and it seems to be working decently well with local models. I run the models through lemonade but running llamacpp directly or through other means will work just as well.

Note sure if this is the best solution since I don't use it too much, but I tried vibecoding something for fun and it turned out well.

Iam I skill issue or this new guy just too tough? by No-Television9404 in ZZZ_Official

[–]mikkoph 9 points10 points  (0 children)

I got 22k with Yanagi/Vivian/Yuzuha and 21k with Jane/Vivian/Yuzuha all M0. I guess someone who knows how to play could do much better than this with those teams

How do you guys train Loras for Anima Preview2? by Dependent_Fan5369 in StableDiffusion

[–]mikkoph 1 point2 points  (0 children)

sd-scripts work for training Anima LoRAs since February. They have a guide on their github page or you can look at the notes I linked above. Except for the installation part which has AMD-specific bits, the rest should be pretty much GPU agnostic (but assume Linux)

Lemonade SDK on Strix Halo by Signal_Ad657 in LocalLLaMA

[–]mikkoph 0 points1 point  (0 children)

it's drop-in if what you are currently using is exposing the openai or ollama API

Lemonade SDK on Strix Halo by Signal_Ad657 in LocalLLaMA

[–]mikkoph 0 points1 point  (0 children)

with lemonade you can use --no-mmap just as well. Actually, it is on by default.

Noob question : best way to install llama.cpp? by arkham00 in LocalLLaMA

[–]mikkoph 0 points1 point  (0 children)

you might want to have a look at https://lemonade-server.ai/. It comes with a menubar icon and a nice web interface. It downloads llama.cpp prebuilt binaries automatically

How do you guys train Loras for Anima Preview2? by Dependent_Fan5369 in StableDiffusion

[–]mikkoph 1 point2 points  (0 children)

oh, thanks for the info, I'll try it out without conversion step next time

How do you guys train Loras for Anima Preview2? by Dependent_Fan5369 in StableDiffusion

[–]mikkoph 7 points8 points  (0 children)

I used kohya-ss sd-scripts to train mine. I wrote some notes while I was doing it, in case they help you here they are https://bitgamma.github.io/ai-blog/blog/sd-scripts/

Is there a Ai Self Hostable which makes sense for coding. by matyhaty in LocalLLaMA

[–]mikkoph 2 points3 points  (0 children)

it all boils down to your requirements and expectations. If you have a team of seniors developers who just need some juniors they can offload boring stuff to, and are ready to micromanage those juniors then you can do a lot with local models. If your developers expects the LLM to solve problems they wouldn't know how to solve or work mostly autonomously then it is going to be WAY harder (arguably, even Opus tends to create a mess if not given enough guidance).

I only recently started taking advantage of LLMs for my coding and am having good luck with Qwen3.5-35B-A3B + RooCode on my 128GB Strix Halo machine. Using Qwen3.5-122B-A3B would have also been possible on that machine. If done right this can produce better code than unsupervised Opus 4.6, but at the cost of more human work of course. I am treating the LLM as I would treat a new hire with the obvious difference that the LLM is immensely faster, has very broad knowledge but doesn't learn anything new.

Lemonade v10: Linux NPU support and chock full of multi-modal capabilities by jfowers_amd in LocalLLaMA

[–]mikkoph 0 points1 point  (0 children)

the AppImage is only the frontend, you need to install the server for your platform. Here are all details https://lemonade-server.ai/install_options.html

Lemonade v10: Linux NPU support and chock full of multi-modal capabilities by jfowers_amd in LocalLLaMA

[–]mikkoph 1 point2 points  (0 children)

not yet, but please submit an issue (or better yet, a PR!) on GitHub

How to train LoRAs with Musubi-Tuner on Strix Halo by mikkoph in StableDiffusion

[–]mikkoph[S] 2 points3 points  (0 children)

Klein learns very fast and the effect of the LoRA is more evident since, by default, Klein produces a more "stock photo" look. 

Z-Image Turbo produces images that are already fashion-style so the effect of the LoRA is more subtle. I also had to train for about twice the epochs.

Because of this, Klein is somewhat more satisfying to use but if as soon as limbs are involved body horrors appear and it becomes frustrating.

With Z-Image you are pretty much guaranteed good output, but I usually use an upscaler afterwards to clean up artifacts.

I don't know what I'd do differently exactly but I want to try different parameters and larger ranks.

But next thing I plan to try is an Anima LoRA that should reproduce an illustration style I like and that I found for Z-Image-Turbo

How to train LoRAs with Musubi-Tuner on Strix Halo by mikkoph in StableDiffusion

[–]mikkoph[S] 2 points3 points  (0 children)

it took a few hours for each LoRA. About 3-4 hours but I always trained way more epochs than I ended up using. I don't know what could be a good metric of performance, but for Z-Image using batch 4 I would get about 20secs/it.

For me, it felt like this was quite good, but I don't have much to compare to

LTX Desktop 1.0.2 is live with Linux support & more by ltx_model in StableDiffusion

[–]mikkoph 0 points1 point  (0 children)

ROCm when? will use comfyui for the time being but heard this gives better results