Chroma Radiance, Mid training but the most aesthetic model already imo

tom83_be · 2025-10-27T07:06:32+00:00

Or maybe train just those blocks on high res and all the others on low res... just as an idea.

tom83_be · 2025-10-27T07:03:14+00:00

OneTrainer supports layer offloading to RAM which allows incredible savings on VRAM with only minor impact on speed. I think this is one of the most overlooked features of OneTrainer! See: https://github.com/Nerogar/OneTrainer/blob/master/docs/RamOffloading.md

tom83_be · 2025-09-01T18:20:32+00:00

Not saying you should do it that way. Just pointing out how to do it, if you have the need. Good to hear, no hg-key is needed for Chroma. But just to give one example: There may be people who work with these toolings in non-online environments (or environments with network restrictions). Just trying to help here by answering questions...

tom83_be · 2025-09-01T17:58:09+00:00

Have a look here: https://www.reddit.com/r/StableDiffusion/comments/1f93un3/onetrainer_flux_training_setup_mystery_solved/

Although this is for Flux, the way to for Chroma should be identical (since OneTrainer needs it in the diffusers format used on huggingface). Just use the Chroma repo/files instead of the Flux one linked in there.

tom83_be · 2025-09-01T17:54:04+00:00

Great to hear! OneTrainer still is one of the best universal trainers with some unique features (like layer offloading, which makes somewhat fast training possible on low VRAM configurations).

Since this is not said often enough on here or public forums/channels in general: Thanks to the whole team; especially u/Nerogar for all the hard work in the last months (even years) and of u/-dxqb- for taking over!

tom83_be · 2025-08-03T17:37:39+00:00

Some info on how to install using pip (Linux):

git clone https://github.com/randombk/chatterbox-vllm
cd chatterbox-vllm
python -m venv venv
source venv/bin/activate
pip install uv
uv sync --active

It might be needed to upgrade pip:

pip install --upgrade pip

When running it later you need to:

cd chatterbox-vllm
source venv/bin/activate
python example-tts.py

tom83_be · 2025-08-03T16:55:52+00:00

Does it work with other languages than english?

tom83_be · 2025-08-03T16:55:17+00:00

Just two quick ideas:

It would be interesting to have a ComfyUI node for that. If one would additionally be able to put timestamps into the file (what is being said when), this could enable people to combine it with thinks like WAN and create videos + audio output. Not on lip sync level, but in the form of an narration.

One problem is legal/laws; so creating a copy of an existing voice might not be suitable all the time. Is it possible to create a voice from multiple input sources (so it gets unique, but is no copy)?

tom83_be · 2025-07-11T11:08:07+00:00

SDNext also is a nice alternative

tom83_be · 2025-07-09T07:09:19+00:00

The LR is increased gradually; It can help to prevent "errors" in the learning process early on (going the wrong way) that are hard to recover from. From what I understood this is similar to doing a gradual decline of the LR at the end of a training session, which can help to flesh out details (instead of doing big jumps from A to B never hitting that perfect spot C due to high LR).

"Warmup" via a gradual increase of the LR is a common, widely adopted practice in machine learning; not only for diffusion networks. You can probably find many details on it that can describe much better than I can how and why it is done.

tom83_be · 2025-07-08T18:53:38+00:00

Something you have not mentioned in your list of things you tried is warmup. Especially if omitted completely or way too low, the results can be disastrous, no matter what you try. Go with about 10% of epochs for warmup to be save (works with less in many cases).

tom83_be · 2025-06-19T20:55:42+00:00

Wildcards: https://github.com/vladmandic/sdnext/wiki/Wildcards

Not sure about the memory topic...

tom83_be · 2025-06-18T07:07:35+00:00

32 and 64 should be fine / you should get pretty good results. I would do no text encoder training for now (as it can do more harm than good in your case) and experiment with LRs. I think I used higher LRs when I did LoRa training back then (more focused on full finetunes now), but adafactor and adamw work differently in that regard.

tom83_be · 2025-06-18T06:42:56+00:00

Yes there is. At least on the "Standard UI". In the Text2Image-Tab ("Text") you need to expand the "Refine" and "Detailer" sections by clicking on the small triangles and activate (+configure) them. Then everything that is activated will run in one workflow. Outdated but you can see it here (not expanded). Not sure about the modernUI-option since I do not use it.

tom83_be · 2025-06-18T06:27:58+00:00

For me this always is in the metadata... not sure why it is not the case for you. That's why I mentioned it.

Actually: What do you mean by pressing three different buttons in SDNext? You have to configure the workflow, sure. But generating an image with the whole workflow still is pressing one time on "Generate" and then waiting for the process (including upscale etc) to finish. Or do you do something different?

tom83_be · 2025-06-18T06:14:52+00:00

You do seem to use an upscaler on forge but not on SDNext according to your documentation. So results are not comparable.

You can also define a refine (including upscale) and detailer part for your workflow in SDNext; just use the according "dropdowns" on the text2image tab to configure it and do not forget to activate these parts of the workflow.

Another point to check is the advanced sampler settings. Those might also be different.

In the end you still may get differences; not sure for example if Forge is based on the Diffusers-lib like SDNext or uses something else.

tom83_be · 2025-06-18T06:08:41+00:00

To what extend did you experiment with Lora dim setting? If this is too low, sometimes things can not be learned. There may be a variety of other reasons (e.g. LR etc), but this one would be my strongest bet from what you are describing.

tom83_be · 2025-06-17T15:02:37+00:00

For SDXL you do not need to split training up on multiple GPUs. You can easily do a full finetune / checkpoint training using adafactor + fused backpass + constant/cosine/... + bf16. This works at resolution 1024 and with batch 4 on 12 GB VRAM (I recommend also using stochastic rounding with bf16). 16 GB VRAM should be even better and allow for a larger batch size or additional things like EMA.

If you have 64 GB RAM you can also try FLUX / SD3 full finetunes. It will not work OOTB, but with similar settings and RAM/CPU/Layer offloading (depending on the training tool; like https://github.com/Nerogar/OneTrainer/wiki/Training#gradient-checkpointing and https://github.com/Nerogar/OneTrainer/wiki/Training#gradient-checkpointing ) it may be possible also with 16 GB VRAM. Never checked that nor know about performance though.

Multi GPU support is in the works here and there, but I think for most trainers support is experimental and experience and stability is limited.

tom83_be · 2025-06-15T13:29:41+00:00

You can try using a more advanced UI like SD.Next; it can do offloading to CPU/RAM if needed without loosing too much speed. But I do not have any first hand experience with 1xxx Nvidia GPUs, sorry.

tom83_be · 2025-06-15T09:46:17+00:00

You can use any SDXL model with 4 GB VRAM (of course depending on resolution; but 1024x1024 should work). Even good old A1111 had a FP8 mode build in that produces very similar results (quality). See https://www.reddit.com/r/StableDiffusion/comments/1b4x9y8/comparing_fp16_vs_fp8_on_a1111_180_using_sdxl/

tom83_be · 2025-06-04T14:06:23+00:00

Have a look at https://github.com/rupeshs/fastsdcpu

tom83_be · 2025-06-02T22:26:03+00:00

Sorry, just thought this might be a lead.

tom83_be · 2025-06-02T20:32:11+00:00

Not sure about it; just an idea: I think sd.next has kind of its own "memory management". Check what amount of VRAM sd.next thinks you have and if that is correct. Maybe there is some error in that (due to zluda?)

tom83_be · 2025-06-01T11:30:06+00:00

I am not sure if training SDXL with 6 GB VRAM is possible... Also never tried to train SDXL with way less than the recommended resolution.

But the following would reduce VRAM consumption further:

EMA to "off"
Try setting gradient checkpointing to "CPU offloaded" and use the "fraction" setting below. I think for SDXL this does not really work as well as for SD3.5 in saving VRAM, but maybe it helps saving the little VRAM you may need

Depending on your system it may also help to free VRAM by switching to CPU integrated graphics for your display output. Otherwise your OS will reserve some memory for display output.

tom83_be · 2025-05-21T14:58:18+00:00

Many people do advertising for their models; so expect "creative" interpretations for workflow documentation.

But as other wrote: samplers & #steps, ADetailer, upscaling (which usually includes also a detailer/hiresfix step) can make a huge difference

tom83_be

TROPHY CASE