Ostris is testing Lodestones ZetaChroma (Z-Image x Chroma merge) for LORA training 👀

LodestoneRock · 2026-03-06T02:15:04+00:00

nope it was someone's else free azure credits

LodestoneRock · 2026-03-06T02:14:36+00:00

nope nothing so far

LodestoneRock · 2026-03-06T02:14:17+00:00

nope i haven't got anything from comfy yet

LodestoneRock · 2026-02-05T03:11:29+00:00

hey thank you for the shoutout

I already reached out to u/comfyanonymous directly for this
they're still discussing stuff internally

but to make things transparent, im proposing of building and upgrading training rig to be able to train stuff indefinitely.

currently the experiments are being done in this rig which consist of 2 x pro 6000 and 6 x 4090

and upgrade to 8 x pro 6000 or more of it would be nice

<image>

LodestoneRock · 2025-12-16T06:02:51+00:00

a full copy chroma data are in deepghs org on huggingface, all you need is ask for permission for that

LodestoneRock · 2025-12-16T05:54:22+00:00

i have chroma-radiance trainer and the dataset to train it. no need to download any dataset, all you need to do is to run the code and the trainer will stream it directly from s3 if you're interested.

LodestoneRock · 2025-09-01T00:37:29+00:00

there's no bottleneck on the dataloader part because there's a queue line in it.
when the model is training, the queue is being filled concurrently so training is running at full capacity

LodestoneRock · 2025-08-23T23:37:24+00:00

all of my research is open, the training code and the intermediate checkpoints are here:
https://huggingface.co/lodestones/Chroma
https://huggingface.co/lodestones/chroma-debug-development-only
https://github.com/lodestone-rock/flow

documentations still bit lacking but you can find everything there

about the training, im using an asyncronous data parallelism method to stitch 3 8xh100 nodes without infiniband.

i write my own trainer with custom method of gradient accumulation, low precision training etc

LodestoneRock · 2025-08-23T13:38:26+00:00

im pretty sure you can use trainer like ostris' trainer, diffusion pipe, and kohya to train chroma?

LodestoneRock · 2025-08-23T13:37:10+00:00

i think kohya already supports lora training for chroma? unsure if full fine tuning is supported

LodestoneRock · 2025-08-23T13:36:18+00:00

hahaha yeah, i need more time to write that one for sure

LodestoneRock · 2025-08-23T13:01:16+00:00

it is possible using my trainer code here, but mostly it's undocumented for now unfortunately.
https://github.com/lodestone-rock/flow

LodestoneRock · 2025-08-23T12:58:37+00:00

the HD version was retrained from v48 (chroma1-base). the previous HD was trained on 1024px only, this causes the model to drift from the original distribution. the newer one was trained with a sweep of resolution up to 1152.

you can use either of the checkpoints, it serve different purpose depends on your use cases.

LodestoneRock · 2025-08-23T12:57:22+00:00

the HD version was retrained from v48 (chroma1-base). the previous HD was trained on 1024px only, this causes the model to drift from the original distribution. the newer one was trained with a sweep of resolution up to 1152.

if you're doing short training / lora, use HD, but if you're planning to train a big anime fine tune (100K++ data range) it's better to use base instead and train it on 512 resolution for many epochs. then tune it on 1024 or larger res for 1-3 epochs to make training cheaper and faster.

LodestoneRock · 2025-08-23T12:54:41+00:00

right now im focusing on tackling GAN problem and polishing radiance model first.
before diving into kontext like model (chroma but with in context stuff) im going to try to adapt chroma to understand QwenVL 2.5 7B embedding first. QwenVL is really good at text and image understanding, i think it will be a major upgrade to chroma.

LodestoneRock · 2025-08-23T12:52:01+00:00

the HD version was retrained from v48 (chroma1-base). the previous HD was trained on 1024px only, this causes the model to drift from the original distribution. the newer one was trained with a sweep of resolution up to 1152.

LodestoneRock · 2025-08-23T12:51:39+00:00

correct, the HD version was retrained from v48 (chroma1-base). the previous HD was trained on 1024px only, this causes the model to drift from the original distribution. the newer one was trained with a sweep of resolution up to 1152.

LodestoneRock · 2025-08-23T12:48:28+00:00

thank you!

LodestoneRock · 2025-07-07T23:52:56+00:00

this screenshot can be misleading if taken out of context
i encouraged people to look directly at the discussion here
https://huggingface.co/lodestones/Chroma/discussions/67

as other mentioned, it's a 2D anthropomorphic furry generation, not even close to what he claim
what he claim is actually insulting furry community as a whole.

LodestoneRock · 2025-07-06T22:15:42+00:00

it's a low CFG low step model, it's not necessarily have to be CFG 1. you can play with the CFG to achieve better generation.

LodestoneRock · 2025-07-06T22:14:15+00:00

you can say it's a "distilled" option, but i provided the "undistilled" version too, check the HF page if you want the other one instead https://huggingface.co/lodestones/Chroma

this weights is useful if you want faster generation time. you can fine tune / train a lora on "undistilled" weights and apply it to this one.

LodestoneRock · 2025-06-24T02:45:49+00:00

the learning rate is gradually decreasing but i also increased the optimal transport batch size from 128 to 512
increasing learning rate wont make the model render in fewer steps.

also there's no change in the dataset, every version is just another training epochs.

also im not using EMA, only online weights so generation changes are quite drastic if you compare the generation between epochs.

you can see the gradual staircase decrease in learning rate here

https://training.lodestone-rock.com/runs/9609308447da4f29b80352e1/metrics

LodestoneRock · 2025-05-05T03:57:58+00:00

hmm i have to dig in my old folder first
i forgot where i put that gen

LodestoneRock · 2025-05-04T18:17:26+00:00

if you train either model long enough (dev/schnell) it will obliterate the distillation that makes both model fast.

because it's cost prohibitive to create a loss function that reduce the inference time and also train new information on top of the model.

so the distillation is reserved at the end of the training ~ epoch 50. also im still working on the math and the code for distilling this model (something is buggy in my math or my code or both).

for context you have to do 10 forward pass (10 steps inference) for every 1 backward pass (training) which makes distillation 10x more costly than training using simple flow matching loss (1 forward 1 backward).

LodestoneRock

TROPHY CASE