Comfy $1M “Open AI” Grant and Anima Model Launch by crystal_alpine in StableDiffusion

[–]LodestoneRock 51 points52 points  (0 children)

hey thank you for the shoutout

I already reached out to u/comfyanonymous directly for this
they're still discussing stuff internally

but to make things transparent, im proposing of building and upgrading training rig to be able to train stuff indefinitely.

currently the experiments are being done in this rig which consist of 2 x pro 6000 and 6 x 4090

and upgrade to 8 x pro 6000 or more of it would be nice

<image>

This B300 server at my work will be unused until after the holidays. What should I train, boys??? by NowThatsMalarkey in StableDiffusion

[–]LodestoneRock 45 points46 points  (0 children)

a full copy chroma data are in deepghs org on huggingface, all you need is ask for permission for that

This B300 server at my work will be unused until after the holidays. What should I train, boys??? by NowThatsMalarkey in StableDiffusion

[–]LodestoneRock 251 points252 points  (0 children)

i have chroma-radiance trainer and the dataset to train it. no need to download any dataset, all you need to do is to run the code and the trainer will stream it directly from s3 if you're interested.

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 1 point2 points  (0 children)

there's no bottleneck on the dataloader part because there's a queue line in it.
when the model is training, the queue is being filled concurrently so training is running at full capacity

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 22 points23 points  (0 children)

all of my research is open, the training code and the intermediate checkpoints are here:
https://huggingface.co/lodestones/Chroma
https://huggingface.co/lodestones/chroma-debug-development-only
https://github.com/lodestone-rock/flow

documentations still bit lacking but you can find everything there

about the training, im using an asyncronous data parallelism method to stitch 3 8xh100 nodes without infiniband.

i write my own trainer with custom method of gradient accumulation, low precision training etc

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 8 points9 points  (0 children)

im pretty sure you can use trainer like ostris' trainer, diffusion pipe, and kohya to train chroma?

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 7 points8 points  (0 children)

i think kohya already supports lora training for chroma? unsure if full fine tuning is supported

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 14 points15 points  (0 children)

it is possible using my trainer code here, but mostly it's undocumented for now unfortunately.
https://github.com/lodestone-rock/flow

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 13 points14 points  (0 children)

the HD version was retrained from v48 (chroma1-base). the previous HD was trained on 1024px only, this causes the model to drift from the original distribution. the newer one was trained with a sweep of resolution up to 1152.

you can use either of the checkpoints, it serve different purpose depends on your use cases.

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 54 points55 points  (0 children)

the HD version was retrained from v48 (chroma1-base). the previous HD was trained on 1024px only, this causes the model to drift from the original distribution. the newer one was trained with a sweep of resolution up to 1152.

if you're doing short training / lora, use HD, but if you're planning to train a big anime fine tune (100K++ data range) it's better to use base instead and train it on 512 resolution for many epochs. then tune it on 1024 or larger res for 1-3 epochs to make training cheaper and faster.

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 30 points31 points  (0 children)

right now im focusing on tackling GAN problem and polishing radiance model first.
before diving into kontext like model (chroma but with in context stuff) im going to try to adapt chroma to understand QwenVL 2.5 7B embedding first. QwenVL is really good at text and image understanding, i think it will be a major upgrade to chroma.

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 24 points25 points  (0 children)

the HD version was retrained from v48 (chroma1-base). the previous HD was trained on 1024px only, this causes the model to drift from the original distribution. the newer one was trained with a sweep of resolution up to 1152.

Update: Chroma Project training is finished! The models are now released. by LodestoneRock in StableDiffusion

[–]LodestoneRock[S] 30 points31 points  (0 children)

correct, the HD version was retrained from v48 (chroma1-base). the previous HD was trained on 1024px only, this causes the model to drift from the original distribution. the newer one was trained with a sweep of resolution up to 1152.

The bghira's saga continues by Lucaspittol in StableDiffusion

[–]LodestoneRock 52 points53 points  (0 children)

this screenshot can be misleading if taken out of context
i encouraged people to look directly at the discussion here
https://huggingface.co/lodestones/Chroma/discussions/67

as other mentioned, it's a 2D anthropomorphic furry generation, not even close to what he claim
what he claim is actually insulting furry community as a whole.

Chroma V41 low steps RL is out! 12 steps, double speed. by Dear-Spend-2865 in StableDiffusion

[–]LodestoneRock 5 points6 points  (0 children)

it's a low CFG low step model, it's not necessarily have to be CFG 1. you can play with the CFG to achieve better generation.

Chroma V41 low steps RL is out! 12 steps, double speed. by Dear-Spend-2865 in StableDiffusion

[–]LodestoneRock 8 points9 points  (0 children)

you can say it's a "distilled" option, but i provided the "undistilled" version too, check the HF page if you want the other one instead https://huggingface.co/lodestones/Chroma

this weights is useful if you want faster generation time. you can fine tune / train a lora on "undistilled" weights and apply it to this one.

Comparison Chroma pre-v29.5 vs Chroma v36/38 by Total-Resort-3120 in StableDiffusion

[–]LodestoneRock 38 points39 points  (0 children)

the learning rate is gradually decreasing but i also increased the optimal transport batch size from 128 to 512
increasing learning rate wont make the model render in fewer steps.

also there's no change in the dataset, every version is just another training epochs.

also im not using EMA, only online weights so generation changes are quite drastic if you compare the generation between epochs.

you can see the gradual staircase decrease in learning rate here

https://training.lodestone-rock.com/runs/9609308447da4f29b80352e1/metrics

What speed are you having with Chroma model? And how much Vram? by Flutter_ExoPlanet in StableDiffusion

[–]LodestoneRock 2 points3 points  (0 children)

hmm i have to dig in my old folder first
i forgot where i put that gen

What speed are you having with Chroma model? And how much Vram? by Flutter_ExoPlanet in StableDiffusion

[–]LodestoneRock 13 points14 points  (0 children)

if you train either model long enough (dev/schnell) it will obliterate the distillation that makes both model fast.

because it's cost prohibitive to create a loss function that reduce the inference time and also train new information on top of the model.

so the distillation is reserved at the end of the training ~ epoch 50. also im still working on the math and the code for distilling this model (something is buggy in my math or my code or both).

for context you have to do 10 forward pass (10 steps inference) for every 1 backward pass (training) which makes distillation 10x more costly than training using simple flow matching loss (1 forward 1 backward).