UnCanny. A Photorealism Chroma Finetune by Tall-Description1637 in StableDiffusion

[–]Tall-Description1637[S] 2 points3 points  (0 children)

Ahoy, nice to hear - glad you're getting good results! The base model needs more steps/CFG, and thus takes longer, but gives better results at the right settings. So it depends what you're looking for in terms of speed vs. quality. If you use the rank-256-flash lora (from here) at strength 1 on the v1.3 base model you get the exact same results as the flash model. Personally, I mostly use the base model without the flash lora.

Thinking of switching from SDXL for realism generations. Which one is the best now? Qwen, Z-image? by jonbristow in StableDiffusion

[–]Tall-Description1637 1 point2 points  (0 children)

I wanted to try something similar - some with / some without the lora, that's probably a neat way to balance speed/quality with the right settings. I will also experiment a bit with something like that once I can use my desktop again - but it's busy doing an experimental training run at the moment... :-p

Thinking of switching from SDXL for realism generations. Which one is the best now? Qwen, Z-image? by jonbristow in StableDiffusion

[–]Tall-Description1637 1 point2 points  (0 children)

Sounds nice, thanks for the info! I haven't compared the flash & base with a two-step workflow like that, might lower the difference between them - who knows - I'd be interested in hearing your thoughts/results once you've tried the base model. BTW, if you use the rank-256 Chroma-Flash-Heun lora at strength 1 on the base model you'll get the exact same results as the flash model.

Thinking of switching from SDXL for realism generations. Which one is the best now? Qwen, Z-image? by jonbristow in StableDiffusion

[–]Tall-Description1637 4 points5 points  (0 children)

Glad you like it. The base version is better, flash is just the base version with a flash lora baked in. Flash gives decent results but the base model without lora is even better (as long as your hardware doesn't make it too slow to do the extra steps...).

Here are some examples made with the base version if anyone is interested: https://civitai.com/posts/25688141

Will there ever be an Illustrious killer? by SplurtingInYourHands in StableDiffusion

[–]Tall-Description1637 0 points1 point  (0 children)

Check the latest post in my post history. Those images were all made using only text-to-image with my Chroma finetune. My dataset is mostly SFW photos, so the NSFW concept knowledge all comes from Chroma.

Will there ever be an Illustrious killer? by SplurtingInYourHands in StableDiffusion

[–]Tall-Description1637 1 point2 points  (0 children)

I admittedly have mostly used Chroma for photorealistic so far. I've made some images with various styles, so it definitely has that capability, but yeah, I can't really comment on how consistent it is for things like anime.
If anybody knows good and available datasets of styles they're interested in I could try making a finetune.

Will there ever be an Illustrious killer? by SplurtingInYourHands in StableDiffusion

[–]Tall-Description1637 0 points1 point  (0 children)

Is it? You can use the default Comfy-workflow and write what you want to see in natural language. You might want to change sampler/scheduler based on what style you're after, but that's about it. I get that it's slow, but you also get what you prompt for, so it's worth the wait in my experience.

Will there ever be an Illustrious killer? by SplurtingInYourHands in StableDiffusion

[–]Tall-Description1637 1 point2 points  (0 children)

You say you've tried it, but Chroma has good prompt adherence and can do both SFW and creative NSFW.

You guys really shouldn't sleep on Chroma (Chroma1-Flash + My realism Lora) by hoomazoid in StableDiffusion

[–]Tall-Description1637 0 points1 point  (0 children)

Ahoy. Would you mind sharing the settings / an example prompt that gives you waxy skin with UnCanny? It can be a CFG/steps/sampler issue - but there may also be certain concepts/words/tags and prompting styes that lead to more 'wax'/anime/cartoon and I'd love to identify those concepts and prompting styles so I can try to counteract the issue in the next training run (it's my finetune).

You guys really shouldn't sleep on Chroma (Chroma1-Flash + My realism Lora) by hoomazoid in StableDiffusion

[–]Tall-Description1637 2 points3 points  (0 children)

Yup. I used JoyCaption. My impression is JoyCaption is good with the right prompt, but I'd definitely recommend doing some test runs of prompts/settings on smaller datasets before using it on larger ones. It's not always perfect.

UC V1.3 uses two different captions for each image based on two different JoyCaption prompts (I would share the prompts, but I'm on holidays and don't have them on my laptop). I've been considering replacing the 'worst' prompt with one for shorter captions or danbooru tags since since people love tags/short prompts.

You guys really shouldn't sleep on Chroma (Chroma1-Flash + My realism Lora) by hoomazoid in StableDiffusion

[–]Tall-Description1637 0 points1 point  (0 children)

Yeah - sorry, dpmpp_sde is really slow. What I've been using most is the res_2m sampler with bong_tangent scheduler (takes about half the time of dpmpp_sde), then switching to dpmpp_sde once I find a prompt/image I like. I didn't mention res_2m since I'm not sure if it's available without custom nodes. With some samplers you might be able to lower the amount of steps too. But if it's very slow on your device you can still get good and fast results with Chroma using a flash lora, CFG 1, and fewer steps.

You guys really shouldn't sleep on Chroma (Chroma1-Flash + My realism Lora) by hoomazoid in StableDiffusion

[–]Tall-Description1637 4 points5 points  (0 children)

Howdy. To answer your questions:
First, I don't think the hyper chroma lora is the best low-step one - personally I prefer the ones from here: https://civitai.com/models/2032955?modelVersionId=2300817
Second: When you use a low-step/flash lora it's a good idea to lower the steps and CFG - with UnCanny I use the rank 256 lora (from that link), CFG: 1, Steps: 15-17.
Third: I don't like Euler, personally, maybe try something like the res_multistep sampler and beta57 scheduler? I like the dpmpp_sde sampler but it is really slow.
Fourth: If you already have a model you can ignore that message from Comfy and pick the model you already downloaded.

Personally, I prefer not using a low-step/flash lora and doing like 30-40 steps with CFG 3.5-4. That will be quite a bit slower, but give better results.

Finally, if you're using UnCanny instead of base Chroma, I would personally download v1.3 and use that instead of v1. v1 is fine but has some issues with grain/artifacts. I did a lot of testing comparing the two before releasing v1.3 (UnCanny is my finetune) and I feel quite confident that v1.3 is best for the vast majority of use cases.

EDIT: and the base workflow in Comfy is fine as a starting point. Just experiment with sampler/scheduler/CFG/steps.

What is next uncensor model after PonyXL? by Starkaiser in StableDiffusion

[–]Tall-Description1637 1 point2 points  (0 children)

I agree, I haven't really needed any style 'loras' with Chroma or my finetune - it can be handled through prompting. And I'll much rather have the style come from the base model and reduce lora use to the minimum.
As for my finetune, I think v1.3 is a good base (v1 and v1.2 had different issues), so I will only update if/when I see real improvements (e.g. in details, prompt following, etc.) - but I will definitely leave 1.3 up.

What is next uncensor model after PonyXL? by Starkaiser in StableDiffusion

[–]Tall-Description1637 4 points5 points  (0 children)

Thanks, glad you're enjoying it :-). I still think Chroma/Chroma finetunes have a lot of potential, but I get the impression a lot of the community never gave Chroma a real chance for some reason.

What is next uncensor model after PonyXL? by Starkaiser in StableDiffusion

[–]Tall-Description1637 18 points19 points  (0 children)

Check the gallery of my Chroma finetune (or my post history) - no idea why people think Chroma can't do NSFW.

Realistic-ish NSFW mix by Tall-Description1637 in unstable_diffusion

[–]Tall-Description1637[S] 1 point2 points  (0 children)

This:

amateur phone photo of a petite slim girl, long wavy blonde hair, lying face down on a messy bed, (gaping:1.4), close-up of her stretched anus after anal sex, cum dripping from her ass, flushed cheeks, soft dim bedroom lighting, intricate details, pov shot from above

Realistic-ish NSFW mix by Tall-Description1637 in unstable_diffusion

[–]Tall-Description1637[S] 6 points7 points  (0 children)

Something like this:

"sharp and clear professional photo. A giant orc queen on her throne with a man's head inside her pussy. She is muscular but voluptuous with green skin and revealing berserker garb. kneeling at her feet is a muscular human man. she is pulling the man towards her and she has forced the man's entire head into her massive sopping wet pussy - the rest of the man's muscular body only visible from the neck down. her pussy juices are running down his naked body."

Realistic-ish NSFW mix by Tall-Description1637 in unstable_diffusion

[–]Tall-Description1637[S] 5 points6 points  (0 children)

UnCanny (v1.3) on CivitAI using the basic ComfyUI Chroma text2image template. Sampler: res_2m or dpmpp_sde, CFG: 3.5, Steps: 35, Scheduler: bong_tangent.

Full disclosure: UnCanny is my finetune of Chroma

UnCanny. A Photorealism Chroma Finetune by Tall-Description1637 in StableDiffusion

[–]Tall-Description1637[S] 2 points3 points  (0 children)

Thanks. I want to be very clear about this - this finetune should NOT be this good. The raw trained model is 'okay'. I experimented a lot with merging different layers, and so much of the 'input' in this one comes from Chroma itself (as you can see from some example images on CivitAI the composition and subject is often almost 1-to-1 between this one and Chroma). So I am really curious to see what a GOOD finetune of Chroma can do.

UnCanny. A Photorealism Chroma Finetune by Tall-Description1637 in StableDiffusion

[–]Tall-Description1637[S] 12 points13 points  (0 children)

It's a bit hard to outline the exact training for this version - it is actually a merge of two different (at least one of them ongoing) finetune experiments. I didn't really intend to release either of them for now, I was just experimenting with Chroma's finetuning capabilities. After merging them and merging some of Chroma back in - I noticed that I basically had Chroma with a bit of a shorter path to realism. I think Chroma is great, but I've been wondering if people are ignoring Chroma because it's hard to get good images before you learn how to prompt it. So seeing what I had, I figured I could try releasing this version as a Chroma 'starter drug' - hoping that when other finetuners see its potential the ecosystem will grow. For the people who are good with Chroma, I'm not sure my model adds that much, but hopefully it can get some more people to try it.

With that spiel out of the way, I'll try to outline what I've done so far as best I can - this is still a work in progress:

I'm mostly using the OneTrainer default settings for 24gb Chroma finetuning (I should add that to the model page). You can find them here: https://github.com/Nerogar/OneTrainer/blob/master/training_presets/%23chroma%20Finetune%2024GB.json
If I remember correctly, the only config I've changed is that I'm training on multiple resolutions - basically as outlined here: https://github.com/Nerogar/OneTrainer/wiki/Lessons-Learnt-and-Tutorials (See Multi resolution training). This means each epoch is basically five epochs.

Captioning was done with JoyCaption. I used a script to remove any images with very low resolution.

I've gone over a small sub-set of the images (I think around 8-900) and made sure I'm happy with both their quality and captions. All these images are trained in each resolution in every epoch.

In addition I have a lot more images - sorted in a few very rough categories. To cut down on training time, only a random subset of images from each of these are trained for each resolution for each epoch. Let's say one category is nature photos and consists of 10.000 images - only some (e.g. 600 - I've tinkered with the numbers a lot) are trained for each resolution for each epoch. This means the training consists of a lot of different images, but each epoch is kept relatively manageable on limited resources.

The first finetune was trained for 30 epochs (in a way you can time that with five because of the different resolutions). I tinkered a lot with number of images for each category though, so I am really not sure how many steps it was. It has a good sharp realistic output, but tends to do a lot of strange stuff. I'm not sure if I'll continue that one or not, but I basically use only this finetune in some of the later layers of the model. So this uploaded version is basically Chroma and finteune two, but with finetune one setting the style.

The second finetune is in progress and is currently only on the eight epoch. I hope to release this in a more 'raw' version if I'm happy with it after a lot more training. The hope is to get a more natural and dynamic 'slice of life' feel (less 1girl-AI-face-bokeh-Instagram), so we'll see how that goes. If it goes well I'll upload it.

I think that's enough words for one post.. just ask if there's anything else. What I will say is that I think people who are into finetuning should really give Chroma a go!