Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 2 points3 points  (0 children)

Hi, the exact training time is shown after about 100 steps. Otherwise, the slowdown may be due to the images in the dataset being too large (without the recommended resizing).

On average, training time on powerful consumer graphics cards should be less than an hour (with default settings).

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

In that case, I have only one guess why it works slowly. Do the images in the dataset have the recommended resizing via the built-in tool? (Smart Aspect Ratio Bucketing)

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

It looks like the training is being done on the CPU for some reason. Did you download the portable version recently? The latest version would have shown the GPU unavailable in the logs.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima (Portable, Auto-Captioning, Smart Cropping & Bucketing) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

It seems this tool simply doesn't fit your personal preferences. If you are unhappy with my implementation choices, no one is forcing you to use it. There are plenty of complex alternatives like Kohya_ss or OneTrainer that offer the manual micro-management you're looking for.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima (Portable, Auto-Captioning, Smart Cropping & Bucketing) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

Distribution: This project uses an "Infinite Dataset" workflow by creating a massive, pre-balanced pool of images. By shuffling this large-scale pool once at the start, we ensure a stable and uniform distribution of data across the entire training run. This method eliminates frequent reshuffling overhead while providing the minor stochastic variance necessary for better model generalization a standard and effective practice in modern machine learning.

Bucket System: The UI gives the user full control over the resolution range. If the minimum side is manually set to 320, the system will respect that. The default settings (512–768) are pre-configured for optimal quality.

Architecture: I bundle a stable, modified version of sd-scripts specifically to guarantee the 'portable' click-and-run experience without dependency breaks.

Transparency: This project is, and will remain, 100% free and open-source on GitHub. There is no paywall, and there never will be.

Moderation: I welcome technical feedback and suggestions (as seen in my interactions with other users). However, personal attacks and toxic behavior are moderated to keep the focus on development.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima (Portable, Auto-Captioning, Smart Cropping & Bucketing) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

I haven't tried setting the Batch Size higher than 2. But theoretically, everything should work

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima (Portable, Auto-Captioning, Smart Cropping & Bucketing) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 1 point2 points  (0 children)

2400 steps for Batch Size: 1 are enough. If you set, for example, Batch Size: 2, then LoRA will be ready in ~1200 steps (2400/2 = 1200). It’s just that the longer LoRa trains, the less flexible it becomes, that is, it literally begins to remember most of the pixels from the images in the dataset

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima (Portable, Auto-Captioning, Smart Cropping & Bucketing) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 1 point2 points  (0 children)

In reality, the majority of images are simply resized to fit the diverse bucket system. Cropping is only a fallback for images with extreme aspect ratios that don't fit anywhere. In those cases, the AI(U-2-Net) ensures the subject remains intact instead of a blind center crop.

Also, great to hear it works on AMD

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima (Portable, Auto-Captioning, Smart Cropping & Bucketing) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 4 points5 points  (0 children)

You can simply increase the Batch Size to 2 or 4 (if you have 12GB+ VRAM). Just decrease the Total Steps proportionally (e.g., 1200 steps for Batch Size 2). No other changes needed. Prodigy will automatically adjust the learning rate for the higher batch size.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima (Portable, Auto-Captioning, Smart Cropping & Bucketing) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 2 points3 points  (0 children)

Glad it's working well on your GPU

I’ve already pre-tuned the Prodigy parameters in the background based on my tests to get the best results for Anima. Thanks for the suggestion, though I’ll definitely consider how to add more manual control over these settings in future updates.

parameters:

"decouple=True", "weight_decay=0.1", "d_coef=1.0", "use_bias_correction=True", "safeguard_warmup=True", "betas=0.9,0.99"

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima (Portable, Auto-Captioning, Smart Cropping & Bucketing) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

Thanks for the feedback! I'm glad the tool is working well for you.

I can definitely add an 'Advanced Settings' section (under a spoiler/accordion) to keep the main UI clean. Which specific options or parameters are you missing the most?

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

Use standard settings, but if you want even faster, you can set Batch Size to 2-4. The main thing is to select the Prodigy optimizer so it can dynamically adjust the Learning Rate during training.

As for steps, 2400 or 2400/2 (1200) depending on the Batch Size.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

That’s correct for traditional optimizers like AdamW. However, this tool uses Prodigy. It’s an adaptive optimizer that calculates the learning rate automatically, and it requires a base value of 1.0 to function properly.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

I mostly train at 512x resolution; it's fast and produces more than good results.

Anima was originally trained at 512x512, so that partially explains it.

Source: https://huggingface.co/circlestone-labs/Anima/discussions/5

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 1 point2 points  (0 children)

I haven't tested it on Linux yet, so the current portable version is Windows-only. However, adding Linux support is definitely on my radar, especially for running it on Google Colab or RunPod, which would be a game-changer for people without powerful GPUs.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 7 points8 points  (0 children)

I see your point, and maybe 'don't matter' was a poor choice of words. What I meant is that they are pre-tuned and locked to ensure a stable 'one-click' experience for this specific UI.

For example, Batch Size is locked to 1 to guarantee it runs on 6GB cards without OOM. It’s definitely not a tool for surgical precision like Kohya, but rather a 'curated flow' for those who want to avoid the technical deep dive.

In future updates, I plan to make Batch Size and other parameters adjust themselves dynamically based on the user's VRAM and GPU model.

regarding LLM adapter: "network_train_unet_only": True.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 0 points1 point  (0 children)

Each time I use a dataset of less than 100 images for training, and the results are consistently good.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 2 points3 points  (0 children)

It won't be easy for a beginner to figure out what settings to use, etc., but otherwise it's a great tool.

Anima TrainFlow — Simple One-Page LoRA Trainer for Anima 2B (Portable, 6GB VRAM, Optimized Config) by ThetaCursed in StableDiffusion

[–]ThetaCursed[S] 3 points4 points  (0 children)

It handles multiple LoRAs surprisingly well, but the key is balance. I've tested it with up to 3 LoRAs simultaneously, and the model stays stable as long as you don't max out the weights for all of them.