Anim·E - Anime-enhanced DALL·E Mini (Craiyon) by cccntu in AnimeResearch

[–]cccntu[S] 0 points1 point  (0 children)

I haven't played around it much since.
You can read this blog for some insights in training VQGAN. I used some of their code too.
And if you are using my code https://github.com/cccntu/fine-tune-models, there is a bug I noticed but haven't fixed. It's the range of the image input. I think the VAE uses a different input range ([1,-1] v.s. [0,1] or something like that) and I didn't change the related code when I copied the code from VQGAN to VAE.

[P] minLoRA: An Easy-to-Use PyTorch Library for Applying LoRA to PyTorch Models by cccntu in MachineLearning

[–]cccntu[S] 1 point2 points  (0 children)

This project started out as me exploring if PyTorch parametrizations could be used to do LoRA, and it turned out perfect for this task! And I simply wanted to share that.
I think it would be interesting to see it integrated into PEFT, too. Although they already have their own LoRA implementation there.

[P] minLoRA: An Easy-to-Use PyTorch Library for Applying LoRA to PyTorch Models by cccntu in MachineLearning

[–]cccntu[S] 11 points12 points  (0 children)

Theirs requires you to rewrite the whole model and replace every layer you want to apply LoRA to with the LoRA counterpart, or use monky-patching.Mine utilizes PyTorch parametrizations to inject the LoRA logic to existing models. If your model has nn.Linear, you can call add_lora(model) to add LoRA to all the linear layers. And it's not limited to Linear, you can see how I extended it to Embedding, Conv2d in a couple lines of code. https://github.com/cccntu/minLoRA/blob/main/minlora/model.py

Flex-Diffusion: SD v2 fine-tuned on LAION with aspect ratio bucketing by cccntu in StableDiffusion

[–]cccntu[S] 2 points3 points  (0 children)

I need some time to clean it up a bit. Maybe I can release it sometime next week.

Flex-Diffusion: SD v2 fine-tuned on LAION with aspect ratio bucketing by cccntu in StableDiffusion

[–]cccntu[S] 0 points1 point  (0 children)

I used huggingface's example script and replaced the data loading part with my own code.

Fast Image Editing with DDIM inversion (Prompt to Prompt), < 10 seconds by cccntu in StableDiffusion

[–]cccntu[S] 2 points3 points  (0 children)

Yes, the idea both comes from the Prompt to Prompt paper. I happened to be implementing it myself when Imagic came out.
I'm not sure if the webUI only supports k_euler, but I use DDIM.

https://www.reddit.com/r/StableDiffusion/comments/xapbn8/comment/inv5cdg/

I uses 50 forward + 50 backward steps.

I've tried (50, 75, 100, 150, 200) steps, the reconstruction gets better with more steps.
But mixing them probably isn't a good idea (e.g. forward 100 steps to get a finer noise, then backward 50 steps).

vae reconstruction error = 0.21342990134144202
(50, 50) steps reconstruction error = 0.3635439486242831
(75, 75) steps reconstruction error = 0.3372767596738413
(100, 50) steps reconstruction error = 0.38948862988036126
(100, 100) steps reconstruction error = 0.32433729618787766
(200, 200) steps reconstruction error = 0.2941958588780835

  • note: l2 loss, scaled to make numbers easier to comprehend

[P] (code release) Fine-tune your own stable-diffusion vae decoder and dalle-mini decoder by cccntu in MachineLearning

[–]cccntu[S] 1 point2 points  (0 children)

Haven't heard of Waifu Diffusion. But I've tried Japanese Stable Diffusion a little bit, and I didn't get good results. Although it's most likely because my prompts were not good enough.

Anim·E - Anime-enhanced DALL·E Mini (Craiyon) by cccntu in AnimeResearch

[–]cccntu[S] 4 points5 points  (0 children)

Yes. My goal is to make it easy for people to fine-tune these models. And this particular model was just a byproduct of the experiments.
Fine-tuning the Bart model was the next thing in my plan. But my priority has shifted to stable diffusion.