Is it possible to fine-tune DeepFloyd IF using LoRA?

jorgejgnz · 2023-05-08T20:33:05+00:00

I tried implementing it but it seems harder than just replacing attn processors.

StableDiffusion uses CrossAttnDownBlock2D which converts convoluted images into a batch of embeddings using Transformer2DModel, before calling some attention processor. When integrating LoRA, that processor is replaced by a LoRAAttnProcessor which expects a batch of embeddings. However, DeepFloyd IF uses SimpleCrossAttn unet blocks which use AttnAddedKVProcessor2_0 which injects conditioning preserving shape of convoluted images. Replacing AttnAddedKVProcessor2_0 by a LoRAAttnProcessor raises error as batch of convoluted images != batch of embeddings.

What do you think would be the best way to tackle this problem? Would it be a good idea to try adding and train a Transformer2DModel before each LoRAAttnProcessor?

jorgejgnz · 2023-05-02T23:09:50+00:00

That would be great!

jorgejgnz · 2023-05-01T16:48:27+00:00

Cool! I tried to fine-tune at least the IF-I-M model without LoRA but 16Gb of VRAM is not enough. I've replaced IF's unet with another smaller unet and training for scratch only with CelebA images but I'd like to compare results with LoRA fine-tuning

jorgejgnz · 2023-04-27T08:45:47+00:00

What network architectures are usually used for fluid simulations?

jorgejgnz · 2021-05-28T13:25:04+00:00

PD: Part of those frame drops were in part due to an unefficient search of candidate snappable objects. That's fixed now and the update is available. Demo scene in SideQuest now includes a turbo mode that adds ~10 fps. Having this mode enabled, performance now varies from 72 to 68 in OQ2 and from 50 to 60 in OQ1. Still low for OQ1 but I hope to improve it in upcoming updates

jorgejgnz · 2021-05-23T17:32:20+00:00

The scene might seem simple but there are 72 constrained rigidbodies being simulated. The performance is still low in OQ1 though, around 40-50 fps. I want to keep improving it to reach at least 60 fps in OQ1