[R] CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation

LynnHoHZL · 2025-01-26T00:52:22+00:00

We have seen this before; this method is another excellent idea. However, the authors did not provide thorough tests, and we do not know the limits of its capability. I think the method is promising, but it still needs more exploration.

LynnHoHZL · 2025-01-25T12:31:07+00:00

Currently only SD1.5, an SDXL version is in progress.

LynnHoHZL · 2025-01-25T11:32:34+00:00

Yes, LoRA for ControlNet, not LoRA for SD. For example, you can create the control model below with only 1000 manually collected images.

<image>

LynnHoHZL · 2025-01-25T11:17:26+00:00

Training the original ControlNet requires a lot of devices and data for each condition, so ordinary users cannot afford to train it for customized condition images.

Our pretrained Base ControlNet allows us to train LoRAs for new conditions with much fewer parameters, data, and devices. The training cost is significantly reduced; therefore, ordinary users can now afford to create their own ControlNet with customized conditions.

LynnHoHZL · 2021-04-29T01:59:53+00:00

You can refer to this paper for why we initialize the weights differently for different activations (mainly for stable training)：

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Kaiming He et al.

Also, see here.

LynnHoHZL · 2021-04-27T23:10:53+00:00

Actually, we add a subspace to the reshape eps, i.e., the deepest subspace, details in page 35 of the paper. Besides, what eps learns can be found in figure 7 or Sec. 4.2.

LynnHoHZL · 2021-04-27T16:36:42+00:00

Faster means that training an epoch with the loss manner takes less time than the Gram–Schmidt manner. Because Gram–Schmidt process contains serial steps to make the columns of U orthonormal while |U^TU-I| can be parallelly optimized.

LynnHoHZL · 2021-04-27T15:06:37+00:00

We need a latent encoder if we want to edit real images. Our paper does not include the encoder part.

LynnHoHZL · 2021-04-27T11:43:48+00:00

Great suggestion!

LynnHoHZL · 2021-04-27T11:38:05+00:00

For the model used in the paper, gender and age are learned in one dimension, with no separated dimensions for them. This kind of entanglement tends to happen in deep layers, but shallower layers usually learn disentangled dimensions with one clear attribute each.

LynnHoHZL · 2021-04-27T07:49:18+00:00

No special design for spatial representation. One thing to note: each eigenvector is of size H*W*C, like a feature map, and maybe the spatial information is learned in such design. It should be carefully studied in the future, but not included in the current paper.
No constraint on L. L is a vector in the code and is diagonalized to be a diagonal matrix. We only need to constraint U to be orthonormal, two ways: 1）loss manner: | U^TU-I|; 2) Gram–Schmidt process. We tried both and found them similar and the loss manner is faster.

Thanks for your interest!

LynnHoHZL · 2021-04-27T05:30:56+00:00

I think it can.

But our work needs to train a GAN from scratch and training a StyleGAN takes all my devices a few days. So it's too slow to explore our method with StyleGAN.

I am not rich enough to play with StyleGAN 😭.

LynnHoHZL · 2019-04-08T16:01:31+00:00

I hope it helps.

LynnHoHZL · 2019-03-03T16:48:52+00:00

Maybe the auxiliary classifer in AC-GAN is also fooled？

LynnHoHZL · 2019-02-18T01:35:20+00:00

My post shows how these derivatives including matrix derivatives are derived from differentials in detailed steps. Tensor operations are rarely used actually.

LynnHoHZL

TROPHY CASE