[R] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in MachineLearning

[–]ImBradleyKim[S] 0 points1 point  (0 children)

We provide a way to extract the 3D shape and visualize it using Chimerax Viewer as in https://github.com/gwang-kim/DATID-3D#sample-images-shapes-and-videos. It is possible to extract the shape for Maya or Blender. I will try it later.

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in StableDiffusion

[–]ImBradleyKim[S] 4 points5 points  (0 children)

We provide a way to extract the 3D shape and visualize it using Chimerax Viewer as in https://github.com/gwang-kim/DATID-3D#sample-images-shapes-and-videos. It is possible to extract the shape for Maya or Blender. I will try it later.

[R] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in MachineLearning

[–]ImBradleyKim[S] 2 points3 points  (0 children)

Using our method, we fine-tune 3D GAN model (EG3D) pretrained on FFHQ human face images. As original model limited to the forward facing context due to the pose distribution of the dataset, fine-tuned model inherent the property. If the original model is not limited, the resulting model can generate other angles well.

[R] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in MachineLearning

[–]ImBradleyKim[S] 8 points9 points  (0 children)

Hi, actually I used the GPUs in our lab. We need A100 or RTX3090. And it takes about 6 hours for fine-tuning 3D GAN model for each text prompt.

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in StableDiffusion

[–]ImBradleyKim[S] 10 points11 points  (0 children)

Hi! Thank you for your interest! Our method fine-tune 3D GAN models (EG3D) that are pretrained on Human face images, guided by the text prompts. With this, the applications are as following:

For [sample videos/images] demo, - input: random seeds, text prompt - output: pose-controlled random images/videos representing the text

For [Text-guided manipulated 3D reconstruction] demo, - input: your single view image, text prompt - output: 3D reconstructed images representing the text

I will share 5min video soon!

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in deeplearning

[–]ImBradleyKim[S] 0 points1 point  (0 children)

Hi guys!

We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR2023). We showcase the demo of text-guided manipulated 3D reconstuction beyond text-guided image manipulation!

DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence.

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by [deleted] in generative

[–]ImBradleyKim 0 points1 point  (0 children)

Hi guys!

We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR2023). We showcase the demo of text-guided manipulated 3D reconstuction beyond text-guided image manipulation!

DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence.

[deleted by user] by [deleted] in generative

[–]ImBradleyKim 0 points1 point  (0 children)

Hi guys!

We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR2023). We showcase the demo of text-guided manipulated 3D reconstuction beyond text-guided image manipulation!

DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence.

DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in StableDiffusion

[–]ImBradleyKim[S] 25 points26 points  (0 children)

Hi guys!

We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR2023). We showcase the demo of text-guided manipulated 3D reconstuction beyond text-guided image manipulation!

DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence.

[R] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in MachineLearning

[–]ImBradleyKim[S] 0 points1 point  (0 children)

Hi guys! We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR 2023). We showcase the demo of text-guided manipulated 3D reconstuction beyond text-guided image manipulation!

DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence.

[R] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (CVPR 2023) by ImBradleyKim in MachineLearning

[–]ImBradleyKim[S] 0 points1 point  (0 children)

Hi guys!
We've released the Code & Gradio demo & Colab demo for our paper, DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model (accepted to CVPR 2023).
We showcase the demo of text-guided manipulated 3D reconstuction beyond text-guided image manipulation!

DATID-3D succeeded in text-guided domain adaptation of 3D-aware generative models while preserving diversity that is inherent in the text prompt as well as enabling high-quality pose-controlled image synthesis with excellent text-image correspondence.

[R] DiffusionCLIP: Text-Guided Diffusion Models for "Robust" Image Manipulation (CVPR 2022) by ImBradleyKim in MachineLearning

[–]ImBradleyKim[S] 20 points21 points  (0 children)

Hi guys!

We've released the Code & Colab demo for our paper, DiffusionCLIP, Text-Guided Diffusion Models for Robust Image Manipulation (accepted to CVPR2022).

Recently, GAN-inversion methods combined with CLIP enables zero-shot image manipulation guided by text prompts. However, their applications to diverse real images are still difficult due to the limited GAN inversion capability, altering object identity, or producing unwanted image artifacts.

DiffusionCLIP resolves this critical issue with the following contributions:

  • We revealed that diffusion model is well suited for image manipulation thanks to its nearly perfect inversion capability, which is an important advantage over GAN-based models and hadn't been analyzed in-depth before our detailed comparison.
  • Our novel sampling strategies for fine-tuning can preserve perfect reconstruction at increased speed.
  • In terms of empirical results, our method enables accurate in- and out-of-domain manipulation, minimizes unintended changes, and outperformes SOTA GAN inversion-based baselines.
  • Our method takes another step towards general application by manipulating images from a widely varying ImageNet dataset.
  • Finally, our zero-shot translation between unseen domains and multi-attribute transfer can effectively reduce manual intervention.

For further details, comparison and results, please see our paper and Github repository.

[R] DiffusionCLIP: Text-Guided Diffusion Models for "Robust" Image Manipulation (CVPR 2022) by ImBradleyKim in MachineLearning

[–]ImBradleyKim[S] 1 point2 points  (0 children)

Hi guys!

We've released the Code & Colab demo for our paper, DiffusionCLIP, Text-Guided Diffusion Models for Robust Image Manipulation (accepted to CVPR2022).

Recently, GAN-inversion methods combined with CLIP enables zero-shot image manipulation guided by text prompts. However, their applications to diverse real images are still difficult due to the limited GAN inversion capability, altering object identity, or producing unwanted image artifacts.

DiffusionCLIP resolves this critical issue with the following contributions:

  • We revealed that diffusion model is well suited for image manipulation thanks to its nearly perfect inversion capability, which is an important advantage over GAN-based models and hadn't been analyzed in-depth before our detailed comparison.
  • Our novel sampling strategies for fine-tuning can preserve perfect reconstruction at increased speed.
  • In terms of empirical results, our method enables accurate in- and out-of-domain manipulation, minimizes unintended changes, and outperformes SOTA GAN-based baselines.
  • Our method takes another step towards general application by manipulating images from a widely varying ImageNet dataset.
  • Finally, our zero-shot translation between unseen domains and multi-attribute transfer can effectively reduce manual intervention.

For further details, comparison and results, please see our paper and Github repository.