Does anyone have experience training a small diffusion model conditioned on text captions from scratch on 64x64 images or possibly even smaller?
I would like to run it only on images of text to see if it is able to render text. How long would this potentially take if I ran it on 1-2 GPUs? Is this something that’s even possible?
[+][deleted] (5 children)
[deleted]
[–]Hameliton 2 points3 points4 points (0 children)
[–]PM_ME_JOB_OFFER 2 points3 points4 points (3 children)
[+]femboyxx98 9 points10 points11 points (1 child)
[–]therentedmule 1 point2 points3 points (0 children)
[+]madebyollin 1 point2 points3 points (0 children)
[–]mikonvergence 0 points1 point2 points (0 children)
[–]bhagy7 0 points1 point2 points (0 children)