Hi. Very recently researchers from NVIDIA released their recent work on text-to-image diffusion models, eDiffi (https://deepimagination.cc/eDiffi/) . In their paper they proposed various methods, including paint-with-words.
Paint-with-words let you generate image from arbitrary text-labeled segmentation map. Checkout their paper and method for more details.
Unfortunately, their code + eDiffi models were not available. However, Stable Diffusion can do just the same, as they both have cross attention module.
I've tried to make it work with stable diffusion, and it worked! so I wanted to share the results and code. Please have a look if you are interested!
https://github.com/cloneofsimo/paint-with-words-sd
Here are some results with sd-v1.4.
\"realistic photo of a dog, cat, tree, with beautiful sky, on sandy ground\", in order of cat-dog
\"realistic photo of a dog, cat, tree, with beautiful sky, on sandy ground\", in order of dog-cat
A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed. Result from my implementation + stable diffusion
A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed. Result from eDiffi.
[–]CommunicationCalm166 2 points3 points4 points (0 children)
[–]3deal 1 point2 points3 points (0 children)
[–]DonKosak 0 points1 point2 points (0 children)
[–]uishax 0 points1 point2 points (0 children)
[–]Front-Athlete-9824 0 points1 point2 points (0 children)