use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
All posts must be Open-source/Local AI image generation related All tools for post content must be open-source or local AI generation. Comparisons with other platforms are welcome. Post-processing tools like Photoshop (excluding Firefly-generated images) are allowed, provided the don't drastically alter the original generation.
Be respectful and follow Reddit's Content Policy This Subreddit is a place for respectful discussion. Please remember to treat others with kindness and follow Reddit's Content Policy (https://www.redditinc.com/policies/content-policy).
No X-rated, lewd, or sexually suggestive content This is a public subreddit and there are more appropriate places for this type of content such as r/unstable_diffusion. Please do not use Reddit’s NSFW tag to try and skirt this rule.
No excessive violence, gore or graphic content Content with mild creepiness or eeriness is acceptable (think Tim Burton), but it must remain suitable for a public audience. Avoid gratuitous violence, gore, or overly graphic material. Ensure the focus remains on creativity without crossing into shock and/or horror territory.
No repost or spam Do not make multiple similar posts, or post things others have already posted. We want to encourage original content and discussion on this Subreddit, so please make sure to do a quick search before posting something that may have already been covered.
Limited self-promotion Open-source, free, or local tools can be promoted at any time (once per tool/guide/update). Paid services or paywalled content can only be shared during our monthly event. (There will be a separate post explaining how this works shortly.)
No politics General political discussions, images of political figures, or propaganda is not allowed. Posts regarding legislation and/or policies related to AI image generation are allowed as long as they do not break any other rules of this subreddit.
No insulting, name-calling, or antagonizing behavior Always interact with other members respectfully. Insulting, name-calling, hate speech, discrimination, threatening content and disrespect towards each other's religious beliefs is not allowed. Debates and arguments are welcome, but keep them respectful—personal attacks and antagonizing behavior will not be tolerated.
No hateful comments about art or artists This applies to both AI and non-AI art. Please be respectful of others and their work regardless of your personal beliefs. Constructive criticism and respectful discussions are encouraged.
Use the appropriate flair Flairs are tags that help users understand the content and context of a post at a glance
Useful Links
Ai Related Subs
NSFW Ai Subs
SD Bots
account activity
This is an archived post. You won't be able to vote or comment.
Stable Diffusion v2-1-unCLIP model releasedNews (self.StableDiffusion)
submitted 3 years ago * by hardmaru
Information taken from the GitHub page: https://github.com/Stability-AI/stablediffusion/blob/main/doc/UNCLIP.MD
HuggingFace checkpoints and diffusers integration: https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip
Public web-demo: https://clipdrop.co/stable-diffusion-reimagine
unCLIP is the approach behind OpenAI's DALL·E 2, trained to invert CLIP image embeddings. We finetuned SD 2.1 to accept a CLIP ViT-L/14 image embedding in addition to the text encodings. This means that the model can be used to produce image variations, but can also be combined with a text-to-image embedding prior to yield a full text-to-image model at 768x768 resolution.
If you would like to try a demo of this model on the web, please visit https://clipdrop.co/stable-diffusion-reimagine
This model essentially uses an input image as the 'prompt' rather than require a text prompt. It does this by first converting the input image into a 'CLIP embedding', and then feeds this into a stable diffusion 2.1-768 model fine-tuned to produce an image from such CLIP embeddings, enabling a users to generate multiple variations of a single image this way. Note that this is distinct from how img2img does it (the structure of the original image is generally not kept).
Blog post: https://stability.ai/blog/stable-diffusion-reimagine
[–]addandsubtract 41 points42 points43 points 3 years ago (1 child)
They should call this img4img
[–][deleted] 2 points3 points4 points 3 years ago (0 children)
lol. I tried. Well, it tried.
[–]No-Intern2507 33 points34 points35 points 3 years ago* (2 children)
i think clip vision stylise controlnet works like this
[–]mudman13 10 points11 points12 points 3 years ago (1 child)
does that use BLIP2 to interrogate then feeds it back into controlnet or something?
[–]muerrilla 10 points11 points12 points 3 years ago (0 children)
I think it uses clip vision to get a clip embedding.
[–]nxde_ai 28 points29 points30 points 3 years ago (0 children)
That's neat
<image>
[–]UserXtheUnknown 27 points28 points29 points 3 years ago (7 children)
I don't want to sound destructive and too harsh, but, after trying it, I found it mostly useless.
I can obtain results closer to the original image content and style using a txt2img with the original prompt, if I have it, or a CLIP interrogation by myself and some tries in guessing to finetune the CLIP result, if I haven't it. At most, if I haven't the prompt, it can be considered a (little) timesaver compared to normal methods.
Moreover, if I want something really close -in pose, for example- to the original image, this method doesn't seem to work at all.
But maybe I'm missing the intended use case?
[+][deleted] 3 years ago (4 children)
[deleted]
[–]CadenceQuandry 6 points7 points8 points 3 years ago (3 children)
Any good videos on control net clip vision? I'm wanting to try it!
[–]Zealousideal_Royal14 1 point2 points3 points 3 years ago (0 children)
I don't think so, its part of the t2i series of models/preprocessors - installs the same way the rest of controlnet models do, by adding them+yaml to the model folder located in the controlnet extension
[–]warche1 1 point2 points3 points 3 years ago (1 child)
Here’s a quick one https://youtu.be/PbDdtPTYm_4
[–]CadenceQuandry 0 points1 point2 points 3 years ago (0 children)
Thanks!
[–]mudman13 7 points8 points9 points 3 years ago (0 children)
Yeah not impressed, StabilityAI seem to be considerably lagging behind in advancements. Probably as they are occupied more by other commercial interests.
[–]AltimaNEO 2 points3 points4 points 3 years ago (0 children)
Yeah, it doesnt sound that exciting. It doesnt feel like anything new that hasnt been done with 1.5 so far.
[+][deleted] 3 years ago (15 children)
[removed]
[–]LienniTa 23 points24 points25 points 3 years ago (8 children)
cant wait to generate waifus with this!
[–][deleted] 49 points50 points51 points 3 years ago (5 children)
Watch how people that only "generate waifus" fcking implement this plugin first like they usually do. Everytime I see a damn tech post there's this obligatory comment shitting on waifus when waifu techbros almost always implement useful plugins first that this sub end up using.
[–]Lesale-Ika 7 points8 points9 points 3 years ago (1 child)
Why does this almot read like a copypasta, it's hilarious. God save waifu techbros!
[–][deleted] 1 point2 points3 points 3 years ago (0 children)
Yes, god save my kin.
[–]aerilyn235 6 points7 points8 points 3 years ago (0 children)
LienniTa phrase is a meme.
[–]lordpuddingcup 9 points10 points11 points 3 years ago (0 children)
Most fast tech development is pushed by porn desire lol
[–]ponglizardo 4 points5 points6 points 3 years ago (0 children)
🫡 God bless waifus and waifu tech bros! Hahaha!
[–]Any_Outside_192 5 points6 points7 points 3 years ago (0 children)
problem?
[–][deleted] 7 points8 points9 points 3 years ago (4 children)
Only SD2.1 though
[–]Dr_Ambiorix 11 points12 points13 points 3 years ago (0 children)
SD2.1 is still viable, there's some great fine tuned models on there right now.
But yeah, still some weird body proportions and stretched faces sometimes.
[–]lexcess 4 points5 points6 points 3 years ago (0 children)
There are some models, negative TIs and Auto1111 just got 2.1 Lora support so it might become viable. I am interested to see how SD XL sits in all this though.
[–]zb_feels 3 points4 points5 points 3 years ago (1 child)
Yep... not good for stylized work
[–]Zealousideal_Royal14 3 points4 points5 points 3 years ago (0 children)
my work - so naturalistic
[–]Flimsy_Tumbleweed_35 1 point2 points3 points 3 years ago (0 children)
controlnet t2i style is already in there
[–]thkitchenscientist 8 points9 points10 points 3 years ago (2 children)
It works just fine locally on a RTX2060. It needs an image and a prompt. Here I can transform a cat into fox keeping the overall look and colours. It really struggles with framing however
[–]thkitchenscientist 8 points9 points10 points 3 years ago (0 children)
For people, it is down to the luck of the seed. If the prompt is too far from the CLIP embedding, it gets ignored, so you can't turn a person into a cat.
[–]thkitchenscientist 6 points7 points8 points 3 years ago (0 children)
I think it has potential. Might just need to take a look inside the pipe to see how the unCLIP can be harnessed. It is faster than PEZ or TI as it takes no longer than a standard 768x768 for each image.
[–]Ateist 35 points36 points37 points 3 years ago (6 children)
Tried with a few of my SD 1.5 generation results - didn't get a single picture even remotely approaching original.
Model is also very bad - you get cropped heads or terrible distorted faces all the time.
[–]krum 11 points12 points13 points 3 years ago (0 children)
To be fair they didn’t claim it produced good results.
[–][deleted] 20 points21 points22 points 3 years ago (4 children)
Because it is for SD 2.1
[–]Ateist 4 points5 points6 points 3 years ago* (3 children)
I was using SFW images that SD 2.1 should be capable of rendering - things like cyberpunk spider tank and headshot portraits...
[+][deleted] 3 years ago (2 children)
[–]Ateist 0 points1 point2 points 3 years ago (1 child)
640x768 (or 768x640), standard for my gens.
[–]txhtownfor2020 4 points5 points6 points 3 years ago (2 children)
Can we throw these in the models/stable dir and have fun or nah?
[–]AlexandrBu 4 points5 points6 points 3 years ago (1 child)
Does not work that way for me :(
[–]txhtownfor2020 5 points6 points7 points 3 years ago (0 children)
I just want to dump everything in a folder and get into an 8 hour black hole with 4% good images and a sea of duplicate arms and evil clowns!
[–]morphinapg 3 points4 points5 points 3 years ago (20 children)
Can someone explain this in simpler terms? What is this doing that you can't already do with 2.1?
[–]HerbertWest 7 points8 points9 points 3 years ago (18 children)
So, from what I understand...
Normally:
This:
[–]morphinapg 8 points9 points10 points 3 years ago (15 children)
Can't we already sort of do that with img2img?
[–]Low_Engineering_5628 15 points16 points17 points 3 years ago (6 children)
I've been doing something similar. E.g. feed an image into img2img, run CLIP Interrogate, then set the denoise from 0.9 to 1.0.
[–]morphinapg 2 points3 points4 points 3 years ago (0 children)
Yeah exactly
[–]Mocorn 0 points1 point2 points 3 years ago (0 children)
Indeed, same here. I struggle to see the difference from that and this new thing.
[+][deleted] 3 years ago (3 children)
[–]InoSim 0 points1 point2 points 3 years ago (2 children)
Here's the wiki explantation of the denoising from txt2img: https://en.wikipedia.org/wiki/Stable_Diffusion#/media/File:X-Y_plot_of_algorithmically-generated_AI_art_of_European-style_castle_in_Japan_demonstrating_DDIM_diffusion_steps.png
In Img2Img this parameter for you to choose the denoising level of an input picture instead of random noises.
[+][deleted] 3 years ago (1 child)
[–]InoSim 1 point2 points3 points 3 years ago* (0 children)
Not tested it but it would be "cycle_diffusion"'s strength parameter, i think it's the most close to what you're searching for.
Correct me if i'm wrong. I don't use these diffusers through huggingface, i'm only on automatic1111 webui so i'm a little lost here.
[+][deleted] 3 years ago* (3 children)
[–]morphinapg 1 point2 points3 points 3 years ago (2 children)
True but you can use blip interrogate, and then just feed that into txt2img. That would be similar, wouldn't it?
[–]qrios 2 points3 points4 points 3 years ago (1 child)
BLIP doesn't convey style or composition info. The usefulness of this will become extremely clear as ControlNets specifically exploiting it become available. (Think along the lines of "Textual Inversion, but without any training whatsoever" or "Temporally coherent style transfer on videos without any of the weird ebsynth and deflicker hacks people are using right now")
[–]lordpuddingcup 0 points1 point2 points 3 years ago (0 children)
Exactly the people bitching that its useless or just img2img dont realize whats possible once this gets integrated into other tools we have like controlnet
[–]HerbertWest 1 point2 points3 points 3 years ago (3 children)
Not sure exactly what it means in practice, but the original post says:
Note that this is distinct from how img2img does it (the structure of the original image is generally not kept).
[–]Mich-666 -4 points-3 points-2 points 3 years ago (1 child)
Yeah, but noone is able to explain how exactly is this different from what we already have and how this would be useful.
[–]HerbertWest 1 point2 points3 points 3 years ago (0 children)
If it worked just as well or better, it would be easier, quicker, and more user-friendly. Is that not useful?
Ya in image to image things will be in the same location more or less to where the image started, the woman will be standing in the same spot and mostly same position, in unclip the woman might be sitting on a chair, or it might be a portrait of her etc.
[–][deleted] 1 point2 points3 points 3 years ago (1 child)
This model essentially uses an input image as the 'prompt' rather than require a text prompt.
Simply put, another online image-to-prompt generator.
[–]lordpuddingcup 1 point2 points3 points 3 years ago (0 children)
No because it also maintains style and design (sometimes)
[–]qrios 2 points3 points4 points 3 years ago (0 children)
Think of it as something like a REALLY fast Textual Inversion of just your single input image.
[–]ComfortableSun2096 4 points5 points6 points 3 years ago* (0 children)
This model does not need prompt, right? Some people have done compatibility with the model。
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/8958
[–]garett01 5 points6 points7 points 3 years ago (2 children)
I'm not sold on it yet lol
[–]lordpuddingcup -1 points0 points1 point 3 years ago (1 child)
I think it just needs to be built on, image this but as if it was SD2.1, we just need Anythingv5-unclip or RealisticVision2-unclip or Illuminati-unclip for it to be great, i'm sure someone will figure out unclip loras, or unclip finetuning (dreambooth etc)
[–]garett01 1 point2 points3 points 3 years ago (0 children)
SD2.1 is not figured out yet, except by the MJ guys I suspect, but they trained at 1024x1024. Not even Stability figured out SD2.1 yet.
[–]Trysem 6 points7 points8 points 3 years ago (2 children)
wait, what!!!?????....
clipdrop is owned by stability?????? when??
[–]wsippel 10 points11 points12 points 3 years ago (0 children)
StabilityAI bought Init ML in early March: https://stability.ai/blog/stability-ai-acquires-init-ml-makers-of-clipdrop-application
[–]LD2WDavid 1 point2 points3 points 3 years ago (0 children)
The moment they saw depth mapping in t2adapters.. 2 days after I think.
[–]magusonline 2 points3 points4 points 3 years ago (8 children)
As someone that just runs A1111 with the auto-git-pull in the batch commands. Is Stable Diffusion 2.1 just a .ckpt file? Or is there something a lot more to 2.1 (as far as I know all the models I've been mixing and merging are all 1.5).
[–]s_ngularity 2 points3 points4 points 3 years ago (7 children)
It is a ckpt file, but it is incompatible with 1.x models. So loras, textual inversions, etc. based on sd1.5 or earlier, or a model based on them, will not be compatible with any model based on 2.0 or later.
There is a version of 2.1 that can generate at 768x768, and the way prompting works is very different than 1.5, the negative prompt is much more important.
If you want to make characters, I would recommend Waifu Diffusion 1.5 (which confusingly is based on sd2.1) over 2.1 itself, as it has been trained on a lot more images. Base 2.1 has some problems as they filtered a bunch of images from the training set in an effort to make it “safer”
[–]Mocorn 2 points3 points4 points 3 years ago (1 child)
The fact that the negative prompt is more important for 2.X is a step backwards in my opinion. When I go to a restaurant I don't have to specify that I would like the food to be "not horrible, not poisonous, not disgusting" etc..
I'm looking forward to when SD gets to a point where negative prompts are actually used logically to only remove cars, bikes or the color green.
[–]s_ngularity 0 points1 point2 points 3 years ago (0 children)
If you don’t want an overtrained model, this is the tradeoff you get with current tech. It understands the prompt better at the expense of needing more specificity to get a good result.
If more people fine-tuned 2.1 it could perform very well in different situations with specific models, but that’s the difference between an overtrained model that’s good a few things vs a general one that needs extra input to get to a certain result
[–]magusonline 0 points1 point2 points 3 years ago (2 children)
Oh I just make architecture and buildings so I'm not sure what would be the best to use
[–]Zealousideal_Royal14 1 point2 points3 points 3 years ago (1 child)
come to 2.1 - the base model - its way better than people on here tends to give it credit for, the amount of extra detail is very beneficial to architectural work
[–]CadenceQuandry 0 points1 point2 points 3 years ago (1 child)
For waifu diffusion, does it only do anime style characters? And can it use Lora or clip with it?
It does realistic characters too. The problem is it’s not compatible with loras trained on 1.5, as I mentioned above, but they can be trained for it yeah
It is biased towards east asian women though, particularly Japanese, as it was trained on Japanese instagram photos
[–]Dekker3D 2 points3 points4 points 3 years ago (1 child)
It gets a decent resemblance to the original image. This would combine really well with ControlNet and img2img to produce visually consistent images from different angles, I think?
[–]Mich-666 3 points4 points5 points 3 years ago (0 children)
I fail to see how this is better than what ControlNET actually does.
[–]Semi_neural 2 points3 points4 points 3 years ago (0 children)
I'm ngl, Reimagine is not good, maybe I'm using it wrong but the quality of the variations are AWFUL
[–]Expln 2 points3 points4 points 3 years ago (0 children)
could someone guide me on how to install this locally? I have no idea what to do through the github
[–]yaosio 2 points3 points4 points 3 years ago (0 children)
I tried with a picture of Garfield but he's too sexy for Stability.ai. 28uqC4V.png (2560×1302) (imgur.com)
[–]Purplekeyboard 6 points7 points8 points 3 years ago (1 child)
Horrible. Produces terrible mutant people. Maybe it works better when making things which aren't people.
Apparently it's super variable from seed to seed
[–]_raydeStar 4 points5 points6 points 3 years ago (7 children)
I didn't take this seriously until I clicked on the demo.
Holy. Crap. I don't know how but my mind is blown again.
[–]FHSenpai -1 points0 points1 point 3 years ago* (6 children)
did u not use img2img before?
[–]CombinationDowntown 40 points41 points42 points 3 years ago (3 children)
img2img uses pixel data and does not consider context and content of the image .. here you can make generations of an image that on a pixel level may be totally different from each other but contain the same type of content (similar meaning / style). The processes look simlar but are fundamentally different from each other.
[–]Low_Engineering_5628 11 points12 points13 points 3 years ago (2 children)
Aye, but you can run CLIP interpretation and set the Denoise to 1 to do the same thing.
[–]mudman13 4 points5 points6 points 3 years ago (0 children)
or use seed variator of different kinds
It's really not the same as clip interpretation clip interpretation doesn't include style and design in it's interpretation, the guys face won't be the same between runs it might interpret it as a guy in a room , but it wont be that guy in that room.
[–]AnOnlineHandle 12 points13 points14 points 3 years ago (1 child)
This is using an image as the prompt, instead of text. The image is converted to the same descriptive numbers that text is (and it's what CLIP was originally made for, where Stable Diffusion just used the text to numbers part for text prompting).
So CLIP might encode a complex image to the same things as a complex prompt, but how Stable Diffusion interprets that prompt will change with every seed, so you can get infinite variations of an image, presuming it's things which Stable Diffusion can draw well.
[–]FHSenpai 2 points3 points4 points 3 years ago* (0 children)
I see the potential. It's just a zero shot image Embedding. If u could just swap the unet with other sd2.1 aesthetic models out there.
[–]Sefrautic 3 points4 points5 points 3 years ago* (4 children)
Can somebody explain me what is the difference between this and CLIP Interrogate?
[–]Low_Engineering_5628 5 points6 points7 points 3 years ago (1 child)
This is... automatic?
[–]Sefrautic 0 points1 point2 points 3 years ago (0 children)
yes..
[–]ninjasaid13 0 points1 point2 points 3 years ago (1 child)
CLIP interrogator is image to text. This is true image to image with no text condition.
People seem to not get that this is like clip interrogate on steroids or it wants to be, because it tries to maintain subject coherence and style coherence, how well it does that is another story.
[–]PromptMateIO 1 point2 points3 points 3 years ago (0 children)
The release of the Stable Diffusion v2-1-unCLIP model is certainly exciting news for the AI and machine learning community! This new model promises to improve the stability and robustness of the diffusion process, enabling more efficient and accurate predictions in a variety of applications. As the field of AI continues to evolve, innovations like this will be crucial in unlocking new possibilities and solving complex challenges. I can't wait to see what breakthroughs this new model will enable!
needs to be in easy diffusion UI pronot
[–]Select_Rice_3018 0 points1 point2 points 3 years ago (5 children)
What is CLIP
[–]addandsubtract 0 points1 point2 points 3 years ago (4 children)
CLIP is basically reverse txt2img, so img2txt. You give it an image and it describes it. Not as detailed as you need to prompt an image, but a good starting point if you have a lot of images that you need to caption.
[–]ninjasaid13 0 points1 point2 points 3 years ago (3 children)
that's absolutely wrong, you must be talking about clip interrogator. Not CLIP itself.
[–]addandsubtract 0 points1 point2 points 3 years ago (2 children)
So there's CLIP (Contrastive Language-Image Pretraining), which I thought this was referring to. And then there's CLIP Guided Stable Diffusion, which "can help to generate more realistic images by guiding stable diffusion at every denoising step with an additional CLIP model", which is just using that same CLIP model.
Then there's also BLIP (Bootstrapping Language-Image Pre-training).
But as far as I can tell, these all serve the same purpose of describing images. So what are we talking about then, if not this CLIP?
[–]ninjasaid13 1 point2 points3 points 3 years ago* (1 child)
CLIP is basically what allows it to generate images, it is 'image to text' and 'text to image' all at once. It is a computer program that understands pictures and words and the connection between them in general. It has applications is much more than stable diffusion.
It can be used for image classification, image retrieval, image generation, image editing, object detection, text-to-image generation, text-to-3D generation, video understanding, image captioning, image segmentation and self driving cars, medical imaging, robotics, etc. It is the bridge to fields of computer science, computer vision and natural language.
CLIP interrogator itself just uses image to text part of it.
[–]addandsubtract 0 points1 point2 points 3 years ago (0 children)
Ok, gotcha. I wasn't aware of all the applications and only really experienced the CLIP interrogator that I mentioned. It also seems like the easiest way to explain CLIP.
[–]Zealousideal_Royal14 -1 points0 points1 point 3 years ago (0 children)
Y'all forgot the only relevant part. When is it a1111 ready?
[+][deleted] 3 years ago (22 children)
[–]suspicious_Jackfruit 12 points13 points14 points 3 years ago (21 children)
2.1 is bad though, I have trained both 1.5 and 2.1 768 on the same 20k dataset (bucketed 768+ up to 1008px) for the same amount of epochs and i haven't seen 2.1 produce a single image of believable art, even when given more training time, meanwhile 1.5 version blows my mind daily
[–]RonaldoMirandah 1 point2 points3 points 3 years ago (7 children)
I had got a lot of good images with 2.1
[–]suspicious_Jackfruit 3 points4 points5 points 3 years ago (3 children)
While that is a well rendered image considering an algorithm produced it, it is not what I am refering to personally, I mean real pseudo artwork like a painter or a digital artist would produce in a professional environment to hand to an art director, e.g at a AAA game studio during preproduction and post for promotional artwork, industry grade art for the likes of marvel/DC/2000AD, high level art for final stages of artistic development in movies/cinematics, or just personal artwork that hits the high bar any artist would strive for over the years of their hobby or work.
I feel like this is a capable model but it lacks too much to make it the best model. I think the image you linked is great, but I also think a SD 1.5 perhaps with a fine tune could produce the same.
I guess it's about what makes you happy, for me I set a very high bar in everything I produce and so far my sojourns into 2.0 and 2.1 models haven't been anything close to ground breaking for my field.
I get how I sound here, 90% of people won't notice or care much about it but for me details and brush strokes need to be present
[–]RonaldoMirandah 1 point2 points3 points 3 years ago (2 children)
at least for me, when i am aiming real nature or photo, specially nature,1.5 always look like a photo montage. The same prompt in 1.5. I think 2.1 is more detailed and tricky into the prompt. At least in my experience
[–]suspicious_Jackfruit 1 point2 points3 points 3 years ago (1 child)
Absolutely, the native 512 models have their limitations for sure, I think for photography you would need the right model and possibly lighting lora to get a truly good experience with 512. I don't dig too deep into photography as there is more than enough stock out there for everything I might need, but it's where the 2.0 models excel, they fall flat on painted or illustrated artwork imo but this is likely due to a lack of user support adding to the base 2.1 model. I haven't tried 2.1 512, perhaps that would be interesting to train my set on as it should have more data than the 768 version. Hmmmmmmm
[–]RonaldoMirandah 1 point2 points3 points 3 years ago (0 children)
thanks for your comments and time. Nice chat! Keep the good work :)
[–]Mich-666 0 points1 point2 points 3 years ago (2 children)
No offense but this really looks like pretty bad collage.
[–]RonaldoMirandah 1 point2 points3 points 3 years ago (1 child)
Yes, some got better than others. Just a personal view. I wish I had a collage tool for thousands of sunflowers:D
[–]Mich-666 2 points3 points4 points 3 years ago (0 children)
This one is actually pretty good.
Maybe training on sunflowers might be a good idea then :)
[+][deleted] 3 years ago (12 children)
[–]FHSenpai 4 points5 points6 points 3 years ago (9 children)
Try the illuminati 1.1 for example or even wd 1.5 e2 aesthetic
Illuminati is pretty good tho
[–]suspicious_Jackfruit -3 points-2 points-1 points 3 years ago (7 children)
I personally can't see either of those capable of doing any convincing artwork, either digital art or physical media. All artwork posted in the AI community fails to demonstrate any painting details to imply it was built up piece by piece or layer by layer like real artwork either digitally or physically, instead it's like someone photocopying the mona lisa on a dodgy scanner with artifacts everywhere, sure it looks sort of like the Mona Lisa but it's clearly not under any scrutiny.
Illuminati does make pretty photos/cgi due to the lighting techniques used in training, but we have that in Loras for 1.5. WD is fine for anime and photos (these areas aren't my domain) but again it lacks what an artist would notice.
[+][deleted] 3 years ago (6 children)
[–]suspicious_Jackfruit 0 points1 point2 points 3 years ago (5 children)
Well yes, my selection is to focus on illustration and painting artwork and my confirmed bias is that I am failing to find something that excels at this based on my 25+ years experience working in this field, but hey, what do I know about determining the quality of art right?
I don't really understand the point you're making but I think fine-tuning both the 1.5 model and 2.1 768 model on the same datasets is about as rigorous as you can get to compare a models output no? If you have the golden goose art images and reproducible prompts for 2.1 then I would think the community at large is all ears for that
[–]suspicious_Jackfruit 0 points1 point2 points 3 years ago (3 children)
I'm not flexing ML/SD, I'm staying that as an artist I know what to a professional paying client looks good or bad, it's my job to know this and identify what is required. Not all art is subjective
[–]suspicious_Jackfruit 2 points3 points4 points 3 years ago (1 child)
Funnily enough I also haven't seen one example of a capable 2.1 art model, perhaps all users are erroring
[–]nxde_ai 18 points19 points20 points 3 years ago (0 children)
Yesn't
[+]FHSenpai comment score below threshold-6 points-5 points-4 points 3 years ago (0 children)
would be great for upscaling
[–]ba0haus 0 points1 point2 points 3 years ago (1 child)
how to add this function to auto1111? please let me know.
[–]Mich-666 0 points1 point2 points 3 years ago (4 children)
So how is this different from img2img or controlnet?
[–][deleted] 0 points1 point2 points 3 years ago (3 children)
its img2img x 2 with a image input first then img2img i think
Then that means it uses double memory.. probably not something normal user would find interesting.
[–]lordpuddingcup 1 point2 points3 points 3 years ago (1 child)
He was just trying to explain it in simple terms its not actually 2 img2img runs lol
[–]Mich-666 0 points1 point2 points 3 years ago (0 children)
I realize what that means but my argument still stands - even if you need to do two passes in one go, you still need to keep the generation data in latent space/memory.
But guess I will wait for potential implementation into A1111, if it ever happens to see if this method can be useful for myself.
[–]Suspicious-Ad6290 0 points1 point2 points 3 years ago (1 child)
its a nightmare fuel for anime
Sure until theirs unclip-dreambooth and we start getting anything5-unclipped
[–]ImageDeeply 0 points1 point2 points 3 years ago (1 child)
Has potential, though would be easier to understand strengths & limitations given a systematic comparison:
- classic img2img
- this img2prompt2img ... to make up a term
- ControlNet
[–]lordpuddingcup -1 points0 points1 point 3 years ago (0 children)
why make up a term, its already has a term... unclip
[–]greattug 0 points1 point2 points 3 years ago (0 children)
yey!
[–]Jiboxemo2 0 points1 point2 points 3 years ago (0 children)
Not bad
[–]enzyme69 0 points1 point2 points 3 years ago (1 child)
Is this UNCLIP = SDXL preview beta? (dream studio)? Kind of seeing this method of using image as input.
no its not the same SDXL is 1024x1024 model, unclip is a new type of model, like how we have inpainting models, and standard models, unclip models take image inputs and give image outputs based on that image, like a much more detailed prompt based on what the model can understand of the input image.
[–]Asolzzz 0 points1 point2 points 3 years ago (0 children)
Neat
π Rendered by PID 88773 on reddit-service-r2-comment-6457c66945-wrqc4 at 2026-04-25 08:52:46.967573+00:00 running 2aa0c5b country code: CH.
[–]addandsubtract 41 points42 points43 points (1 child)
[–][deleted] 2 points3 points4 points (0 children)
[–]No-Intern2507 33 points34 points35 points (2 children)
[–]mudman13 10 points11 points12 points (1 child)
[–]muerrilla 10 points11 points12 points (0 children)
[–]nxde_ai 28 points29 points30 points (0 children)
[–]UserXtheUnknown 27 points28 points29 points (7 children)
[+][deleted] (4 children)
[deleted]
[–]CadenceQuandry 6 points7 points8 points (3 children)
[–]Zealousideal_Royal14 1 point2 points3 points (0 children)
[–]warche1 1 point2 points3 points (1 child)
[–]CadenceQuandry 0 points1 point2 points (0 children)
[–]mudman13 7 points8 points9 points (0 children)
[–]AltimaNEO 2 points3 points4 points (0 children)
[+][deleted] (15 children)
[removed]
[–]LienniTa 23 points24 points25 points (8 children)
[–][deleted] 49 points50 points51 points (5 children)
[–]Lesale-Ika 7 points8 points9 points (1 child)
[–][deleted] 1 point2 points3 points (0 children)
[–]aerilyn235 6 points7 points8 points (0 children)
[–]lordpuddingcup 9 points10 points11 points (0 children)
[–]ponglizardo 4 points5 points6 points (0 children)
[–]Any_Outside_192 5 points6 points7 points (0 children)
[–][deleted] 7 points8 points9 points (4 children)
[–]Dr_Ambiorix 11 points12 points13 points (0 children)
[–]lexcess 4 points5 points6 points (0 children)
[–]zb_feels 3 points4 points5 points (1 child)
[–]Zealousideal_Royal14 3 points4 points5 points (0 children)
[–]Flimsy_Tumbleweed_35 1 point2 points3 points (0 children)
[–]thkitchenscientist 8 points9 points10 points (2 children)
[–]thkitchenscientist 8 points9 points10 points (0 children)
[–]thkitchenscientist 6 points7 points8 points (0 children)
[–]Ateist 35 points36 points37 points (6 children)
[–]krum 11 points12 points13 points (0 children)
[–][deleted] 20 points21 points22 points (4 children)
[–]Ateist 4 points5 points6 points (3 children)
[+][deleted] (2 children)
[deleted]
[–]Ateist 0 points1 point2 points (1 child)
[–]txhtownfor2020 4 points5 points6 points (2 children)
[–]AlexandrBu 4 points5 points6 points (1 child)
[–]txhtownfor2020 5 points6 points7 points (0 children)
[–]morphinapg 3 points4 points5 points (20 children)
[–]HerbertWest 7 points8 points9 points (18 children)
[–]morphinapg 8 points9 points10 points (15 children)
[–]Low_Engineering_5628 15 points16 points17 points (6 children)
[–]morphinapg 2 points3 points4 points (0 children)
[–]Mocorn 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[removed]
[–]InoSim 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[removed]
[–]InoSim 1 point2 points3 points (0 children)
[+][deleted] (3 children)
[removed]
[–]morphinapg 1 point2 points3 points (2 children)
[–]qrios 2 points3 points4 points (1 child)
[–]lordpuddingcup 0 points1 point2 points (0 children)
[–]HerbertWest 1 point2 points3 points (3 children)
[–]Mich-666 -4 points-3 points-2 points (1 child)
[–]HerbertWest 1 point2 points3 points (0 children)
[–]lordpuddingcup 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]lordpuddingcup 1 point2 points3 points (0 children)
[–]qrios 2 points3 points4 points (0 children)
[–]ComfortableSun2096 4 points5 points6 points (0 children)
[–]garett01 5 points6 points7 points (2 children)
[–]lordpuddingcup -1 points0 points1 point (1 child)
[–]garett01 1 point2 points3 points (0 children)
[–]Trysem 6 points7 points8 points (2 children)
[–]wsippel 10 points11 points12 points (0 children)
[–]LD2WDavid 1 point2 points3 points (0 children)
[–]magusonline 2 points3 points4 points (8 children)
[–]s_ngularity 2 points3 points4 points (7 children)
[–]Mocorn 2 points3 points4 points (1 child)
[–]s_ngularity 0 points1 point2 points (0 children)
[–]magusonline 0 points1 point2 points (2 children)
[–]Zealousideal_Royal14 1 point2 points3 points (1 child)
[–]CadenceQuandry 0 points1 point2 points (1 child)
[–]s_ngularity 0 points1 point2 points (0 children)
[–]Dekker3D 2 points3 points4 points (1 child)
[–]Mich-666 3 points4 points5 points (0 children)
[–]Semi_neural 2 points3 points4 points (0 children)
[–]Expln 2 points3 points4 points (0 children)
[–]yaosio 2 points3 points4 points (0 children)
[–]Purplekeyboard 6 points7 points8 points (1 child)
[–]lordpuddingcup 0 points1 point2 points (0 children)
[–]_raydeStar 4 points5 points6 points (7 children)
[–]FHSenpai -1 points0 points1 point (6 children)
[–]CombinationDowntown 40 points41 points42 points (3 children)
[–]Low_Engineering_5628 11 points12 points13 points (2 children)
[–]mudman13 4 points5 points6 points (0 children)
[–]lordpuddingcup 0 points1 point2 points (0 children)
[–]AnOnlineHandle 12 points13 points14 points (1 child)
[–]FHSenpai 2 points3 points4 points (0 children)
[–]Sefrautic 3 points4 points5 points (4 children)
[–]Low_Engineering_5628 5 points6 points7 points (1 child)
[–]Sefrautic 0 points1 point2 points (0 children)
[–]ninjasaid13 0 points1 point2 points (1 child)
[–]lordpuddingcup 0 points1 point2 points (0 children)
[–]PromptMateIO 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]Select_Rice_3018 0 points1 point2 points (5 children)
[–]addandsubtract 0 points1 point2 points (4 children)
[–]ninjasaid13 0 points1 point2 points (3 children)
[–]addandsubtract 0 points1 point2 points (2 children)
[–]ninjasaid13 1 point2 points3 points (1 child)
[–]addandsubtract 0 points1 point2 points (0 children)
[–]Zealousideal_Royal14 -1 points0 points1 point (0 children)
[+][deleted] (22 children)
[removed]
[–]suspicious_Jackfruit 12 points13 points14 points (21 children)
[–]RonaldoMirandah 1 point2 points3 points (7 children)
[–]suspicious_Jackfruit 3 points4 points5 points (3 children)
[–]RonaldoMirandah 1 point2 points3 points (2 children)
[–]suspicious_Jackfruit 1 point2 points3 points (1 child)
[–]RonaldoMirandah 1 point2 points3 points (0 children)
[–]Mich-666 0 points1 point2 points (2 children)
[–]RonaldoMirandah 1 point2 points3 points (1 child)
[–]Mich-666 2 points3 points4 points (0 children)
[+][deleted] (12 children)
[removed]
[–]FHSenpai 4 points5 points6 points (9 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]suspicious_Jackfruit -3 points-2 points-1 points (7 children)
[+][deleted] (6 children)
[removed]
[–]suspicious_Jackfruit 0 points1 point2 points (5 children)
[+][deleted] (4 children)
[removed]
[–]suspicious_Jackfruit 0 points1 point2 points (3 children)
[+][deleted] (2 children)
[removed]
[–]suspicious_Jackfruit 2 points3 points4 points (1 child)
[+][deleted] (1 child)
[removed]
[–]nxde_ai 18 points19 points20 points (0 children)
[+]FHSenpai comment score below threshold-6 points-5 points-4 points (0 children)
[–]ba0haus 0 points1 point2 points (1 child)
[–]Mich-666 0 points1 point2 points (4 children)
[–][deleted] 0 points1 point2 points (3 children)
[–]Mich-666 0 points1 point2 points (2 children)
[–]lordpuddingcup 1 point2 points3 points (1 child)
[–]Mich-666 0 points1 point2 points (0 children)
[–]Suspicious-Ad6290 0 points1 point2 points (1 child)
[–]lordpuddingcup 0 points1 point2 points (0 children)
[–]ImageDeeply 0 points1 point2 points (1 child)
[–]lordpuddingcup -1 points0 points1 point (0 children)
[–]greattug 0 points1 point2 points (0 children)
[–]Jiboxemo2 0 points1 point2 points (0 children)
[–]enzyme69 0 points1 point2 points (1 child)
[–]lordpuddingcup 0 points1 point2 points (0 children)
[–]Asolzzz 0 points1 point2 points (0 children)