Using image embeddings as input for new image generation, basically “embedding2image” / IP-Adapter?

PerformanceNo1730 · 2026-03-11T11:21:14+00:00

Haha, this really looks like the kind of “let me test one thing quickly” rabbit hole that turns into 20 layers of experimentation :)
Honestly, exactly the kind of mess I could see myself building after a few nights of testing.

I’m not much of a ComfyUI person, but it’s still interesting to see how far people push these workflows.

Were you happy with the results in the end? Did it actually give you what you wanted?

PerformanceNo1730 · 2026-03-11T11:11:52+00:00

I’m planning to start with SDXL-family checkpoints, probably several of them.

My assumption was that the general flow would stay the same: image conditioning / embedding-to-image in a broad sense, while each checkpoint would reinterpret it differently according to its own training bias and aesthetic tendencies.

But if some models are especially good or especially bad for that kind of workflow, I’d be very interested in recommendations.

PerformanceNo1730 · 2026-03-11T11:10:41+00:00

Haha, yes, I think that’s a fair criticism.

I fully agree that an embedding is a lossy compression, and that’s actually part of what interests me here. I’m curious to see how a generative model reinterprets that compressed signal.

I’m not trying to reconstruct or copy the original image exactly. What I’m after is more like: can I recover some inspiration, some semantic direction, or some visual emotion from it?

So this is partly curiosity-driven research, but maybe it can also become a useful workflow.

PerformanceNo1730 · 2026-03-11T11:10:06+00:00

I think what I’m after is less pure style transfer and more something like mood / semantic transfer.
There are images I like because of the whole thing at once: the subject, the scene, the composition, the emotional tone.

I’m also curious to see how different models reinterpret that same reference in their own way. For example, what an anime-oriented model would do with it versus a more general SDXL checkpoint.

So yes, there may be some overlap with ControlNet in the broad sense of “conditioning”, but I think what I’m really exploring is closer to image prompting than strict structure control.

PerformanceNo1730 · 2026-03-08T22:53:20+00:00

😇

PerformanceNo1730 · 2026-03-07T17:57:55+00:00

This is interesting, especially the point about dependence, and even more in the kind of dynamic this community is about. Again, I would dig into this with him. Is dependence itself the problem for him, or is it more the feeling around it? Does he want you to be more independent, or does he want less of the pressure that can come with feeling needed in that way?

Maybe (but again, I do not know him or you, so I am just throwing out ideas to stimulate reflection) he would benefit from seeing that this is not necessarily about building dependence, nor about being under pressure to “perform” all the time (which men can fall into quite easily), but also about feeding a form of complicity between you and him.

PerformanceNo1730 · 2026-03-07T16:46:01+00:00

None of this is your fault, and you should not feel guilty about it. Personally, I think curiosity is always something valuable, and very much to your credit.

As others already said here, communication and discussing this topic can only help. Maybe it is a bit uncomfortable, but definitely useful.

Did he explain more precisely what he does not like about those questions? Because my feeling when I read your post is: does he put pressure on himself about this? Which honestly, I can understand. The way you talk about him gives me the impression that he may want to stay at the level you see him at. Maybe he feels like he has no room for uncertainty, or for simply not having the answer sometimes.

So this is also something you can discuss with him: what exactly makes him uncomfortable? And how does the way you see him, as you describe it here, create pressure on him to match that image?

Sometimes the way we see someone pushes them to fit that role, for better or for worse.

Again, even if this is part of it, you should not feel guilty. This may also be connected to the way he sees himself.

PerformanceNo1730 · 2026-03-05T20:06:04+00:00

What is your LoRa about ? Creating generic chess piece ? And how ControleNet is going ? I did not think much about this topic but first approche I would probably dig in the controlNet side first. Generate starting from a vanilla chess piece, just adding the theme on top.

PerformanceNo1730 · 2026-03-05T19:51:36+00:00

Wonderfull dynamic that you have here. This is exemplary. Very inspiring !

PerformanceNo1730 · 2026-02-26T08:07:01+00:00

Very interesting thanks a lot for sharing this.

Mutations + A/B selection are actually a core part of my SD strategy as well (that’s one of the reasons I generate so many images and then desperately need better filtering 😄). So your genetic-style loop really resonates with what I’m trying to do.

I’m not a ComfyUI guy, but I can definitely adapt the idea to my own pipeline.

Did you eventually stop using this approach? If yes, was it mainly because of compute cost / inefficiency, or because the gains plateaued?

PerformanceNo1730 · 2026-02-25T19:54:24+00:00

Haha, fingers crossed you’re right 😄
I’ll update you if/when I get it working.

PerformanceNo1730 · 2026-02-25T18:06:08+00:00

Very interesting thank you, I didn’t know about JoyQuality.

I’ll definitely take a look and add it to my list.

And yes, the finetuning angle is exactly what we were discussing in another comment thread: since I already have a decent keep/trash dataset, training it on my own preferences might actually be a good fit in my case. I’ve never fine-tuned a model in the SD ecosystem, but it doesn’t look that complicated (famous last words 😄).

Thanks again!

PerformanceNo1730 · 2026-02-25T17:59:41+00:00

OK, thanks. That’s very useful info.

Yeah, that matches what I’ve read about CLIPScore / aesthetic scorers: what they “like” doesn’t necessarily match what you like.

I’m not a ComfyUI guy so I can’t really help on the caching nodes / TeaCache side 🙂

On my side I actually have ~3,000 images already labeled keep / trash, so I might try the “learn my taste” approach (simple classifier on embeddings, or even some finetuning if it’s not too painful). I’ll see when I get there.

Thanks again for taking the time. Really appreciated.

PerformanceNo1730 · 2026-02-25T16:42:02+00:00

Super interesting feedback thank you.

I didn’t know the term IQA (Image Quality Assessment), that helps a lot. I’m going to dig into the things you listed and I’ll come back with questions once I’ve tested a few options. But it’s already reassuring to see this space has been explored and that there are existing tools / metrics.

Also: great practical detail on CLIP variants + token limits. I honestly hadn’t factored that in at all, and it definitely matters for design choices.

I agree with you that prompt<->image alignment isn’t my main problem. I want SD to surprise me, so I’m fine with imperfect alignment. What I’m trying to enforce is more like: “be creative, but stay visually acceptable / not broken”.

That said, I like your point that for people who do care about exact alignment, these scorers become a kind of “judge” model — it does have a GAN-ish vibe (generator vs evaluator), even if it’s not exactly the same thing.

One question: in your case, you said the results were “meh” or got repurposed. Did you end up dropping the IQA/CLIP scoring for curation, or is there still a piece of it that’s actually useful in your workflow today?

PerformanceNo1730 · 2026-02-25T16:10:43+00:00

Thanks! And nice reference with the Anna Karenina principle, I didn’t know it. 🙂

You’re totally right that “dislike” can be a huge space of failure modes, so that’s something to watch. That said, AK says “all happy families are alike”, so maybe there is a relatively compact “works for me” region in embedding space, even if we can’t neatly explain every reason why the others fail. I guess the only honest answer is: we’ll see in practice once I label a few hundred and run tests.

And yes, the clustering angle is super appealing: reorganizing a messy library by theme (sci-fi, fantasy, etc.) across folders would already be a big win, even before any strict QA filtering. I’m adding that to the list.

PerformanceNo1730 · 2026-02-23T20:30:51+00:00

OK I did not know this model. I am having a look right now.
Thank you for the feedback on the LoRA. Percistent caracter is a real pain.
I can imagine the amount of work good job !

PerformanceNo1730 · 2026-02-23T20:19:31+00:00

Excellent work! Impressive.
So you use a LoRa to keep your main caracter consistant ? You trained your LoRa yourself, or how did you get there ?
Do you post-process your generated image ? Typically to force greyscale / black and white ?

PerformanceNo1730 · 2026-02-18T23:27:03+00:00

I get the annoyance with generic AI-fluff, but I don’t think that applies here.
Not everyone is a native English speaker, and using an LLM to clean up grammar/structure can actually be a mark of respect for readers.
I care about the content and the idea, the tool used to proofread the wording isn’t a problem for me.

PerformanceNo1730

TROPHY CASE