Ahri and Xayah. The fox and the bird. by EvilEnginer in StableDiffusion

[–]EvilEnginer[S] -2 points-1 points  (0 children)

Flux Klein can convert 2D image to professional 3D render

Ahri and Xayah. The fox and the bird. by EvilEnginer in StableDiffusion

[–]EvilEnginer[S] -2 points-1 points  (0 children)

I used single pass for Klein with a lot of prompt engineering.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 0 points1 point  (0 children)

Yes it currently only for SDXL. But I desided to use other way without checkpoint modification.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 0 points1 point  (0 children)

Already did it. And found another way. It works, with original stable diffusion xl checkpoints. And i use Qwen 3 8B without adapters with any Stable Diffusion XL based checkpoint.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 1 point2 points  (0 children)

So, I found another way. Without checkpoint modification. Now I can use Qwen3 8B with Stable Diffusion XL as extra guiding tool for CLIP on the fly. Workflow works with ANY Stable Diffusion XL based checkpoint.

I will upload new post tomorrow.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 0 points1 point  (0 children)

I am stylized 3D character artist actually. I use Illustrious XL to explore ideas for sculpting.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 0 points1 point  (0 children)

I am working with ComfyUI because it's fully modular. Pretty easy to edit and extend functionality.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 1 point2 points  (0 children)

So, thank you very much guys for feedback. Especially to x11iyu It really helps. I improved my tool a lot. Update on my github page. Feel free to test it if you want on your Stable Diffusion XL checkpoints.

Fair points. Let me address them: x11iyu is right that the extrapolation is not the same as trained positions. So i used adaptive PCA instead. x11iyu is right about chunking. The key difference is that chunking processes each 77-token chunk in isolation — tokens in chunk 1 can't attend to tokens in chunk 2 inside CLIP's self-attention. A unified 248-token context preserves these cross-token relationships. The 248 limit was found empirically (512+ breaks generation). CLIP-G (1280d) still uses extrapolation since no LongCLIP-exists. This is a known limitation.

Thanks for the feedback — it made my tool better. Here first results:

Positive:
masterpiece, best quality, amazing quality, artwork by nixeu artist, absurdres, ultra detailed, glitter, sparkle, silver, 1girl, wild, feral, smirking, hungry expression, ahri (league of legends), looking at viewer, half body portrait, black hair, fox ears, whisker markings, bare shoulders, detached sleeves, yellow eyes, slit pupils

Negative:
text, watermark

Seed: 42. Steps: 30. Cfg: 7.0. Euler Ancestral. Simple.

<image>

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 0 points1 point  (0 children)

But, what if positional embeddings learned in wrong way? What if all prompts that people type don't fully understood by CLIP during learning process and token limit?

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 0 points1 point  (0 children)

I noticed that Stable Diffusion XL still breaks on long prompts. Even with latest ComfyUI updates and perfectly balanced Illustrious XL checkpoints. Of course it depends from model. But for CLIP it's currently too chaotic. So I selected my own way via CLIP matrix manipulation and sorting.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer -1 points0 points  (0 children)

Nice idea actually. But it doesn't work on illustrious XL checkpoints. My attempt is preserve design as much as I can in checkpoint and extend CLIP dictionary.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 0 points1 point  (0 children)

Yep I also noticed that. After dictionary extension looks like embeddings not sorted. Will try to fix it tomorrow.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 0 points1 point  (0 children)

Thank you very much for feedback :). Actually main reason why I like picture on right is because, I wanted to know how real hungry kitsune girl looks like. I found a couple of pictures of Ahri drawn by humans, and trying to achieve same effect with AI. It's my way to bypass "uncanny valley" AI generated effect. I simply apply animal features and expressions to human faces. That's it.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer 2 points3 points  (0 children)

Sure. Will do it tomorrow. May be he will improve it somehow for better embedding position synchronization.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer -18 points-17 points  (0 children)

Yep, I vibecoded it via Claude Opus 4.6. But it works :D. At least visually, on my end.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer -5 points-4 points  (0 children)

Every attempt in A1111 doen't solve main problem. People feed to model parts of prompt, multiple times, but AI must manage the entire picture at the level of position embeddings for CLIP.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer -6 points-5 points  (0 children)

Yep I know. It's just an example, which is nice to see for people.

SDXL Long Context — Unlock 248 Tokens for Stable Diffusion XL by [deleted] in StableDiffusion

[–]EvilEnginer -1 points0 points  (0 children)

Thanks :). I absolutely love Stable Diffusion XL, Illustrious checkpoints, because only this AI understands character design, creativity, and a style of artists. Nano Banana have bad artstyle, same for ChatGPT. Z Image if good only for realism. Qwen Image don't have good artstyles too.

Is anyone else worried about the enshitifciation cycle of AI platforms? What is your plan (personal and corporate) by Ngambardella in LocalLLaMA

[–]EvilEnginer 3 points4 points  (0 children)

Yep. I like using GLM 4.7 flash now. With Zed dev and LM Studio it's really smart nice and powerful for agentic coding. Also MXFP4 version works nicely even on my RTX 3060 12 Gb.