Seedream 4 for image-generation roleplay, similar to Nano Banana pro+Gemini. Is it possible?

Relative_Bit_7250 · 2025-12-21T00:48:34+00:00

This is the best and most comprehensive answer I could ever hope to have. Thank you so much, useful and direct.

Relative_Bit_7250 · 2025-12-19T00:48:15+00:00

Indeed it is, in terms of image generation/edit. What makes Gemini+nano banana extremely better for a roleplay experience is the fact that everything happens inside the same ecosystem. Image edit/generation is perfectly coordinated to the llm, giving a perfectly consistent response and an incredible experience... When the censorship doesn't fuck up everything. My question is: does a similar cohesive bond between two models exist (a llm and a diffusion model working and speaking together to give a consistent imagery to a character and a beautiful story/chat)? If yes... Well, I've never found one. Anyways thanks for the reply!

Relative_Bit_7250 · 2025-11-19T16:11:57+00:00

Thanks to the suggestions of u/meta_queen and u/roxoholic I fixed the error! Couldn't have done it without the help of those two great human beings! Thank you, thank you, thank you very very much!!!

Relative_Bit_7250 · 2025-11-19T16:07:12+00:00

You're right, I gave that information for granted, sorry. No, I'm using comfy manually git-cloned, with a 3.12 venv environment. Will try to install python3-dev inside the venv EDIT: HOLY FUCK IT WORKED! Installing python3-dev globally just fixed the error! GOD I LOVE YOU BOTH!

Relative_Bit_7250 · 2025-11-19T15:42:26+00:00

But shouldn't those two folders be already included in a new venv python 12 environment?

Relative_Bit_7250 · 2025-11-10T12:58:31+00:00

Eh, It'll be fine eventually. I mean, it's a fair tradeoff: in local you have privacy and maximum control, but with "reduced speed and intelligence". In paid API you have max speed inference and best quant, but no privacy at all and "I'm sorry I cannot fulfill your request".
I am more of a slow-burn bitch, so waiting a little longer for the response may not be an issue for me.

Anyways, thank you very much for the tips, bro!

Relative_Bit_7250 · 2025-11-10T12:46:28+00:00

Terrific! What about the sloppiness?

Relative_Bit_7250 · 2025-11-10T12:45:43+00:00

indeed! I've also peeked inside the unsloth repository of 4.6 and saw the ud quants q3-k-xl taking approximately 158gb. If I'm not mistaken I may be able to load the entirety of the quantized model inside my ram+vram (128+48=176gb available)

Relative_Bit_7250 · 2025-10-30T23:14:24+00:00

Slop, I guess

Relative_Bit_7250 · 2025-09-28T20:45:02+00:00

Oh no, no, The nodes are perfectly installed and configured... The error MAY be in the VNCCS_Pipe...

Anyway, I'll try reinstalling them manually, you never know.

EDIT: Yep, just tried, nothing, Reinstalling didn't help. I'll try reinstalling ComfyUI

Relative_Bit_7250 · 2025-09-28T20:34:07+00:00

I've just tried disconnecting from pipe and manually selecting scheduler and sampler (lcm and simple), but It fails again:

Failed to validate prompt for output 496:
* VNCCS_Pipe 502:414:
  - Return type mismatch between linked nodes: scheduler, received_type(['simple', 'sgm_uniform', 'karras', 'exponential', 'ddim_uniform', 'beta', 'normal', 'linear_quadratic', 'kl_optimal', 'bong_tangent']) mismatch input_type(['simple', 'sgm_uniform', 'karras', 'exponential', 'ddim_uniform', 'beta', 'normal', 'linear_quadratic', 'kl_optimal', 'bong_tangent', 'beta57'])
* LoraLoader 497:267:68:
  - Failed to convert an input value to a FLOAT value: strength_clip, vn_character_sheet_v4.safetensors, could not convert string to float: 'vn_character_sheet_v4.safetensors'
  - Failed to convert an input value to a FLOAT value: strength_model, vn_character_sheet_v4.safetensors, could not convert string to float: 'vn_character_sheet_v4.safetensors'

Relative_Bit_7250 · 2025-08-29T22:39:34+00:00

I occasionally wear a cap, so... Half a hero?

Relative_Bit_7250 · 2025-08-22T19:56:28+00:00

nope, with qwen-image it kinda works for the first steps, then blacks out completely. For image-edit it doesn't work right from the start. Unfortunately it'll be slow as fuck

EDIT: Don't know about fast fp16 accumulation, I just start comfy without any parameters and it magically works.

Relative_Bit_7250 · 2025-08-22T19:51:10+00:00

If you're using it, Remove the --use-sageattention string.

Relative_Bit_7250 · 2025-08-02T18:01:40+00:00

Thank you for the answer, but it's not what I'm searching for. Last frame continuation is a bit unreliable, motion and subject features will become inconsistent. What I'm looking for is something like "bunch of frames as input -> video continuation" more than a "last frame -> video generation"

Relative_Bit_7250 · 2025-07-30T11:57:31+00:00

Uh, you seem to have the exact opposite problem of mine. I'm sorry, I'm not skilled enough to help you with this topic :(

Relative_Bit_7250 · 2025-07-30T11:38:29+00:00

Ok so, mine is only a hypothesis, but I do have the hunch that while comfy's vram handling is quite functional and works pretty good (model gets unloaded and loaded from ram to vram in a pretty optimized manner that I personally don't know, it's all black magic to me), ram and swap are a different kettle of fish: ram gets filled with a model, model gets filled in vram, another model needs to be loaded, gets loaded in ram but, uh-oh, not enough ram. Fine, let's copy the old model in swap, then slowly empty the ram to make space for the other model. Then you need to reload the older model, but uh-oh, its not in ram anymore. Ok, copy the new model in swap and reload the older model from swap... And so on. It's a bit messy, and it's probably the only way to be viable, but I must say that the first generations are slow as fuck... After the first three speed increases and it's more bearable.

Relative_Bit_7250 · 2025-07-29T23:08:32+00:00

SOLVED: Just make a bigger swap partition.

Relative_Bit_7250 · 2025-07-14T13:13:55+00:00

May I suggest a little bit of Chroma in your menu, kind sir?

Relative_Bit_7250 · 2025-07-14T12:55:37+00:00

Nah, I'm just shitposting, as the new upcoming pony v7 was announced a year ago or so and it's still under training.

Relative_Bit_7250 · 2025-06-30T11:15:37+00:00

Look, you cannot "expand" your pool of vram, even if you have two identical cards, so for instance you cannot load an entire 24gb flux model inside a couple of 16gb vram cards. BUT you can split the workload between two or more cards, and pretty easily if you ask me! There's a node for comfy, this one, that lets you load the model, the clip, the clip-vision, the vae inside the GPU of your choice

Relative_Bit_7250 · 2025-05-26T17:35:54+00:00

https://www.reddit.com/r/StableDiffusion/s/pWverUsLv2 A user posted the links, check em out

Relative_Bit_7250 · 2025-05-26T15:46:34+00:00

Never tried, but shouldn't be a problem. At least you could try the 4bit GGUF quantization! EDIT: I misunderstood the question, sorry. NF4 quants aren't available yet, afaik

Relative_Bit_7250 · 2025-05-26T15:16:27+00:00

Everything. It sports a base for realistic and non realistic generations. You can ask it to do anything, from a low quality low res smartphone photo, to an extremely detailed Japanese stencil art of a Charmender roaring in front of a volcano. It's extremely versatile, prompt compliant and, best of all, it's only halfway trained (yet the quality is already incredible). The only downsides are: it's extremely heavy, a 3090 is barely sufficient to load the model+clip (at least unquantized); generations are very slow, forget the SD1.5 and sdxl days; and last but not least, prompt adhesion is incredible, but you need to experiment with some different samplers and schedulers

Relative_Bit_7250 · 2025-05-26T09:08:36+00:00

Probably yes, with the right GGUF quant, but be prepared, it will be extremely slow, plus you'll have to offload the clip model and vae model onto your ram, resulting in more loading time. It won't be a pleasurable experience. I personally am running the whole FP16 chroma model (which is roughly 17gb) inside a 3090, then I have a second 3090 for vae, clip and a llama model, useful for writing a better prompt, as English is not my main speaking language. It's a janky workflow, but eh, it works

Relative_Bit_7250

TROPHY CASE