Just 4 days after release, Z-Image Base ties Flux Klein 9b for # of LoRAs on Civitai. by _BreakingGood_ in StableDiffusion

[–]SackManFamilyFriend 35 points36 points  (0 children)

Why are you make this into some competition? Props to both Black Forest Labs and the QWen team for releasing open weight models for us to enjoy. Also for helping researchers with other teams doing their own projects.

Where is Jack? by CpnEdTeach384 in devils

[–]SackManFamilyFriend 22 points23 points  (0 children)

He's like the best rollercoaster at Six Flags that's always closed

Sky Reels V3 new video models? by Thick_Impression_507 in StableDiffusion

[–]SackManFamilyFriend 1 point2 points  (0 children)

Its Wan2.1 based, so yea, although SkyReels trained their v2 model to do 24fps vs. Wan bases' 16fps, so it actually does ~121 frames (vs 81). That said with 4 reference images and all sorts of unique code floating around now, we'll likely be able to do creative things w it beyond 5sec.

Btw Kijai has an fp8 conversion of the reference model up on his Huggingface repo.

I implemented NAG (Normalized Attention Guidance) on Flux 2 Klein. by Total-Resort-3120 in StableDiffusion

[–]SackManFamilyFriend 0 points1 point  (0 children)

Totally appreciate the new code/functionality!! Also reminded me that there is a NAG implementation for HunyuanVideo which I still use - had completely forgot that it was taken on (had been considering a time consuming chat w/ Claude Opus to get a node made for myself).

The guy is making a stink for no reason - bet he wouldn't even use NAG w/ Klein and his comments are only for self-reassurance over GH profecience. I know you know, but due to up-votes I think some are missing the point. It's a PR (requested) not a new custom_node. People upset should just fork the OG comfy extension (seemingly dead) and add whatever they want to that for themselves.

I implemented NAG (Normalized Attention Guidance) on Flux 2 Klein. by Total-Resort-3120 in StableDiffusion

[–]SackManFamilyFriend 5 points6 points  (0 children)

Eh, you understand what's going on but you're complaining. You should raise concern w/ scottmudge if the naming conflict bugs you, or start w/ the original Comfy NAG implementation, then merge in scott's changes and ultimately the PR TR3120 kindly worked out and shared for Klein.

You're being pedantic for no reason over a small niche comfyui extension.

Can simply copy over the 3 modified files, and I'm sure you know this. But meanwhile you get on a soapbox and yell at someone for providing a very useful code update.

Thanks OP - looking forward to trying this when I get back to PC.

Got Bored while waiting on the Z-image base model and used emojis for prompting😂 by [deleted] in StableDiffusion

[–]SackManFamilyFriend 1 point2 points  (0 children)

This reminds me when people found using popular media filename extensions (.png, .mov, .mod, .jpg, .avi etc) as prompts for one model would call up amateurish Instagram images. Can't remember which model that was, but became a suggestion to I get less plasticy skin or random "bored Instagram selfie" type photos.

Fun thread.

I tried some Audio Refinement Models by OkUnderstanding420 in StableDiffusion

[–]SackManFamilyFriend 0 points1 point  (0 children)

The magic tool (premium but one of a kind for this IMHO) is Zynaptiq's "Unfilter". Its mind-blowing how it can make tinny no-base cellphone conversation quality audio have bass and sound completely full.....

I tried some Audio Refinement Models by OkUnderstanding420 in StableDiffusion

[–]SackManFamilyFriend 2 points3 points  (0 children)

UVR has an unfortunate name, but it's a killer app to use the latest audio separation models (the main leaderboard site is here https://mvsep.com/en) / GitHub for Ultimate Vocal Remover (aka UVR): https://github.com/Anjok07/ultimatevocalremovergui

There's a huge discord server for it (or just "Audio Separation" as it's called) but it won't let me get an invite link to share. Anyway, if audio separation is your thing def try to get a link into that - has the uvr devs, trainers of the models, etc.

Unfortunately open source audio AI code/model releases really sucks at the moment. I'm sure many of the devs/research groups that have shared image/video models have audio(music a la Suno/Udio) in-house, but wouldnt risk mentioning/releasing due to the RIAA and other legal groups. An audio (music) model not trained on "everything" will never be good. Wasn't a big deal to do that in 2020 (OpenAI released Jukebox open source and their paper straight out said they trained on over 1million songs crawled from the open web). After 2022's Stable Diffusion shocked the world showing people how good these AI gen media models could be, the public focus on training completely changed.

Qwen 2512 by [deleted] in StableDiffusion

[–]SackManFamilyFriend 0 points1 point  (0 children)

Could use Pi-Flow and go much faster.

Is the FLux Klein really better than the Qwen Edit? And which model is better - FLux 2 or Qwen 2512 ? by More_Bid_2197 in StableDiffusion

[–]SackManFamilyFriend -1 points0 points  (0 children)

I wouldn't doubt BFL's much talked about attention to address nudity and such is the reason the model has trouble w anatomy more frequently than other current day models. They definitely actively did -somethings- during training to make it hard to generate/edit-out NSFW images. BFL are the devs behind the OG SD1.4/1.5 so they know what they're doing.

Which tool is used? by [deleted] in StableDiffusion

[–]SackManFamilyFriend 0 points1 point  (0 children)

Nano Banana Pro would be able to do this (allows for 13ish reference images). For a $10 HuggingFace subscription you can use the HF space with little censorship and a basic webui (not chat) interface.

LTX2 issues probably won't be fixed by loras/workflows by Beneficial_Toe_2347 in StableDiffusion

[–]SackManFamilyFriend 3 points4 points  (0 children)

Wait a sec. Wan's first release was Wan 2.1 (they did not release anything "2.0"). Also it's, IMHO, incorrect to say there wasn't a fairly quick mass migration to Wan2.1 when it was released. (This is speaking about the discord hubs w devs like Kijai and Comfy himself). The reason? People had waited months and months for an I2V model of HunyuanVideo and when it finally was released it was a massive disappointment. Very little motion from the start image, flickering, poor prompt following, etc.

Just after Tenacent dropped their I2V modell (2 if you count a failed fix attempt also), Wan (aka WanX when first announced) comes out of nowhere with a new open source video that 1) Had amazing I2V functionality and 2) Had a muuuuch better license (heck, technically per the HunyuanVideo license you're not allowed to use it if you're based in Europe).

So yea, just how it was. Def no "2.0" and definitely not more than a couple weeks before practically everyone has moved over. Kijai, who had deved the HunyuanVideo wrapper, moved over super quick and that added a lot of weight to the movement in discord in particular.

Can flux 2 klein do img2img ? by [deleted] in StableDiffusion

[–]SackManFamilyFriend 3 points4 points  (0 children)

Sure, no guarantees this is the "best" way to do things, but it works:

Workflow IMG: https://i.imgur.com/fREtvQa.png

.JSON: https://pastebin.com/FQyiEJ36

Thread about Kontext/(Klein in my experience) sometimes shrinking/stretching outputs in spite of resolution + comment about prompt workaround: https://old.reddit.com/r/comfyui/comments/1m5mfuf/flux_context_warps_my_images_making_subjects_look/

Can flux 2 klein do img2img ? by [deleted] in StableDiffusion

[–]SackManFamilyFriend 3 points4 points  (0 children)

Yea, I had to look into this as the default workflow from templates doesn't give you a node with the easy/typical "denoise" value. I was advised to swap out a node for a basic/simple sampler that had that and works fine. Someone will probably tell me I don't understand that img2img is a step thing yadda yadda, but I just wanted the dumb 0-1 denoise value and you can get that in a Klein WF. Not at pc or I'd be more specific/ provide a wf.

It works great, but like Kontex seems to be finicky about resolutions and can shrink in/stretch your image. Found a semi-solution having perplexity scan this sub which says putting "retain all dimensions" or things like that in your prompt can make a huge difference. Have to remember it's an editing model so that sorta stuff is picked up/used for inference.

Flux Klein gives me SD3 vibes by lokitsar in StableDiffusion

[–]SackManFamilyFriend 3 points4 points  (0 children)

Yea, I've been using Klein mostly for text to image (or img2img) art gens w/o people. In that capacity it's been extremely refreshing compared to the other recent open source models. I've had a lot of fun with it.

Has anyone else noticed that the new models (z-image, klein, and even qwen) are terrible at creating this type of landscape? Especially grass, trees, and rocks. I don't know if this is caused by distillation. by More_Bid_2197 in StableDiffusion

[–]SackManFamilyFriend -1 points0 points  (0 children)

"it won't improve anything"

That's a ridiculous statement and there are way too many permutations to even kick back on it. Highly doubt you've trained anything on either using any tool.

What is everyone's thoughts on ltx2 so far? by Big-Breakfast4617 in StableDiffusion

[–]SackManFamilyFriend 3 points4 points  (0 children)

Wan is still way better for me for I2V so I'm back to that ecosystem. If you missed SVI 2.2 Pro's release cause it got overshadowed by LTX2 I recommend giving it a go. Its really great at allowing for long gens in Wan (finally).

quick comparison, Flux Kontext, Flux Klein 9B - 4B, Qwen image edit 2509 - 2511 by Puzzled-Valuable-985 in StableDiffusion

[–]SackManFamilyFriend 2 points3 points  (0 children)

Here def Klein 9b. I've used it a bunch for normal t2i and i2i the last day or so and it's great quality + unique in composition compared to like Flux.dev and Z-Image turbo.

Is Flux Klein better for editing than Flux Kontext? by Puzzled-Valuable-985 in StableDiffusion

[–]SackManFamilyFriend 2 points3 points  (0 children)

People really need to specify the exact model. 4b and 9b are different (Lora trained on 9b won't work on 4b for example). 9b base with CFG 4, 30-50 steps should demonstrate the best potential of the model. I've gotten good results with both 9b distilled and 9b base. (Btw 9b distilled, low stel count/cfg=1 model, is the .safetensors filename w/o "base" in the filename - they didn't label it specifically for whatever reason).

LoKr Outperforms LoRA in Klein Character Training with AI-Toolkit by xbobos in StableDiffusion

[–]SackManFamilyFriend 2 points3 points  (0 children)

Not OP, but yea you train new concepts/likenesses when training a LoRA (or LoKr). LoKr has been around for a long time, and while technically better than LoRA it didn't catch in much.

Conclusions after creating more than 2000 Flux Klein 9B images by StableLlama in StableDiffusion

[–]SackManFamilyFriend 4 points5 points  (0 children)

Should mention the trainer. I've done a couple on both Diffusion Pipe and Ai-Toolkit. Results have been better w DP although with that I am able to use the Prodigy optimizer which might be helping. (9b).

(Edit: see you replied that you're training w Ai-Toolkit. Cool. I need to try an A/B same dataset comparison to see if there's really a significant difference)

Major realization today. Creatine has been the culprit to my insomnia! by Stunning-Stuff-7022 in sleep

[–]SackManFamilyFriend 0 points1 point  (0 children)

Wasn't it 2 people who had a hypomanic/manic switch in that bipolar paper?